We have been having a problem with service broker for quite a while now and searching on these forums and more generally on the web has not yielded any kind of answer...
Our application utilises service broker within a single database (there is no cross database, cross instance or cross server communication). There are approximately 12 queues which are used to varying degrees. A few of the queues have a throughput rate in the order of up to 100s per minute at peak periods.
In some scenarios we are able to make use of persistent conversations but the majority of messages are sent on their own conversation or in small batches on their own conversation.
Some time ago we found that the database was growing in size more than expected due to a build up of data in the service broker meta data internal tables, exposed via the sys.conversation_endpoints system view.
We identified a problem in the application that was preventing some conversations from being closed and have now fixed this.
However, we are still experiencing a build up in the service broker tables and sys.conversation_endpoints shows this is now due to a very large number of conversations in the "CLOSED" state.
I know that conversations are kept around to prevent replay attacks but thought they were supposed to be cleared after about 30 minutes. This is certainly not happening as we currently have conversations that were opened on the 18th September, a full week ago. We currently have about 350,000 closed conversations and this figure is increasing!
We have tried restarting the SQL Server instance with no effect.
I have been using a script to loop through all closed conversations and get rid of them using the "WITH CLEANUP" clause but I'm loath to create a scheduled task that does this in the background when service broker should be doing the job itself.
Has anybody experience this problem and, even better, have any idea how to solve it?
Daniel
Hi Daniel!
Sounds very interesting. Do you close the conversation on both ends of the dialog with an END CONVERSATION statement?
Normally the conversations should go away from sys.conversation_endpoints within 30 minutes as you have wrote...
Thanks
Klaus Aschenbrenner
www.sqljunkies.com/weblog/klaus.aschenbrenner
www.csharp.at
Hi Daniel!
Sounds very interesting. Do you close the conversation on both ends of the dialog with an END CONVERSATION statement?
Normally the conversations should go away from sys.conversation_endpoints within 30 minutes as you have wrote...
Thanks
Klaus Aschenbrenner
www.sqljunkies.com/weblog/klaus.aschenbrenner
www.csharp.at
The endpoint states meaning is described here: http://msdn2.microsoft.com/en-us/library/ms176082.aspx. CLOSED indeed means both endpoints had issued END CONVERSATION.
What is you message exchange pattern? Who is issuing END CONVERSATION first, the initiator or the target? Is the target ever sending any replies back to the initiator?
Thanks,
~ Remus
Thanks for considering this Remus. I'm a little busy and haven't yet had a chance to look into this to keep comprehensive answers; maybe I can spend more time on this later today.
We have a mixture of message exchange patterns. Most are fire and forget (no response from target to initiator) and in these situations the initiator starts a conversation sends a message and ends its side of the conversation with no delay; the target then picks up the message at some point in the future, processes it and then ends the other side of the conversation. We also have a few request/response conversations; generally we have seperate processes responsible for sending requests and processing responses; there are no pseudo-synchronous conversations.
I may have some more detailed information once I've analysed a trace of service broker activity in this database.
|||Here are some examples of the service broker activity:
546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Initiator DISCONNECTED_OUTBOUND
546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Target DISCONNECTED_INBOUND
546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Initiator CLOSED221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.083 Initiator STARTED_OUTBOUND
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Target STARTED_INBOUND
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.160 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.160 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.193 Initiator CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.193 Target CONVERSING
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Initiator DISCONNECTED_OUTBOUND
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Target DISCONNECTED_INBOUND
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Initiator CLOSED
221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:58:38.330 Target CLOSED
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.977 Initiator STARTED_OUTBOUND
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Initiator CONVERSING
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Target STARTED_INBOUND
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Target CONVERSING
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:13.180 Initiator CONVERSING
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:13.180 Target CONVERSING
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.837 Initiator DISCONNECTED_OUTBOUND
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.837 Target DISCONNECTED_INBOUND
C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.853 Initiator CLOSED
I'm not sure this tells us much but I am suprised about the location of the DISCONNECTED_OUTBOUND messages as I thought our code typcically ended the conversation immediately after sending the message(s) without waiting for a response. Is this because I traced the service broker activity events rather than the actuall T-SQL statements issuing the service broker commands ("END CONVERSATION" etc)?
Daniel
|||The pattern that leaves target endpoints stranded is the one where the target never sends any message (including EndDialog message). So try not to do fire and forget patterns, they have other faults too. If you don't want/can't change this, then you should set up a task to periodically clean them up (using ENd CONVERSATION ... WITH CLEANUP). The criteria is conversation endpoints that are CLOSED and last activity is more than 30 minutes in the past.
HTH,
~ Remus
Daniel Renshaw wrote:
I'm not sure this tells us much but I am suprised about the location of the DISCONNECTED_OUTBOUND messages as I thought our code typcically ended the conversation immediately after sending the message(s) without waiting for a response. Is this because I traced the service broker activity events rather than the actuall T-SQL statements issuing the service broker commands ("END CONVERSATION" etc)?
I hop I'll g et this right, I'm doing it from memory:
The transition CONVERSING->DISCONNECTED_OUTBOUND is when the initiator issues END CONVERSATION.
The transition CONVERSING->DISCONNECTED_INBOUND is when the EndDialog message sent by initiator arrives at the target
The transition DISCONNECTED_OUTBOUND->CLOSED is when the acknowledgement for the initiators EndDialog message comes back to it.
The transition DISCONNECTED_INBOUND->CLOSED is when the target issues END CONVERSATION in response to the EndDialog message received from initiator.
The pattern shown here is one of the pattern that leaves the target endpoint stranded. Because it never sent any message to the initiator, it never received an acknowledgement. An ack from the initiator sent back to the target serves as positive proof that the initiator will not retry to send message 0. W/o this proof the target cannot delete itself, it has to stay around indefinetly to prevent a replay of message 0, resulting in a stranded target endpoint. That's another reason why the fire-and-forget pattern is ... unrecommended.
BTW, you should also add the EventSubclass column to the profiler view, it details what kind of conversation event we're seeing.
HTH,
~ Remus
Thanks Remus, I'll implement the background cleanup for now. We may be revisting our core principles at some point in the future based on the lessons learnt over the past few months. We've found that tables may have been a more appropriate solution than the many queues we've ended up with.
For example, a message arrives and we put a record in a table as an audit/status information. This continues through the system, each message/step typically generates further records being inserted/updated in regular tables so that we can keep track of what's going on. This has caused tremendous bloating/duplication of data and massive transaction logs so we're now asking ourselves: why not put the message in that same table and forget the queues.
When we come to reconsider all that, we'll ensure fire-and-forget is removed from any service broker elements that remain.
Daniel
|||This saved me from pulling my hair out... I've added it to my blog.http://www.nootz.net/index.php/notification-services-conversation-cleanup.html
Thanks,
Garick
No comments:
Post a Comment