Friday, March 9, 2012

Maximum number of Conversation Timers?

Is there an upper limit on the number of Conversation Timers?

We're using Conversation Timers to implement a retry mechanism for outbound web service calls. The idea is to use Service Broker to queue up a number of web service requests. If a web service call fails to contact it's target server, we would like to reschedule that request to retry after a few minutes delay. It appears that when we have a few thousand Conversation Timers queued up, SQL Server (June CTP) will crash, or at least cause a failover to it's mirror partner.

TIA -- Keith.

There is no hard limit on the number of conversations. However, conversations timers do require system resources (memory, processor time, tempdb space, etc) and if you push more than your hardware can handle, the system might become unresponsive and this can cause mirroring to failover.

In order to decide whether this is the case or something else is happening, I would need more info.
- Is it really a crash or just a failover? Does the SQL LOG folder contain any dumps?
- What kind of hardware is this happening on: architecture, number and speed of procs, RAM available?
- When the failover occurs, any entry appear in the ERRLROG or in the system EventViewer, on the pricnipal or on the mirror?

Thanks,
~ Remus

|||

We are running an load test which is attempting to benchmark the maximum capability of the system. The h/w configuration:

2 x HP DL380's running the SQL 2005 June CTP on Windows Server 2003 Standard Edition SP1. The servers are using synchronous Database Mirroring with a 3rd machine acting as a Witness. Each DL380 has 2 x 3.4GHz Xeon's and 4GB RAM. Each DL380 also has a RAID5 SCSI array attached.

@.@.VERSION = Microsoft SQL Server 2005 - 9.00.1187.07 (Intel X86) May 24 2005 18:22:46 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

There are also 2 x HP DL360's running Windows Server 2003 Web Edition SP1. Each DL360 has 2 x 3.4GHz Xeon's with lots of RAM. These are being used as load-balanced web application front ends, running ASP.NET 2.0. There are also 3 .NET 2.0 based windows services running against the database.

There are no indications of a crash: no dumps in the SQL LOG folder. The starting PRIMARY server has the following log entries at the time of failover:

09/09/2005 21:25:36,Logon,Unknown,Login failed for user 'XXXUser'. [CLIENT: 10.0.10.14]
09/09/2005 21:25:36,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 16.
09/09/2005 21:25:36,spid27s,Unknown,Bypassing recovery for database 'XXX' because it is marked as a mirror database<c/> which cannot be recovered. This is an informational message only. No user action is required.
09/09/2005 21:25:35,spid27s,Unknown,Starting up database 'XXX'.
09/09/2005 21:25:34,Logon,Unknown,Login failed for user 'XXXUser'. [CLIENT: 10.0.10.15]
09/09/2005 21:25:34,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 16.
09/09/2005 21:25:34,spid17s,Unknown,Database mirroring is inactive for database 'XXX'. This is an informational message only. No user action is required.
09/09/2005 21:20:17,Backup,Unknown,Log was backed up. Database: XXX<c/> creation date(time): 2005/09/02(23:31:06)<c/> first LSN: 2489:314:1<c/> last LSN: 2492:2360:1<c/> number of dump devices: 1<c/> device information: (FILE=1<c/> TYPE=DISK: {'D:\SQL_Data\MSSQL.1\MSSQL\Backup\tlogs\XXX_backup_200509092120.TRN'}). This is an informational message only. No user action is required.

The error messages for failed logons continue to repeat. The starting SECONDARY server has the following log entries at the time of failover:

09/09/2005 21:49:37,spid15s,Unknown,Database mirroring is inactive for database 'XXX'. This is an informational message only. No user action is required.
09/09/2005 21:25:39,spid15s,Unknown,Database mirroring is active with database 'XXX' as the principal copy. This is an informational message only. No user action is required.
09/09/2005 21:25:35,spid15s,Unknown,Recovery is writing a checkpoint in database 'XXX' (5). This is an informational message only. No user action is required.
09/09/2005 21:25:33,spid15s,Unknown,Database mirroring is inactive for database 'XXX'. This is an informational message only. No user action is required.
09/09/2005 15:51:17,spid15s,Unknown,Database mirroring is active with database 'XXX' as the mirror copy. This is an informational message only. No user action is required.

This is an order processing application using 4 service broker queues all in the same database. One queue is being used by the two ASP.NET 2.0 front-ends for Order Entry. The three .NET 2.0 services each process the remaining three queues. Each of the services is multi-threaded with 4, 16, and 16 threads respectively. Each thread will wait on it's respective queue for an available request.

The system overall runs as expected when no conversation timers are injected into the queues. The application front-ends run at nearly 100% CPU utilization (as expected) and the database servers seems to run at around 20% CPU without extensive memory pressure.

Simulating one of our failure conditions causes the program logic to use the conversation timers in one of the queues as part of the retry mechanism. The failover and logon problems start after only a few minutes of running our load test. I would estimate that less than 500 orders have been processed at that time.

-Keith.

|||How big are the conversation timer intervals?

If I understand correctly, you say that when there are about 500 conversation timers 'armed' for a timeout of X, the system will become unresponsive (user logins are denied, mirroring timeouts occur triggering failover etc).

we'd better take this offline, I will probably have more questions for you and some of the details probably you don't want them posted on a public forum. Can you send me at r e m u s r @. m i c r o s o f t . c o m a mail address where I can contac yout?

I will post back the conclusion here.

Thanks,
~ Remus

No comments:

Post a Comment