Showing posts with label error. Show all posts
Showing posts with label error. Show all posts

Tuesday, March 27, 2012

Cluster service error

On Mar 7, 5:25 pm, "Steen Schlter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schlter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
>
> - Show quoted text -
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh
<get2rajesh@.gmail.com> wrote in message
news:1173271137.650631.212640@.v33g2000cwv.googlegr oups.com...
On Mar 7, 5:25 pm, "Steen Schlter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schlter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
>
> - Show quoted text -
Are you sure NODE-1 is setup to be part of the SQL Cluster.
(keep in mind clustering SQL is two steps, creating nodes as a window
cluster and then defining/creating the SQL cluster.
For example, you could have 3 nodes in a windows cluster, A, B, C. A,B
could be setup as a SQL cluster and B,C as an exchange cluster.)
And all services should start on any valid cluster, there is no preferred
cluster.
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com

Cluster service error

I had a Windows Server 2003 (Enterprise ed) cluster with 2 nodes
running.
When I try to start the service through Cluster Administrator, which
is
installed on the local machine, I get "Error 1060: The specified
service does
not exist as an installed service"
The cluster service does not appear to be installed.but the same is
installed in other node .
Shouldn't this service be installed along with the OS and if so what
do you
suggest'
Any help on this would be appreciated as I cannot find any info on
this
problem anywhere......Hi,
I'm not quite sure which service you try to start, but the SQL server
services will not be runnning on the passive node of the cluster. when
you move your ressources to that node, the services will be started ( -
and will be stopped on the node that now is the passive ).
The cluster service will be running on both servers though.
Regards
Steen Schlter Persson
Database Administrator / System Administrator
get2rajesh@.gmail.com wrote:
> I had a Windows Server 2003 (Enterprise ed) cluster with 2 nodes
> running.
> When I try to start the service through Cluster Administrator, which
> is
> installed on the local machine, I get "Error 1060: The specified
> service does
> not exist as an installed service"
> The cluster service does not appear to be installed.but the same is
> installed in other node .
> Shouldn't this service be installed along with the OS and if so what
> do you
> suggest'
> Any help on this would be appreciated as I cannot find any info on
> this
> problem anywhere......
>|||On Mar 7, 5:25 pm, "Steen Schl=FCter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schl=FCter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
>
> - Show quoted text -
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh|||<get2rajesh@.gmail.com> wrote in message
news:1173271137.650631.212640@.v33g2000cwv.googlegroups.com...
On Mar 7, 5:25 pm, "Steen Schlter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schlter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
>
> - Show quoted text -
Are you sure NODE-1 is setup to be part of the SQL Cluster.
(keep in mind clustering SQL is two steps, creating nodes as a window
cluster and then defining/creating the SQL cluster.
For example, you could have 3 nodes in a windows cluster, A, B, C. A,B
could be setup as a SQL cluster and B,C as an exchange cluster.)
And all services should start on any valid cluster, there is no preferred
cluster.
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com

Cluster service error

I had a Windows Server 2003 (Enterprise ed) cluster with 2 nodes
running.
When I try to start the service through Cluster Administrator, which
is
installed on the local machine, I get "Error 1060: The specified
service does
not exist as an installed service"
The cluster service does not appear to be installed.but the same is
installed in other node .
Shouldn't this service be installed along with the OS and if so what
do you
suggest'
Any help on this would be appreciated as I cannot find any info on
this
problem anywhere......Hi,
I'm not quite sure which service you try to start, but the SQL server
services will not be runnning on the passive node of the cluster. when
you move your ressources to that node, the services will be started ( -
and will be stopped on the node that now is the passive ).
The cluster service will be running on both servers though.
--
Regards
Steen Schlüter Persson
Database Administrator / System Administrator
get2rajesh@.gmail.com wrote:
> I had a Windows Server 2003 (Enterprise ed) cluster with 2 nodes
> running.
> When I try to start the service through Cluster Administrator, which
> is
> installed on the local machine, I get "Error 1060: The specified
> service does
> not exist as an installed service"
> The cluster service does not appear to be installed.but the same is
> installed in other node .
> Shouldn't this service be installed along with the OS and if so what
> do you
> suggest'
> Any help on this would be appreciated as I cannot find any info on
> this
> problem anywhere......
>|||On Mar 7, 5:25 pm, "Steen Schl=FCter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schl=FCter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
> > I had a Windows Server 2003 (Enterprise ed)clusterwith 2 nodes
> > running.
> > When I try to start theservicethroughClusterAdministrator, which
> > is
> > installed on the local machine, I get "Error1060: The specified
> >servicedoes
> > not exist as an installedservice"
> > Theclusterservicedoes not appear to be installed.but the same is
> > installed in other node .
> > Shouldn't thisservicebe installed along with the OS and if so what
> > do you
> > suggest'
> > Any help on this would be appreciated as I cannot find any info on
> > this
> > problem anywhere......- Hide quoted text -
> - Show quoted text -
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh|||<get2rajesh@.gmail.com> wrote in message
news:1173271137.650631.212640@.v33g2000cwv.googlegroups.com...
On Mar 7, 5:25 pm, "Steen Schlüter Persson (DK)"
<steen@.REMOVE_THIS_asavaenget.dk> wrote:
> Hi,
> I'm not quite sure whichserviceyou try to start, but the SQL server
> services will not be runnning on the passive node of thecluster. when
> you move your ressources to that node, the services will be started ( -
> and will be stopped on the node that now is the passive ).
> Theclusterservicewill be running on both servers though.
> --
> Regards
> Steen Schlüter Persson
> Database Administrator / System Administrator
>
> get2raj...@.gmail.com wrote:
> > I had a Windows Server 2003 (Enterprise ed)clusterwith 2 nodes
> > running.
> > When I try to start theservicethroughClusterAdministrator, which
> > is
> > installed on the local machine, I get "Error1060: The specified
> >servicedoes
> > not exist as an installedservice"
> > Theclusterservicedoes not appear to be installed.but the same is
> > installed in other node .
> > Shouldn't thisservicebe installed along with the OS and if so what
> > do you
> > suggest'
> > Any help on this would be appreciated as I cannot find any info on
> > this
> > problem anywhere......- Hide quoted text -
> - Show quoted text -
Are you sure NODE-1 is setup to be part of the SQL Cluster.
(keep in mind clustering SQL is two steps, creating nodes as a window
cluster and then defining/creating the SQL cluster.
For example, you could have 3 nodes in a windows cluster, A, B, C. A,B
could be setup as a SQL cluster and B,C as an exchange cluster.)
And all services should start on any valid cluster, there is no preferred
cluster.
Hi Steen
The cluster server is not starting on the node-1. when i try to start
the service it says cluster service does not exist as an installed
service. As I tried to start the service from C:\WINDOWS\Cluster still
it is not starting is there a way to install these service which i am
not able to see on NODE-1 but it is present on NODE-2 which is active
but all drives are offline expect.
1)cluster IP Address
2)Cluster name
3)Disk Q (Quram)
4)MSDDTC
I think if the service starts on NODE-1 the cluster will be up.
Thanks,
Rajesh
Greg Moore
SQL Server DBA Consulting
sql (at) greenms.com http://www.greenms.com

Cluster Service

For a particular node, I stopped the cluster service. When I tried to start the service, I received an error stating that the cluster could not start due to logon failure.
Can I be pointed in the direction to resolve this?
That's because the password for the SQL Server service account does not
match the Windows password.
Mike
Principal Mentor
Solid Quality Learning
"More than just Training"
SQL Server MVP
http://www.solidqualitylearning.com
http://www.mssqlserver.com
|||This would indicate that the password for the account that the cluster
service starts under has been changed. Go to Administrative Tools -
Services and enter the correct password for the account that the cluster
service is starting under.
Rand
This posting is provided "as is" with no warranties and confers no rights.
sqlsql

Cluster problems?

Every now and then (about once a week) our SQL server cluster
(active/passive) becomes unresponsive, and the following events are logged:
Type: Error
Event ID: 17052
Source: MSSQLSERVER
User: N/A
Generated: 8/7/2005 7:58:33 AM
Category: Failover
Message: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
Data: 9C 42 00 40 01 00 00 00 0B 00 00 00 44 00 41 00 54 00 41 00 42 00 41
00 53 00 45 00 30 00 31 00 00 00 00 00 00 00
Type: Error
Event ID: 17052
Source: MSSQLSERVER
User: N/A
Generated: 8/7/2005 7:58:33 AM
Category: Failover
Message: [sqsrvres] printODBCError: sqlstate = HYT00; native error = 0;
message = [Microsoft][ODBC SQL Server Driver]Timeout expired
Data: 9C 42 00 40 01 00 00 00 0B 00 00 00 44 00 41 00 54 00 41 00 42 00 41
00 53 00 45 00 30 00 31 00 00 00 00 00 00 00
Type: Error
Event ID: 17052
Source: MSSQLSERVER
User: N/A
Generated: 8/7/2005 7:58:33 AM
Category: Failover
Message: [sqsrvres] OnlineThread: QP is not online.
Data: 9C 42 00 40 01 00 00 00 0B 00 00 00 44 00 41 00 54 00 41 00 42 00 41
00 53 00 45 00 30 00 31 00 00 00 00 00 00 00
SQL server errorlog contains these errors:
2005-08-07 07:52:16 - ! [165] ODBC Error: 0, Timeout expired [SQLSTATE HYT00]
2005-08-07 07:52:16 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:52:19 - ! [165] ODBC Error: 0, Timeout expired [SQLSTATE HYT00]
2005-08-07 07:52:19 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:53:27 - ! [165] ODBC Error: 0, Timeout expired [SQLSTATE HYT00]
2005-08-07 07:53:27 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:53:47 - ! [165] ODBC Error: 0, Timeout expired [SQLSTATE HYT00]
2005-08-07 07:53:47 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:54:27 - ! [298] SQLServer Error: 11, General network error.
Check your network documentation. [SQLSTATE 08001]
2005-08-07 07:54:27 - ! [298] SQLServer Error: 65534, ConnectionOpen
(PreLoginHandshake()). [SQLSTATE 01000]
2005-08-07 07:54:27 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:54:27 - ! [165] ODBC Error: 0, Timeout expired [SQLSTATE HYT00]
2005-08-07 07:54:27 - ! [382] Logon to server '(local)' failed (JobManager)
2005-08-07 07:55:05 - ! [298] SQLServer Error: 11, General network error.
Check your network documentation. [SQLSTATE 08001]
2005-08-07 07:55:05 - ! [298] SQLServer Error: 65534, ConnectionOpen
(PreLoginHandshake()). [SQLSTATE 01000]
2005-08-07 07:57:03 - ! [298] SQLServer Error: 11, General network error.
Check your network documentation. [SQLSTATE 08001]
2005-08-07 07:57:03 - ! [298] SQLServer Error: 65534, ConnectionOpen
(PreLoginHandshake()). [SQLSTATE 01000]
2005-08-07 07:57:03 - ! [382] Logon to server '(local)' failed (JobManager)
During this time, connections are terminated and SQL server does not allow
legitimate accounts to log in. it last for a couple of minutes. The server is
clearly not overloaded, and it doesn't have any specific jobs or anything to
do at this time. I would like to exclude network problems too, because the
machine is otherwise responsive and fine. it's just the SQL server service
that seems to "take a nap". Has anyone seen this before? Win2k3 Ent, SQL 2000
Ent.
thanks> that seems to "take a nap". Has anyone seen this before? Win2k3 Ent, SQL
2000
> Ent.
I forgot one thing, Win2k3 has SP1, and SQL server has SP3 applied.
- Gabor

Sunday, March 25, 2012

Cluster Log error message

Hello to all,
I'm investigating an application outage issue on a 2-node W2K Adv Server
cluster. A Generic Application is running on the cluster in a remote site,
and connects to a UNIX-based Oracle db at my site. There are intermittent
MRxSMB 3019 Event IDs which I'm not too worried about. I just used Q138365
to adjust the autodisconnect value to -1.
But I'm finding this error message in the Cluster log:
Network Name <Network Name CLUSTER VIRTUAL NAME>: Failed to register DNS PTR
record X.X.X.X.in-addr.arpa. for host CLUSTER VIRTUAL NAME FQDN, status 9005
I can't find anything documented about this error, so I can't tell if this
is something to be concerned about. Is it nothing, a cluster service
problem, DNS registraion problem?
Any thoughts would be greatly appreciated.
Thanks in advance.
Hey I had this error as well when I was setting up my cluster
To fix it:
o Right-click the network connection for your heartbeat adapter, and then
click Properties.
o Click Internet Protocol (TCP/IP), and then click Properties
o On the DNS tab, verify that no values are defined. Make sure that the
Register this connection's address in DNS is cleared.
Hope this helps!!
"Chris" <Chris@.discussions.microsoft.com> wrote in message
news:932C28D9-F8C2-4F83-BE57-AA149AF27065@.microsoft.com...
> Hello to all,
> I'm investigating an application outage issue on a 2-node W2K Adv Server
> cluster. A Generic Application is running on the cluster in a remote
site,
> and connects to a UNIX-based Oracle db at my site. There are intermittent
> MRxSMB 3019 Event IDs which I'm not too worried about. I just used
Q138365
> to adjust the autodisconnect value to -1.
> But I'm finding this error message in the Cluster log:
> Network Name <Network Name CLUSTER VIRTUAL NAME>: Failed to register DNS
PTR
> record X.X.X.X.in-addr.arpa. for host CLUSTER VIRTUAL NAME FQDN, status
9005
> I can't find anything documented about this error, so I can't tell if this
> is something to be concerned about. Is it nothing, a cluster service
> problem, DNS registraion problem?
> Any thoughts would be greatly appreciated.
> Thanks in advance.
sqlsql

Cluster install fails on SQL Native client

When I try to install SQL 2k5 Developer on a Windows 2003 SP1 cluster I keep getting an error popping up while trying to install. the error looks like this:

An Installation package for the product Microsoft SQL Server Native Client cannot be found. Try the installation again using a valid copy of the installation package 'sqlncli.msi'

Can someone please tell me what is wrong.

I figure out what the problem was. I uninstall SQL Native client off of both nodes and reinstalled and it got past this problem and installed properly on both nodes as an Active/Active Cluster|||

I suffered with this problem for about 4 hours while doing a simple install on a XP box and on a Win2K box.

Yes, your solution works: Run SQLEXPR.exe, uninstall SQL Native Client, then re-run SQLEXPR.exe.

sqlsql

Thursday, March 22, 2012

Cluster Failover

Each morning I find that 1 of my servers has failed across and I'm
getting the following error:
Does anyone have any ideas'
Many Thanks
Event Type: Error
Event Source: ClusSvc
Event Category: Startup/Shutdown
Event ID: 1234
Date: 06/09/2006
Time: 09:09:18
User: N/A
Computer: NXNHPSQLSRV1
Description:
The Cluster service account does not have the following required user
rights:
Act as part of the operating system
These user rights were granted to the Cluster service account during
cluster setup and must not be removed.
User Action
Assign these rights to the Cluster service account. One way to do this
is to use Local Security Settings (Secpol.msc). Another way is to edit
the Group Policy object that is associated with the Cluster service
account's user object in Active Directory.
If you have already assigned these rights to the Cluster service
account, and the user rights appear to be removed, a Group Policy
object might be removing the rights. Check with your domain
administrator to find out if this is happening.Hi
Have you checked the rights? I assume it is a domain account that you are
using for the cluster service account?
John
<Joe.Mobley@.nationalexpress.com> wrote in message
news:1157531034.223893.106160@.b28g2000cwb.googlegroups.com...
> Each morning I find that 1 of my servers has failed across and I'm
> getting the following error:
> Does anyone have any ideas'
> Many Thanks
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Startup/Shutdown
> Event ID: 1234
> Date: 06/09/2006
> Time: 09:09:18
> User: N/A
> Computer: NXNHPSQLSRV1
> Description:
> The Cluster service account does not have the following required user
> rights:
> Act as part of the operating system
> These user rights were granted to the Cluster service account during
> cluster setup and must not be removed.
> User Action
> Assign these rights to the Cluster service account. One way to do this
> is to use Local Security Settings (Secpol.msc). Another way is to edit
> the Group Policy object that is associated with the Cluster service
> account's user object in Active Directory.
> If you have already assigned these rights to the Cluster service
> account, and the user rights appear to be removed, a Group Policy
> object might be removing the rights. Check with your domain
> administrator to find out if this is happening.
>|||It's probably time to work with your server admin folks to see what else are
being modified. Nobody should be revoking privileges like this from service
accounts. Whether it's modified systematically via some group policy or
manually by someone mucking around, you've got a potentially big problem.
Linchi
"Joe.Mobley@.nationalexpress.com" wrote:
> Each morning I find that 1 of my servers has failed across and I'm
> getting the following error:
> Does anyone have any ideas'
> Many Thanks
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Startup/Shutdown
> Event ID: 1234
> Date: 06/09/2006
> Time: 09:09:18
> User: N/A
> Computer: NXNHPSQLSRV1
> Description:
> The Cluster service account does not have the following required user
> rights:
> Act as part of the operating system
> These user rights were granted to the Cluster service account during
> cluster setup and must not be removed.
> User Action
> Assign these rights to the Cluster service account. One way to do this
> is to use Local Security Settings (Secpol.msc). Another way is to edit
> the Group Policy object that is associated with the Cluster service
> account's user object in Active Directory.
> If you have already assigned these rights to the Cluster service
> account, and the user rights appear to be removed, a Group Policy
> object might be removing the rights. Check with your domain
> administrator to find out if this is happening.
>|||Sounds like a GP issue -removing rights from the cluster service account.
Check with the Domain admins about providing a static profile to the cluster
service account.
--
Arnie Rowland, Ph.D.
Westwood Consulting, Inc
Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous
<Joe.Mobley@.nationalexpress.com> wrote in message
news:1157531034.223893.106160@.b28g2000cwb.googlegroups.com...
> Each morning I find that 1 of my servers has failed across and I'm
> getting the following error:
> Does anyone have any ideas'
> Many Thanks
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Startup/Shutdown
> Event ID: 1234
> Date: 06/09/2006
> Time: 09:09:18
> User: N/A
> Computer: NXNHPSQLSRV1
> Description:
> The Cluster service account does not have the following required user
> rights:
> Act as part of the operating system
> These user rights were granted to the Cluster service account during
> cluster setup and must not be removed.
> User Action
> Assign these rights to the Cluster service account. One way to do this
> is to use Local Security Settings (Secpol.msc). Another way is to edit
> the Group Policy object that is associated with the Cluster service
> account's user object in Active Directory.
> If you have already assigned these rights to the Cluster service
> account, and the user rights appear to be removed, a Group Policy
> object might be removing the rights. Check with your domain
> administrator to find out if this is happening.
>

Cluster failing connections with SSL error message

We recently migrated our production environment from a win2k3/SQL 2000 EE cluster to a new 64 bit win2k3/ SQL 2005 SP1 Cluster.

Cluster works fine for a while (3-6 hours), then our logins start failing. We recieve the following error in the event log:

"The server was unable to load the SSL provider library needed to log in; the connection has been closed. SSL is used to encrypt either the login sequence or all communications, depending on how the administrator has configured the server. See Books Online for information on this error message"

The only way to resume connectivity is to offline and online the sql service in the cluster.

The only reference to this error that I can find has something to do with a hotfix that might be available?

It's sad to say but we are experiencing the same issue.
Only we get this after a good 30 minutes...

Someone from Microsoft Spain is flying over here.

Very painful

|||

In working with microsoft PSS, we found the issue to be a permissions issue of all things.

Our service account did not have rights to lock pages in memory on either machines in the cluster. (even though is a domain administrator).

Once we gave it this right and restarted the SQL Server object in the cluster we have not had a similar failure since.

|||

This was unfortunately enabled already so it did not cause the problem here :-(

|||Hi Wesley,

Any joy with this? I'm currently going throught the same thing, and also have locked pages enabled for my SQL service account. I'm currently going through MS Support, but seem to be going around the houses :o(

Thanks

Stu|||Any luck, we have a similar problem.
We noticed that memory usage grows slowly beyond physical memory (32G) then connection failiurs start and server becomes inoperable.
Workaround solution is to reduce max mem in the server, down to 18G from (25).
This causes the server and OS to release memory. After the system stabilized we increased the maxmem in the server back to its original settng. Lock pages in memory is set.

Hope this helps.

This is a workaround, so we don't have to bounce the server, we are still looking for the cause.
esk.

Setup:2 node Cluster 4*dual 64bit AMD, 32Gb, SQL 2005 Enterprise SP0.|||

Hi,

Apparently we have hit a bug where the usercache size becomes too large and for every execution of a statement SQL Server has to check this store.
This store is not cleaned and keep growing too the point where lookups become blocked with spinlocks. If you want I can give you the commands to check this.

This will be fixed in SP2 because of the major impact a fix would have. The workaround is adding the users to the sysadmin group or clearing the cache manually with DBCC FREESYSTEMCACHE.

HTH

Kind regards,
Wesley

|||Thanks for the replies !

Wes, if you could give the command to check the spinlocks that would be great.

many thanks

Stu|||

Stu,

It's actually the query for checking the cache size.

SELECT * FROM sys.dm_os_memory_clerks
WHERE type = 'USERSTORE_TOKENPERM'
We have seen problems as of 150MB in our environment but this probably all depends on the load and types of queries. Add up the single pages and multipages to determine the size.

HTH
Wesley

|||Thanks Wesley!

After much investigation it turned out the problem was the iLO Management Channel Interface Driver (Cpqcidrv.sys) from Hewlett-Packard. Apparently this

is known to cause this issue on x64 editions of SQL Server 2005. The driver

causes the SQL Server 2005 64-bit working set to be trimmed!


We do not use ILO on our HP Servers, so I disable it in device manager, and so far so good (fingers crossed)!!!

Thanks

Stu|||

Great news!

There seems to be a hotfix for Windows 2003 concerning working set trims too.
Apparently working sets get trimmed when you use Terminal Services on the server... so beware ;-)

|||

If possible could you share the hotfix number? Thanks in advance,

Drew

|||

Hey Wesley. Can you please elaborate more on this 'working sets get trimmed' and beware. and the hotfix available? Thank you very much. I've had problems with this also.

|||

I am also have a problem with this issue. I am going to try disabling the ILO. The terminal service hotfix seems to be this one:

http://support.microsoft.com/kb/905865/en-us

sqlsql

Cluster failing connections with SSL error message

We recently migrated our production environment from a win2k3/SQL 2000 EE cluster to a new 64 bit win2k3/ SQL 2005 SP1 Cluster.

Cluster works fine for a while (3-6 hours), then our logins start failing. We recieve the following error in the event log:

"The server was unable to load the SSL provider library needed to log in; the connection has been closed. SSL is used to encrypt either the login sequence or all communications, depending on how the administrator has configured the server. See Books Online for information on this error message"

The only way to resume connectivity is to offline and online the sql service in the cluster.

The only reference to this error that I can find has something to do with a hotfix that might be available?

It's sad to say but we are experiencing the same issue.
Only we get this after a good 30 minutes...

Someone from Microsoft Spain is flying over here.

Very painful

|||

In working with microsoft PSS, we found the issue to be a permissions issue of all things.

Our service account did not have rights to lock pages in memory on either machines in the cluster. (even though is a domain administrator).

Once we gave it this right and restarted the SQL Server object in the cluster we have not had a similar failure since.

|||

This was unfortunately enabled already so it did not cause the problem here :-(

|||Hi Wesley,

Any joy with this? I'm currently going throught the same thing, and also have locked pages enabled for my SQL service account. I'm currently going through MS Support, but seem to be going around the houses :o(

Thanks

Stu|||Any luck, we have a similar problem.
We noticed that memory usage grows slowly beyond physical memory (32G) then connection failiurs start and server becomes inoperable.
Workaround solution is to reduce max mem in the server, down to 18G from (25).
This causes the server and OS to release memory. After the system stabilized we increased the maxmem in the server back to its original settng. Lock pages in memory is set.

Hope this helps.

This is a workaround, so we don't have to bounce the server, we are still looking for the cause.
esk.

Setup:2 node Cluster 4*dual 64bit AMD, 32Gb, SQL 2005 Enterprise SP0.|||

Hi,

Apparently we have hit a bug where the usercache size becomes too large and for every execution of a statement SQL Server has to check this store.
This store is not cleaned and keep growing too the point where lookups become blocked with spinlocks. If you want I can give you the commands to check this.

This will be fixed in SP2 because of the major impact a fix would have. The workaround is adding the users to the sysadmin group or clearing the cache manually with DBCC FREESYSTEMCACHE.

HTH

Kind regards,
Wesley

|||Thanks for the replies !

Wes, if you could give the command to check the spinlocks that would be great.

many thanks

Stu|||

Stu,

It's actually the query for checking the cache size.

SELECT * FROM sys.dm_os_memory_clerks
WHERE type = 'USERSTORE_TOKENPERM'
We have seen problems as of 150MB in our environment but this probably all depends on the load and types of queries. Add up the single pages and multipages to determine the size.

HTH
Wesley

|||Thanks Wesley!

After much investigation it turned out the problem was the iLO Management Channel Interface Driver (Cpqcidrv.sys) from Hewlett-Packard. Apparently this

is known to cause this issue on x64 editions of SQL Server 2005. The driver

causes the SQL Server 2005 64-bit working set to be trimmed!


We do not use ILO on our HP Servers, so I disable it in device manager, and so far so good (fingers crossed)!!!

Thanks

Stu|||

Great news!

There seems to be a hotfix for Windows 2003 concerning working set trims too.
Apparently working sets get trimmed when you use Terminal Services on the server... so beware ;-)

|||

If possible could you share the hotfix number? Thanks in advance,

Drew

|||

Hey Wesley. Can you please elaborate more on this 'working sets get trimmed' and beware. and the hotfix available? Thank you very much. I've had problems with this also.

|||

I am also have a problem with this issue. I am going to try disabling the ILO. The terminal service hotfix seems to be this one:

http://support.microsoft.com/kb/905865/en-us

Cluster failing connections with SSL error message

We recently migrated our production environment from a win2k3/SQL 2000 EE cluster to a new 64 bit win2k3/ SQL 2005 SP1 Cluster.

Cluster works fine for a while (3-6 hours), then our logins start failing. We recieve the following error in the event log:

"The server was unable to load the SSL provider library needed to log in; the connection has been closed. SSL is used to encrypt either the login sequence or all communications, depending on how the administrator has configured the server. See Books Online for information on this error message"

The only way to resume connectivity is to offline and online the sql service in the cluster.

The only reference to this error that I can find has something to do with a hotfix that might be available?

It's sad to say but we are experiencing the same issue.
Only we get this after a good 30 minutes...

Someone from Microsoft Spain is flying over here.

Very painful

|||

In working with microsoft PSS, we found the issue to be a permissions issue of all things.

Our service account did not have rights to lock pages in memory on either machines in the cluster. (even though is a domain administrator).

Once we gave it this right and restarted the SQL Server object in the cluster we have not had a similar failure since.

|||

This was unfortunately enabled already so it did not cause the problem here :-(

|||Hi Wesley,

Any joy with this? I'm currently going throught the same thing, and also have locked pages enabled for my SQL service account. I'm currently going through MS Support, but seem to be going around the houses :o(

Thanks

Stu|||Any luck, we have a similar problem.
We noticed that memory usage grows slowly beyond physical memory (32G) then connection failiurs start and server becomes inoperable.
Workaround solution is to reduce max mem in the server, down to 18G from (25).
This causes the server and OS to release memory. After the system stabilized we increased the maxmem in the server back to its original settng. Lock pages in memory is set.

Hope this helps.

This is a workaround, so we don't have to bounce the server, we are still looking for the cause.
esk.

Setup:2 node Cluster 4*dual 64bit AMD, 32Gb, SQL 2005 Enterprise SP0.|||

Hi,

Apparently we have hit a bug where the usercache size becomes too large and for every execution of a statement SQL Server has to check this store.
This store is not cleaned and keep growing too the point where lookups become blocked with spinlocks. If you want I can give you the commands to check this.

This will be fixed in SP2 because of the major impact a fix would have. The workaround is adding the users to the sysadmin group or clearing the cache manually with DBCC FREESYSTEMCACHE.

HTH

Kind regards,
Wesley

|||Thanks for the replies !

Wes, if you could give the command to check the spinlocks that would be great.

many thanks

Stu|||

Stu,

It's actually the query for checking the cache size.

SELECT * FROM sys.dm_os_memory_clerks
WHERE type = 'USERSTORE_TOKENPERM'
We have seen problems as of 150MB in our environment but this probably all depends on the load and types of queries. Add up the single pages and multipages to determine the size.

HTH
Wesley

|||Thanks Wesley!

After much investigation it turned out the problem was the iLO Management Channel Interface Driver (Cpqcidrv.sys) from Hewlett-Packard. Apparently this

is known to cause this issue on x64 editions of SQL Server 2005. The driver

causes the SQL Server 2005 64-bit working set to be trimmed!


We do not use ILO on our HP Servers, so I disable it in device manager, and so far so good (fingers crossed)!!!

Thanks

Stu|||

Great news!

There seems to be a hotfix for Windows 2003 concerning working set trims too.
Apparently working sets get trimmed when you use Terminal Services on the server... so beware ;-)

|||

If possible could you share the hotfix number? Thanks in advance,

Drew

|||

Hey Wesley. Can you please elaborate more on this 'working sets get trimmed' and beware. and the hotfix available? Thank you very much. I've had problems with this also.

|||

I am also have a problem with this issue. I am going to try disabling the ILO. The terminal service hotfix seems to be this one:

http://support.microsoft.com/kb/905865/en-us

Cluster failing connections with SSL error message

We recently migrated our production environment from a win2k3/SQL 2000 EE cluster to a new 64 bit win2k3/ SQL 2005 SP1 Cluster.

Cluster works fine for a while (3-6 hours), then our logins start failing. We recieve the following error in the event log:

"The server was unable to load the SSL provider library needed to log in; the connection has been closed. SSL is used to encrypt either the login sequence or all communications, depending on how the administrator has configured the server. See Books Online for information on this error message"

The only way to resume connectivity is to offline and online the sql service in the cluster.

The only reference to this error that I can find has something to do with a hotfix that might be available?

It's sad to say but we are experiencing the same issue.
Only we get this after a good 30 minutes...

Someone from Microsoft Spain is flying over here.

Very painful

|||

In working with microsoft PSS, we found the issue to be a permissions issue of all things.

Our service account did not have rights to lock pages in memory on either machines in the cluster. (even though is a domain administrator).

Once we gave it this right and restarted the SQL Server object in the cluster we have not had a similar failure since.

|||

This was unfortunately enabled already so it did not cause the problem here :-(

|||Hi Wesley,

Any joy with this? I'm currently going throught the same thing, and also have locked pages enabled for my SQL service account. I'm currently going through MS Support, but seem to be going around the houses :o(

Thanks

Stu|||Any luck, we have a similar problem.
We noticed that memory usage grows slowly beyond physical memory (32G) then connection failiurs start and server becomes inoperable.
Workaround solution is to reduce max mem in the server, down to 18G from (25).
This causes the server and OS to release memory. After the system stabilized we increased the maxmem in the server back to its original settng. Lock pages in memory is set.

Hope this helps.

This is a workaround, so we don't have to bounce the server, we are still looking for the cause.
esk.

Setup:2 node Cluster 4*dual 64bit AMD, 32Gb, SQL 2005 Enterprise SP0.|||

Hi,

Apparently we have hit a bug where the usercache size becomes too large and for every execution of a statement SQL Server has to check this store.
This store is not cleaned and keep growing too the point where lookups become blocked with spinlocks. If you want I can give you the commands to check this.

This will be fixed in SP2 because of the major impact a fix would have. The workaround is adding the users to the sysadmin group or clearing the cache manually with DBCC FREESYSTEMCACHE.

HTH

Kind regards,
Wesley

|||Thanks for the replies !

Wes, if you could give the command to check the spinlocks that would be great.

many thanks

Stu|||

Stu,

It's actually the query for checking the cache size.

SELECT * FROM sys.dm_os_memory_clerks
WHERE type = 'USERSTORE_TOKENPERM'
We have seen problems as of 150MB in our environment but this probably all depends on the load and types of queries. Add up the single pages and multipages to determine the size.

HTH
Wesley

|||Thanks Wesley!

After much investigation it turned out the problem was the iLO Management Channel Interface Driver (Cpqcidrv.sys) from Hewlett-Packard. Apparently this

is known to cause this issue on x64 editions of SQL Server 2005. The driver

causes the SQL Server 2005 64-bit working set to be trimmed!


We do not use ILO on our HP Servers, so I disable it in device manager, and so far so good (fingers crossed)!!!

Thanks

Stu|||

Great news!

There seems to be a hotfix for Windows 2003 concerning working set trims too.
Apparently working sets get trimmed when you use Terminal Services on the server... so beware ;-)

|||

If possible could you share the hotfix number? Thanks in advance,

Drew

|||

Hey Wesley. Can you please elaborate more on this 'working sets get trimmed' and beware. and the hotfix available? Thank you very much. I've had problems with this also.

|||

I am also have a problem with this issue. I am going to try disabling the ILO. The terminal service hotfix seems to be this one:

http://support.microsoft.com/kb/905865/en-us

Tuesday, March 20, 2012

cluster error

I think I originally posted this in the wrong forum of SQL Server General Tools.

We have a one node cluster that is receiving the below error messages in the cluster.log and the Event Viewer. In the SQL Server 2005 SP2 job that fails we receive a slighlty different message. The OS is Windows 2003 SP1. This is not a 64-bit box. We have checked the permissions of the cluster service account in SQL Server to be sure they are correct. Can anyone please help with this issue?

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] OnlineThread: QP is not online.
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure

You encounter the SynAttachProtect issue (based on your error message). Take a look at section 4.1.2.

http://support.microsoft.com/kb/910228

|||

I spent months trying to resolve this issue but found it to be a hardware driver issue. Not sure what hardware you're running but if its HP Proliant its worth checking this as it sorted the problem we had (even though we were not using iLO) with a simple driver update.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00688313&jumpid=reg_R1002_USEN

Advisory: (Revision) Integrated Lights-Out Management (iLO) Interface Driver for Windows May Cause the System to Become Unresponsive if the Driver Does Not Allocate Extra Contiguous Memory Blocks Under the 4 GB Space

|||thanks for the info Pete.|||

I have a same problem with a SQL Server Cluster and I disabled SynAttackProtect. But this is take no effect

|||

Andrew,

You disable SYN on all nodes and restart? Can you post the exact error?

|||

Yes.

Errors (Windows Application Log):

20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native ClientCommunication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

The last time of this errors - 20.08.2007. Previos day of this errors - 13/08/2007.

I have a two-nodes cluster with Windows Server 2003 R2 and MS SQL Server 2005 Standart

|||

Andrew,

It happens every 7 day? Can you check system event log to see if there is any network outage at 20.08.2007 17:18:58.

Basically, the cluster service was using tcp to connect to sqlserver to check its health and it couldn't. The original hint was "an existing connection was forcibly closed..." which pointed to SynAttackProtect.

Since you already turned off DoS flag, the only thing left to check is the system itself.

|||

Yes, I check system log - there is no errors at 20.08.2007 17:18:58 (The last event at 20.08 was in 15:08 and it was a Information event). All nodes in cluster was available at at 20.08.2007 17:18:58. Only SQL Server was fail.

|||

!!!!

Today at 10:57:50 it's happened again !!!

I check the system and application log.

System event log:

23.08.2007 11:01:14 ClusSvc Information Failover Mgr 1201 N/A MSNODE1 "The Cluster Service brought the Resource Group ""SQL Server 2005"" online."
23.08.2007 11:01:14 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:14 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:13 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:07 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:06 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 11:00:24 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 11:00:23 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 10:59:57 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 10:59:57 ClusSvc Error Failover Mgr 1069 N/A MSNODE1 Cluster resource 'SQL Server' in Resource Group 'SQL Server 2005' failed.

23.08.2007 10:57:58 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 1751833(in 100 nanosecs).
23.08.2007 10:57:24 TermServDevices Error None 1111 N/A MSNODE2 Driver Microsoft XPS Document Writer required for printer Microsoft XPS Document Writer is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver Canon MF5700 Series required for printer Canon MF5700 Series is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver hp LaserJet 1000 required for printer !!C528!hp LaserJet 1000 is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 Print Information None 2 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was created.
23.08.2007 10:57:22 Print Information None 9 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was set.
23.08.2007 10:57:57 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 269397841(in 100 nanosecs).
23.08.2007 10:26:26 Print Warning None 3 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was deleted.
23.08.2007 10:26:26 Print Warning None 4 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 is pending deletion.
23.08.2007 10:26:26 Print Warning None 8 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was purged.

Application Event Log (because I use a SQL Server 2005 Russian some event have a russian text. In brackets I write english tranlation):

23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 0; message = [Microsoft][SQL Native Client]Не удается завершить вход в систему из-за задержки при открытии соединения с сервером
"
23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] ODBC sqldriverconnect failed
"
23.08.2007 10:58:51 Application Hang Error (101) 1002 N/A MSNODE2 Hanging application cluadmin.exe, version 5.2.3790.3959, hang module hungapp, version 0.0.0.0, hang address 0x00000000.
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:54 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:53 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

I need help !

-

Andrew Mishechkin

ICQ: 101861332

|||

Thanks for the logs.

The app log indicates error with connection which escalates to a clussvc error in the system log. I would next suggest checking out faulty hardware (i.e. NIC's). I've seen strange and hard to diagnosed issue when it comes to faulty hardware.

|||We also have the same exact error and we are using HP DL 585 G2 running Win2k3 SP1. Any steps i can take to isolate further ( tools for dignostics ).

thanks,
Tony B|||

Hello,

I have the same problem with a sql 2005 SP2 cluter. I have several sql 2005 errors in aplication and a few later an error in system log about cluster service.
This is the cluster:

*clusternode1: HP DL50G5. 4 P dual core with HTT. W2003 R2 EE SP2 x64.
*clusternode2: the same as clusternode1.


I tried to solve the problem taking this actions:

*Disable the ilo driver.
*Update NICs drivers.
*Create a DWORD SynAttackProtect with value 0.


?Any idea to solve the problem?

Thanks
pablo

sqlsql

cluster error

I think I originally posted this in the wrong forum of SQL Server General Tools.

We have a one node cluster that is receiving the below error messages in the cluster.log and the Event Viewer. In the SQL Server 2005 SP2 job that fails we receive a slighlty different message. The OS is Windows 2003 SP1. This is not a 64-bit box. We have checked the permissions of the cluster service account in SQL Server to be sure they are correct. Can anyone please help with this issue?

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] OnlineThread: QP is not online.
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure

You encounter the SynAttachProtect issue (based on your error message). Take a look at section 4.1.2.

http://support.microsoft.com/kb/910228

|||

I spent months trying to resolve this issue but found it to be a hardware driver issue. Not sure what hardware you're running but if its HP Proliant its worth checking this as it sorted the problem we had (even though we were not using iLO) with a simple driver update.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00688313&jumpid=reg_R1002_USEN

Advisory: (Revision) Integrated Lights-Out Management (iLO) Interface Driver for Windows May Cause the System to Become Unresponsive if the Driver Does Not Allocate Extra Contiguous Memory Blocks Under the 4 GB Space

|||thanks for the info Pete.|||

I have a same problem with a SQL Server Cluster and I disabled SynAttackProtect. But this is take no effect

|||

Andrew,

You disable SYN on all nodes and restart? Can you post the exact error?

|||

Yes.

Errors (Windows Application Log):

20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native ClientCommunication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

The last time of this errors - 20.08.2007. Previos day of this errors - 13/08/2007.

I have a two-nodes cluster with Windows Server 2003 R2 and MS SQL Server 2005 Standart

|||

Andrew,

It happens every 7 day? Can you check system event log to see if there is any network outage at 20.08.2007 17:18:58.

Basically, the cluster service was using tcp to connect to sqlserver to check its health and it couldn't. The original hint was "an existing connection was forcibly closed..." which pointed to SynAttackProtect.

Since you already turned off DoS flag, the only thing left to check is the system itself.

|||

Yes, I check system log - there is no errors at 20.08.2007 17:18:58 (The last event at 20.08 was in 15:08 and it was a Information event). All nodes in cluster was available at at 20.08.2007 17:18:58. Only SQL Server was fail.

|||

!!!!

Today at 10:57:50 it's happened again !!!

I check the system and application log.

System event log:

23.08.2007 11:01:14 ClusSvc Information Failover Mgr 1201 N/A MSNODE1 "The Cluster Service brought the Resource Group ""SQL Server 2005"" online."
23.08.2007 11:01:14 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:14 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:13 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:07 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:06 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 11:00:24 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 11:00:23 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 10:59:57 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 10:59:57 ClusSvc Error Failover Mgr 1069 N/A MSNODE1 Cluster resource 'SQL Server' in Resource Group 'SQL Server 2005' failed.

23.08.2007 10:57:58 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 1751833(in 100 nanosecs).
23.08.2007 10:57:24 TermServDevices Error None 1111 N/A MSNODE2 Driver Microsoft XPS Document Writer required for printer Microsoft XPS Document Writer is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver Canon MF5700 Series required for printer Canon MF5700 Series is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver hp LaserJet 1000 required for printer !!C528!hp LaserJet 1000 is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 Print Information None 2 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was created.
23.08.2007 10:57:22 Print Information None 9 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was set.
23.08.2007 10:57:57 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 269397841(in 100 nanosecs).
23.08.2007 10:26:26 Print Warning None 3 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was deleted.
23.08.2007 10:26:26 Print Warning None 4 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 is pending deletion.
23.08.2007 10:26:26 Print Warning None 8 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was purged.

Application Event Log (because I use a SQL Server 2005 Russian some event have a russian text. In brackets I write english tranlation):

23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 0; message = [Microsoft][SQL Native Client]Не удается завершить вход в систему из-за задержки при открытии соединения с сервером
"
23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] ODBC sqldriverconnect failed
"
23.08.2007 10:58:51 Application Hang Error (101) 1002 N/A MSNODE2 Hanging application cluadmin.exe, version 5.2.3790.3959, hang module hungapp, version 0.0.0.0, hang address 0x00000000.
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:54 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:53 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

I need help !

-

Andrew Mishechkin

ICQ: 101861332

|||

Thanks for the logs.

The app log indicates error with connection which escalates to a clussvc error in the system log. I would next suggest checking out faulty hardware (i.e. NIC's). I've seen strange and hard to diagnosed issue when it comes to faulty hardware.

|||We also have the same exact error and we are using HP DL 585 G2 running Win2k3 SP1. Any steps i can take to isolate further ( tools for dignostics ).

thanks,
Tony B|||

Hello,

I have the same problem with a sql 2005 SP2 cluter. I have several sql 2005 errors in aplication and a few later an error in system log about cluster service.
This is the cluster:

*clusternode1: HP DL50G5. 4 P dual core with HTT. W2003 R2 EE SP2 x64.
*clusternode2: the same as clusternode1.


I tried to solve the problem taking this actions:

*Disable the ilo driver.
*Update NICs drivers.
*Create a DWORD SynAttackProtect with value 0.


?Any idea to solve the problem?

Thanks
pablo

cluster error

I think I originally posted this in the wrong forum of SQL Server General Tools.

We have a one node cluster that is receiving the below error messages in the cluster.log and the Event Viewer. In the SQL Server 2005 SP2 job that fails we receive a slighlty different message. The OS is Windows 2003 SP1. This is not a 64-bit box. We have checked the permissions of the cluster service account in SQL Server to be sure they are correct. Can anyone please help with this issue?

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] OnlineThread: QP is not online.
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
00000a0c.0000089c::2007/06/12-12:51:27.459 ERR SQL Server <SQL Server>: [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure

You encounter the SynAttachProtect issue (based on your error message). Take a look at section 4.1.2.

http://support.microsoft.com/kb/910228

|||

I spent months trying to resolve this issue but found it to be a hardware driver issue. Not sure what hardware you're running but if its HP Proliant its worth checking this as it sorted the problem we had (even though we were not using iLO) with a simple driver update.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00688313&jumpid=reg_R1002_USEN

Advisory: (Revision) Integrated Lights-Out Management (iLO) Interface Driver for Windows May Cause the System to Become Unresponsive if the Driver Does Not Allocate Extra Contiguous Memory Blocks Under the 4 GB Space

|||thanks for the info Pete.|||

I have a same problem with a SQL Server Cluster and I disabled SynAttackProtect. But this is take no effect

|||

Andrew,

You disable SYN on all nodes and restart? Can you post the exact error?

|||

Yes.

Errors (Windows Application Log):

20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native ClientCommunication link failure
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]TCP Provider: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:27:21 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure
"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
20.08.2007 17:18:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

The last time of this errors - 20.08.2007. Previos day of this errors - 13/08/2007.

I have a two-nodes cluster with Windows Server 2003 R2 and MS SQL Server 2005 Standart

|||

Andrew,

It happens every 7 day? Can you check system event log to see if there is any network outage at 20.08.2007 17:18:58.

Basically, the cluster service was using tcp to connect to sqlserver to check its health and it couldn't. The original hint was "an existing connection was forcibly closed..." which pointed to SynAttackProtect.

Since you already turned off DoS flag, the only thing left to check is the system itself.

|||

Yes, I check system log - there is no errors at 20.08.2007 17:18:58 (The last event at 20.08 was in 15:08 and it was a Information event). All nodes in cluster was available at at 20.08.2007 17:18:58. Only SQL Server was fail.

|||

!!!!

Today at 10:57:50 it's happened again !!!

I check the system and application log.

System event log:

23.08.2007 11:01:14 ClusSvc Information Failover Mgr 1201 N/A MSNODE1 "The Cluster Service brought the Resource Group ""SQL Server 2005"" online."
23.08.2007 11:01:14 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:14 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:13 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the running state.
23.08.2007 11:01:07 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a start control.
23.08.2007 11:01:06 Service Control Manager Information None 7036 N/A MSNODE1 The SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 11:00:24 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 11:00:23 Service Control Manager Information None 7036 N/A MSNODE1 The Агент SQL Server (MSSQLSERVER) service entered the stopped state.
23.08.2007 10:59:57 Service Control Manager Information None 7035 POLAD\ClusterService MSNODE1 The Агент SQL Server (MSSQLSERVER) service was successfully sent a stop control.
23.08.2007 10:59:57 ClusSvc Error Failover Mgr 1069 N/A MSNODE1 Cluster resource 'SQL Server' in Resource Group 'SQL Server 2005' failed.

23.08.2007 10:57:58 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 1751833(in 100 nanosecs).
23.08.2007 10:57:24 TermServDevices Error None 1111 N/A MSNODE2 Driver Microsoft XPS Document Writer required for printer Microsoft XPS Document Writer is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver Canon MF5700 Series required for printer Canon MF5700 Series is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 TermServDevices Error None 1111 N/A MSNODE2 Driver hp LaserJet 1000 required for printer !!C528!hp LaserJet 1000 is unknown. Contact the administrator to install the driver before you log in again.
23.08.2007 10:57:23 Print Information None 2 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was created.
23.08.2007 10:57:22 Print Information None 9 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was set.
23.08.2007 10:57:57 ClusSvc Information Event Logger 1202 N/A MSNODE1 The time delta between node MSNODE1 and node MSNODE2 is 269397841(in 100 nanosecs).
23.08.2007 10:26:26 Print Warning None 3 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was deleted.
23.08.2007 10:26:26 Print Warning None 4 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 is pending deletion.
23.08.2007 10:26:26 Print Warning None 8 NT AUTHORITY\SYSTEM MSNODE2 Printer HP LaserJet 2100 PCL6 on c634.local.polad.ru (from N486) in session 1 was purged.

Application Event Log (because I use a SQL Server 2005 Russian some event have a russian text. In brackets I write english tranlation):

23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 0; message = [Microsoft][SQL Native Client]Не удается завершить вход в систему из-за задержки при открытии соединения с сервером
"
23.08.2007 10:59:36 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] ODBC sqldriverconnect failed
"
23.08.2007 10:58:51 Application Hang Error (101) 1002 N/A MSNODE2 Hanging application cluadmin.exe, version 5.2.3790.3959, hang module hungapp, version 0.0.0.0, hang address 0x00000000.
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:59 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:58 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:57 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:55 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:54 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:53 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] OnlineThread: QP is not online.
"
23.08.2007 10:57:51 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Ошибка связи (Communication link failure)
"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Поставщик TCP: An existing connection was forcibly closed by the remote host.

"
23.08.2007 10:57:50 MSSQLSERVER Error (3) 19019 N/A MSNODE1 "[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed
"

I need help !

-

Andrew Mishechkin

ICQ: 101861332

|||

Thanks for the logs.

The app log indicates error with connection which escalates to a clussvc error in the system log. I would next suggest checking out faulty hardware (i.e. NIC's). I've seen strange and hard to diagnosed issue when it comes to faulty hardware.

|||We also have the same exact error and we are using HP DL 585 G2 running Win2k3 SP1. Any steps i can take to isolate further ( tools for dignostics ).

thanks,
Tony B|||

Hello,

I have the same problem with a sql 2005 SP2 cluter. I have several sql 2005 errors in aplication and a few later an error in system log about cluster service.
This is the cluster:

*clusternode1: HP DL50G5. 4 P dual core with HTT. W2003 R2 EE SP2 x64.
*clusternode2: the same as clusternode1.


I tried to solve the problem taking this actions:

*Disable the ilo driver.
*Update NICs drivers.
*Create a DWORD SynAttackProtect with value 0.


?Any idea to solve the problem?

Thanks
pablo