Showing posts with label availability. Show all posts
Showing posts with label availability. Show all posts

Tuesday, March 27, 2012

Cluster recovery from node failure

Cluster services gives the high availability needed - that is great.
But I have never seen any discussion about what happens when a node
fails - what do you do to get everything back to the active-passive
tandem.

I imagine there is not much difference in terms of recovery procedure
for either active or passive node. So I'm just going to make up a
scenario that we have encountered. The system hard drive (not the
shared disk) on primary node fails. Cluster fails over to the passive
node. Following are the problems I have at hand:
-After installing windows, I need to install driver and configure the
permission to access the SAN. There is no way I could do it since the
secondary node has exclusive access to the disks.
-Imagine I got that working, is there anyway to install SQL so SQL
would know this server used to be the primary node and attach the DB
and translog automatically
-Finally, there is no proper way to apply SQL 2000 service pack 3a.
Originally when the cluster was fully functional, the service pack was
applied to active node and that automatically upgrades passive node.
Now we have a machine without 3a and a machine with 3a already
installed. See any problem?

Consider all of the above as this one big question: What is a proper
procedure to restore a cluster when one of the node goes down? Whether
it's the active or passive node."gotdough" <praemonitus@.hotmail.com> wrote in message
news:1ad01306.0409120058.3df26726@.posting.google.c om...
> Cluster services gives the high availability needed - that is great.
> But I have never seen any discussion about what happens when a node
> fails - what do you do to get everything back to the active-passive
> tandem.
> I imagine there is not much difference in terms of recovery procedure
> for either active or passive node. So I'm just going to make up a
> scenario that we have encountered. The system hard drive (not the
> shared disk) on primary node fails. Cluster fails over to the passive
> node. Following are the problems I have at hand:
> -After installing windows, I need to install driver and configure the
> permission to access the SAN. There is no way I could do it since the
> secondary node has exclusive access to the disks.
> -Imagine I got that working, is there anyway to install SQL so SQL
> would know this server used to be the primary node and attach the DB
> and translog automatically
> -Finally, there is no proper way to apply SQL 2000 service pack 3a.
> Originally when the cluster was fully functional, the service pack was
> applied to active node and that automatically upgrades passive node.
> Now we have a machine without 3a and a machine with 3a already
> installed. See any problem?
> Consider all of the above as this one big question: What is a proper
> procedure to restore a cluster when one of the node goes down? Whether
> it's the active or passive node.

This KB article might help:

http://support.microsoft.com/defaul...0&Product=sql2k

You should probably also post this in microsoft.public.sqlserver.clustering
to see if you get a better response.

Simon

Sunday, March 25, 2012

Cluster or Mirror

Hi,
I'm currently looking in the possiblities for a SQL server install. The web
project needs a SQL database with a high availability. Speed and size are not
that important as the load will be medium (especially for the new enterprise
class servers), but it cannot eb offline for more then minutes. Assuming
there is suffient budget what option would be preferred:
HP DL380 cluster with SAN : diskset in SAN RAID 1+0 with SQL enterprise and
SQL virtual server setup
OR
2x Fully redundent HP DL380's with own diskset (RAID 1+0), one active SQL
Std, and one passive. Replication using DoubleTake or Neverfail alikes.
The fail-switch should also be automatic and should react within max 5
minutes.
Any idea's pro + cons?
Thanks
Have you looked at Log Shipping as an alternative?
Regards
Darryl Pollock
Squirrel Consulting
www.remotesquirrel.com - Performance Monitoring - Anytime ! Anywhere!
"christophe" <christophe@.discussions.microsoft.com> wrote in message
news:250E4D88-6464-4E88-80AA-E0D8262B4CD1@.microsoft.com...
> Hi,
> I'm currently looking in the possiblities for a SQL server install. The
> web
> project needs a SQL database with a high availability. Speed and size are
> not
> that important as the load will be medium (especially for the new
> enterprise
> class servers), but it cannot eb offline for more then minutes. Assuming
> there is suffient budget what option would be preferred:
> HP DL380 cluster with SAN : diskset in SAN RAID 1+0 with SQL enterprise
> and
> SQL virtual server setup
> OR
> 2x Fully redundent HP DL380's with own diskset (RAID 1+0), one active SQL
> Std, and one passive. Replication using DoubleTake or Neverfail alikes.
> The fail-switch should also be automatic and should react within max 5
> minutes.
> Any idea's pro + cons?
> Thanks
|||yes, but the data can only be out of sync for max a few minutes and from what
I read replication is faster for something like that. I also need an
automated switch to the standby server in case of a failure of the primary
server. The application only knows 1 SQL server connection.
"Darryl Pollock" wrote:

> Have you looked at Log Shipping as an alternative?
> Regards
> Darryl Pollock
> Squirrel Consulting
> www.remotesquirrel.com - Performance Monitoring - Anytime ! Anywhere!
>
> "christophe" <christophe@.discussions.microsoft.com> wrote in message
> news:250E4D88-6464-4E88-80AA-E0D8262B4CD1@.microsoft.com...
>
>
|||Hi
Just remember, Cluster or disk/volume replication level does not protect you
against database/disk corruption. The corruption is faithfully propogated to
the other drives in the case of replication and is on the single copy on the
SAN.
For SQL 2000, Microsoft Windows Clustering is the way for the least downtime
and no human intervention is required. Clusters typically fail over in <20
seconds, depending on how many transactions need to rolled forward/back.
Regards
Mike
"christophe" wrote:
[vbcol=seagreen]
> yes, but the data can only be out of sync for max a few minutes and from what
> I read replication is faster for something like that. I also need an
> automated switch to the standby server in case of a failure of the primary
> server. The application only knows 1 SQL server connection.
> "Darryl Pollock" wrote:
|||I would argue that the time required to failover is a function of how long
it takes for the transaction log to be applied from the failed node to the
failed over node.
For most production systems with VLDBs you should be dumping the log very
frequently, ie under 5 minutes.
While 20 s is an often quoted statistic, I think 1 - 2 minutes might be more
realistic.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Mike Epprecht (SQL MVP)" <mike@.epprecht.net> wrote in message
news:0B6AD0FE-AFC4-46F3-AAAD-031C1F179600@.microsoft.com...
> Hi
> Just remember, Cluster or disk/volume replication level does not protect
you
> against database/disk corruption. The corruption is faithfully propogated
to
> the other drives in the case of replication and is on the single copy on
the
> SAN.
> For SQL 2000, Microsoft Windows Clustering is the way for the least
downtime[vbcol=seagreen]
> and no human intervention is required. Clusters typically fail over in <20
> seconds, depending on how many transactions need to rolled forward/back.
> Regards
> Mike
> "christophe" wrote:
what[vbcol=seagreen]
primary[vbcol=seagreen]
The[vbcol=seagreen]
size are[vbcol=seagreen]
Assuming[vbcol=seagreen]
enterprise[vbcol=seagreen]
active SQL[vbcol=seagreen]
alikes.[vbcol=seagreen]
5[vbcol=seagreen]