NFS problems in failover – MC Service Guard. Applicable to other Linux HA clusters

ByEtzion 07/08/2006

Problem: Two Linux servers (RHEL4) running NFS Server in High-Availability (failover) mode. When failovering the resources, an NFS client can continue to work. When failing back, the NFS client times-out for 5+ minutes.

Further problem information: While using RHEL3, that same (exact) configuration worked flawlessly.

Solution: set NFS options to UDP instead of TCP.

Explanation: RHEL3 has used NFS3 with UDP by default. RHEL4 uses NFS4 with TCP by default, which is a significant difference between them two.

Searching the web a while, to better understand the cause of the problem, I discovered an article in linux-ha (which looks like a very good place to visit if you’re into HA in Linux environmnets) which recommended using UDP instead of TCP. Quote:

"If your kernel defaults to using TCP for NFS (as is the case in 2.6
kernels), switch to UDP instead by using the ‘udp’ mount option. If you
don’t do this, you won’t be able to quickly switch from server "A" to
"B" and back to "A" because "A" will hold the TCP connection in
TIME_WAIT state for 15-20 minutes and refuse to reconnect." (quoted from the "Hints" section).

So, although I did not expect this cause (I had a hunch about Portmapper), the solution suggested worked fine (and only later we got to understand the cause). Good.

Clusters | Disk Storage | Linux

HP MSA1000 controller failover
ByEtzion 27/03/2007

HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path…

Read More HP MSA1000 controller failover
Uncategorized

Not much of a programmer
ByEtzion 11/08/2005

I’ve never been much of a coder. I used to write a little pieces of code in C, when I was student, but long since I’ve stopped. I have had a resolution just now. To learn PHP and Perl. I want to build few things, and now is as good a time as ever. I…

Read More Not much of a programmer
Clusters | Disk Storage | Linux | RedHat Cluster | Virtualization

Redhat Cluster and Citrix XenServer
ByEtzion 09/04/2015

I wanted to write down a guide for RHCS on RHEL/Centos6 and XenServer. If you want to do that, you need to go through two major challenges which you will encounter. I want to save on the search and sum it all up together here. The first difficulty is the shared disk. In order to…

Read More Redhat Cluster and Citrix XenServer
Uncategorized

Trend Micro Client-Server-Messaging-Security for SMB problems
ByEtzion 25/03/2006

I run CSMS for SMB in my organization. Not long ago I have lost all my settings. It means that using the web interface, I have gotten empty scan rules for my groups. All computers, actually, were scanning for an empty list of file extensions to scan, meaning no scanning at all. I tracked it…

Read More Trend Micro Client-Server-Messaging-Security for SMB problems
Uncategorized

IBM X306 (ServerRaid7e) and Linux Centos 4.3
ByEtzion 14/05/2006

It was no fun, and I hope I will never experiance again such bad setup. Summery: IBM X306. The X306, X206 and some of the others are equipped with ServerRaid7e Sata controller. These controllers lack Linux drivers, and thus, make me a sad person. Drivers are available from IBM web site to RHEL 3 Update…

Read More IBM X306 (ServerRaid7e) and Linux Centos 4.3
Uncategorized

NT4 Server English, BDC, and problems
ByEtzion 04/12/2005

In the long forgotten days of NT4, there was not Unicode. In these older days, one could use English, and the language the server machine was predefined for. In our poor and sad case, English alone. This is a story of a poor filer, member of a multi-site NT4 domain, which, due to latency and…

Read More NT4 Server English, BDC, and problems

Related posts:

Similar Posts

Leave a Reply Cancel reply