Archive for August, 2006

HP-UX and Software Raid1

Tuesday, August 8th, 2006

I have installed today an HP-UX 11i V2 on PA-Risc server, and it went quite fine. I have used the “Technical Environment” DVDs for installation, and it went fine. I was unable to find, however, the Raid1 (Mirror) tools for the LVM.

Symptoms: There is no parameter “-m” to “lvextend“. According to documentaion (or even better, HP Forum1 and HP Forum2), it is plain simple, using the lvextend. Only here I got to figure that it was part of the LVM package for Enterprise Servers.

I finally found it in the CD called “Mission Critical Operating Environment DVD 1”. Inside,in a bundle called “HPUX11i-OE-Ent”. I have selected “LVM” from the list there, installed, let the system recompile the kernel, and reboot. Then lvextend will started accepting the “-m” flag.

Per the posts described above, I run:

for LVOL in `ls /dev/vg00/l*` ; do

lvextend -m 1 $LVOL

done

Took a while, but at least it worked.

NFS problems in failover – MC Service Guard. Applicable to other Linux HA clusters

Monday, August 7th, 2006

Problem: Two Linux servers (RHEL4) running NFS Server in High-Availability (failover) mode. When failovering the resources, an NFS client can continue to work. When failing back, the NFS client times-out for 5+ minutes.

Further problem information: While using RHEL3, that same (exact) configuration worked flawlessly.

Solution: set NFS options to UDP instead of TCP.

Explanation: RHEL3 has used NFS3 with UDP by default. RHEL4 uses NFS4 with TCP by default, which is a significant difference between them two.

Searching the web a while, to better understand the cause of the problem, I discovered an article in linux-ha (which looks like a very good place to visit if you’re into HA in Linux environmnets) which recommended using UDP instead of TCP. Quote:

"If your kernel defaults to using TCP for NFS (as is the case in 2.6
kernels), switch to UDP instead by using the ‘udp’ mount option. If you
don’t do this, you won’t be able to quickly switch from server "A" to
"B" and back to "A" because "A" will hold the TCP connection in
TIME_WAIT state for 15-20 minutes and refuse to reconnect.
" (quoted from the "Hints" section).

So, although I did not expect this cause (I had a hunch about Portmapper), the solution suggested worked fine (and only later we got to understand the cause). Good.