Archive for October, 2005

SunCluster, VxVM, and a system image. Sounds nice, right? No.

Tuesday, October 11th, 2005

Due to a customer’s problem, and due to the expensive investments in sending a person over, They’ve decided in my jot to ask the customer to send us a ufsdump of one of his SunCluster nodes, and we’ll just try to imitate his environments in our labs. Well, it is hardly as simple as this. The computer settings are as follows:

1) Veritas Foundation Suite (VxVM, especially) in use for the "/", encapsulated, as well as swap and /var.

2) Single node of a whole SunCluster.

I’ve tried to make it work. First, I’ve noted there’s no guide in the world
called "SunCluster Troubleshooting". You can work with the SunCluster from within, but you cannot (officially, at least) work on it from outside of it. Every document in the world is using sc* for actions on the Cluster node, however, when the SunCluster is malfunctioned the machine doesn’t boot up completely. If, like me, you have to boot the machine (Sun Sparc) using the" boot -x" flag. you won’t be able to maintain the cluster. The only docs I was able to find containing the combination "SunCluster Troubleshooting" were people’s online C.Vs.

The first part was to boot the encapsulated root slice. I’ve had to boot into CD (I use purposely broken JumpStart, which is designed to leave me with shell on the machine), edit /etc/vfstab, edit /etc/system (so it won’t map the root slice into VxVM), edit /etc/hosts (for the machine’s IP), change /etc/hostname.<something> to /etc/hostname.hme0 (due to the hardware layout), change /etc/defaultrouter to point to my own router, and remap the devices – I’ve had to manually relink /etc/rdsk/c0t0d0s* to /devices/[email protected]……/…./…@disk:a etc, etc. Dirty job, but it finally
was able to boot (using the -x flag), and left me with a crippled, yelling (about VxVM and remapped disk devices) system. Great. Now I’ve had to clear VxVM settings somehow, and recreate (and then, re-encapsulate) the root slice, and get the machine towards booting up and working. It wasn’t simple, and it took me a while to understand how to get to it, especially that vxconfigd was screaming about RPC errors, stale configuration, and was unable to perform at all. That will be added to the blog later.

Cheers.

Customer’s site goes down

Saturday, October 1st, 2005

I’m not too happy about it, but we’ve tried to convince him to migrate his data to another server, or at least let us rebuild the server from scratch.

It’s one of those in this link
, holding few virtual machines (VServers), and it appears that the person who built it decided that the system would get a software mirror, and the data would get even better – a stripe! Nice going… 🙂

So, this customer, although we’ve tried putting him on track, lost tons of data (he had no regular backup policy, so it’s around two weeks of 400 hosted sites. Nice… ), and for some reason, he can’t still understand what this person, who has decided that the system volume is more important than the data volume, is a total jerk. He can’t seem to understand why now, after he has a new Raid5 array, he needs this backup. Nice.

So, without saying "told you so", I just keep a hidden smug. It will go away in a day or two.

Dell PowerEdge 1800 and Linux – Part 2

Saturday, October 1st, 2005

In this part – Me installing Centos 4.1 64bit, the 86_x64 version.

Installation and boot from net (another chapter, soon to come), all works great. I’m using the same partition table / Raid / LVM I’ve created in my previous
install on this server. All looks great, and I expect to find the same GRUB problem. I’m not disappointed.

However – a tricky part! This time, in yum repositories, no Lilo! This time I have to think of something, and finally I reach a simple decision – If I can’t get a 64bit Lilo version, I would use the 32bit version. I obtain it from my Centos4.1 32bit installation storage, install it (worked like a charm), and reconf / run it. It works like charm. The system is up and running now.

Running "dd if=/dev/sda of=/dev/null bs=1M" gave me results around the 70MB/sec. Nice… 🙂

Next to do: Install ISPMan on this system, and migrate whole lot of user accounts directly into it.