Archive for October, 2005

Why people use Linux

Sunday, October 30th, 2005

Tom Adelstein has posted an article about why people use Linux. Although I tend to stick to the tech stuff, it was rather interesting. One could never imaging MS bashing wasn’t one of the major reasons… :-)

I have had fun with Cacti, in the meanwhile. It is one mean graphing machine. Details, as well as other, most sought for information, in the future. When I get the time to arrange some of it into writing, and into public scripts.

Server online – finished migration!

Tuesday, October 25th, 2005

It was a tough one. A real tough one. However, finally, our little Dell Server (described below) PE1800, is up and running. Replacing it’s old 5 years old predecessor. It was a living hell, but it allowed me vast experiments with wider range of technologies, better combined solutions, and newer, more mature approach to system management.

I can proudly say I’ve been pushing towards the "no shell" solution, where everything will either be completely automatic, or where some web interface which allows even a non-tech user to manage things, will be the only required tools aiding in managing a server.

It’s not that I abandon CLI. On the contrary. I can now choose whether to use it or not, unlike before, where it was not a matter of choice, but of necessity.

I wish ISPMAN would give better permission and authorization granularity. I have developed a small wish-list of my own, but I don’t think it will ever come true. I will describe it here,
for the sports of it:

1) Allow domain users to manage their own passwords and GECOS-like information

2) Allow setting up "master domain", which will default to having no-extras to the login name. I wish I could login using "username" and not "username_my_domain_com". Allow me to set a single default during init time, and you’d make me happy. I’ve had to do so myself, for many scripts.

3) Allow me to state few levels of domain manager. I want "administrative" and "technical" domain managers, which could overlap on some issues, or could not.

4) I want to select what each type of the above mentioned roles would be able to do using the web interface. let me state that the "administrative" domain manager
cannot change DNS settings, but can control users. Let me select what each of them can and cannot do using the web GUI.

5) Better documentation. I might contribute some to it, and I try, when I remember, and have time, to share some of my knowledge with the readers of this specific blog.

Well… It’s up and running, dealing with real DNS requests, and tagging, queuing and delivering real mail messages, being under real attacks. My Baby :-)

It’s probably during the next few days I’ll have better perspective of the success of the migration.

SunCluster and VxVM – post installation

Sunday, October 23rd, 2005

When I’ve installed my lab’s SunCluster, only later did I install Veritas Volume Manager, aka VxVM. I believed they wouldn’t need it at my development dept., but I proved to be wrong. Only short while afterwards I’ve been asked to add VxVM to the configuration. I’ve added it, and installed it using Veritas install scripts. I’ve used vxsetup
to configure disks which were not the bootable disks, as I know the consequences of early development stages with hard-to-access volumes.

Now, a while later, I’ve been asked to encapsulate the bootable disk, and add it into VxVM.

After I’ve manually configured one node (and discovered SC global devices were encapsulated, but their did devices /etc/vfstab entries were not updated correctly), I’ve read /usr/cluster/bin/scvxinstall, and deduced how Sun do it. It appears they change the entry in /etc/vfstab to point to the real physical dev entry of the global device (for example, instead of /dev/did/dsk/d1s3, in my case, to /dev/dsk/c0t0d0s3). When set that way, encapsulation, using vxdiskadm, for example, can easily be done. Afterwards, the system can boot
correctly. To be on the safe side, After setting the encapsulation, I’ve rebooted the machine using the command "reboot — -x", so SunCluster and Veritas won’t have any possible collision. It worked, to my full surprise, and it’s now up and running!

Proccess monitoring, Keepalive, etc

Sunday, October 23rd, 2005

My new Linux server-to-be will require some remote monitoring and process keepalive going there. It’s that I’ve noticed nscd (which is required, when dealing with hundreds of LDAP based accounts) tends to
die once a while. I’ve also made a mistake once, and managed to kill all SSH daemons, including the running ones. I am happy to say it was solved by going down one floor, and connecting a screen to the machine, and restarting the service, however, it would have been nasty has it happened in relocation room, inside our ISP’s server farm…

So I’m trying to solve problems *before* they appear, I’ve decided to search for process KeepAlive daemon, or something which will ease my life, and make sure I don’t get any phone calls.

At first searching for "process keepalive" led me to some pages about HA-servers, aka, High Availability clusters. I don’t need multi-node keepalive, so I didn’t bother with it. Installing Centos’ or Dag’s keepalived proved to be exactly the thing I did not look for. So I’ve removed it, and kept on searching.

In the process, I found this link, which should have been put into cron. Nice going for one or two processes, but maintaining a full load of about 10 processes, which I must keep alive at all times, is a bit too big for this one. Without being able to code perl, I needed something else, better scalable.

I’ve seen lots of things, and some of them looked like they could interest me, but I wanted it as part of my package tree. I wanted it to be an RPM, and me to be able to upgrade it, if there are updates. All this, without actually tracking each package in person (which is a good enough reason to having package management system in the first place).

I was able to find in Dag Wieers RPM repository just the thing for me. It’s called "monit", and it was just the thing. Took me about 10 minutes to set the thing up, and make it work, tested, for most of my more important daemons.

Example of a configuration file is here monit.conf

It works, and it made my life a lot easier. I can easily recover both human mistakes and machine errors now. I might add some mail notification, but for now I will settle for logs only.

ISPMan – Towards the finish line

Friday, October 14th, 2005

Although I have not shared most of my work on ISPMan in this blog, and it happened due to lack of time, I wish to share a script I’ve been part of creating, which dealt with migration of flat-passwd file based authentication server, to LDAP, used through ISPMan. We’ve taken many things into consideration, such as first/last name, such as the crypt method (we were sure it was MD5, but it appears to be crypt), etc. etc.

So, we have a running script, which can either add (correctly!) or remove users, assign passwords for them, assuming your ISPMan is set up correctly.

Since I’ve been looking for something like that, I’ve decided to post it here, under the common GPL license, so everyone migrating to ISPMan, or porting either a single server/domain, or a whole lot of them could use and alter to his/her needs. I wish you all better luck than mine in finding one of those :-)

import_passwd.sh

SunCluster, VxVM, and a system image. Sounds nice, right? No.

Tuesday, October 11th, 2005

Due to a customer’s problem, and due to the expensive investments in sending a person over, They’ve decided in my jot to ask the customer to send us a ufsdump of one of his SunCluster nodes, and we’ll just try to imitate his environments in our labs. Well, it is hardly as simple as this. The computer settings are as follows:

1) Veritas Foundation Suite (VxVM, especially) in use for the "/", encapsulated, as well as swap and /var.

2) Single node of a whole SunCluster.

I’ve tried to make it work. First, I’ve noted there’s no guide in the world
called "SunCluster Troubleshooting". You can work with the SunCluster from within, but you cannot (officially, at least) work on it from outside of it. Every document in the world is using sc* for actions on the Cluster node, however, when the SunCluster is malfunctioned the machine doesn’t boot up completely. If, like me, you have to boot the machine (Sun Sparc) using the" boot -x" flag. you won’t be able to maintain the cluster. The only docs I was able to find containing the combination "SunCluster Troubleshooting" were people’s online C.Vs.

The first part was to boot the encapsulated root slice. I’ve had to boot into CD (I use purposely broken JumpStart, which is designed to leave me with shell on the machine), edit /etc/vfstab, edit /etc/system (so it won’t map the root slice into VxVM), edit /etc/hosts (for the machine’s IP), change /etc/hostname.<something> to /etc/hostname.hme0 (due to the hardware layout), change /etc/defaultrouter to point to my own router, and remap the devices – I’ve had to manually relink /etc/rdsk/c0t0d0s* to /devices/pci@……/…./…@disk:a etc, etc. Dirty job, but it finally
was able to boot (using the -x flag), and left me with a crippled, yelling (about VxVM and remapped disk devices) system. Great. Now I’ve had to clear VxVM settings somehow, and recreate (and then, re-encapsulate) the root slice, and get the machine towards booting up and working. It wasn’t simple, and it took me a while to understand how to get to it, especially that vxconfigd was screaming about RPC errors, stale configuration, and was unable to perform at all. That will be added to the blog later.

Cheers.

Customer’s site goes down

Saturday, October 1st, 2005

I’m not too happy about it, but we’ve tried to convince him to migrate his data to another server, or at least let us rebuild the server from scratch.

It’s one of those in this link
, holding few virtual machines (VServers), and it appears that the person who built it decided that the system would get a software mirror, and the data would get even better – a stripe! Nice going… :-)

So, this customer, although we’ve tried putting him on track, lost tons of data (he had no regular backup policy, so it’s around two weeks of 400 hosted sites. Nice… ), and for some reason, he can’t still understand what this person, who has decided that the system volume is more important than the data volume, is a total jerk. He can’t seem to understand why now, after he has a new Raid5 array, he needs this backup. Nice.

So, without saying "told you so", I just keep a hidden smug. It will go away in a day or two.

Dell PowerEdge 1800 and Linux – Part 2

Saturday, October 1st, 2005

In this part – Me installing Centos 4.1 64bit, the 86_x64 version.

Installation and boot from net (another chapter, soon to come), all works great. I’m using the same partition table / Raid / LVM I’ve created in my previous
install on this server. All looks great, and I expect to find the same GRUB problem. I’m not disappointed.

However – a tricky part! This time, in yum repositories, no Lilo! This time I have to think of something, and finally I reach a simple decision – If I can’t get a 64bit Lilo version, I would use the 32bit version. I obtain it from my Centos4.1 32bit installation storage, install it (worked like a charm), and reconf / run it. It works like charm. The system is up and running now.

Running "dd if=/dev/sda of=/dev/null bs=1M" gave me results around the 70MB/sec. Nice… :-)

Next to do: Install ISPMan on this system, and migrate whole lot of user accounts directly into it.