Posts Tagged ‘web server’

Expanding ks.cfg tweaks

Monday, July 9th, 2007

For the latest (and currently whole) ks.cfg I use, check this link. I have extended the logic there, and got the following out of it. Showing only the %pre section:

%pre
# By Ez-Aton http://www.tournament.org.il/run
for i in `cat /proc/cmdline`; do
echo $i >> /tmp/vars.tmp
done
grep “=” /tmp/vars.tmp > /tmp/vars
# Parse command line. Using only vars with type var=value (doesn’t matter
# what the actual value is)
KS=/tmp/ks.cfg
update=””
name=””
pkg=””
. /tmp/vars
# Shall we update the system during the %post section?
if [ ! -z “$update” ]; then
echo “yum update -y” >> $KS
fi
# Shall we reboot the system after the installation?
if [ ! -z “$reboot” ]; then
echo “reboot” > $KS.tmp
cat $KS >> $KS.tmp
cat $KS.tmp > $KS
fi
# What is the machine’s hostname?
if [ ! -z “$name” ]; then
value=”dhcp –hostname $name”
cat $KS | sed s/dhcp/”$value”/ > $KS.tmp
cat $KS.tmp > $KS
fi
# Shall we add another package to the installation preset?
if [ ! -z “$pkg” ]; then
pkg_line=`grep -n ^%packages $KS | cut -f 1 -d :`
max_line=`wc -l $KS | awk ‘{print $1}’`
head -n $pkg_line $KS > $KS.tmp
for i in `echo $pkg | sed s/,/ /g`; do
echo $i >> $KS.tmp
done
let tail_line=$max_line-$pkg_line
tail -n $tail_line $KS >> $KS.tmp
cat $KS.tmp > $KS
fi
# Is it a virtual machine running on VMWare? If so, we’ll install vmware-tools
if [ ! -z “$vmware” ]; then
# We need vmhttp value for server. It can be the name and path
# of the web server
if [ -z “$vmhttp” ]; then
# my defaults
vmhttp=”centos4-01″
fi
# The name of the rpm is always vmware.rpm
echo “wget http://$vmhttp/vmware.rpm” >> $KS
echo “rpm -i vmware.rpm” >> $KS
fi

Web server behind a web server

Friday, May 26th, 2006

I’ve acquired a new server which is to supply services to a certain group. On most cases, I would have used PREROUTE chain in my IPTABLES on my router for prerouting, based on a rule such as this:

iptables -t nat -I PREROUTING -i <external_Interface_name> -p tcp -s <Some_IP_address> –dport 80 -j DNAT –to-destination <New_server_internal_IP>:80

I can do this trick to any other port just as well, however, I already have one web server inside my network, and I cannot know the source IP of my special visitors. Tough luck.

Reverting to more application-based solution, I can use my existing Apache server, which listens on port 80 alread, and gets its requests already, with mod_proxy directive and Name based Virtual Hosts.

Assuming the name of the server should be domain.com, and that the DNS entries are correct, I would add such a directive to my vhosts.conf (or whatever other file containing your Apache2 Virtual Servers configuration):

<VirtualHost *:80>
ServerName domain.com
ErrorLog logs/domain.com-error_log
CustomLog logs/domain.com-access_log common
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>

ProxyPass / http://<Internal_Server_IP_or_Name>/
ProxyPassReverse / http://<Internal_Server_IP_or_Name>/
</VirtualHost>

I’m not absolutely sure about the need for logs, but I was able to see few issues by using them, such as that the internal server was down, etc. I can see that the internal server is being accessed, and that it’s working just fine.

A note – If it’s the first Name Based Virtual Host you’ve added, you will need to “readjust” your entire configuration to a Name Based Virtual Host. Name agnostic and Name based cannot reside on the same IP configuration. It just won’t work.

RedHat Cluster, and some more

Sunday, February 12th, 2006

It’s been a long while since I’ve written. I get to have, once a while, a period of time dedicated for laziness. I’ve had just one of these for the last few weeks, in which I’ve been almost completely idle. Usually, waking up from such idle time is a time dedicated to self studies and hard work, so I don’t fight my idle periods too hard. This time, I’ve had the pleasure of testing and playing, for personal reasons, both with VMWare GSX, in a “Cluster-In-a-Box” setup, based on recommendation regarding MSCS, altered for Linux (and later, Veritas Cluster Service) and both with RedHat Cluster Server, with the notion of playing with RedHat’s GFS, but, regrettably, without the last.

First, VMware. In their latest rivalty with Microsoft over the issue of Virtualization of servers and desktops, MS has gained an advantage lately. Due to the lower prices of “Virtual Server 2005”, comparing with “VMware GSX Server”, and due to their excellent marketing system (from which we should all learn, if I may say!), Not a few servers and virtual server farms, especially the ones running Windows/Windows setups, had moved to this MS solution, which is as capable as VMware GSX Server. Judging by the history of such rivalries, MS would have won. They always have. However, VMware, in an excellent move, has announced that the next generation of their GSX, simply called “Server”, would be for free. Free for everyone. In this they probably mean to invest more in their more robust ESX server, and give the GSX as a taste of their abilities. While MS do not have any more advanced product than their Virtual Server, it could mean a death blow to their effort in this direction. It could even mean they will just give away their product! While this will happen, we, the customers, will earn a selection of free, advanced and reliable products designed for virtualization. Could it be any better than that?

One more advantage of this “Virtualization for the People” is that community based virtual images, of even the most complicated to install setups can and would be widely available. Meaning to shorten installation time, and allow for a quick working system for everyone. It will require, however, better knowledge and understanding of the products themselves, as merely installing them will not be enough. To survive the future market, you won’t be able to just sell an installation of a product, but should be able to support an out-of-the-box setup of it. That’s for the freelances, and the partially freelances of us…

So, I’ve reinstalled my GSX, and started playing with it. The original goal was to actually run a working setup of RHEL, VCS and Oracle 10g. Unfortunately, VCS supports only RH3 (update 2?), and not RH4, which was a shame. At that point, I’ve considered using RH Cluster Server for the task at hand. It grew to the task of learning this cluster server, and nothing more, which I did, and I can and would share my concepts about it here.

First – Names – I’ve had the pleasure of working with numerous cluster solutions. I’m thrilled each time I get to play with another cluster solution the naming conventions, and name changes vendors do, just to keep themselves unique. I hate it. So here’s a little explanation:
All clusters contain a group of resources (Resource Group, as most vendors call them). This group contains a set of resources, and in some cases, relations (order of startup, dependencies, etc). Each resource could be any single element required for an application. Example – Resource could be an IP address, which without you won’t be able to contact the application. Resource could be a disk device, containing the application’s data. It could be an application start/stop script, and it could be a sub-application – an application required for the whole group to be up, such as a DB for DB driven web server. The order you would ask them to start would be IP, disk, DB, web server (in our case). You’d ask the IP to be brought up first because some of the cluster servers can trick an IP based clients into some delay, so the client hardly feels the short downtime of application failover. But this is for later. So, in a resource group, we have resources. If we can separate resources into different groups, if they have no required dependency between them, it is always better to do so. In our previous example, lets say our web server uses the DB, but it contacts it using IP address, or using hostname. In this case, we won’t need the DB to run on the same physical machine the web server is using, and in such a case, assuming the physical disk holding the DB and the one holding the rest of the web application are not the same disk, we could separate them.

The idea, if I can try to sum it up, is to split your application to the smallest self-maintained structures. Each structure will be called resource group, and each component in this structure is a resource. On some cluster servers, one could group and set dependencies between resource groups, which allows for even more scalability, but that is not our case.

So we had resource groups containing resources. Each computer, a member in the cluster, is called a node. Now, let’s assume our cluster containing three nodes, but we want our application (our resource group) to be able to run on only two specific? In this case, we need to define, for our resource group, which nodes are to be associated with it. In RH Cluster Server, a thing called “Domain” is designed for it. This Domain containes a list of nodes. This Domain can be associated with Resource Group, and thus set priority of failover, and set the group of nodes allowed to deal with the resource group.

All clusters have a single point of error (unlike failure). The whole purpose of the cluster is to allow for non-cluster-aware application the high-availability you could expect for a (relatively) low price. We’re great – we know how to bring an application up, we know how to bring it down. We can assume when the other node(s) is/are down. We cannot be sure of it. We try. We demand few means of communication, so that one link failure won’t cause us to corrupt our shared volumes (by trying multiple access into them). We set a whole system of logic, a heartbit, just name it, to avoid, at almost all cost, a status of split-head – two cluster nodes believing they are the only ones up. You can guess what it means, right?

In RH, there is a heartbit, sure. However, it is based on bonding, in the event of more than one NIC, and not on separated infrastructures. It is a simple network-based HB, with nothing special about it. In case of loss of connection, it would have reset the inactive node, if it saw fit, using a mechanism they call “Fence”. A “Fence” is a system by which the cluster can *know* for sure (or almost for sure) a node has been down, or the cluster can physically take a node down (poweroff if needs), such as control of the UPS the node is connected to, or its power switch, or alternate monitoring infrastructure, such as the Fibre Channel Switch, etc. In such an event, the cluster can know for sure, or can assume, at least, that the hung node has been reset, or it can force it to reset, to release some hung application.

Naming – Resource group is called Service. Resource remains resource, but an application resource *must* be defined by an rc-like script, which accepts start/stop (/restart?). Nothing complicated to it, really. The service contains all required resources.

I was not happy with the cluster, if I can sum up my issues with it. Monitoring machines (nodes) it did correctly, but in the simple enough example I’ve chosen to setup, using apache as a resource (only afterwards I’ve noticed it to be the example RedHat used in their documentation) it failed miserably to take the correct action when an application failed (unlike a failure of a node). I’ve defined my “Service” to contain the following three items:

1) IP Address – Unique for my testing purposes.

2) Shared partition (in my case, and thanks to VMware, /dev/sdb1, mounted at /var/www/html)

3) The Apache application – “/etc/init.d/httpd”

All in all, it was brought up correctly, and switch-over went just fine, including in a case of correct and incorrect reset of the active/passive node, however, when I’ve killed my apache (killall httpd), the cluster detected failure in the application, but was helpless with it. I was unable to bring down the “Service”, as it failed to turn off Apache (duh!), so it did not release neither the IP address, nor the shared volume. In so doing, I’ve had to restart the service rgmanager on both nodes, after manual removal of the remains of the “Service”. I didn’t like it. I expect the cluster to notice failure in the application, which it did, but I expect it to either try to restart the application (/etc/init.d/httpd stop && /etc/init.d/httpd start) before it fails completely, or to set a flag saying it is down, remove the remains of the “Service” from the node in question (release the IP address and the shared storage), and try to bring it up on the other node(s). It did nothing of the likes. It just failed, completely, and required manual intervention.

I expect HA-Cluster to be able to react to an application or resource failure, and not just to a node failure. Since HA-Clusters are meant for the non-ideal world, a place where computers crash, where hardware failures occure, and where applications just die, while servers remain working, I expect the Cluster Server to be able to handle the full variety of problems, but maybe i was expecting too much. I believe it to be better in their future versions, and I believe it could have been done quite easily right now, as long as detection of the failed application occurred, which it has, but it’s not for me to define the cluster’s abilities. This cluster is not mature enough for real-life production sites, if and only because of its failure to react correctly to a resource failure, without demanding manual intervention. A year from now, I’ll probably recommend it as a cheap and reliable solution for most common HA-related tasks, but not today.

That leaves me with VCS and Oracle, which I’ll deal with in the future, wether I like it or not 🙂

Finished customer’s project

Sunday, July 31st, 2005

It was long, it was tiresome, and it was nasty. We’ve been to a hosting farm, in one of Israel’s largest ISPs,where their (and our) customer needed to relocate servers, and change his server’s IPs, settings, etc.

 

I don’t know why, but we’ve tried to come as prepared as possible. One of the things you learn, doing such
projects in an un-controlled environment, far away from your own personal lab, is this – “Trust no one”. Just like X-Files, but for real.

 

If it’s not obvious, here’s an example – Assuming you get there, and you find out you need some drivers for one of the machines. In a controlled environment, you would get these drivers from the Internet, but in an uncontrolled environment, you must make sure you get them with you before, and make sure the CD, floppy, USB port, or whatever is being used there, is actually functioning, and in good condition. Not only, you must make sure you either get in this place with a whole pack of methods to get the files/info/drivers/data into the machine in question, or a method of transferring between media types, like cd -> Disk on Key, or DoK -> Floppy.

So, trying to be as prepared as possible for the machine (plus extra ~400 domains) transfer and change, we’ve came with the following inventory:

  • 1 IBM 1U server, preinstalled with Linux, predefined as DNS server, and web server, saying “The server is under maintenance. It will be solved soon” or something alike.
  • 2 Laptops running Linux/Windows, including backup of all configurations of the Virtual servers, and the root servers.
  • Cables
    (We’ve discovered only on last minute we don’t get anything out of the hosting farm. We have to bring it all with us. It was night, and we just picked anything we could for it, hoping it would do. It did).
  • Tools
  • extras
  • Exact written procedure of which files to change, where, and into what. New IPs pre-assigned, passwords, etc.

 

We were only half prepared. Half prepared, because the only thing we didn’t predict as much was the ill tempered and lazy SoB who was our contact in the farm. I have no idea why, and I do not care why, but he has some grudge with our (and his!) customer, and he made everything he could to “not help us”. Meaning he didn’t deliberately hinder us, but he did the least he could to help, up to nothing.

 

Example? Sure. We needed network link for the new rack, so he said we had one. I’ve asked him to activate it, and soon he claimed he did. Not long after, when reconfigured the router, and moved it into the new location, I needed to connect it to this link. Not working. I started debugging the problem (maybe bad cable, maybe interface in “shut” mode. Maybe we need laplink cable. Don’t know). Soon I had the obvious idea, and asked him if the link was up. He said “No. I was just waiting for you”. I’ve asked him to bring the link up, keeping my temper as down as possible. It took him 15-30 minutes, while we just stood and waited (it was a show stopper. You can’t start moving servers before you know you have where to connect them to, right?). Finally, and after lots of intervention on our side (like testing and seeing the link was still down, changing cables, etc), the link was brought up, and we could
continue.

Things like this piss me off. You expect the man to do any and every thing he can to assist, so all of you can go home already (the job started at midnight), and this lazy SoB was supposed to hand us the cable link, everything predefined per our demands, and wait for us to finish. Not starting to set it up during our work, and
“waiting” for us. We had to wait for him, that’s for sure, but he had no reason to wait for us.

 

So that’s a hostile, and uncontrolled environment.

 

Don’t get me wrong. We had tons of laughs, and enjoyed the job (and the A/C), but the lack of cooperation, and the stinking attitude of our contact person was, least to say, a problem. Another example is when asking for coffee (to remind you – midnight, no coffee-shops open for kilometers around us), he showed us into their “kitchen”, and pointed out how much he was nice, because of the special time and all, else we wouldn’t supposed to use this “kitchen”. Man, this is only a cup of coffee, and it’s not yours, nor your mom’s! Stinking attitude.

And we had our share of technical difficulties. The person setting up our client’s servers was, how to say, amature. He predefined the machine’s IP address in around a dozen different locations. Three times in the firewall settings (for each, virtual of otherwise real machine’s IP), twice in each network configuration file (per machine), once for every major service each machine (again, virtual or real) was running, such as sshd Listen address, or FTPD Listen address, httpd Listen address, etc. It was a major hell. Hosted domains zone files were not using CNAME record for a single, one-time-only-defined IP address (which each Vserver had. Only one), but had a full A record for the whole IP address. We had to “sed” them all to the new ones, decrease the TTLs for each domain (again, “sed”, or friend), and so on.

It wasn’t easy, but it went rather well, summing it wall up. Why we did it? For the money, of course. And besides, the hosting farm had better A/C than
I have 🙂

 

Well, it sums a night without sleep, filled with work, before I’ve started traveling around, doing all kind of chores I could accumulate around this area of Israel. It went quite well, after all, and I managed to keep my eyes open when driving, which was good, generally speaking.

 

So, here’s me, back home, about to go to sleep, behind me a very, very long day.

 

*Addition*

I have managed to take pictures at the place. Attached in Thumbnails. Sorry for the choppy quality, as they were taken using a cell phone camera, and not a real camera.

Front of the rack

Front of the rack, #2

The rear of the rack

 

The rack was a bit shorter than we’ve expected, so our power cables are to be pressed in, to allow closing the doors. Tomorrow night, we are to add a router into the system, and change the firewall’s settings,
accordingly. Will be fun. Not.