Posts Tagged ‘performance impact’

New version of Cacti, and using spine

Monday, January 21st, 2008

A while ago, a newer version of Cacti became available through Dag’s RPM repository. An upgrade went without any special events, and was nothing to write home about.

A failure in one of my customer’s Cacti system lead me to test the system using “spine” – the “cactid” new generation.

I felt as if it acts faster and better, but had no measurable results (as the broken Cacti system did not work at all). I have decided to propagate the change to a local system I have, which is running Cacti locally. This is a virtual machine, dedicated only to this task.

Almost a day later I can see the results. Not only the measurements are continuous, but the load on the system dropped, and the load on the VM server dropped in accordance. Check the graphs below!

MySQL CPU load reduces at around midnight
as well as the amount of MySQL locks
and innoDB I/O
A small increase in the amount of table locks
A graph which didn’t function starts working
System load average reduces dramatically
Also comparing to a longer period of time
And the virtual host (the carrier), which runs several other guests in addition to this one, without any other change, shows a great improvement in CPU consumption

These measures talk for themselves. From now on (unless it’s realy vital), spine is my perfered engine.

Misconfigured Amavisd and its impact

Tuesday, June 19th, 2007

As an administrator, I am responsible for many setups and configurations, sometimes hand tailored to supply an answer to a set of given demands.

As a human, I err, and the common method of verifying that you have avoided error is by answering this simple rule: “Does it work after these changes?”

In the world of computers there is hardly ever simple true or false. We would have expected it to be boolean world – either it works or it doesn’t, but we are not there. The world of computers is filled with “works better” and “works worse”, and sometimes we forget that.

This long prologue was meant to bring up the subject of monitoring and evaluating your actions. While the simplest method of evaluation remains “Does it work?”, there are some additional, more subtle methods of verifying that things work according to your specifications.

One of the tools which helps me see, in the mirror of time, the effect of changes I have done is a graphical tool called Cacti. This tool graphs a set of predefined parameters which were chosen by me. It has no special AI, it cannot guess anything, and I am quite happy with it, as I can understand for myself the course of events better.

This post is about a mis configured Amavisd daemon. Amavis is a wrapper which scans using both Spamassassin and a selected Antivirus (ClamAV, in my case, as it has proven itself to me as a good AV) mail supplied by the local MTA.

I had a directive looking like this in it:

[‘ClamAV-clamscan’, ‘clamscan’,
“–stdout –disable-summary -r –tempdir=$TEMPBASE {}”, [0], [1],
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

It worked, however, this server, as it appears, was heavily loaded for a while now. Since it’s a rather strong server, it was not really visible unless you take a look at the server’s Cacti. On about 80%+ of the time the CPUs were on 100% with the process ‘clamscan‘. I have decided yesterday to solve the heavy load, and for that modified the file ‘/etc/amavisd.conf‘ to include the primary ClamAV section as follows:

[‘ClamAV-clamd’,
&ask_daemon, [“CONTSCAN {}n”, “/tmp/clamd”],
qr/bOK$/, qr/bFOUND$/,
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

This uses clamd instead of clamscan. The results were a drastic decrease on the CPU consumption and system average load, as can be seen in the Cacti graph (around 4 AM):

Cacti load average graph

The point is that while both configuration worked, I had the tools to understand that the earlier configuration was not good enough. Through tracking parameters on the system for a while, I could monitor my configuration modifications using a wider perspective, and reach better conclusions.

Linux LVM performace measurement

Sunday, June 10th, 2007

Modern Linux LVM offers great abilities to maintain snapshots of existing logical volumes. Unlike NetApp “Write Anywhere File Layout” (WAFL), Linux LVM uses “Copy-on-Write” (COW) to allow snapshots. The process, in general, can be described in this pdf document.

I have issues several small tests, just to get real-life estimations of what is the actual performance impact such COW method can cause.

Server details:

1. CPU: 2x Xion 2.8GHz

2. Disks: /dev/sda – system disk. Did not touch it; /dev/sdb – used for the LVM; /dev/sdc – used for the LVM

3. Mount: LV is mounted (and remains mounted) on /vmware

Results:

1. No snapshot, Using VG on /dev/sdb only:

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 0m16.088s
user 0m0.009s
sys 0m8.756s

2. With snapshot on the same disk (/dev/sdb):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 6m5.185s
user 0m0.008s
sys 0m11.754s

3. With snapshot on 2nd disk (/dev/sdc):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 5m17.604s
user 0m0.004s
sys 0m11.265s

4. Same as before, creating a new empty file on the disk:

# time dd if=/dev/zero of=/vmware/test2.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 3m24.804s
user 0m0.006s
sys 0m11.907s

5. Removed the snapshot. Created a 3rd file: