Posts Tagged ‘performance measurement’

New version of Cacti, and using spine

Monday, January 21st, 2008

A while ago, a newer version of Cacti became available through Dag’s RPM repository. An upgrade went without any special events, and was nothing to write home about.

A failure in one of my customer’s Cacti system lead me to test the system using “spine” – the “cactid” new generation.

I felt as if it acts faster and better, but had no measurable results (as the broken Cacti system did not work at all). I have decided to propagate the change to a local system I have, which is running Cacti locally. This is a virtual machine, dedicated only to this task.

Almost a day later I can see the results. Not only the measurements are continuous, but the load on the system dropped, and the load on the VM server dropped in accordance. Check the graphs below!

MySQL CPU load reduces at around midnight
as well as the amount of MySQL locks
and innoDB I/O
A small increase in the amount of table locks
A graph which didn’t function starts working
System load average reduces dramatically
Also comparing to a longer period of time
And the virtual host (the carrier), which runs several other guests in addition to this one, without any other change, shows a great improvement in CPU consumption

These measures talk for themselves. From now on (unless it’s realy vital), spine is my perfered engine.

Misconfigured Amavisd and its impact

Tuesday, June 19th, 2007

As an administrator, I am responsible for many setups and configurations, sometimes hand tailored to supply an answer to a set of given demands.

As a human, I err, and the common method of verifying that you have avoided error is by answering this simple rule: “Does it work after these changes?”

In the world of computers there is hardly ever simple true or false. We would have expected it to be boolean world – either it works or it doesn’t, but we are not there. The world of computers is filled with “works better” and “works worse”, and sometimes we forget that.

This long prologue was meant to bring up the subject of monitoring and evaluating your actions. While the simplest method of evaluation remains “Does it work?”, there are some additional, more subtle methods of verifying that things work according to your specifications.

One of the tools which helps me see, in the mirror of time, the effect of changes I have done is a graphical tool called Cacti. This tool graphs a set of predefined parameters which were chosen by me. It has no special AI, it cannot guess anything, and I am quite happy with it, as I can understand for myself the course of events better.

This post is about a mis configured Amavisd daemon. Amavis is a wrapper which scans using both Spamassassin and a selected Antivirus (ClamAV, in my case, as it has proven itself to me as a good AV) mail supplied by the local MTA.

I had a directive looking like this in it:

[‘ClamAV-clamscan’, ‘clamscan’,
“–stdout –disable-summary -r –tempdir=$TEMPBASE {}”, [0], [1],
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

It worked, however, this server, as it appears, was heavily loaded for a while now. Since it’s a rather strong server, it was not really visible unless you take a look at the server’s Cacti. On about 80%+ of the time the CPUs were on 100% with the process ‘clamscan‘. I have decided yesterday to solve the heavy load, and for that modified the file ‘/etc/amavisd.conf‘ to include the primary ClamAV section as follows:

[‘ClamAV-clamd’,
&ask_daemon, [“CONTSCAN {}n”, “/tmp/clamd”],
qr/bOK$/, qr/bFOUND$/,
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

This uses clamd instead of clamscan. The results were a drastic decrease on the CPU consumption and system average load, as can be seen in the Cacti graph (around 4 AM):

Cacti load average graph

The point is that while both configuration worked, I had the tools to understand that the earlier configuration was not good enough. Through tracking parameters on the system for a while, I could monitor my configuration modifications using a wider perspective, and reach better conclusions.

net-snmp broken in RHEL (and Centos, of course) – diskio

Saturday, June 9th, 2007

I’ve had a belief for quite a while now that Linux, unlike other types of systems, was unable to produce any I/O SNMP information. I only recently found out that it was partially true – all production-level distros, such as RedHat (and Centos, for that matter) were unable to produce any output for any SNMP DISKIO queries.

I had found a bugzilla entry about it, so I raise the glove in a request to any of the maintainers of an RH-compatible repositories to recompile (and maintain, of course) an alternate net-snmp package which supports diskio.

Meanwhile, I have found this blog post, which offers an alternate (and quite clumsy, yet working) solution to the disk performance measurement issue in Linux. I haven’t tried it yet, but I will, rather soon.

—Update—

I have used the script from the blog post mentioned above, and it works.

Speed could be an issue. Comparing two servers the speed differential was amazing.

Both servers are connected on the same switch as the server running the query is connected. Server1 has a P2 233MHz CPU, while Server2 has a dual 2.8GHz Xion CPU.

~$ time snmpwalk -c COMMUNITY -v2c Server1 1.3.6.1.4.1.2021.13.15 > /dev/null

real 0m0.311s
user 0m0.024s
sys 0m0.020s

~$ time snmpwalk -c COMMUNITY -v2c Server2 1.3.6.1.4.1.2021.13.15 > /dev/null

real 0m8.303s
user 0m0.044s
sys 0m0.012s

Looks like a huge difference. However, I believe it’s currently good enough for me.