Posts Tagged ‘cacti’

Work around ISP QoS limitations

Friday, May 29th, 2009

ISPs which enforce QoS limitations suddenly, without alerting the customer, are abusing their force. QoS limitation is not a bad thing, from the ISP’s point of view, but changing the customer deal without notifying him seems to me to be unfair.

This is a recipe for a QoS workaround.

Ingredients:

  • One fast Internet connection which is not used to its full capacity
  • Defined target service provider. I use Giganews as an NNTP, which is the fastest method of obtaining content today. You should have the service list of IPs. Luckily, Giganews use only two IP addresses
  • One “evil” ISP which enforces QoS for external targets
  • One server in the ISP’s hosting farm, which has no speed or transfer limitations, and is probably not bound by the ISP’s QoS
  • For a better looking dish – some graphing solution, such as Cacti or MRTG

Directions:

  • Setup OpenVPN Server on the hosted server
  • Setup OpenVPN Client on your NNTP/Other service client (your desktop, your Linux router, etc) – This can also be a Windows machine, but configuration varies a bit.
  • Define, in your OpenVPN client.conf line(s) which look like this:

route <SERVICE_IP>

route <SERVICE_IP2>

  • If this is a router machine, activate NAT on it. Of course – remember to set this rule to work after reboot too!

iptables -t nat -A POSTROUTING -o tun+ -j MASQUERADE

  • For your good feeling, try to pickup data from before and after, and compare.
  • Start the OpenVPN Service on the server, on the client, and restart your NNTP/Other service downloads.
  • Serve with a smile :-)

The result dish is both tasty and good looking! see below:

QoS_override.png

A word of warning – OpenVPN is a VPN tool. As such, it uses encryption and varios methods which are very secure. This means that for high througput, such as mine (about 10Mb/s) you will see the impact on the router/workstation’s CPU. Under virtualization, I get about 2% additional system CPU utilization from a 2×3GHz Xeon CPU. For older router devices this could result in an overworked router. I am so glad I got rid of my old P2 350MHz router in favor of the virtualized one.

Centreon and batch-adding hosts

Monday, April 27th, 2009

Centreon is a nice GUI wrapper for Nagios. It is using MySQL as its configuration engine, and it functions quite well. One thing Cacti can do but Centreon can’t is mass automatic addition of servers. I have had a new site with an installed Centreon, and I wanted to add about 40 servers to be monitored. This is a tedious work, and I was searching for some semi-automatic method of doing it.
This is not perfect, but it worked for me.
In this case I do not replicate service-group relationship, but only add a mass of servers.

First – create a text file containing a list of servers and IPs. It should look like this:
serv1:1.2.3.4
serv2:10.2.1.3
new_srv:2.3.4.1

I have placed in in /tmp/machines

Second – find the last host entry. In my case the DB name is Centreon, so I run the following command:

mysql -u root -p centreon -e’select host_id from host’

This should return a colum with numbers. Find the largest one and increment it by one. In my example the last one was 19, so my initial host_id will be 20.

You should now find the host_template_model_html_id you are to use. There are few methods for that, but the easiest way is to find another host information which matches to some level your desired information. In my case it was called “DB1″, so this looks like this:

mysql -u root -p centreon -e”select host_template_model_htm_id from host where host_name=’DB1′”

Please note that my blog formatting might change the quote character. You might not want to copy/paste it, but type it yourselves.

The result of the above query should give us a template ID. In my case it was “2″, which is fine by me.

If you want a better reference for the values entered, you can do a whole select for a single host to verify your values match mine:

mysql -u root -p centreon -e”select * from host where host_name=’DB1′\G”

This should give you long listing and information of the host, as a reference.

My script goes like this, based on the assumptions made above:

#!/bin/bash
HOST=20
for i in `cat /tmp/machines`
do
   NAME=`echo $i | cut -f1 -d:`
   IP=`echo $2 | cut -f2 -d:`
   echo "insert into host values ('$HOST',2,NULL,NULL,1,NULL,NULL,NULL,NULL,'$NAME','$NAME','$IP',NULL,NULL,'2','2','2','2','2',NULL,'2',NULL,NULL,'2','2','2','2',NULL,NULL,'2',NULL,NULL,'0',NULL,'1','1');" >> /tmp/insert_sql.sql
   echo "insert into extended_host_information values('',$HOST,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);" >> /tmp/insert_sql.sql
   let HOST++
done

This should create a file called /tmp/insert_sql.sql which then should be first reviewed, and then inserted into your database.

Needless to say – back up your database first, just in case:

mysqldump -u root -p –opt -B centreon > /tmp/centreon_backup.sql

and then insert the newly created data:

mysql -u root -p centreon < /tmp/insert_sql.sql

Notice – at this point, no service relationship is created. I think it is quite a chore only to create the nodes. Adding the service relationships complicates things a bit, and I did not want to go there at this specific stage. However, for few tenths of monitored hosts, this is quite a lifesaver.

Notice that this is only Centreon configuration, and you will be required to apply it (through the GUI) to Nagios.

Xen VMs performance collection

Saturday, October 18th, 2008

Unlike VMware Server, Xen’s HyperVisor does not allow an easy collection of performance information. The management machine, called “Domain-0″ is actually a privileged virtual machine, and thus – get its own small share of CPUs and RAM. Collecting performance information on it will lead to, well, collecting performance information for a single VM, and not the whole bunch.

Local tools, such as “xentop” allows collection of information, however, combining this with Cacti, or any other SNMP-based collection tool is a bit tricky.

A great solution is provided by Ian P. Christian in his blog post about Xen monitoring. He has created a Perl script to collect information. I have taken the liberty to fix several minor things with his permission. The modified scripts are presented below. Name the script (according to your version of Xen) “/usr/local/bin/xen_stats.pl” and set it to be executable:

For Xen 3.1

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare...
sub trim($);
#<a href="/blog/files/xen_cloud.tar.gz" title="xen_cloud.tar.gz" target="_blank">xen_cloud.tar.gz</a>
# we need to run 2 iterations because CPU stats show 0% on the first, and I'm putting .1 second betwen them to speed it up
my @result = split(/\n/, `xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
 
shift(@result) while @result &amp;&amp; $result[0] !~ /^xentop - /;
 
# the next 3 lines are headings..
shift(@result);
shift(@result);
shift(@result);
shift(@result);
 
foreach my $line (@result)
{
  my @xenInfo = split(/[\t ]+/, trim($line));
  printf("name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n",
    $xenInfo[0],
    $xenInfo[2],
    $xenInfo[3],
    $xenInfo[14],
    $xenInfo[15]
    );
}
 
# trims leading and trailing whitespace
sub trim($)
{
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

For Xen 3.2 and Xen 3.3

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare…
sub trim($);
 
# we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
my @result = split(/\n/, `/usr/sbin/xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
shift(@result) while @result &amp;&amp; $result[0] !~ /^[\t ]+NAME/;
shift(@result);
 
foreach my $line (@result)
{
        my @xenInfo = split(/[\t ]+/, trim($line));
        printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n,
        $xenInfo[0],
        $xenInfo[2],
        $xenInfo[3],
        $xenInfo[14],
        $xenInfo[15]
        );
}
# trims leading and trailing whitespace
sub trim($)
{
        my $string = shift;
        $string =~ s/^\s+//;
        $string =~ s/\s+$//;
        return $string;
}

Cron settings for Domain-0

Create a file “/etc/cron.d/xenstat” with the following contents:

# This will run xen_stats.pl every minute
*/1 * * * * root /usr/local/bin/xen_stats.pl > /tmp/xen-stats.new && cat /tmp/xen-stats.new > /var/run/xen-stats

SNMP settings for Domain-0

Add the line below to “/etc/snmp/snmpd.conf” and then restart the snmpd service

extend xen-stats   /bin/cat /var/run/xen-stats

Cacti

I reduced Ian Cacti script to be based on a per-server setup, meaning this script gets the host (dom-0) name from Cacti, but cannot support live migrations. I will try to deal with combining live migrations with Cacti in the future.

Download and extract my modified xen_cloud.tar.gz file. Extract it, place the script and config in its relevant location, and import the template into Cacti. It should work like charm.

A note – the PHP script will work only on PHP5 and above. Works flawlessly on Centos5.2 for me.

New version of Cacti, and using spine

Monday, January 21st, 2008

A while ago, a newer version of Cacti became available through Dag’s RPM repository. An upgrade went without any special events, and was nothing to write home about.

A failure in one of my customer’s Cacti system lead me to test the system using “spine” – the “cactid” new generation.

I felt as if it acts faster and better, but had no measurable results (as the broken Cacti system did not work at all). I have decided to propagate the change to a local system I have, which is running Cacti locally. This is a virtual machine, dedicated only to this task.

Almost a day later I can see the results. Not only the measurements are continuous, but the load on the system dropped, and the load on the VM server dropped in accordance. Check the graphs below!

MySQL CPU load reduces at around midnight
as well as the amount of MySQL locks
and innoDB I/O
A small increase in the amount of table locks
A graph which didn’t function starts working
System load average reduces dramatically
Also comparing to a longer period of time
And the virtual host (the carrier), which runs several other guests in addition to this one, without any other change, shows a great improvement in CPU consumption

These measures talk for themselves. From now on (unless it’s realy vital), spine is my perfered engine.

Misconfigured Amavisd and its impact

Tuesday, June 19th, 2007

As an administrator, I am responsible for many setups and configurations, sometimes hand tailored to supply an answer to a set of given demands.

As a human, I err, and the common method of verifying that you have avoided error is by answering this simple rule: “Does it work after these changes?”

In the world of computers there is hardly ever simple true or false. We would have expected it to be boolean world – either it works or it doesn’t, but we are not there. The world of computers is filled with “works better” and “works worse”, and sometimes we forget that.

This long prologue was meant to bring up the subject of monitoring and evaluating your actions. While the simplest method of evaluation remains “Does it work?”, there are some additional, more subtle methods of verifying that things work according to your specifications.

One of the tools which helps me see, in the mirror of time, the effect of changes I have done is a graphical tool called Cacti. This tool graphs a set of predefined parameters which were chosen by me. It has no special AI, it cannot guess anything, and I am quite happy with it, as I can understand for myself the course of events better.

This post is about a mis configured Amavisd daemon. Amavis is a wrapper which scans using both Spamassassin and a selected Antivirus (ClamAV, in my case, as it has proven itself to me as a good AV) mail supplied by the local MTA.

I had a directive looking like this in it:

['ClamAV-clamscan', 'clamscan',
"--stdout --disable-summary -r --tempdir=$TEMPBASE {}", [0], [1],
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

It worked, however, this server, as it appears, was heavily loaded for a while now. Since it’s a rather strong server, it was not really visible unless you take a look at the server’s Cacti. On about 80%+ of the time the CPUs were on 100% with the process ‘clamscan‘. I have decided yesterday to solve the heavy load, and for that modified the file ‘/etc/amavisd.conf‘ to include the primary ClamAV section as follows:

['ClamAV-clamd',
\&ask_daemon, ["CONTSCAN {}\n", "/tmp/clamd"],
qr/\bOK$/, qr/\bFOUND$/,
qr/^.*?: (?!Infected Archive)(.*) FOUND$/ ],

This uses clamd instead of clamscan. The results were a drastic decrease on the CPU consumption and system average load, as can be seen in the Cacti graph (around 4 AM):

Cacti load average graph

The point is that while both configuration worked, I had the tools to understand that the earlier configuration was not good enough. Through tracking parameters on the system for a while, I could monitor my configuration modifications using a wider perspective, and reach better conclusions.