Well I just jumped through the hoops again installing a new tool, and as it took me quite a while, I thought I'd help the Internet at Large through it. Or at least make a few notes, as most of my own searches for information on this drew blanks. I even went to the lengths of translating a few obscure German posts in case they could help.
Anyway, munin, once you get it going, is actually quite cool. It provides you with a graphical look at your server performance, and you can customise which data you collect quite simply. I'm installing it on an Ubuntu server 11.04, with nginx and mysql. I'm expecting a big traffic spike in the near future, so I want to see how the machine is handling it, and which bits, if any, are struggling.
Installation
I'm using a single server, which will act as both client (munin-node) and collection / display server (munin). Installation is as simple as this:
apt-get install munin munin-node
If you have a lot of servers, you'll probably want to install munin on one server and munin-node on the rest of them. I'm assuming you're root, if not add sudo on the front of all commands here.
Now to configure it: open up /etc/munin/munin.conf with your favourite editor. We have to tell it that there is a new client node (itself). Add this to the bottom of the file, where domain.com is the name you want the report to appear as. In this case its what I get with 'hostname -f'
# Only client is this machine. [domain.com] address 127.0.0.1 use_node_name yes
OK, so now we need to alter the client part of the setup. Open /etc/munin/munin-node.conf and do the following: Check that the only access line says something like this.
# A list of addresses that are allowed to connect. allow ^127\.0\.0\.1$ # Which address to bind to; host 127.0.0.1 # And which port port 4949
The server gathers data from each of the nodes in turn, (using port 4949) so this allows the server to gather data from itself. OK, all done here.
Now we need to tell nginx to serve up the relevant munin reporting directory so that we can access it over HTTP. Because you probably don't want the world and his dog accessing that, we're also going to put an htaccess-style password on there. Create a file /etc/nginx/sites-available/munin and paste the following into it.
# Turn on Nginx status reporting. server { listen 127.0.0.1; server_name localhost; location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } } server { listen 80; server_name reporting.domain.com; location / { # Host Based auth #allow 11.12.22.55/32; #deny all; # Passwd auth auth_basic "Restricted"; auth_basic_user_file munin_auth_pass; root /var/www/munin; } }
The first server directive tells nginx to turn on its status reporting, of how many connections etc. It makes it available in the nginx_status directory (which is where munin expects to find it) and then limits access to localhost only. You can test it from a command prompt with 'telnet localhost 80', or 'links localhost' from the command line, and you should see some numbers.
The second server directive sets the reporting portion of munin to run on a separate domain name (set this up in your DNS …), and limits access. The commented example can be used to limit to a certain IP address range, but I've chosen to use a password auth file. Create the password auth file in /etc/nginx/munin_auth_pass, using htpasswd or an online htpasswd generator.
We're serving all this out of the /var/www/munin directory. One crucial step which I couldn't find was to link the munin files to that directory. i.e.
ln -s /var/cache/munin/www/ /var/www/munin
We also need to link that munin nginx file so nginx knows to use it, so:
ln -s /etc/nginx/sites-available/munin /etc/nginx/sites-enabled/munin
Finally, for this section, we need to add the nginx plugins, which aren't available out of the box. You can get them from the website, and then you'll need to link them into the right place. It goes something like this:
cd /usr/share/munin/plugins/ wget -O nginx_request http://exchange.munin-monitoring.org/plugins/nginx_request/version/2/raw wget -O nginx_status http://exchange.munin-monitoring.org/plugins/nginx_status/version/3/raw wget -O nginx_memory http://exchange.munin-monitoring.org/plugins/nginx_memory/version/1/raw chmod +x nginx* ln -s /usr/share/munin/plugins/nginx_request /etc/munin/plugins/nginx_request ln -s /usr/share/munin/plugins/nginx_status /etc/munin/plugins/nginx_status ln -s /usr/share/munin/plugins/nginx_memory /etc/munin/plugins/nginx_memory
Now edit /etc/munin/plugin-conf.d/munin-node, and add the following.
[nginx*] env.url http://localhost/nginx_status
OK. we're probably ready for a test. Restart nginx and munin-node services and send your browser to reports.domain.com. You should see some graphs being generated, but you'll have to wait 10 minutes to generate some data. Go and get a coffee. If nothing is happening at this point, check the logs in /var/log/munin to see any interesting errors.
Adding mysql reporting
So we should now have a working install, password protected, and getting a lot of system information, and hopefully nginx information too. But we want to add mysql info as well. This shouldn't be difficult, but in fact was a bit more complicated than it should have been. First we need to enable the plugins.
ln -s /usr/share/munin/plugins/mysql_queries /etc/munin/plugins/mysql_queries
Link more modules if you need them; remove links for modules you don't need.
Now the problem I had was that the Ubuntu install was trying to use a mysql account called debian_sys_maint or something, which didn't exist in my database. I also thought that account probably had a bit too much power, so I created a new user which munin could use and then told it to use it. In mysql (command line or phpmyadmin), run the following commands:
CREATE USER 'muninmonitor'@'localhost' IDENTIFIED BY 'XXXXXXXXXX'; GRANT PROCESS ON *.* TO 'muninmonitor'@'localhost'; FLUSH PRIVILEGES;
Then in /etc/munin/plugin-conf.d/munin-node we need to make sure the [mysql] section agrees with us.
[mysql*] user root env.mysqlopts -umuninmonitor -pXXXXXXX env.mysqlconnection DBI:mysql:mysql;mysql_read_default_file=/etc/mysql/debian.cnf env.mysqladmin /usr/bin/mysqladmin
Restart your services again (service nginx restart ; service munin-node restart) and after a few minutes you should start to get mysql reports. If mysql reports are not working, you might try checking that you have the right PERL modules installed. In Ubuntu this is apt-get install libdbd-mysql-perl
You may also like to troubleshoot using telnet. If you 'telnet localhost 4949' and quickly type 'fetch mysql_queries' (or whatever module name you want to test) then you'll see data. Or not. Also look in the logs at /var/log/munin/munin_update.log
Getting rid of that annoying error
All was working well now. Except for one thing. I got an error every time munin-cron ran. This error was logged in the logfile, it appeared in my root mailbox every five minutes, and also in my logwatch reports 288 times a day. Irritating. The error said.
2012/01/02 09:25:01 Opened log file 2012/01/02 09:25:01 [INFO]: Starting munin-update 2012/01/02 09:25:01 [FATAL ERROR] Lock already exists: /tmp/munin-update.lock. Dying. 2012/01/02 09:25:01 at /usr/share/perl5/Munin/Master/Update.pm line 128
I couldn't understand why. The job only took 5 seconds, and only if it took longer than 5 minutes would it encounter a lock file. I watched the lock files appearing and being deleted in the /tmp directory, exactly as expected, every 5 minutes. People around the internet were complaining about the same error and offering fixes. I traced the command from /etc/cron.d/munin. It tries to run /usr/bin/munin-cron, which tries to run munin-updates. I tried running the commands by hand and it ran fine (sudo -u munin /usr/share/munin/munin-update –nofork –debug). I tried without the –debug option and without the –nofork option and both of these worked as well! So basically it ran fine, except when it was a cron job. Eventually I just turned off the error reporting by diverting the output to null within the cron job.
But that's not a solution, only a band-aid. Or maybe a blindfold. I eventually found the solution. There is a cron job set up in /etc/cron.d/munin which runs every five minutes. EXACTLY THE SAME job is set up to run under munin users crontab (crontab -u munin -l). So one would start first, then the other one would start and there would be a lock file exactly as it reported. So. Comment out the cron job in /etc/cron.d/munin and all is good.
This took me over a day to figure out. I hope it takes you considerably less.
Remember to open your firewall if your node is on a different machine to the collector:
ufw allow from 192.168.0.0/24 to any port 4949
You saved my day. Thank you
No problem. That's exactly why I blog these things. Nice to know someone else is actually reading them …
Thanks for the duplicate cron job tip!
Yeah, pretty irritating wasn't it … 🙂
THANK YOU!!!!!! I was getting the lock file error, and your great blog helped me fix it. Keep up the good work!
Thanks for the cron tip! took me a while to figure this out