ganglia and ganglia-web rpms built from source tar balls
Dependencies:
RPMforge has the latest rrd for RHEL5. Read up on how it installs rpm forge repository. We also use it for libconfuse
### Name: RPMforge RPM Repository for RHEL 5 - dag ### URL: http://rpmforge.net/ [rpmforge] name = RHEL $releasever - RPMforge.net - dag baseurl = http://apt.sw.be/redhat/el5/en/$basearch/rpmforge mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge enabled = 1 protect = 0 gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag gpgcheck = 1 includepkgs= rrdtool.x86_64 rrdtool-devel.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 perl-rrdtool.x86_64 Priority=50
yum install rpm-build
yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 perl-rrdtool.x86_64 php.x86_64 php-gd.x86_64
yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
-
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm
yum install rpm-build
yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 php53.x86_64 php53-gd.x86_64
yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
Note - these steps are valid for SL 5 or SL 5
Starting from a clean shell and clean area
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
cd ~/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1342491498&use_mirror=superb-sea2
tar -zxvf ganglia-3.4.0.tar.gz cd ganglia-3.4.0 cp ganglia.spec ../../SPECS/
cd ../../SPECS rpmbuild -bb ganglia.spec
ls ../RPMS/x86_64/
cd $HOME/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.4.2/ganglia-web-3.4.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia-web%2F3.4.2%2F&ts=1342531342&use_mirror=voxel tar xzvf ganglia-web-3.4.2.tar.gz cd ganglia-web-3.4.2 cp gweb.spec ../../SPECS/
cd ../../SPECS/ rpmbuild -bb gweb.spec ls ../RPMS/noarch/
Note - work done as root account
Scientific Linux 5 instructions
yum install libdbi.x86_64 lua.x86_64 perl-rrdtool.x86_64 rrdtool.x86_64 libconfuse.x86_64
Scientific Linux 6 instructions
yum install apr.x86_64 libconfuse.x86_64
Use some code from http://iuscommunity.org/. specifically yum-plugin-replace and the php53u* packages
rpm -Uvh http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-10.ius.el5.noarch.rpm rpm --import /etc/pki/rpm-gpg/IUS-COMMUNITY-GPG-KEY yum install yum-plugin-replace
yum remove php\*
yum replace php --replace-with php53u yum install php53u php53u-cli php53u-common php53u-gd
rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm /sbin/service gmetad restart
yum install php53.x86_64 php53-gd.x86_64 httpd.x86_64 rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
The gmond client needs to be configured to report to the gmetad collector. Since we break up the cluster into worker nodes, interactive nodes and server nodes there will be 3 “clusters” In addition we have redundant gmetad collectors on each of the head nodes. We are using multicast
In this cluster, machines are either interactive nodes, worker nodes or service machines. The clusters are called InteractiveNodes, WorkerNodes and ServiceMachines.
Due the the nature of the network equipment at ANL. multicast configuration of ganglia will not work. Instead ganglia unicast configuration must be used.
Information required prior to configuration:
Cluster Name | port number |
---|---|
ServiceMachines | 8661 |
WorkerNodes | 8662 |
InteractiveNodes | 8663 |
Open the proper iptables port for the given cluster type. Add the proper line to /etc/sysconfig/iptables and restart iptables.
ServiceMachines cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT
WorkerNodes cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT
InteractiveNodes cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8663 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
the relevant sections of the gmond.conf file for the head nodes, gridftp server and file servers.
In this example the two nodes receiving the gmond information at atlas66.hep.anl.gov and atlas67.hep.anl.gov.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "ServiceMachines" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas66.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas67.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlashn.hep.anl.gov port = 8661 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8661 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8661 }
the relevant sections of the gmond.conf file for the worker nodes. In this example atlas68 and atlas69 are used as the data sources and receive the unicast information.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "WorkerNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes host = atlas68.hep.anl.gov port = 8662 ttl = 1 } udp_send_channel { bind_hostname = yes host = atlas69.hep.anl.gov port = 8662 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8662 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8662 }
the relevant sections of the gmond.conf file for the interactive nodes
In tnis example atlas28 and atlas29 will receive the unicast gmond updates.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "InteractiveNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas28.hep.anl.gov port = 8663 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas29.hep.anl.gov port = 8663 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8663 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8663 }
Steps after installation and configuration of gmond and iptables
/sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on
To start, stop, restart gmond :
/sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart
We run the gmeta client on both machines that could act as the head node. Each machine needs Apache web server running also (httpd).
Add these lines to /etc/ganglia/gmetad.conf file
data_source "ServiceMachines" atlashn.hep.anl.gov:8661 atlas67.hep.anl.gov:8661 data_source "WorkerNodes" atlas68.hep.anl.gov:8662 atlas69.hep.anl.gov:8662 data_source "InteractiveNodes" atlas28.hep.anl.gov:8663
open these ports on the gmetad/gweb servers
# ganglia and web ports -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
Make certain that the httpd package is installed on both machines.
Steps after installation and configuration of gmond,gmeta,httpd and iptables
/sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on /sbin/service gmetad restart /sbin/chkconfig gmetad on /sbin/service httpd restart /sbin/chkconfig httpd on
To start, stop, restart gmond :
/sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart
To start, stop, restart gmetad :
/sbin/service gmetad start /sbin/service gmetad stop /sbin/service gmetad restart
To start, stop, restart httpd :
/sbin/service httpd start /sbin/service httpd stop /sbin/service httpd restart
This section describes a few tips and tricks for troubleshooting and view the ganglia web servers from offsite
nc <Node_name> <port>
where port would be 8661, 8662 or 8663 based on the configuration above. If gmond is running and open to tcp, you should get xml back
Other troubleshooting tips can be found here: http://sourceforge.net/apps/trac/ganglia/wiki/FAQ
To view the ganglia plots from outside of ANL, ssh tunneling can be used.
ssh -D 8888 <user name>@<interactive node>