Table of Contents
Building ganglia (v3.4.0) from source rpm
ganglia and ganglia-web rpms built from source tar balls
Dependencies:
compilation
Steps done as root
Scientific Linux 5 steps
Install rpm forge repo
RPMforge has the latest rrd for RHEL5. Read up on how it installs rpm forge repository. We also use it for libconfuse
- Modify /etc/yum.repos.d/rpmforge.repo to be:
### Name: RPMforge RPM Repository for RHEL 5 - dag ### URL: http://rpmforge.net/ [rpmforge] name = RHEL $releasever - RPMforge.net - dag baseurl = http://apt.sw.be/redhat/el5/en/$basearch/rpmforge mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge enabled = 1 protect = 0 gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag gpgcheck = 1 includepkgs= rrdtool.x86_64 rrdtool-devel.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 perl-rrdtool.x86_64 Priority=50
Fetching of packages needed to build ganglia rpms
- Get rpm-build
yum install rpm-build
- Install libconfuse and rrdtool
yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 perl-rrdtool.x86_64 php.x86_64 php-gd.x86_64
- Install other needed packages
yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
Scientific Linux 6 steps
Install EPEL repo
-
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm
Fetching of packages needed to build ganglia rpms
- Get rpm-build
yum install rpm-build
- Install libconfuse and rrdtool
yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 php53.x86_64 php53-gd.x86_64
- Install other needed packages
yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
Steps done as normal user
Note - these steps are valid for SL 5 or SL 5
Starting from a clean shell and clean area
- Create the rpm build areas
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
- Get the source code tarball
cd ~/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1342491498&use_mirror=superb-sea2
- Extract it and copy the spec file to the proper place
tar -zxvf ganglia-3.4.0.tar.gz cd ganglia-3.4.0 cp ganglia.spec ../../SPECS/
- Go to SPECS directory and build the rpms
cd ../../SPECS rpmbuild -bb ganglia.spec
- Check your work
ls ../RPMS/x86_64/
- Go to SOURCES area fetch gweb code
cd $HOME/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.4.2/ganglia-web-3.4.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia-web%2F3.4.2%2F&ts=1342531342&use_mirror=voxel tar xzvf ganglia-web-3.4.2.tar.gz cd ganglia-web-3.4.2 cp gweb.spec ../../SPECS/
- Now build the rpm
cd ../../SPECS/ rpmbuild -bb gweb.spec ls ../RPMS/noarch/
Installing new rpms
Note - work done as root account
Scientific Linux 5 instructions
- Install rpmforge yum repository.
- Install the rrdtool and libconfuse packages
yum install libdbi.x86_64 lua.x86_64 perl-rrdtool.x86_64 rrdtool.x86_64 libconfuse.x86_64
Scientific Linux 6 instructions
- Install EPEL yum repository
- Install apr and libconfuse packages
yum install apr.x86_64 libconfuse.x86_64
Steps needed on machine with web server when Tier 3 monitoring is used (it uses php53)
Use some code from http://iuscommunity.org/. specifically yum-plugin-replace and the php53u* packages
- Install the yum-plugin-replace package
rpm -Uvh http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-10.ius.el5.noarch.rpm rpm --import /etc/pki/rpm-gpg/IUS-COMMUNITY-GPG-KEY yum install yum-plugin-replace
- Remove any existing php and php53 packages
yum remove php\*
- Install phpu replacements
yum replace php --replace-with php53u yum install php53u php53u-cli php53u-common php53u-gd
- Install the ganglia-web rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm /sbin/service gmetad restart
On machine with web server (installing gmond, gmetad and gweb):
yum install php53.x86_64 php53-gd.x86_64 httpd.x86_64 rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm
On machine w/o web server (gmond only):
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
Configure gmond client
The gmond client needs to be configured to report to the gmetad collector. Since we break up the cluster into worker nodes, interactive nodes and server nodes there will be 3 “clusters” In addition we have redundant gmetad collectors on each of the head nodes. We are using multicast
In this cluster, machines are either interactive nodes, worker nodes or service machines. The clusters are called InteractiveNodes, WorkerNodes and ServiceMachines.
Due the the nature of the network equipment at ANL. multicast configuration of ganglia will not work. Instead ganglia unicast configuration must be used.
Information required prior to configuration:
- Determine with cluster (InteractiveNodes,WorkerNodes and ServiceMachines) that a given node will be a member of.
- Determine the port that will be used for each cluster. Each cluster must use a different port number. For example:
Cluster Name | port number |
---|---|
ServiceMachines | 8661 |
WorkerNodes | 8662 |
InteractiveNodes | 8663 |
- Within a given cluster determine the two or three nodes that will receive the unicast information
- For each cluster - determine the data sources - note these are the same
Open the proper iptables port for the given cluster type. Add the proper line to /etc/sysconfig/iptables and restart iptables.
ServiceMachines cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT
WorkerNodes cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT
InteractiveNodes cluster
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8663 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
gmond.conf Service machines
the relevant sections of the gmond.conf file for the head nodes, gridftp server and file servers.
In this example the two nodes receiving the gmond information at atlas66.hep.anl.gov and atlas67.hep.anl.gov.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "ServiceMachines" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas66.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas67.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlashn.hep.anl.gov port = 8661 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8661 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8661 }
gmond.conf Worker Nodes
the relevant sections of the gmond.conf file for the worker nodes. In this example atlas68 and atlas69 are used as the data sources and receive the unicast information.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "WorkerNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes host = atlas68.hep.anl.gov port = 8662 ttl = 1 } udp_send_channel { bind_hostname = yes host = atlas69.hep.anl.gov port = 8662 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8662 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8662 }
gmond.conf Interactive Nodes
the relevant sections of the gmond.conf file for the interactive nodes
In tnis example atlas28 and atlas29 will receive the unicast gmond updates.
/* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the <CLUSTER> * tag that will wrap all hosts collected by this instance. */ cluster { name = "InteractiveNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas28.hep.anl.gov port = 8663 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas29.hep.anl.gov port = 8663 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8663 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8663 }
Starting and Stopping gmond services
Steps after installation and configuration of gmond and iptables
/sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on
To start, stop, restart gmond :
/sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart
Configure gmeta client
We run the gmeta client on both machines that could act as the head node. Each machine needs Apache web server running also (httpd).
Add these lines to /etc/ganglia/gmetad.conf file
data_source "ServiceMachines" atlashn.hep.anl.gov:8661 atlas67.hep.anl.gov:8661 data_source "WorkerNodes" atlas68.hep.anl.gov:8662 atlas69.hep.anl.gov:8662 data_source "InteractiveNodes" atlas28.hep.anl.gov:8663
Configure iptables
open these ports on the gmetad/gweb servers
# ganglia and web ports -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
Configure httpd server
Make certain that the httpd package is installed on both machines.
Starting and Stopping gmond, gmetad and httpd services
Steps after installation and configuration of gmond,gmeta,httpd and iptables
/sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on /sbin/service gmetad restart /sbin/chkconfig gmetad on /sbin/service httpd restart /sbin/chkconfig httpd on
To start, stop, restart gmond :
/sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart
To start, stop, restart gmetad :
/sbin/service gmetad start /sbin/service gmetad stop /sbin/service gmetad restart
To start, stop, restart httpd :
/sbin/service httpd start /sbin/service httpd stop /sbin/service httpd restart
Troubleshooting and other tips
This section describes a few tips and tricks for troubleshooting and view the ganglia web servers from offsite
Troubleshooting
- Use the nc command to test the gmond servers. From a machine on the yellow or green ANL networks
nc <Node_name> <port>
where port would be 8661, 8662 or 8663 based on the configuration above. If gmond is running and open to tcp, you should get xml back
Other troubleshooting tips can be found here: http://sourceforge.net/apps/trac/ganglia/wiki/FAQ
View the ganglia plots from outside ANL
To view the ganglia plots from outside of ANL, ssh tunneling can be used.
- log into an interactive node with -D flag. Use an unprivileged port for example:
ssh -D 8888 <user name>@<interactive node>
- Set your web server to use SOCKS 5 proxy server. SOCKS Proxy Server = localhost port = 8888
- point your web server to the IP address of servers running gmetad and Apache webserver 146.139.33.66/gweb or 146.139.33.67/gweb
- stay logged in through ssh connection as the web traffic is routed through the ssh connection.