====== Building ganglia (v3.4.0) from source rpm ======
ganglia and ganglia-web rpms built from source tar balls
Dependencies:
* [[http://apr.apache.org/ | APR]]
* [[http://www.nongnu.org/confuse/ | libConfuse]]
* [[http://expat.sourceforge.net/ | expat]]
* [[http://www.freedesktop.org/wiki/Software/pkg-config | pkg-config]]
* [[http://www.python.org/ | python]]
* [[http://www.pcre.org/ | PCRE]]
* [[http://oss.oetiker.ch/rrdtool/ | RRDtool]]
===== compilation =====
==== Steps done as root ====
=== Scientific Linux 5 steps ===
== Install rpm forge repo ==
RPMforge has the latest rrd for RHEL5. Read up on [[http://wiki.centos.org/AdditionalResources/Repositories/RPMForge/#head-5aabf02717d5b6b12d47edbc5811404998926a1b | how it installs rpm forge repository]]. We also use it for libconfuse
* Modify /etc/yum.repos.d/rpmforge.repo to be:
### Name: RPMforge RPM Repository for RHEL 5 - dag
### URL: http://rpmforge.net/
[rpmforge]
name = RHEL $releasever - RPMforge.net - dag
baseurl = http://apt.sw.be/redhat/el5/en/$basearch/rpmforge
mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge
#mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge
enabled = 1
protect = 0
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag
gpgcheck = 1
includepkgs= rrdtool.x86_64 rrdtool-devel.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 perl-rrdtool.x86_64
Priority=50
== Fetching of packages needed to build ganglia rpms ==
- Get rpm-buildyum install rpm-build
- Install libconfuse and rrdtoolyum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 perl-rrdtool.x86_64 php.x86_64 php-gd.x86_64
- Install other needed packagesyum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
=== Scientific Linux 6 steps ===
== Install EPEL repo ==
- rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm
== Fetching of packages needed to build ganglia rpms ==
- Get rpm-buildyum install rpm-build
- Install libconfuse and rrdtoolyum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 php53.x86_64 php53-gd.x86_64
- Install other needed packagesyum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel
==== Steps done as normal user ====
Note - these steps are valid for SL 5 or SL 5
Starting from a clean shell and clean area
- Create the rpm build areasmkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
- Get the source code tarballcd ~/rpmbuild/SOURCES
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1342491498&use_mirror=superb-sea2
- Extract it and copy the spec file to the proper placetar -zxvf ganglia-3.4.0.tar.gz
cd ganglia-3.4.0
cp ganglia.spec ../../SPECS/
- Go to SPECS directory and build the rpmscd ../../SPECS
rpmbuild -bb ganglia.spec
- Check your workls ../RPMS/x86_64/
- Go to SOURCES area fetch gweb codecd $HOME/rpmbuild/SOURCES
wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.4.2/ganglia-web-3.4.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia-web%2F3.4.2%2F&ts=1342531342&use_mirror=voxel
tar xzvf ganglia-web-3.4.2.tar.gz
cd ganglia-web-3.4.2
cp gweb.spec ../../SPECS/
- Now build the rpmcd ../../SPECS/
rpmbuild -bb gweb.spec
ls ../RPMS/noarch/
==== Installing new rpms ====
**Note - work done as root account**
** Scientific Linux 5 instructions **
- Install rpmforge yum repository.
- Install the rrdtool and libconfuse packagesyum install libdbi.x86_64 lua.x86_64 perl-rrdtool.x86_64 rrdtool.x86_64 libconfuse.x86_64
** Scientific Linux 6 instructions **
- Install EPEL yum repository
- Install apr and libconfuse packages yum install apr.x86_64 libconfuse.x86_64
=== Steps needed on machine with web server when Tier 3 monitoring is used (it uses php53) ===
Use some code from http://iuscommunity.org/. specifically yum-plugin-replace and the php53u* packages
- Install the yum-plugin-replace package
rpm -Uvh http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-10.ius.el5.noarch.rpm
rpm --import /etc/pki/rpm-gpg/IUS-COMMUNITY-GPG-KEY
yum install yum-plugin-replace
- Remove any existing php and php53 packages yum remove php\*
- Install phpu replacements
yum replace php --replace-with php53u
yum install php53u php53u-cli php53u-common php53u-gd
- Install the ganglia-web rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm
/sbin/service gmetad restart
=== On machine with web server (installing gmond, gmetad and gweb): ===
yum install php53.x86_64 php53-gd.x86_64 httpd.x86_64
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm
=== On machine w/o web server (gmond only): ===
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
====== Configure gmond client ======
The gmond client needs to be configured to report to the gmetad collector. Since we break up the cluster into worker nodes, interactive nodes and server nodes
there will be 3 "clusters" In addition we have redundant gmetad collectors on each of the head nodes. We are using multicast
In this cluster, machines are either interactive nodes, worker nodes or service machines. The clusters are called InteractiveNodes, WorkerNodes and ServiceMachines.
Due the the nature of the network equipment at ANL. multicast configuration of ganglia will not work. Instead ganglia unicast configuration must be used.
Information required prior to configuration:
* Determine with cluster (InteractiveNodes,WorkerNodes and ServiceMachines) that a given node will be a member of.
* Determine the port that will be used for each cluster. Each cluster must use a different port number. For example:
^ Cluster Name ^ port number ^
| ServiceMachines | 8661 |
| WorkerNodes | 8662 |
| InteractiveNodes | 8663 |
* Within a given cluster determine the two or three nodes that will receive the unicast information
* For each cluster - determine the data sources - note these are the same
Open the proper iptables port for the given cluster type. Add the proper line to /etc/sysconfig/iptables and restart iptables.
//ServiceMachines cluster//
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8661 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT
//WorkerNodes cluster//
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8662 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT
//InteractiveNodes cluster//
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8663 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
===== gmond.conf Service machines =====
the relevant sections of the gmond.conf file for the head nodes, gridftp server and file servers.
In this example the two nodes receiving the gmond information at atlas66.hep.anl.gov and atlas67.hep.anl.gov.
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 30 /*secs */
}
/*
* The cluster attributes specified will be used as part of the
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "ServiceMachines"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
host atlas66.hep.anl.gov
port = 8661
ttl = 1
}
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
host atlas67.hep.anl.gov
port = 8661
ttl = 1
}
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
host atlashn.hep.anl.gov
port = 8661
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8661
}
/* You can specify as many tcp_accept_channels as you like to share
an xml description of the state of the cluster */
tcp_accept_channel {
port = 8661
}
===== gmond.conf Worker Nodes =====
the relevant sections of the gmond.conf file for the worker nodes. In this example atlas68 and atlas69 are used as the data sources and receive the unicast information.
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 30 /*secs */
}
/*
* The cluster attributes specified will be used as part of the
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "WorkerNodes"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
bind_hostname = yes
host = atlas68.hep.anl.gov
port = 8662
ttl = 1
}
udp_send_channel {
bind_hostname = yes
host = atlas69.hep.anl.gov
port = 8662
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8662
}
/* You can specify as many tcp_accept_channels as you like to share
an xml description of the state of the cluster */
tcp_accept_channel {
port = 8662
}
===== gmond.conf Interactive Nodes =====
the relevant sections of the gmond.conf file for the interactive nodes
In tnis example atlas28 and atlas29 will receive the unicast gmond updates.
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 30 /*secs */
}
/*
* The cluster attributes specified will be used as part of the
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "InteractiveNodes"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
host = atlas28.hep.anl.gov
port = 8663
ttl = 1
}
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
host = atlas29.hep.anl.gov
port = 8663
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8663
}
/* You can specify as many tcp_accept_channels as you like to share
an xml description of the state of the cluster */
tcp_accept_channel {
port = 8663
}
===== Starting and Stopping gmond services =====
Steps after installation and configuration of gmond and iptables
/sbin/service iptables restart
/sbin/service gmond restart
/sbin/chkconfig gmond on
To start, stop, restart gmond :
/sbin/service gmond start
/sbin/service gmond stop
/sbin/service gmond restart
====== Configure gmeta client ======
We run the gmeta client on both machines that could act as the head node. Each machine needs Apache web server running also (httpd).
Add these lines to /etc/ganglia/gmetad.conf file
data_source "ServiceMachines" atlashn.hep.anl.gov:8661 atlas67.hep.anl.gov:8661
data_source "WorkerNodes" atlas68.hep.anl.gov:8662 atlas69.hep.anl.gov:8662
data_source "InteractiveNodes" atlas28.hep.anl.gov:8663
===== Configure iptables =====
open these ports on the gmetad/gweb servers
# ganglia and web ports
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT
===== Configure httpd server =====
Make certain that the httpd package is installed on both machines.
===== Starting and Stopping gmond, gmetad and httpd services =====
Steps after installation and configuration of gmond,gmeta,httpd and iptables
/sbin/service iptables restart
/sbin/service gmond restart
/sbin/chkconfig gmond on
/sbin/service gmetad restart
/sbin/chkconfig gmetad on
/sbin/service httpd restart
/sbin/chkconfig httpd on
To start, stop, restart gmond :
/sbin/service gmond start
/sbin/service gmond stop
/sbin/service gmond restart
To start, stop, restart gmetad :
/sbin/service gmetad start
/sbin/service gmetad stop
/sbin/service gmetad restart
To start, stop, restart httpd :
/sbin/service httpd start
/sbin/service httpd stop
/sbin/service httpd restart
====== Troubleshooting and other tips ======
This section describes a few tips and tricks for troubleshooting and view the ganglia web servers from offsite
===== Troubleshooting =====
* Use the nc command to test the gmond servers. From a machine on the yellow or green ANL networks
nc
where port would be 8661, 8662 or 8663 based on the configuration above. If gmond is running and open to tcp, you should get xml back
Other troubleshooting tips can be found here: http://sourceforge.net/apps/trac/ganglia/wiki/FAQ
===== View the ganglia plots from outside ANL =====
To view the ganglia plots from outside of ANL, ssh tunneling can be used.
* log into an interactive node with -D flag. Use an unprivileged port for example:
ssh -D 8888 @
* Set your web server to use SOCKS 5 proxy server. SOCKS Proxy Server = localhost port = 8888
* point your web server to the IP address of servers running gmetad and Apache webserver 146.139.33.66/gweb or 146.139.33.67/gweb
* stay logged in through ssh connection as the web traffic is routed through the ssh connection.
====== Other links for configuration of ganglia ======
http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_quick_start