Table of Contents

Building ganglia (v3.4.0) from source rpm

ganglia and ganglia-web rpms built from source tar balls

Dependencies:

compilation

Steps done as root

Scientific Linux 5 steps

Install rpm forge repo

RPMforge has the latest rrd for RHEL5. Read up on how it installs rpm forge repository. We also use it for libconfuse

Fetching of packages needed to build ganglia rpms
  1. Get rpm-build
    yum install rpm-build
  2. Install libconfuse and rrdtool
    yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 perl-rrdtool.x86_64 php.x86_64 php-gd.x86_64
  3. Install other needed packages
    yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel

Scientific Linux 6 steps

Install EPEL repo

-

 rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm 
Fetching of packages needed to build ganglia rpms
  1. Get rpm-build
    yum install rpm-build
  2. Install libconfuse and rrdtool
    yum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64  php53.x86_64 php53-gd.x86_64
  3. Install other needed packages
    yum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel

Steps done as normal user

Note - these steps are valid for SL 5 or SL 5

Starting from a clean shell and clean area

  1. Create the rpm build areas
    mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
    echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
  2. Get the source code tarball
    cd ~/rpmbuild/SOURCES
    wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1342491498&use_mirror=superb-sea2
  3. Extract it and copy the spec file to the proper place
    tar -zxvf ganglia-3.4.0.tar.gz
    cd ganglia-3.4.0
    cp ganglia.spec ../../SPECS/
  4. Go to SPECS directory and build the rpms
    cd ../../SPECS
    rpmbuild -bb ganglia.spec
  5. Check your work
    ls ../RPMS/x86_64/
  6. Go to SOURCES area fetch gweb code
    cd $HOME/rpmbuild/SOURCES
    wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.4.2/ganglia-web-3.4.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia-web%2F3.4.2%2F&ts=1342531342&use_mirror=voxel
    tar xzvf ganglia-web-3.4.2.tar.gz 
    cd ganglia-web-3.4.2
    cp gweb.spec ../../SPECS/
  7. Now build the rpm
    cd ../../SPECS/
    rpmbuild -bb gweb.spec
    ls ../RPMS/noarch/

Installing new rpms

Note - work done as root account

Scientific Linux 5 instructions

  1. Install rpmforge yum repository.
  2. Install the rrdtool and libconfuse packages
    yum install libdbi.x86_64 lua.x86_64  perl-rrdtool.x86_64 rrdtool.x86_64 libconfuse.x86_64

Scientific Linux 6 instructions

  1. Install EPEL yum repository
  2. Install apr and libconfuse packages
     yum install apr.x86_64 libconfuse.x86_64 

Steps needed on machine with web server when Tier 3 monitoring is used (it uses php53)

Use some code from http://iuscommunity.org/. specifically yum-plugin-replace and the php53u* packages

  1. Install the yum-plugin-replace package
    rpm -Uvh http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-10.ius.el5.noarch.rpm
    rpm --import /etc/pki/rpm-gpg/IUS-COMMUNITY-GPG-KEY
    yum install yum-plugin-replace
  2. Remove any existing php and php53 packages
     yum remove php\* 
  3. Install phpu replacements
    yum replace php --replace-with php53u
    yum install php53u php53u-cli php53u-common php53u-gd
  4. Install the ganglia-web rpm
    rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm
    /sbin/service gmetad restart

On machine with web server (installing gmond, gmetad and gweb):

yum install php53.x86_64 php53-gd.x86_64 httpd.x86_64
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm

On machine w/o web server (gmond only):

rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm
rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm

Configure gmond client

The gmond client needs to be configured to report to the gmetad collector. Since we break up the cluster into worker nodes, interactive nodes and server nodes there will be 3 “clusters” In addition we have redundant gmetad collectors on each of the head nodes. We are using multicast

In this cluster, machines are either interactive nodes, worker nodes or service machines. The clusters are called InteractiveNodes, WorkerNodes and ServiceMachines.

Due the the nature of the network equipment at ANL. multicast configuration of ganglia will not work. Instead ganglia unicast configuration must be used.

Information required prior to configuration:

Cluster Name port number
ServiceMachines 8661
WorkerNodes 8662
InteractiveNodes 8663

Open the proper iptables port for the given cluster type. Add the proper line to /etc/sysconfig/iptables and restart iptables.

ServiceMachines cluster

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8661 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT

WorkerNodes cluster

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8662 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT

InteractiveNodes cluster

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8663 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT

gmond.conf Service machines

the relevant sections of the gmond.conf file for the head nodes, gridftp server and file servers.

In this example the two nodes receiving the gmond information at atlas66.hep.anl.gov and atlas67.hep.anl.gov.

/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "ServiceMachines"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  bind_hostname = yes # Highly recommended, soon to be default.
  host atlas66.hep.anl.gov
  port = 8661
  ttl = 1
}

udp_send_channel {
  bind_hostname = yes # Highly recommended, soon to be default.
  host atlas67.hep.anl.gov
  port = 8661
  ttl = 1
}
udp_send_channel {
  bind_hostname = yes # Highly recommended, soon to be default.
  host atlashn.hep.anl.gov
  port = 8661
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  port = 8661
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8661
}

gmond.conf Worker Nodes

the relevant sections of the gmond.conf file for the worker nodes. In this example atlas68 and atlas69 are used as the data sources and receive the unicast information.

/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "WorkerNodes"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  bind_hostname = yes 
  host = atlas68.hep.anl.gov
  port = 8662
  ttl = 1
}

udp_send_channel {
  bind_hostname = yes 
  host = atlas69.hep.anl.gov
  port = 8662
  ttl = 1
}



/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  port = 8662
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8662
}

gmond.conf Interactive Nodes

the relevant sections of the gmond.conf file for the interactive nodes

In tnis example atlas28 and atlas29 will receive the unicast gmond updates.

/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "InteractiveNodes"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  bind_hostname = yes # Highly recommended, soon to be default.
  host = atlas28.hep.anl.gov
  port = 8663
  ttl = 1
}
udp_send_channel {
  bind_hostname = yes # Highly recommended, soon to be default.
  host = atlas29.hep.anl.gov
  port = 8663
  ttl = 1
}



/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  port = 8663
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8663
}

Starting and Stopping gmond services

Steps after installation and configuration of gmond and iptables

/sbin/service iptables restart 
/sbin/service gmond restart
/sbin/chkconfig gmond on

To start, stop, restart gmond :

/sbin/service gmond start
/sbin/service gmond stop
/sbin/service gmond restart

Configure gmeta client

We run the gmeta client on both machines that could act as the head node. Each machine needs Apache web server running also (httpd).

Add these lines to /etc/ganglia/gmetad.conf file

data_source "ServiceMachines" atlashn.hep.anl.gov:8661 atlas67.hep.anl.gov:8661
data_source "WorkerNodes" atlas68.hep.anl.gov:8662 atlas69.hep.anl.gov:8662
data_source "InteractiveNodes" atlas28.hep.anl.gov:8663

Configure iptables

open these ports on the gmetad/gweb servers

# ganglia and web ports
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT

Configure httpd server

Make certain that the httpd package is installed on both machines.

Starting and Stopping gmond, gmetad and httpd services

Steps after installation and configuration of gmond,gmeta,httpd and iptables

/sbin/service iptables restart 
/sbin/service gmond restart
/sbin/chkconfig gmond on
/sbin/service gmetad restart
/sbin/chkconfig gmetad on
/sbin/service  httpd restart
/sbin/chkconfig httpd on

To start, stop, restart gmond :

/sbin/service gmond start
/sbin/service gmond stop
/sbin/service gmond restart

To start, stop, restart gmetad :

/sbin/service gmetad start
/sbin/service gmetad stop
/sbin/service gmetad restart

To start, stop, restart httpd :

/sbin/service httpd start
/sbin/service httpd stop
/sbin/service httpd restart

Troubleshooting and other tips

This section describes a few tips and tricks for troubleshooting and view the ganglia web servers from offsite

Troubleshooting

nc <Node_name> <port>

where port would be 8661, 8662 or 8663 based on the configuration above. If gmond is running and open to tcp, you should get xml back

Other troubleshooting tips can be found here: http://sourceforge.net/apps/trac/ganglia/wiki/FAQ

View the ganglia plots from outside ANL

To view the ganglia plots from outside of ANL, ssh tunneling can be used.

ssh -D 8888 <user name>@<interactive node>

Other links for configuration of ganglia

http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_quick_start