====== Building ganglia (v3.4.0) from source rpm ====== ganglia and ganglia-web rpms built from source tar balls Dependencies: * [[http://apr.apache.org/ | APR]] * [[http://www.nongnu.org/confuse/ | libConfuse]] * [[http://expat.sourceforge.net/ | expat]] * [[http://www.freedesktop.org/wiki/Software/pkg-config | pkg-config]] * [[http://www.python.org/ | python]] * [[http://www.pcre.org/ | PCRE]] * [[http://oss.oetiker.ch/rrdtool/ | RRDtool]] ===== compilation ===== ==== Steps done as root ==== === Scientific Linux 5 steps === == Install rpm forge repo == RPMforge has the latest rrd for RHEL5. Read up on [[http://wiki.centos.org/AdditionalResources/Repositories/RPMForge/#head-5aabf02717d5b6b12d47edbc5811404998926a1b | how it installs rpm forge repository]]. We also use it for libconfuse * Modify /etc/yum.repos.d/rpmforge.repo to be: ### Name: RPMforge RPM Repository for RHEL 5 - dag ### URL: http://rpmforge.net/ [rpmforge] name = RHEL $releasever - RPMforge.net - dag baseurl = http://apt.sw.be/redhat/el5/en/$basearch/rpmforge mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge enabled = 1 protect = 0 gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag gpgcheck = 1 includepkgs= rrdtool.x86_64 rrdtool-devel.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 perl-rrdtool.x86_64 Priority=50 == Fetching of packages needed to build ganglia rpms == - Get rpm-buildyum install rpm-build - Install libconfuse and rrdtoolyum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 perl-rrdtool.x86_64 php.x86_64 php-gd.x86_64 - Install other needed packagesyum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel === Scientific Linux 6 steps === == Install EPEL repo == - rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm == Fetching of packages needed to build ganglia rpms == - Get rpm-buildyum install rpm-build - Install libconfuse and rrdtoolyum install libdbi.x86_64 lua.x86_64 libconfuse.x86_64 libconfuse-devel.x86_64 rrdtool.x86_64 rrdtool-devel.x86_64 php53.x86_64 php53-gd.x86_64 - Install other needed packagesyum install libpng-devel libart_lgpl-devel python-devel pcre-devel freetype-devel apr-devel libconfuse-devel expat-devel ==== Steps done as normal user ==== Note - these steps are valid for SL 5 or SL 5 Starting from a clean shell and clean area - Create the rpm build areasmkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros - Get the source code tarballcd ~/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1342491498&use_mirror=superb-sea2 - Extract it and copy the spec file to the proper placetar -zxvf ganglia-3.4.0.tar.gz cd ganglia-3.4.0 cp ganglia.spec ../../SPECS/ - Go to SPECS directory and build the rpmscd ../../SPECS rpmbuild -bb ganglia.spec - Check your workls ../RPMS/x86_64/ - Go to SOURCES area fetch gweb codecd $HOME/rpmbuild/SOURCES wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.4.2/ganglia-web-3.4.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia-web%2F3.4.2%2F&ts=1342531342&use_mirror=voxel tar xzvf ganglia-web-3.4.2.tar.gz cd ganglia-web-3.4.2 cp gweb.spec ../../SPECS/ - Now build the rpmcd ../../SPECS/ rpmbuild -bb gweb.spec ls ../RPMS/noarch/ ==== Installing new rpms ==== **Note - work done as root account** ** Scientific Linux 5 instructions ** - Install rpmforge yum repository. - Install the rrdtool and libconfuse packagesyum install libdbi.x86_64 lua.x86_64 perl-rrdtool.x86_64 rrdtool.x86_64 libconfuse.x86_64 ** Scientific Linux 6 instructions ** - Install EPEL yum repository - Install apr and libconfuse packages yum install apr.x86_64 libconfuse.x86_64 === Steps needed on machine with web server when Tier 3 monitoring is used (it uses php53) === Use some code from http://iuscommunity.org/. specifically yum-plugin-replace and the php53u* packages - Install the yum-plugin-replace package rpm -Uvh http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-10.ius.el5.noarch.rpm rpm --import /etc/pki/rpm-gpg/IUS-COMMUNITY-GPG-KEY yum install yum-plugin-replace - Remove any existing php and php53 packages yum remove php\* - Install phpu replacements yum replace php --replace-with php53u yum install php53u php53u-cli php53u-common php53u-gd - Install the ganglia-web rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm /sbin/service gmetad restart === On machine with web server (installing gmond, gmetad and gweb): === yum install php53.x86_64 php53-gd.x86_64 httpd.x86_64 rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/noarch/ganglia-web-3.4.2-1.noarch.rpm === On machine w/o web server (gmond only): === rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-3.4.0-1.x86_64.rpm rpm -iv ~dbenjamin/rpmbuild/RPMS/x86_64/ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm ====== Configure gmond client ====== The gmond client needs to be configured to report to the gmetad collector. Since we break up the cluster into worker nodes, interactive nodes and server nodes there will be 3 "clusters" In addition we have redundant gmetad collectors on each of the head nodes. We are using multicast In this cluster, machines are either interactive nodes, worker nodes or service machines. The clusters are called InteractiveNodes, WorkerNodes and ServiceMachines. Due the the nature of the network equipment at ANL. multicast configuration of ganglia will not work. Instead ganglia unicast configuration must be used. Information required prior to configuration: * Determine with cluster (InteractiveNodes,WorkerNodes and ServiceMachines) that a given node will be a member of. * Determine the port that will be used for each cluster. Each cluster must use a different port number. For example: ^ Cluster Name ^ port number ^ | ServiceMachines | 8661 | | WorkerNodes | 8662 | | InteractiveNodes | 8663 | * Within a given cluster determine the two or three nodes that will receive the unicast information * For each cluster - determine the data sources - note these are the same Open the proper iptables port for the given cluster type. Add the proper line to /etc/sysconfig/iptables and restart iptables. //ServiceMachines cluster// -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT //WorkerNodes cluster// -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT //InteractiveNodes cluster// -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 8663 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT ===== gmond.conf Service machines ===== the relevant sections of the gmond.conf file for the head nodes, gridftp server and file servers. In this example the two nodes receiving the gmond information at atlas66.hep.anl.gov and atlas67.hep.anl.gov. /* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the * tag that will wrap all hosts collected by this instance. */ cluster { name = "ServiceMachines" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas66.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlas67.hep.anl.gov port = 8661 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host atlashn.hep.anl.gov port = 8661 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8661 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8661 } ===== gmond.conf Worker Nodes ===== the relevant sections of the gmond.conf file for the worker nodes. In this example atlas68 and atlas69 are used as the data sources and receive the unicast information. /* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the * tag that will wrap all hosts collected by this instance. */ cluster { name = "WorkerNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes host = atlas68.hep.anl.gov port = 8662 ttl = 1 } udp_send_channel { bind_hostname = yes host = atlas69.hep.anl.gov port = 8662 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8662 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8662 } ===== gmond.conf Interactive Nodes ===== the relevant sections of the gmond.conf file for the interactive nodes In tnis example atlas28 and atlas29 will receive the unicast gmond updates. /* This configuration is as close to 2.5.x default behavior as possible The values closely match ./gmond/metric.h definitions in 2.5.x */ globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 20 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } /* * The cluster attributes specified will be used as part of the * tag that will wrap all hosts collected by this instance. */ cluster { name = "InteractiveNodes" owner = "unspecified" latlong = "unspecified" url = "unspecified" } /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas28.hep.anl.gov port = 8663 ttl = 1 } udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. host = atlas29.hep.anl.gov port = 8663 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { port = 8663 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8663 } ===== Starting and Stopping gmond services ===== Steps after installation and configuration of gmond and iptables /sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on To start, stop, restart gmond : /sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart ====== Configure gmeta client ====== We run the gmeta client on both machines that could act as the head node. Each machine needs Apache web server running also (httpd). Add these lines to /etc/ganglia/gmetad.conf file data_source "ServiceMachines" atlashn.hep.anl.gov:8661 atlas67.hep.anl.gov:8661 data_source "WorkerNodes" atlas68.hep.anl.gov:8662 atlas69.hep.anl.gov:8662 data_source "InteractiveNodes" atlas28.hep.anl.gov:8663 ===== Configure iptables ===== open these ports on the gmetad/gweb servers # ganglia and web ports -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8661 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8662 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 8663 -j ACCEPT ===== Configure httpd server ===== Make certain that the httpd package is installed on both machines. ===== Starting and Stopping gmond, gmetad and httpd services ===== Steps after installation and configuration of gmond,gmeta,httpd and iptables /sbin/service iptables restart /sbin/service gmond restart /sbin/chkconfig gmond on /sbin/service gmetad restart /sbin/chkconfig gmetad on /sbin/service httpd restart /sbin/chkconfig httpd on To start, stop, restart gmond : /sbin/service gmond start /sbin/service gmond stop /sbin/service gmond restart To start, stop, restart gmetad : /sbin/service gmetad start /sbin/service gmetad stop /sbin/service gmetad restart To start, stop, restart httpd : /sbin/service httpd start /sbin/service httpd stop /sbin/service httpd restart ====== Troubleshooting and other tips ====== This section describes a few tips and tricks for troubleshooting and view the ganglia web servers from offsite ===== Troubleshooting ===== * Use the nc command to test the gmond servers. From a machine on the yellow or green ANL networks nc where port would be 8661, 8662 or 8663 based on the configuration above. If gmond is running and open to tcp, you should get xml back Other troubleshooting tips can be found here: http://sourceforge.net/apps/trac/ganglia/wiki/FAQ ===== View the ganglia plots from outside ANL ===== To view the ganglia plots from outside of ANL, ssh tunneling can be used. * log into an interactive node with -D flag. Use an unprivileged port for example: ssh -D 8888 @ * Set your web server to use SOCKS 5 proxy server. SOCKS Proxy Server = localhost port = 8888 * point your web server to the IP address of servers running gmetad and Apache webserver 146.139.33.66/gweb or 146.139.33.67/gweb * stay logged in through ssh connection as the web traffic is routed through the ssh connection. ====== Other links for configuration of ganglia ====== http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_quick_start