Tracking VMWare ESX 3.x or ESXi Host resources with Cacti

I’ve been using Cacti,  RRDTool, SNMP and custom scripts running on Linux for several years to collect and display historical data regarding Network, System, and FlexLM license resource usage. I recently began tracking the Host resources used by ESX.  I was interested in tracking the following:

  • CPU utilization
  • Memory Free
  • Total Memory
  • IO activity
  • Network traffic

I found that the IO activity and Network traffic could be obtained via standard SNMP. The network traffic can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::netHCKbRx and VMWARE-RESOURCES-MIB::netHCKbTx.  Similarly, the IO can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::kbRead and VMWARE-RESOURCES-MIB::kbWritten.

I did not identify items in the vmware esx mib that I could use via SNMP for the CPU utilization or Free Memory. I did find that these items are available by running esxtop or resxtop in batch mode. Running the utility esxtop on the ESX service console in batch outputs a comma separated file, csv. For example: running the command

    esxtop -b -n 2 -d 10 > esxtop-output.csv

will run esxtop in batch, take 2 samples with a delay of 10 seconds and write a 3 row csv file named “esxtop-output.csv”. The 1st line of the .csv file contains the column headers. The ESX Host cpu utilization time will be in a column named:

“\\[your-esx-hostname]\Physical Cpu(_Total)\% Processor Time”

The ESX Host free memory will be in a column named:

“\\[your-esx-hostname]\Memory\Free MBytes”.

The ESX Host total memory will be in a column named:

“\\[your-esx-hostname]\Memory\Machine MBytes”.

Today, I run 2 cronjobs; 1 on my ESX service console to run esxtop in batch and save the data to a .csv file; and the 2nd on my Cacti monitoring server. On the Cacti monitoring server, I run a “driver” script to collect data items into an RRDTool database every 5 minutes. My Cacti monitoring server uses identity based ssh/scp to get the .csv file that is produced by the cronjob that runs on the ESX service console.The RRD database is mapped as a datasource and graphed via Cacti.  I’m currently looking at using the VMWare remote cli version of esxtop, rexstop. The use of resxtop, would eliminate the need for a cronjob on the ESX service console. This will allow my utilities to run against both ESX and ESXi.


Scripts


ESX RRDTool Data update Script. Call with Arguments DBName, Hostname and community, this driver ccript calls this script to obtain the individual items

#!/bin/bash
. $HOME/setrrdtool_vars
PERF_BASE=$HOME/perfmon-esx
PATH=${PERF_BASE}/bin:${PATH}
#
# arg 1 - DBNAME
# arg 2 - monitor host
# arg 3 - community
#
if [ "${1}" = "" ]; then
   echo "must supply rrd dbname"
   call_errror="yes"
fi
if [ "${2}" = "" ]; then
   echo "must supply host"
   call_errror="yes"
fi
if [ "${3}" = "" ]; then
   echo "must supply community"
   call_errror="yes"
fi
 
if [ "${call_error}" = "yes" ]; then
   exit
fi
 
DBNAME=${1}
MHOST=${2}
COMMUNITY=${3}
 
RRD_DB=${PERF_BASE}/db/${DBNAME}
RRD_LOG=${PERF_BASE}/log/${DBNAME}.log
 
TOD=`date`
 
for tline in ${T}; do
    itemname=`echo ${tline}|awk -F":" '{print $1}'`
    itemval=`echo ${tline}|awk -F":" '{print $2}'`
 
    case "${itemname}" in
	"total_cpu")
	   CPU=${itemval}
	   ;;
      "available_memory")
         MEMAVAIL=${itemval}
         ;;
      "total_memory")
         TOTMEM=${itemval}
         ;;
      "iokb_total")
         IOKB=${itemval}
         ;;
       "netkb_total")
         NETKB=${itemval}
         ;;
	*)
 
	  ;;
 
    esac
done
 
rrdtool update ${RRD_DB} --template \
CPU_utilization:Memory_available:IO_KBytes:Net_KBytes \
N:${CPU}:${MEMAVAIL}:${IOKB}:${NETKB}
 
exit

ESX Data capture Script. Call with Arguments Hostname and community. This uses SSH/SCP, SNMP and some additional helper scripts to parse the csv data returned by esxtop

#!/bin/bash
. $HOME/setrrdtool_vars
export MIBS=ALL
# arg 1 - monitor host
# arg 2 - community
 
if [ "${1}" = "" ]; then
   echo "must supply host to monitor"
   call_errror="yes"
else
   MHOST=${1}
fi
if [ "${2}" = "" ]; then
   echo "must supply community"
   call_errror="yes"
else
   COMMUNITY=${2}
fi
 
if [ "${call_error}" = "yes" ]; then
   exit
fi
 
PERFDIR=$HOME/perfmon-esx
PATH=${PATH}:${PERFDIR}/bin:${PERFDIR}/bin/esxstats
WORKDIR=${PERFDIR}/tmp
ESXTOPCSV=mrtg-esxtop.csv
CSVfile=${WORKDIR}/${MHOST}-${ESXTOPCSV}
export PATH
 
if [ ! -d ${WORKDIR} ]; then
   mkdir ${WORKDIR}
fi
 
# use identity based ssh to copy latest csv file from ${MHOST}
# could replace with resxtop and output output to ${CSVfile}
scp -Bq ${MHOST}:${ESXTOPCSV} ${CSVfile}
 
# Begin CPU
get_cpustats ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-cpustat.tmp
CPUTOT=`cat ${WORKDIR}/${MHOST}-cpustat.tmp | grep -v ${MHOST}`
 
linenum=1
for line in ${CPUTOT}; do
  if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
     skipping=1
     let "linenum = linenum + 1"
  else
     cpupct_total=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'`
     let "linenum = linenum + 1"
  fi
done
#End CPU
 
# Host Available Memory
get_machine_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-machinemem.tmp
MACHINEMEM=`cat ${WORKDIR}/${MHOST}-machinemem.tmp | grep -v ${MHOST}`
 
linenum=1
for line in ${MACHINEMEM}; do
  if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
     skipping=1
     let "linenum = linenum + 1"
  else
     tMachineMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'`
     # Adjust answer to bytes
     let "MachineMem = tMachineMem * 1024"
     let "linenum = linenum + 1"
  fi
done
let "TotalMem = MachineMem"
 
# Begin Used Memory
 
# Kernel Used Memory
linecnt=0
get_kern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-kernused.tmp
KERNMEM=`cat ${WORKDIR}/${MHOST}-kernused.tmp | grep -v ${MHOST}`
 
linenum=1
for line in ${KERNMEM}; do
  if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
     skipping=1
     let "linenum = linenum + 1"
  else
     tKernMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'`
     # Adjust answer to bytes
     let "KernMem = tKernMem * 1024"
     let "linenum = linenum + 1"
  fi
done
 
# Non-Kernel Used Memory
get_nonkern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-nonkernused.tmp
NONKERNMEM=`cat ${WORKDIR}/${MHOST}-nonkernused.tmp | grep -v ${MHOST}`
 
linenum=1
for line in ${NONKERNMEM}; do
  if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
     skipping=1
     let "linenum = linenum + 1"
  else
     tNonKernMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'`
     # Adjust answer to bytes
     let "NonKernMem = tNonKernMem * 1024"
     let "linenum = linenum + 1"
  fi
done
 
# End Used Memory
 
# Begin Free Memory
get_free_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-freemem.tmp
FREET=`cat ${WORKDIR}/${MHOST}-freemem.tmp | grep -v ${MHOST}`
 
linenum=1
for line in ${FREET}; do
  if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
     skipping=1
     let "linenum = linenum + 1"
  else
     tAvailMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'`
     # Adjust answer to bytes
     let "AvailMem = tAvailMem * 1024"
     let "linenum = linenum + 1"
  fi
 
done
# End Free Memory
 
#IO 
 
#Get IO READ Write each VM
ALLREAD=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbRead |awk -F":" '{print $4}'|awk -F" " '{print $1}'`
ALLWRITE=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbWritten|awk -F":" '{print $4}'|awk -F" " '{print $1}'`
#
IORead=0
for iobr in ${ALLREAD}; do
    if [ "${iobr}" = "" -o "${iobr}" = " " ]; then
        skipping=1
    else
       let "IORead = IORead + iobr"
    fi
done
IOWritten=0
for iobw in ${ALLWRITE}; do
    if [ "${iobw}" = "" -o "${iobw}" = " " ]; then
        skipping=1
    else
       let "IOWritten = IOWitten + iobw"
    fi
done
 
let "IOTotal=IORead + IOWritten"
 
#Get NetworkBytesTotalPerSec from each interface and calculate the total
ALLNET_RX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbRx |awk -F":" '{print $4}'|awk -F" " '{print $1}'`
ALLNET_TX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbTx |awk -F":" '{print $4}'|awk -F" " '{print $1}'`
#
NetRX=0
for netr in ${ALLNET_RX}; do
    if [ "${netr}" = "" -o "${netr}" = " " ]; then
        skipping=1
    else
       let "NetRX = NetRX + netr"
    fi
done
#
NetTX=0
for nett in ${ALLNET_TX}; do
    if [ "${nett}" = "" -o "${nett}" = " " ]; then
        skipping=1
    else
       let "NetTX = NetTX + nett"
    fi
done
let "NetTotal=NetRX + NetTX"
 
echo total_cpu:${cpupct_total} total_memory:${TotalMem} available_memory:${AvailMem} iokb_total:${IOTotal} netkb_total:${NetTotal}
 
exit

ESX Data parse script(s). Call each one with the esxtop csv file to grab the data. Uses this awk script

each script is a seperate file
# get_cpustats
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="Physical Cpu(_Total)" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'
 
# get_free_mem
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="Free MBytes" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'
 
# get_machine_mem
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="Machine MBytes" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'
 
#get_kern_used_mem
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="Kernel MBytes" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'
 
#get_managed_mem
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="Kernel Managed MBytes" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'
 
#get_nonkern_used_mem
CSVfile=$1
NDX=`cat ${CSVfile} | awk -v str="NonKernel MBytes" -f ${2}/get_index_awk`
cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}'

awk script used by get_cpustats, etc

BEGIN {FS=","}
      {colndx=0;
       for (i=1; i<=NF; i++) {
           if (index($i,str) >0 ) {
               colndx=i;
               print colndx;
               i=NF;
               break;
               }
           }
      }

[ad#ad-4]

Be Sociable, Share!
This entry was posted in virtualization, vmware and tagged , , , , , , , , . Bookmark the permalink.
  • pen

    Hello where can I download your scripts?

  • Mike

    Like the prior comment, it would be great if you made your script available for download. I have a 3 host cluster I am trying to monitor.

  • david

    Mike and Pen, I’ve upgraded the post to include my scripts with a minimal of documentation.

    David

  • Nicolai Rasmussen

    http://www.unnoc.org/ utilizes the VI Perl Toolkit (viperformance.pl) to graph performance stats for ESX(i) hosts.
    You don’t need snmp :)

    It would be really neat with a direct implementation to Cacti though.

  • Pingback: Natesbox.com » Blog Archive » VMWare ESX Monitoring With Cacti