David's technobabble Rotating Header Image

Tracking VMWare ESX 3.x or ESXi Host resources with Cacti

I’ve been using Cacti,  RRDTool, SNMP and custom scripts running on Linux for several years to collect and display historical data regarding Network, System, and FlexLM license resource usage. I recently began tracking the Host resources used by ESX.  I was interested in tracking the following:

  • CPU utilization
  • Memory Free
  • Total Memory
  • IO activity
  • Network traffic

I found that the IO activity and Network traffic could be obtained via standard SNMP. The network traffic can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::netHCKbRx and VMWARE-RESOURCES-MIB::netHCKbTx.  Similarly, the IO can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::kbRead and VMWARE-RESOURCES-MIB::kbWritten.

I did not identify items in the vmware esx mib that I could use via SNMP for the CPU utilization or Free Memory. I did find that these items are available by running esxtop or resxtop in batch mode. Running the utility esxtop on the ESX service console in batch outputs a comma separated file, csv. For example: running the command

    esxtop -b -n 2 -d 10 > esxtop-output.csv

will run esxtop in batch, take 2 samples with a delay of 10 seconds and write a 3 row csv file named “esxtop-output.csv”. The 1st line of the .csv file contains the column headers. The ESX Host cpu utilization time will be in a column named:

“\\[your-esx-hostname]\Physical Cpu(_Total)\% Processor Time”

The ESX Host free memory will be in a column named:

“\\[your-esx-hostname]\Memory\Free MBytes”.

The ESX Host total memory will be in a column named:

“\\[your-esx-hostname]\Memory\Machine MBytes”.

Today, I run 2 cronjobs; 1 on my ESX service console to run esxtop in batch and save the data to a .csv file; and the 2nd on my Cacti monitoring server. On the Cacti monitoring server, I run a “driver” script to collect data items into an RRDTool database every 5 minutes. My Cacti monitoring server uses identity based ssh/scp to get the .csv file that is produced by the cronjob that runs on the ESX service console.The RRD database is mapped as a datasource and graphed via Cacti.  I’m currently looking at using the VMWare remote cli version of esxtop, rexstop. The use of resxtop, would eliminate the need for a cronjob on the ESX service console. This will allow my utilities to run against both ESX and ESXi.


Scripts


ESX RRDTool Data update Script. Call with Arguments DBName, Hostname and community, this driver ccript calls this script to obtain the individual items

  1. #!/bin/bash
  2. . $HOME/setrrdtool_vars
  3. PERF_BASE=$HOME/perfmon-esx
  4. PATH=${PERF_BASE}/bin:${PATH}
  5. #
  6. # arg 1 – DBNAME
  7. # arg 2 – monitor host
  8. # arg 3 – community
  9. #
  10. if [ "${1}" = "" ]; then
  11.    echo "must supply rrd dbname"
  12.    call_errror="yes"
  13. fi
  14. if [ "${2}" = "" ]; then
  15.    echo "must supply host"
  16.    call_errror="yes"
  17. fi
  18. if [ "${3}" = "" ]; then
  19.    echo "must supply community"
  20.    call_errror="yes"
  21. fi
  22.  
  23. if [ "${call_error}" = "yes" ]; then
  24.    exit
  25. fi
  26.  
  27. DBNAME=${1}
  28. MHOST=${2}
  29. COMMUNITY=${3}
  30.  
  31. RRD_DB=${PERF_BASE}/db/${DBNAME}
  32. RRD_LOG=${PERF_BASE}/log/${DBNAME}.log
  33.  
  34. TOD=`date`
  35.  
  36. for tline in ${T}; do
  37.     itemname=`echo ${tline}|awk -F":" ‘{print $1}’`
  38.     itemval=`echo ${tline}|awk -F":" ‘{print $2}’`
  39.  
  40.     case "${itemname}" in
  41.         "total_cpu")
  42.            CPU=${itemval}
  43.            ;;
  44.       "available_memory")
  45.          MEMAVAIL=${itemval}
  46.          ;;
  47.       "total_memory")
  48.          TOTMEM=${itemval}
  49.          ;;
  50.       "iokb_total")
  51.          IOKB=${itemval}
  52.          ;;
  53.        "netkb_total")
  54.          NETKB=${itemval}
  55.          ;;
  56.         *)
  57.  
  58.           ;;
  59.  
  60.     esac
  61. done
  62.  
  63. rrdtool update ${RRD_DB} –template \
  64. CPU_utilization:Memory_available:IO_KBytes:Net_KBytes \
  65. N:${CPU}:${MEMAVAIL}:${IOKB}:${NETKB}
  66.  
  67. exit

ESX Data capture Script. Call with Arguments Hostname and community. This uses SSH/SCP, SNMP and some additional helper scripts to parse the csv data returned by esxtop

  1. #!/bin/bash
  2. . $HOME/setrrdtool_vars
  3. export MIBS=ALL
  4. # arg 1 – monitor host
  5. # arg 2 – community
  6.  
  7. if [ "${1}" = "" ]; then
  8.    echo "must supply host to monitor"
  9.    call_errror="yes"
  10. else
  11.    MHOST=${1}
  12. fi
  13. if [ "${2}" = "" ]; then
  14.    echo "must supply community"
  15.    call_errror="yes"
  16. else
  17.    COMMUNITY=${2}
  18. fi
  19.  
  20. if [ "${call_error}" = "yes" ]; then
  21.    exit
  22. fi
  23.  
  24. PERFDIR=$HOME/perfmon-esx
  25. PATH=${PATH}:${PERFDIR}/bin:${PERFDIR}/bin/esxstats
  26. WORKDIR=${PERFDIR}/tmp
  27. ESXTOPCSV=mrtg-esxtop.csv
  28. CSVfile=${WORKDIR}/${MHOST}-${ESXTOPCSV}
  29. export PATH
  30.  
  31. if [ ! -d ${WORKDIR} ]; then
  32.    mkdir ${WORKDIR}
  33. fi
  34.  
  35. # use identity based ssh to copy latest csv file from ${MHOST}
  36. # could replace with resxtop and output output to ${CSVfile}
  37. scp -Bq ${MHOST}:${ESXTOPCSV} ${CSVfile}
  38.  
  39. # Begin CPU
  40. get_cpustats ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-cpustat.tmp
  41. CPUTOT=`cat ${WORKDIR}/${MHOST}-cpustat.tmp | grep -v ${MHOST}`
  42.  
  43. linenum=1
  44. for line in ${CPUTOT}; do
  45.   if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
  46.      skipping=1
  47.      let "linenum = linenum + 1"
  48.   else
  49.      cpupct_total=`echo ${line} | awk -F‘"’ ‘{print $2}’ | awk -F‘"’ ‘{print $1}’`
  50.      let "linenum = linenum + 1"
  51.   fi
  52. done
  53. #End CPU
  54.  
  55. # Host Available Memory
  56. get_machine_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-machinemem.tmp
  57. MACHINEMEM=`cat ${WORKDIR}/${MHOST}-machinemem.tmp | grep -v ${MHOST}`
  58.  
  59. linenum=1
  60. for line in ${MACHINEMEM}; do
  61.   if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
  62.      skipping=1
  63.      let "linenum = linenum + 1"
  64.   else
  65.      tMachineMem=`echo ${line} | awk -F‘"’ ‘{print $2}’ | awk -F‘"’ ‘{print $1}’`
  66.      # Adjust answer to bytes
  67.      let "MachineMem = tMachineMem * 1024"
  68.      let "linenum = linenum + 1"
  69.   fi
  70. done
  71. let "TotalMem = MachineMem"
  72.  
  73. # Begin Used Memory
  74.  
  75. # Kernel Used Memory
  76. linecnt=0
  77. get_kern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-kernused.tmp
  78. KERNMEM=`cat ${WORKDIR}/${MHOST}-kernused.tmp | grep -v ${MHOST}`
  79.  
  80. linenum=1
  81. for line in ${KERNMEM}; do
  82.   if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
  83.      skipping=1
  84.      let "linenum = linenum + 1"
  85.   else
  86.      tKernMem=`echo ${line} | awk -F‘"’ ‘{print $2}’ | awk -F‘"’ ‘{print $1}’`
  87.      # Adjust answer to bytes
  88.      let "KernMem = tKernMem * 1024"
  89.      let "linenum = linenum + 1"
  90.   fi
  91. done
  92.  
  93. # Non-Kernel Used Memory
  94. get_nonkern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-nonkernused.tmp
  95. NONKERNMEM=`cat ${WORKDIR}/${MHOST}-nonkernused.tmp | grep -v ${MHOST}`
  96.  
  97. linenum=1
  98. for line in ${NONKERNMEM}; do
  99.   if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
  100.      skipping=1
  101.      let "linenum = linenum + 1"
  102.   else
  103.      tNonKernMem=`echo ${line} | awk -F‘"’ ‘{print $2}’ | awk -F‘"’ ‘{print $1}’`
  104.      # Adjust answer to bytes
  105.      let "NonKernMem = tNonKernMem * 1024"
  106.      let "linenum = linenum + 1"
  107.   fi
  108. done
  109.  
  110. # End Used Memory
  111.  
  112. # Begin Free Memory
  113. get_free_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-freemem.tmp
  114. FREET=`cat ${WORKDIR}/${MHOST}-freemem.tmp | grep -v ${MHOST}`
  115.  
  116. linenum=1
  117. for line in ${FREET}; do
  118.   if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then
  119.      skipping=1
  120.      let "linenum = linenum + 1"
  121.   else
  122.      tAvailMem=`echo ${line} | awk -F‘"’ ‘{print $2}’ | awk -F‘"’ ‘{print $1}’`
  123.      # Adjust answer to bytes
  124.      let "AvailMem = tAvailMem * 1024"
  125.      let "linenum = linenum + 1"
  126.   fi
  127.  
  128. done
  129. # End Free Memory
  130.  
  131. #IO
  132.  
  133. #Get IO READ Write each VM
  134. ALLREAD=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbRead |awk -F":" ‘{print $4}’|awk -F" " ‘{print $1}’`
  135. ALLWRITE=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbWritten|awk -F":" ‘{print $4}’|awk -F" " ‘{print $1}’`
  136. #
  137. IORead=0
  138. for iobr in ${ALLREAD}; do
  139.     if [ "${iobr}" = "" -o "${iobr}" = " " ]; then
  140.         skipping=1
  141.     else
  142.        let "IORead = IORead + iobr"
  143.     fi
  144. done
  145. IOWritten=0
  146. for iobw in ${ALLWRITE}; do
  147.     if [ "${iobw}" = "" -o "${iobw}" = " " ]; then
  148.         skipping=1
  149.     else
  150.        let "IOWritten = IOWitten + iobw"
  151.     fi
  152. done
  153.  
  154. let "IOTotal=IORead + IOWritten"
  155.  
  156. #Get NetworkBytesTotalPerSec from each interface and calculate the total
  157. ALLNET_RX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbRx |awk -F":" ‘{print $4}’|awk -F" " ‘{print $1}’`
  158. ALLNET_TX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbTx |awk -F":" ‘{print $4}’|awk -F" " ‘{print $1}’`
  159. #
  160. NetRX=0
  161. for netr in ${ALLNET_RX}; do
  162.     if [ "${netr}" = "" -o "${netr}" = " " ]; then
  163.         skipping=1
  164.     else
  165.        let "NetRX = NetRX + netr"
  166.     fi
  167. done
  168. #
  169. NetTX=0
  170. for nett in ${ALLNET_TX}; do
  171.     if [ "${nett}" = "" -o "${nett}" = " " ]; then
  172.         skipping=1
  173.     else
  174.        let "NetTX = NetTX + nett"
  175.     fi
  176. done
  177. let "NetTotal=NetRX + NetTX"
  178.  
  179. echo total_cpu:${cpupct_total} total_memory:${TotalMem} available_memory:${AvailMem} iokb_total:${IOTotal} netkb_total:${NetTotal}
  180.  
  181. exit

ESX Data parse script(s). Call each one with the esxtop csv file to grab the data. Uses this awk script

  1. each script is a seperate file
  2. # get_cpustats
  3. CSVfile=$1
  4. NDX=`cat ${CSVfile} | awk -v str="Physical Cpu(_Total)" -f ${2}/get_index_awk`
  5. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’
  6.  
  7. # get_free_mem
  8. CSVfile=$1
  9. NDX=`cat ${CSVfile} | awk -v str="Free MBytes" -f ${2}/get_index_awk`
  10. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’
  11.  
  12. # get_machine_mem
  13. CSVfile=$1
  14. NDX=`cat ${CSVfile} | awk -v str="Machine MBytes" -f ${2}/get_index_awk`
  15. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’
  16.  
  17. #get_kern_used_mem
  18. CSVfile=$1
  19. NDX=`cat ${CSVfile} | awk -v str="Kernel MBytes" -f ${2}/get_index_awk`
  20. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’
  21.  
  22. #get_managed_mem
  23. CSVfile=$1
  24. NDX=`cat ${CSVfile} | awk -v str="Kernel Managed MBytes" -f ${2}/get_index_awk`
  25. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’
  26.  
  27. #get_nonkern_used_mem
  28. CSVfile=$1
  29. NDX=`cat ${CSVfile} | awk -v str="NonKernel MBytes" -f ${2}/get_index_awk`
  30. cat ${CSVfile} | awk -v i=$NDX -F"," ‘{print $i}’

awk script used by get_cpustats, etc

  1. BEGIN {FS=","}
  2.       {colndx=0;
  3.        for (i=1; i<=NF; i++) {
  4.            if (index($i,str) >0 ) {
  5.                colndx=i;
  6.                print colndx;
  7.                i=NF;
  8.                break;
  9.                }
  10.            }
  11.       }

5 Comments

  1. pen says:

    Hello where can I download your scripts?

  2. Mike says:

    Like the prior comment, it would be great if you made your script available for download. I have a 3 host cluster I am trying to monitor.

  3. david says:

    Mike and Pen, I’ve upgraded the post to include my scripts with a minimal of documentation.

    David

  4. Nicolai Rasmussen says:

    http://www.unnoc.org/ utilizes the VI Perl Toolkit (viperformance.pl) to graph performance stats for ESX(i) hosts.
    You don’t need snmp :)

    It would be really neat with a direct implementation to Cacti though.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <font color="" face="" size=""> <span style="">

Bad Behavior has blocked 526 access attempts in the last 7 days.