I’ve been using Cacti, RRDTool, SNMP and custom scripts running on Linux for several years to collect and display historical data regarding Network, System, and FlexLM license resource usage. I recently began tracking the Host resources used by ESX. I was interested in tracking the following:
- CPU utilization
- Memory Free
- Total Memory
- IO activity
- Network traffic
I found that the IO activity and Network traffic could be obtained via standard SNMP. The network traffic can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::netHCKbRx and VMWARE-RESOURCES-MIB::netHCKbTx. Similarly, the IO can be obtained by summing the results from an snmpwalk of VMWARE-RESOURCES-MIB::kbRead and VMWARE-RESOURCES-MIB::kbWritten.
I did not identify items in the vmware esx mib that I could use via SNMP for the CPU utilization or Free Memory. I did find that these items are available by running esxtop or resxtop in batch mode. Running the utility esxtop on the ESX service console in batch outputs a comma separated file, csv. For example: running the command
- esxtop -b -n 2 -d 10 > esxtop-output.csv
will run esxtop in batch, take 2 samples with a delay of 10 seconds and write a 3 row csv file named “esxtop-output.csv”. The 1st line of the .csv file contains the column headers. The ESX Host cpu utilization time will be in a column named:
“\\[your-esx-hostname]\Physical Cpu(_Total)\% Processor Time”
The ESX Host free memory will be in a column named:
“\\[your-esx-hostname]\Memory\Free MBytes”.
The ESX Host total memory will be in a column named:
“\\[your-esx-hostname]\Memory\Machine MBytes”.
Today, I run 2 cronjobs; 1 on my ESX service console to run esxtop in batch and save the data to a .csv file; and the 2nd on my Cacti monitoring server. On the Cacti monitoring server, I run a “driver” script to collect data items into an RRDTool database every 5 minutes. My Cacti monitoring server uses identity based ssh/scp to get the .csv file that is produced by the cronjob that runs on the ESX service console.The RRD database is mapped as a datasource and graphed via Cacti. I’m currently looking at using the VMWare remote cli version of esxtop, rexstop. The use of resxtop, would eliminate the need for a cronjob on the ESX service console. This will allow my utilities to run against both ESX and ESXi.
Scripts
ESX RRDTool Data update Script. Call with Arguments DBName, Hostname and community, this driver ccript calls this script to obtain the individual items
#!/bin/bash . $HOME/setrrdtool_vars PERF_BASE=$HOME/perfmon-esx PATH=${PERF_BASE}/bin:${PATH} # # arg 1 - DBNAME # arg 2 - monitor host # arg 3 - community # if [ "${1}" = "" ]; then echo "must supply rrd dbname" call_errror="yes" fi if [ "${2}" = "" ]; then echo "must supply host" call_errror="yes" fi if [ "${3}" = "" ]; then echo "must supply community" call_errror="yes" fi if [ "${call_error}" = "yes" ]; then exit fi DBNAME=${1} MHOST=${2} COMMUNITY=${3} RRD_DB=${PERF_BASE}/db/${DBNAME} RRD_LOG=${PERF_BASE}/log/${DBNAME}.log TOD=`date` for tline in ${T}; do itemname=`echo ${tline}|awk -F":" '{print $1}'` itemval=`echo ${tline}|awk -F":" '{print $2}'` case "${itemname}" in "total_cpu") CPU=${itemval} ;; "available_memory") MEMAVAIL=${itemval} ;; "total_memory") TOTMEM=${itemval} ;; "iokb_total") IOKB=${itemval} ;; "netkb_total") NETKB=${itemval} ;; *) ;; esac done rrdtool update ${RRD_DB} --template \ CPU_utilization:Memory_available:IO_KBytes:Net_KBytes \ N:${CPU}:${MEMAVAIL}:${IOKB}:${NETKB} exit |
ESX Data capture Script. Call with Arguments Hostname and community. This uses SSH/SCP, SNMP and some additional helper scripts to parse the csv data returned by esxtop
#!/bin/bash . $HOME/setrrdtool_vars export MIBS=ALL # arg 1 - monitor host # arg 2 - community if [ "${1}" = "" ]; then echo "must supply host to monitor" call_errror="yes" else MHOST=${1} fi if [ "${2}" = "" ]; then echo "must supply community" call_errror="yes" else COMMUNITY=${2} fi if [ "${call_error}" = "yes" ]; then exit fi PERFDIR=$HOME/perfmon-esx PATH=${PATH}:${PERFDIR}/bin:${PERFDIR}/bin/esxstats WORKDIR=${PERFDIR}/tmp ESXTOPCSV=mrtg-esxtop.csv CSVfile=${WORKDIR}/${MHOST}-${ESXTOPCSV} export PATH if [ ! -d ${WORKDIR} ]; then mkdir ${WORKDIR} fi # use identity based ssh to copy latest csv file from ${MHOST} # could replace with resxtop and output output to ${CSVfile} scp -Bq ${MHOST}:${ESXTOPCSV} ${CSVfile} # Begin CPU get_cpustats ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-cpustat.tmp CPUTOT=`cat ${WORKDIR}/${MHOST}-cpustat.tmp | grep -v ${MHOST}` linenum=1 for line in ${CPUTOT}; do if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then skipping=1 let "linenum = linenum + 1" else cpupct_total=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'` let "linenum = linenum + 1" fi done #End CPU # Host Available Memory get_machine_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-machinemem.tmp MACHINEMEM=`cat ${WORKDIR}/${MHOST}-machinemem.tmp | grep -v ${MHOST}` linenum=1 for line in ${MACHINEMEM}; do if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then skipping=1 let "linenum = linenum + 1" else tMachineMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'` # Adjust answer to bytes let "MachineMem = tMachineMem * 1024" let "linenum = linenum + 1" fi done let "TotalMem = MachineMem" # Begin Used Memory # Kernel Used Memory linecnt=0 get_kern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-kernused.tmp KERNMEM=`cat ${WORKDIR}/${MHOST}-kernused.tmp | grep -v ${MHOST}` linenum=1 for line in ${KERNMEM}; do if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then skipping=1 let "linenum = linenum + 1" else tKernMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'` # Adjust answer to bytes let "KernMem = tKernMem * 1024" let "linenum = linenum + 1" fi done # Non-Kernel Used Memory get_nonkern_used_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-nonkernused.tmp NONKERNMEM=`cat ${WORKDIR}/${MHOST}-nonkernused.tmp | grep -v ${MHOST}` linenum=1 for line in ${NONKERNMEM}; do if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then skipping=1 let "linenum = linenum + 1" else tNonKernMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'` # Adjust answer to bytes let "NonKernMem = tNonKernMem * 1024" let "linenum = linenum + 1" fi done # End Used Memory # Begin Free Memory get_free_mem ${CSVfile} ${PERFDIR}/bin/esxstats >${WORKDIR}/${MHOST}-freemem.tmp FREET=`cat ${WORKDIR}/${MHOST}-freemem.tmp | grep -v ${MHOST}` linenum=1 for line in ${FREET}; do if [ "${linenum}" = "1" -o "${line}" = "" -o "${line}" = " " ]; then skipping=1 let "linenum = linenum + 1" else tAvailMem=`echo ${line} | awk -F'"' '{print $2}' | awk -F'"' '{print $1}'` # Adjust answer to bytes let "AvailMem = tAvailMem * 1024" let "linenum = linenum + 1" fi done # End Free Memory #IO #Get IO READ Write each VM ALLREAD=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbRead |awk -F":" '{print $4}'|awk -F" " '{print $1}'` ALLWRITE=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::kbWritten|awk -F":" '{print $4}'|awk -F" " '{print $1}'` # IORead=0 for iobr in ${ALLREAD}; do if [ "${iobr}" = "" -o "${iobr}" = " " ]; then skipping=1 else let "IORead = IORead + iobr" fi done IOWritten=0 for iobw in ${ALLWRITE}; do if [ "${iobw}" = "" -o "${iobw}" = " " ]; then skipping=1 else let "IOWritten = IOWitten + iobw" fi done let "IOTotal=IORead + IOWritten" #Get NetworkBytesTotalPerSec from each interface and calculate the total ALLNET_RX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbRx |awk -F":" '{print $4}'|awk -F" " '{print $1}'` ALLNET_TX=`snmpwalk -m ALL -c ${COMMUNITY} -v 2c ${MHOST} VMWARE-RESOURCES-MIB::netHCKbTx |awk -F":" '{print $4}'|awk -F" " '{print $1}'` # NetRX=0 for netr in ${ALLNET_RX}; do if [ "${netr}" = "" -o "${netr}" = " " ]; then skipping=1 else let "NetRX = NetRX + netr" fi done # NetTX=0 for nett in ${ALLNET_TX}; do if [ "${nett}" = "" -o "${nett}" = " " ]; then skipping=1 else let "NetTX = NetTX + nett" fi done let "NetTotal=NetRX + NetTX" echo total_cpu:${cpupct_total} total_memory:${TotalMem} available_memory:${AvailMem} iokb_total:${IOTotal} netkb_total:${NetTotal} exit |
ESX Data parse script(s). Call each one with the esxtop csv file to grab the data. Uses this awk script
each script is a seperate file # get_cpustats CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="Physical Cpu(_Total)" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' # get_free_mem CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="Free MBytes" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' # get_machine_mem CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="Machine MBytes" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' #get_kern_used_mem CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="Kernel MBytes" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' #get_managed_mem CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="Kernel Managed MBytes" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' #get_nonkern_used_mem CSVfile=$1 NDX=`cat ${CSVfile} | awk -v str="NonKernel MBytes" -f ${2}/get_index_awk` cat ${CSVfile} | awk -v i=$NDX -F"," '{print $i}' |
awk script used by get_cpustats, etc
BEGIN {FS=","}
{colndx=0;
for (i=1; i<=NF; i++) {
if (index($i,str) >0 ) {
colndx=i;
print colndx;
i=NF;
break;
}
}
} |
Hello where can I download your scripts?
Like the prior comment, it would be great if you made your script available for download. I have a 3 host cluster I am trying to monitor.
Mike and Pen, I’ve upgraded the post to include my scripts with a minimal of documentation.
David
http://www.unnoc.org/ utilizes the VI Perl Toolkit (viperformance.pl) to graph performance stats for ESX(i) hosts.
You don’t need snmp
It would be really neat with a direct implementation to Cacti though.
[...] -ESX Scripts for monitoring VM Host: http://bable.cybermarshall.com/2008/12/14/tracking-vmware-esx-3x-or-esxi-host-resources-with-cacti/ [...]