Performance “stats” without PerfStat or Ops Mgr
PerfStat is a great way to get some quite detailed performance information out of the filer when you have a performance or other issue that you can’t quite put your finger on. You need to have access to the PerfStat Viewer, or get someone to process this output for you, and then you need to trawl through it.
Operations Manager, and more specifically Performance Advisor is brilliant and 99% of the time gives you the counters you need to diagnose the problem. Once you’ve found your way round it, it is completely indispensible!
But what if you don’t have Operations Manager, or you just want to quickly pull out information on one area of the system?
First things you want to look at sysstat. Everyone’s best friend and great way of seeing “Is my system busy?”. Whenever you run sysstat, make sure to through it the “-s” modifier so that you get a summary at the end of the output. If you don’t define a number of iterations (-c <num>), then ctrl+c to break the output. “-x” is great for giving all areas of output, but it can be a little wide sometimes. “-u” is my favourite as it gives you utilisation readings and these the usually the most useful when troubleshooting.
Most of the columns are fairly self explanatory. CPU is % busy, NFS, CIFS, HTTP, FCP and iSCSI are all protocol operations counters. Net kB/s in and out are obvious (for reference a single gigabit interface will happily sustain around 80MB/s, but can stretch to 110/120MB/s). Disk and Tape in&out. Watch the cache age when it gets really low, but there’s better counters for that. Cache hit is a counter you want as close to 100% as possible. The more data is getting read from cache the better! CP Type is Consistency Points, I won’t go into detail as to what these are, there is a very good KB article on this already (https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb23471). And finally Disk Utilisation which seems to cause some confusion. This is the reading from the single busiest disk in the system, and not an average. This reading can interestingly go about 100% (much like CPU can too), and this simply means the disks are doing more than they should!
So sysstat is a great way to get a high level view of “Is my system busy” and also gives you a rough idea of where the bottleneck is. If the CPU is really high, but nothing else, then this is what is holding back the system. If the disk utilisation is very high, then again, here is the problem. But these aren’t conclusive figures, and don’t point directly at a culprit. For instance if disk utilisation is very high, you may need to run a wafl reallocate as you have added some new disks and these aren’t holding any data yet. If your CPU is very high, it may be that you are doing a lot of other processing like A-SIS and SnapVault, or it could be very random IO so the CPU is working harder at trying to make calculations around this.
The next step may be to look at statit. A “priv set advanced” command, and not for the feint hearted, a great command to get a snapshot of details over a period. Simply run “statit -b” at the start of the monitoring period, and then “statit -e” at the end. Make sure to log your output window as you’ll get a lot from statit (more than the standard Windows and Putty buffer will show). There is a lot of statit output, and I won’t go into too much detail in it all here (but maybe another day). Most of it is pretty self explanatory really.
This brings me onto the real reason for this article in the first place. One of my favourite commands, and certainly a largely overlooked one, “stats”. This has a lot of information at it’s fingertips, pretty much anything you can see from in Performance Advisor and anything you can report on in PerfStats is available in the stats command. And possibly a lot more! “stats” works very similar to sysstat in that it reports counters based on the iterations. If you simply run it, it’ll report what the system is doing at that exact time. If you tell it to run every 5 seconds, it’ll report what happened over those 5 seconds.
So first up, don’t just in and run “stats show” without having a few minutes to spare. The output is very complete! First you want to see what counters are available. Stats is split into “Objects”, “Instances” and “Counters”. To show each, we can use “stats list …”
filer01> stats list objects
Objects:
dump
logical_replication_source
logical_replication_destination
vfiler
qtree
aggregate
iscsi
fcp
cifs
volume
lun
target
nfsv3
ifnet
processor
disk
system
filer01> stats list instances ifnet
Instances for object name: ifnet
B2net
Storage-101
filer01> stats list counters ifnet
Counters for object name: ifnet
recv_packets
recv_errors
send_packets
send_errors
collisions
recv_data
send_data
recv_mcasts
send_mcasts
recv_drop_packets
As an example above, I can show all the objects available to me, I can query all the networking instances I have setup (2 VIFs, 1 with a VLAN), and I can see what counters I can report on. So putting this together…
filer01> stats show ifnet:Storage-101:collisions
ifnet:Storage-101:collisions:0/s
Great, my storage interface doesn’t have any network collisions for the period this has run! That’s good news for me!
If I want to run this over several iterations, I can feed it some more options. Note: The options must go before the counter information!
filer01> stats show -n 5 -i 1 ifnet:Storage-101:collisions
Instance collisions
/s
Storage-101 0
Storage-101 0
Storage-101 0
Storage-101 0
Storage-101 0
Great, so over a period of 5 seconds I’m still not getting collisions!
You’ll notice from above that there are a lot of performance counters available, and not all of them have the most verbose names. You can query any of these by running “stats explain counters”.
filer01> stats explain counters ifnet collisions
Counters for object name: ifnet
Name: collisions
Description: Collisions per second on CSMA interfaces
Properties: rate
Unit: per_sec
So lets take another example, I want to look at latency readings on my Exchange system…
filer01> stats show -n 5 -i 1 volume:exch01_db:read_latency volume:exch01_db:write_latency volume:exch01_logs:read_latency volume:exch01_logs:write_latency
Instance read_latency write_latenc
ms ms
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
It’s 8 in the morning, none of the sales team is awake yet! The column headings get a bit skewed, but we can see read latency in the first column, and write latency in the second.
One of my biggest complaints about sysstat is what happens if I want to keep this running over a period of time and log the output? Well, I can change “options autologout” and leave my laptop plugged in, but that’s never a good idea. “stats” gives you the ability to pipe all stats output direct to a file. Brilliant news!
filer01> stats show -n 5 -i 1 -o /etc/stats.txt volume:exch01_db:read_latency volume:exch01_db:write_latency volume:exch01_logs:read_latency volume:exch01_logs:write_latency
filer01> rdfile /etc/stats.txt
Instance read_latency write_latenc
ms ms
exch01_db 0 16.00
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 8.00
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 1.00
exch01_logs 0 0
Unfortunately this doesn’t free up the console, so scripting this from RSH or SSH may be the best bet, but be careful how long you run the iterations for!
Another nice feature is that you can have some presets. So if you have 4 Exchange servers each with 3 databases, then you can load all the volume:<vol_name>:read/write_latency commands into a file and issue this direct from the stats command. The presets files are XML files, so they take a little thought in the writing, but if you have seen XML before, then it’s not that tricky.
My XML file looks like this…
<?xml VERSION = “1.0″ ?>
<preset>
<object name=”volume”>
<instance name=”exch01_db”>
<counter name=”read_latency”>
</counter>
<counter name=”write_latency”>
</counter>
</instance>
<instance name=”exch01_logs”>
<counter name=”read_latency”>
</counter>
<counter name=”write_latency”>
</counter>
</instance>
</object>
</preset>
Once saved within /etc/stats/presets as an “.xml” file, I can call it directly from the stats command.
filer01> stats show -p exchange -i 1 -n 5
Instance read_latency write_latenc
ms ms
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0
exch01_logs 0 0
exch01_db 0 0.13
exch01_logs 0 0.12
exch01_db 0 0.00
exch01_logs 0 0.00
exch01_db 0 0
exch01_logs 0 0
The possibilities are huge for this, but this opens up something even better. We can now use “stats start” and “stats stop” to trigger this reporting and I get my console back!
filer01> stats start -p exchange
Stats identifier name is ‘Ind0x6920b2f0′
filer01> stats show -I Ind0x6920b2f0
StatisticsID: Ind0x6920b2f0
volume:exch01_db:read_latency:0ms
volume:exch01_db:write_latency:5.14ms
volume:exch01_logs:read_latency:0ms
volume:exch01_logs:write_latency:0.00ms
filer01> stats stop -I Ind0x6920b2f0
StatisticsID: Ind0x6920b2f0
volume:exch01_db:read_latency:0ms
volume:exch01_db:write_latency:5.36ms
volume:exch01_logs:read_latency:0ms
volume:exch01_logs:write_latency:0.00ms
Hopefully you are starting to realise why I like this command, and why the possibilities for using this are huge, and that it is very powerful indeed!
One final thing to add, there are a lot of counters available by default in normal privileged mode, but try switched to advanced, or even diag, and see how many counters are available then! This is overwhelming, but with a bit of digging, very powerful.
One last thing, you can use wildcards in the “stats show” command, so to pull out all counters for my exchange database…
filer01> stats show volume:exch01_db:*
volume:exch01_db:avg_latency:0.00ms
volume:exch01_db:total_ops:3/s
volume:exch01_db:read_data:0b/s
volume:exch01_db:read_latency:0ms
volume:exch01_db:read_ops:0/s
volume:exch01_db:write_data:12288b/s
volume:exch01_db:write_latency:0.00ms
volume:exch01_db:write_ops:3/s
volume:exch01_db:other_latency:0ms
volume:exch01_db:other_ops:0/s
Or to show all the read_latency for all my volumes…
filer01> stats show volume:*:read_latency
volume:vol0:read_latency:0ms
volume:exch01_db:read_latency:0ms
volume:home:read_latency:0ms
volume:backup:read_latency:0ms
volume:share:read_latency:0ms
If you have any specific questions, or you want to query how to get specific counter information from the system, feel free to send me over a question. Hope this is useful for everyone!










































stats is a great command to use for collecting data for long-term trending too (if the same data is not exposed in the SNMP MIB) — for example, per volume performance data is not available via the SNMP MIB — I have a blog entry about how I collect that and provide some example graphs at http://aditya.grot.org/2009/02/netapp-ontap-per-volume-statistics.html
Some great tools available for translating the output from “stats” available on the NetApp Communities – http://communities.netapp.com/docs/DOC-2092
great post Chris, very informative and certainly not something that’s covered in any of the ‘fundamentals’ docs – or even the technical reports I have read so far.