Home > General > NetApp AutoSupport

NetApp AutoSupport

January 18th, 2009

This is a big topic for me personally as I have been developing a system to extract the information in the AutoSupport emails and make this visually easy to understand. This makes a Technical Engineers life a lot easier, and we can display the important information we commonly need easily. The traditional call-home autosupport system built into the NetApp and N-Series systems has all the good information I need.

Lets have a quick look at what’s in the AutoSupport, and what might be missing.

Software Versions – quite clear at the top, the ONTAP version

Firmware Versions – a bit unwieldy to search through, but yes, the system, RLM, disk and shelf firmware is all there. Very important giving some of the past exploits of some firmware!

Space Usage – now we get into the nitty gritty. And yes, there it is, we have a full “df -s”, “df -r”, “df” output so we can see full savings, reservations and space usage. These are couple with their -A equivalent to show aggregate usage also.

Snapshot Usage – again, full details here with “snap list -n” and other details to show our full snapshot listings.

Options – both system options and volume options are included. These aren’t the easiest thing to search through however. But they are a great way of comparing the setup on 2 systems, a cluster for instance.

etc. etc. – It’s all in here, all the details I need to report on my filers

What’s missing? Not much I don’t think. One thing that the filer simply doesn’t report on properly (obviously) is host space usage. How much space is used in a LUN for instance? But we can pull this out to a certain extent. Seeing as Fractional Reservation is based on LUN usage, then we can report on this (combined with the fractional_reservation as a multiplier). I should add this won’t be 100% accurate. You’d need to have all the functionality of SnapDrive enabled, punching free blocks back through to the filer when files are deleted.

Where this gets complex is with thin provisioning and deduplication. We can combine “df -s” and “df -r” to calculation reservations and space savings, but if the data is thin provisioned, or worst, flex-cloned, then we need to do some calculations to work out. We’ll need to cross reference the volume/lun size in the full status output with the “df” output. Then we can work out how much the volume was set to, and how much you are currently using.

What about performance? Well, the autosupport has a hidden ace or two. Firstly, a normal autosupport has a file attachment called “cm_stats.gz”. This is a technically incomplete XML file. It is not in standard XML format, and needs a little adaptation to get into any standard XML reader. I guess that makes it a bespoke performance tool? But the good news is that it is broken up in a relatively standard XML type format. The first part of the file <perf-info>….</perf-info> includes detailed descriptions of all the performance stats, what they mean and what the counters are. The second section of the file, the bulk of the info, is <perf-data>…</perf-data>. As you would expect, this is where all our stats are actually kept.

If you’ve ever looking at the stats command on the filer, then the objects, instances and counters will all look familiar. The only issue is that this is just one single iteration. However this is quite useful as it is generated at the point the autosupport was triggered. So if there was an issue, then we get to see what the stats were during that time.

If you are lucky enough not to have this disabled (which I fear is now the default), then you will also be triggering a weekly “Performance Data” email. This is very useful, as we get a stats package the same as “cm_stats.gz”, but it is useful labelled “cm_hourly_stats.gz”. As you would guess, we now have an hourly iteration of a week of stats! So we have all the information we may need to trend across a period of time and see what the system is doing (or not as the case may be).

So the autosupport message does have a whole host of information. The trouble is that it’s a bit overwhelming. The cm_hourly_stats file can decompress to 50mb or more and the basic AutoSupport information isnt the easiest to navigate (especially when Outlook insists that CTRL+F is Forward email, not find text!).

For those of us that aren’t afraid of a little coding and regular expressions, then the information is fairly easy to extract and manipulate. Just be aware that a lot of the info in the email is compressed, if you index it in a database, this won’t be any more, so you’ll need a lot of space to report on all this info.

For those of us that are afraid of a little coding, or simply haven’t the time (you need time not just to decode the email, but also to then display this again in a readable format), then watch this space. Unfortunately as a full-time employee, I am bound by a contract, so I have some work to do before I can present my findings and results back out again.

General , , ,

  1. Dinesh murani
    | #1

    Hi Chris,

    I need your help in a sample BASH script on how to generate a call home feature for Linux box.

    In help or guidance will be appreciated.

    I am a newbie in BASH scripting/programming. I am a TSE and call home feature will help me a lot.

    Thanks in advance.
    Best Regards,
    Dinesh Murani

  1. No trackbacks yet.

This site is not affiliated or sponsored in anyway by NetApp or any other company mentioned within.
%d bloggers like this: