Home > General > VMware CPU Ready Time

VMware CPU Ready Time

March 11th, 2014

I have been surprised that recently this has come back to haunt me as an issue, and a major one at that.

So what’s the issue? Well, long story short, if you starve your virtual estate of CPU resources you’ll get CPU ready-state issues. Broadly this is caused by 2 issues, you’ve over-committed your CPU resources (consolidation ratio is too high), or your virtual machines are sized too big (and their workload is too high).

VMware vSphere is very clever with it’s CPU virtualisation. In order to allow multiple virtual machines share the same CPU space, it schedules them in and out. Needless to say this happens very quickly, and generally speaking the only thing you’ll notice is that you consume very little CPU and have a very high consolidation ratio. The problem really occurs with large VMs (4+ vCPU’s). vSphere needs to be a lot more intelligent about this, as all vCPU’s need to be scheduled at the same time, or skewed slightly (part of the relaxed co-scheduling in 5.0+). The window of opportunity to schedule these gets narrower the more vCPU’s you assign, so a 4 vCPU machine needs to wait for 4 logical cores to be available (hyper-threaded cores count as individual logical cores), and 8 vCPU machine needs to wait for 8. The busier a vSphere host is, the longer a queue there may be for CPU resources and the harder it is to schedule all the vCPU’s is. While a machine is waiting for CPU resources to be available, it is in a ready-state (meaning it has CPU transactions to process, but can’t as no resources are available). The relaxed co-scheduling means it doesn’t always have to wait for all vCPU’s to be scheduled at the same time on logical physical cores, but it’s a rule of thumb when sizing.

What if you see this, what can you do about it? Well, there are a couple of options with different viability each.

Set CPU affinity

Not necessarily ideal, but I recently saw this issue occur on a Citrix farm and the vCPU’s added up to less than the logical cores, but still the CPU’s were on average 30% in ready state (bad!). It turns out that VMware is still trying to schedule the CPU’s intelligently, and trying to NUMA affinity (the locality of memory pages to physical CPUs as not all memory has a direct path to all CPUs), so in this estate VMware was constantly re-scheduling these vCPUs causing an issue. Once we’d set CPU affinity on the Citrix VMs, the ready state dropped to less than 5% at peak. As the Citrix servers didn’t need any of the advanced features of the VMware clustering and the scaling was always the same (need more Citrix servers, add more physical hosts), this was safe to do and a relatively simple and easy fix. In other environments this really isn’t ideal at all.

Reduce the vCPU count

Definitely one of my top recommendations, although getting buy-in from application owners could be tough. If you see CPU scheduling issues I would almost guarantee that reducing your virtual machine size (in vCPU terms) would improve performance. Look to scale out instead, have more smaller VMs. Single vCPU servers don’t need to be scheduled in the same way (as they’ll run on any free logical core), so rarely (if ever) suffer from CPU ready state issues. I would choose this option every time if I could and you could ramp up your consolidation ratios if everything was single vCPU.

Grow your estate

More logical cores means less scheduling overhead and less CPU contention. An expensive choice maybe, but definitely a viable one. Look to remove or upgrade older hosts and put in servers with more cores. There is a caveat to this, some workloads prefer the higher clock speed of lower core CPU’s, but this is a rarity. Most applications will be fine and generally CPU is a resource you have spare (ready-state accepted).


VMware are constantly improving the ESXi CPU scheduler, and the improvements will give you lots of benefits to your issues. People with CPU ready-state issues in 4.0 saw them completely disappear with an upgrade to 5.x. You’ll not get an exact figure on the improvements as it really depends on the make-up of your hosts and VMs. But ESX upgrades are easy these days, and it’s a low risk fix.

Separate clusters

Have a high consolidation cluster that you use for your low vCPU machines (2 vCPU’s and less maybe) that you ramp up consolidation ratios. Then have a separate cluster for high-performance systems where the consolidation ratio is low, maybe as low as 1:1 (vCPU:pCPU). But do this within reason, DRS performs best with a good sized cluster, and if you only have 3 hosts in total then don’t do this! If you have say less than 5/6 hosts then maybe look to create this with DRS affinity groups, although that comes with some management overhead.

Bottom line is this issue is caused by over-committed resources, or an under-sized estate. Make sure your reporting is tuned so that you can pickup on CPU ready-state (vCenter Operations Manager does), if it doesn’t, look at alternatives or script (r)ESXtop to gather the stats for you. If you’re a hosting provider make sure you aren’t adversely affecting your customers with this issue, they’ll not be able to see a high CPU ready-state but there VMs will perform very badly.

More reading:



Enhanced by Zemanta

General , , , ,

  1. VMware guy
    | #1

    There’s a really cute article with pictures and diagrams here:


    It explains how to allocate vCPUs in a really nice way, with an example.

  1. No trackbacks yet.

This site is not affiliated or sponsored in anyway by NetApp or any other company mentioned within.
%d bloggers like this: