I’ll throw it out there – raw utilization measures are useless in determining the efficiency or effectiveness of a virtualization initiative or the ongoing management of virtualized infrastructure.
As long as we have had distributed computing there has been an interest in CPU, Memory and IO utilization. In a largely physical world, these measures were fundamentals of the capacity management and performance tuning disciplines. Understanding utilization levels told much of how an application might be performing, where there might be issues or how long a given server would be sufficient to handle a particular workload. Of course, much of the server sprawl of the 90’s and early 2000’s was created by the need to service many workloads on an individual basis. The phrase one app / one box was accurate. Capacity managers still had to scrutinize the stats on larger mission critical apps, but the average workload was left to take a small percentage of the capacity of the server that it sat on.
Virtualization technologies have driven the revolution of consolidation and more efficient use of server infrastructure. The dream is to have greater utilization on a smaller number of servers. It follows logically then to assess how well you are doing in that goal by looking at server utilization post virtualization – maybe even set goals around it? The term capacity management sounds the same but the discipline is now very different. Rather than using analytics to determine how to optimize performance or solve a problem on a single, important application, the challenge is now much broader – potentially dealing with the efficient uses of capacity for thousands of workloads on hundreds of hosts.
A recent survey conducted by of 100 large enterprises showed that 70 percent of companies are using raw utilization measures, in particular CPU utilization and memory utilization, to determine efficiency in virtual infrastructure. So while the challenge has changed, organizations still rely on the same approaches and measures to determine requirements and “how well they are doing.” Raw
utilization might be a useful metric to compare on average how well-utilized infrastructure is from a before and after perspective when moving from physical to virtual – but even then it’s emotionally satisfying but practically useless.
The first question to ask is “what is the optimal target?” Of course that will vary depending on the type of applications, the service level requirements, workload personalities, HA and DR strategies and the list goes on. There is no one answer that fits across the board. The second question, and perhaps most impactful, is “which constraints dictate which workloads can go where?” In environments that are horizontally scaled like web farms, this is less of an issue, but the majority of applications that support a business are bound by multiple constraints relating to compliance, physical location, maintenance windows, political lines, legal jurisdictions and many, many others. When these constraints are factored in, raw utilization goes out the window. Instead, looking at infrastructure requirements and utilization when burdened by constraints gives a clear picture as to the number of servers or how much capacity you actually require, and how much, given those considerations, you are actually using. A “fully-burdened” measure may result in servers that only ever see five percent of their capacity used and that may be as good as it’s going to get based on the constraints at play. Does that mean that you are doing poorly from an efficiency perspective? No! It means that to be compliant with the constraints at play, you need to run servers at that level.
What’s the point of measuring the impact of all the constraints and considerations accurately? Without these measures, capacity requirements are left to guess work. My own team’s experience in the field shows that IT staffs guess high to avoid risk. In most cases, they guess very high. So while you think you have 20 percent allocated for growth and risk protection, you might actually have upwards of 100 percent. This is great from a risk avoidance perspective, but it goes without saying that the larger the environment, the greater the waste and cost. Including all the policies and constraints into your efficiency measures and capacity requirement determination is critical to be able to reduce infrastructure costs in virtual and internal cloud environments. The second critical step is to optimize workload placements according to these same considerations in order to make the best possible use of infrastructure.
The most efficient well-managed environments we have found in the field understand the impact of policies and workload requirements and incorporate those into both the capacity and workload placement decisions. These organizations look at environments both from a macro perspective while also ensuring that the individual workloads and hosts are well cared for and aligned. It’s a shift in perspective and approach, but one that pays off handsomely in the end.
Most organizations rely on utilization to measure efficiency in virtual & cloud infrastructure,
but should they?