Intel pinpoints thousands of unproductive servers
Using a homegrown application for measuring server utilization, chipmaker is able to reassign or retire 5,000 machines
Imagine running a company with a staff of 2,000 full-time employees who spent around 80 percent of their time doing nothing beyond waiting for some work to do. Odds are, you'd make some staffing changes pretty darn quickly to address such egregious waste. Yet in data centers around the world, servers are permitted to run 24/7, wasting power and adding to organizations' carbon footprints while operating at average utilization levels of 20 percent, 10 percent, or even less.
There are several reasons data center operators tolerate this level of waste. One is that companies lack the necessary tools to gain full visibility into the hardware they're running, such as how much work a machine is doing or whether it's powering a business-critical application. Thus, it's generally easier (and safer) to simply add new racks of servers when computing demands increase, rather than performing a time-consuming inventory of all the machines and pulling the plug on systems that appear to be performing unnecessary work.
Intel last year developed an innovative application for determining which servers were earning their keep and which ones were slacking off. Called iSHARP (Interactive System Health and Resource Productivity), the application is capable of accurately measuring and tracking utilization on the company's large distributed pools of computers. These particular machines are part of an interactive environment, used to process design and development simulations and related tasks for microprocessors.
"This was in effect an effort to drive down the cost of capital expenditures within the batch and interactive services and the evergrowing operational expenses, including data center power, cooling, and space," said Richard Meneely, Interactive Computing Product Owner for Intel's Engineering Computing group. "We would prefer to not add the expense of building and operating any additional data centers."
In developing iSHARP, Intel first had to define algorithms to correctly identify underutilized machines. Specifically, the app measures CPU and memory utilization on a frequent basis for each system within the interactive computing environment. Those measurements are written to a back-end database for reporting and analysis. The algorithms take into account the individual system's architecture, hardware configuration, and category of application when determining thresholds for identifying underutilization.
Beyond the challenge of developing this application itself, Intel's engineers also had to convince end-users that they could relinquish their computing resources without fear. "Design teams were often initially reluctant to give up resources they already had and believed doing so would impact their productivity. iSHARP allowed us to communicate the same information our IT engineers saw directly with the customer," said Meneely.
"We often offered to keep the targeted systems available offline for a period of time in case the customer determined they really did need it. After a period of time, confidence grew with our customers that we could accurately measure and remove systems without impacting their productivity," Meneely concluded.
The effort proved remarkably successful. In the span of about 12 months, Intel reduced the size of its targeted server pools from 14,000 machines to under 9,000, a reduction of 35 percent. Another 2,700 servers were reallocated to more productive purposes, and 2,300 were removed entirely. The removal of those machines helped Intel shed over 8 million kWh and save $645,000 on energy costs. From a business perspective, the project helped Intel boost the efficiency and capacity of its IT environment -- without hurting productivity.
Meneely said he is now involved in an effort called LEAF, which will build on the lessons learned from iSHARP to provide detailed data for each application within Intel's interactive environment. That, in turn, will help Intel further optimize its resource allocation.