Hyperscale for Enterprise - hype or scale?

Facebook Prineville, Oregon Hyperscale Datacenter

Facebook Prineville, Oregon Hyperscale Datacenter

Google, Facebook, Amazon, Alibaba, Baidu, Microsoft, and Tencent are known as the "Big 7" hyperscalers according to Intel. Buying servers in quantities of 10,000 or more, hyperscalers get preferential treatment from Intel and buy their hardware directly from ODM's saving over 20%.  But who buys 10,000 servers at a time? Not many, but that’s ok, and order quantity requirements are coming down. Fortunately, most of the cost of running a server is not the cost of the server itself. The TCO of running a server consists of 3 major components: 1) the cost of the server itself, 2) administration and management and 3) space, power and cooling. According to IDC, the server represents 20% of the total, 70% is administration and management, and the remaining 10% is space power, and cooling. So, if you are looking at reducing costs, look closely at the whopping 70% - administration and management. This is what the hyperscalers have done - their investments in software, machine learning, and automation drives utilization rates to 4X that of the average enterprise creating world-class TCO and programmability of their datacenter infrastructure.  For example, a system administrator in a hyperscale datacenter can manage 20,000 + servers whereas the average enterprise system administrator manages 500 servers on the high end.

Enterprises will benefit from hyperscale techniques as they become more mainstream

Enterprises are fighting back, however, and with the help of open source initiatives and progressive technology vendors, enterprises are deploying infrastructure solutions that are closing the gap with the hyperscalers. What are some practical things you can you do to become more like the hyperscalers? Let’s start by looking at the basics, and then I’ll make some suggestions where you can mimic the hyperscalers.

If you google “hyperscale”, you’ll find a lot of information about the companies, the size and scale of their datacenters, (square feet, how many servers, power consumption, etc. try this one from Facebook: https://www.facebook.com/PrinevilleDataCenter/app/399244020173259/ ) and other interesting “ooh and ah” information, but it’s not very useful. What is useful is to understand what goes on inside these data centers and suggest some ways you can benefit from their innovation.  In brief, however, hyperscale datacenters are highly automated environments with software-defined resource pools for compute, storage, and networking which can rapidly scale up and scale out with software using commodity white box hardware. The hyperscalers core business is their IT infrastructure, and traditional methods did not scale cost-effectively to suit them so they were forced to think differently about how to build it.  The good news for enterprises is that the techniques and technologies they use have evolved over the past few years and many of them have become mainstream helping enterprises to benefit from the reduced capex and opex they enable.  OCP for example, the Open Compute Foundation founded by Facebook in 2011, was designed to lower hardware costs by open sourcing the designs.  Open Source software has also dramatically reduced the costs to software define your infrastructure, a key underpinning of hyperscale.

Reducing both capital and operating expenses in your infrastructure is a strategic imperative for any enterprise infrastructure executive, but challenges remain. For example, the easiest way to reduce capex is to increase infrastructure utilization so if your average utilization is 15% and you’re able to increase utilization to 30%, it means that you can buy ½ of what you normally buy to serve the same workload demand. Virtualization has helped tremendously here but challenges remain.  For opex, it means doing more with fewer people or reducing power costs for example.

What can enterprises do now to enjoy the economic benefits of hyperscale technologies?

1.     Use Open Source Software and Hardware.  Open Source software leverages a community for innovation and it’s free – what is better than that? OCP is trying to do the same thing with hardware designs. Combine them both and you will save money. Take storage for example, it’s expensive and most storage purchased today remains proprietary. Leveraging open source software to enable software defined storage such as CEPH in combination with JBODs (“just a bunch of disks”) enables you to purchase the least expensive commodity hardware and combining it with free software is a really good start. And if you’re still using technologies like RAID, you can reduce the amount of storage hardware you need by using technologies such as erasure coding along with CEPH.  Erasure coding is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media instead of making a completely redundant backup of every bit. OCP Hardware is also more readily available than ever now. The classic OCP buyers are hyperscalers who buy directly from ODMs (original design manufacturer v. OEM, original equipment manufacturer) like Quanta and Inventec which can save 20-30% by eliminating the OEM middleman. The main challenges buying from an ODM are minimum order quantities in the thousands (although I’ve heard this has dropped recently) and lack of support you get from a traditional OEM.  There are alternatives however, such as Quanta Computer leveraging their expertise in serving the hyperscalers and now offering products as an OEM as well. There are also “OCP inspired” designs like HP’s cloudline which offer OCP like pricing with OEM support. Perhaps even mentioning to your current OEM than you’d like to buy OCP hardware may shake loose a few more discount points.

2.  Leverage Hyperconverged and/or Composable Infrastructure Technologies. Hyperconverged solutions reduce opex (remember the IDC stat mentioned earlier) because they take less time to configure and manage than traditional systems but sometimes comes with a higher price. But if your environment is highly dynamic, the increased capex will be easily offset by opex savings. Hyperconverged systems by Nutanix, Simplivity and others are very simple to setup and operate when compared to buying traditional servers with virtualization software. Then there are composable systems like those from HP, Dell and a newcomer, Liqid. These systems are the next evolution systems combining the best of hyperconverged automation but with “disaggregated” hardware.  These systems essentially use techniques popularized by Intel’s Rack Scale Design which disaggregates datacenter components such as compute, networking, and storage into pools which can dimensioned to accommodate any type of workload requirement on the fly. Traditional thinking for workload dimensioning was accomplished with hardware in the form of a customized bill of materials (BOM). Ask any IT executive for a Hadoop BOM, they have one, for SAP, a different one, for a VNF, yet another one -  proliferating silos in a datacenter which increases cost and complexity. Solving workload complexity in software, as the composable systems do, ensures you can scale up, down, in or out on the fly with software which dramatically improves resource utilization reducing the amount of hardware you have to buy!

3.  Deploy ITOA Software . IT Operations and Analytics software help IT infrastructure executives in many ways by enabling improvements in SLA’s, capacity planning, resource utilization. Large amounts of data need to be collected and analyzed to enable ITOA which can quickly become overwhelming, but the payoffs are real. In a recent conversation with an IT Executive colleague who had just deployed ITOA software, he told me he saved $2M by simply turning off unused ports and controllers on their servers!  Google saved 40% off their datacenter cooling bill with their analytics software, “Deep Mind” using similar techniques and now have 3.5X the computing power using the same amount of energy as they did 5 years ago! (https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/ ) ITOA software can also identify “ghost” or “zombie” servers – servers that are sitting in your datacenter using energy and doing nothing! 0% utilization! Intel estimates that at least 10-15% of servers in the typical enterprise data center fall into this category. Some estimate up to 30%.  Good ITOA software can identify these servers and put them back into active inventory – that’s found money.

To thrive in the new digital economy, businesses will need to seize fast-emerging digital business opportunities that require immediate deployments of compute, storage and networking resources. Demand can also ebb and flow to new projects just as quickly, so enterprise data centers need to be able to turn capacity on and off at short notice. This means data centers will need to provide business stakeholders with the same “infrastructure on tap” capabilities that modern hyperscalers like Amazon Web Services have pioneered.

These suggestions are by no means comprehensive, and the impact on your business will vary, but are meant to be a short list of possibilities for you to explore and enable your company to become more competitive in the digital economy. I hope you find them useful and welcome any comments to scott@cloudwirx.com.

About the Author

Scott A Walker is an advisory board member of Cloudwirx.  He is best known for launching the world's first public cloud direct connect solution with AWS in 2011 while at Equinix driving early adoption of hybrid cloud. Scott joined Ericsson Digital Services in 2015 to spearhead the company's foray into hyperscale systems and software-defined infrastructure earning numerous innovation awards. You can follow Scott on on twitter: https://twitter.com/scottawalker