How should I match the applications in my portfolio with the most appropriate cloud? This question is becoming increasingly common in enterprise IT organizations today, and it can be difficult to answer. Often the decision depends on the sensitivity of the data within the application. At other times, public versus private cloud considerations are paramount. Other factors influencing the decision include business goals and whether or not speed or price must be optimized.
Of course, performance and cost are difficult to measure, and comparing across clouds is hardly straightforward. This article illustrates a methodology and test that CliQr uses to help customers weigh these considerations and make a decision on which of the most popular clouds -- Amazon Web Services, Microsoft Azure, and Google Cloud Platform -- and instances will be best for a set of sample applications.
Enterprise cloud management platform CliQr CloudCenter was used to conduct this set of black-box tests. Each application described below was modeled using CliQr’s Application Profile mechanism, which configures the various application tiers in a consistent manner across different cloud platforms. In addition to providing governance (that is, who is allowed to deploy what applications where) and metering (how much they spent) capabilities, CliQr CloudCenter includes a black-box benchmarking capability that deploys each application on a target cloud, imparts load upon it using JMeter, and graphs the throughput (the number of transactions per second) against each cloud’s hourly cost of the configuration in question.
The results should not necessarily be interpreted as praise or criticism of an individual cloud. Rather, the results should serve as an example of a methodology that can be used to answer the “Which cloud for application X?” question. Mileage will vary greatly depending upon the nuances of individual applications, and the results presented here cannot automatically be extrapolated to other situations.
For this set of tests, the following applications were used.
- Pet Clinic: The Spring Framework Java sample application was modeled as a three-tier Web app using a single Nginx virtual machine as a load balancer, two Tomcat VMs as application servers, and a MySQL VM as the database. All VMs for this application used CentOS 6. The database server had a 2GB block storage volume attached to it.
- OpenCart: The popular open source LAMP stack storefront package was modeled using a single Apache VM as the Web server and a MySQL VM as the database. Both VMs were configured to run Ubuntu 12.04. As for Pet Clinic, a 2GB block storage volume was mounted to the database server.
- BlogEngine: A single VM was used to implement this .Net blog platform built on IIS and Microsoft SQL Server.
Within this mix, we have three different operating systems, three different programming languages, and three different sets of application tiers, giving us a good variety to observe.
The instance types
Benchmarking different clouds can be challenging because there are not always apples-to-apples comparisons among different instance types. Any mix of instance types for a set of tests like this is arguable. For this experiment, we used the following configurations.
The intent here was to get a variety of different CPU and memory sizes. While Google and Amazon instance types offer a closer 1:1 comparison, Azure instance types were chosen to align with CPU.
For each test, the CliQr benchmarking tool deployed the entire application on the cloud in question, created an additional VM to house the JMeter client, executed the JMeter script provided, measured the transactional throughput, then turned off all VMs. A JMeter script imparted 5,000 transactions for Pet Clinic, 6,000 transactions for OpenCart, and 7,000 transactions for BlogEngine.
All VMs involved in a particular test were set to the same instance type. For example, the Google n1-standard-4 test for Pet Clinic involved an n1-standard-4 instance type for the load generator, the load balancer, both Tomcat servers, and the database server. This was done to simplify the testing, but in a real-world scenario, one would typically introduce permutations in the testing to benchmark a range of instance sizes within the tiers of a particular application.
Each test was run on five different days within one week. The results in the graphs below show the average transactional throughput for each permutation.
Pet Clinic results
Given that there are more VMs involved in handling load, we see a higher transactional throughput for Pet Clinic than for the other test applications in our sample. In these tests, Amazon consistently delivered better performance, followed by Google, then Azure. A closer look at the data shows that Amazon is also slightly cheaper for each set of instance types.
Within the Amazon results, which is the best instance type to use for this application? That somewhat depends on whether the business priority is low cost or high speed. That said, the graph clearly shows that the increase in performance above the m4.xlarge instance type is smaller than the subsequent increase in cost. This means the best combination of price/performance can be found in the m4.large or the m4.xlarge (Amazon’s two- or four-CPU instance).
You’ll notice that the OpenCart tests generated far fewer transactions per second compared to the Pet Clinic tests, which is likely due to the simpler application architecture. When comparing clouds, the OpenCart results show a much better picture for Google. Is that because a two-tier application has fewer networking needs, demonstrating that Amazon has a better network? Is it because Google does better with PHP applications, or because Google is more finely tuned to Ubuntu? Or perhaps there are other reasons? Further detailed testing would reveal the answers, but this test shows how differently applications run on different clouds.
Throughput on BlogEngine is similar to what we saw for OpenCart, but this set of tests used Microsoft technologies, so it is not surprising to see Azure do better here compared to the tests of the Java and LAMP apps. A similar knee in the price-performance curve is seen between four and eight CPUs, with the performance benefits leveling off after four CPUs, as we saw in some of the other results.
Determining which application should run on which cloud is a complicated task. In this set of tests, we have seen how black-box testing can help you compare price and performance of different instance types both across and within public clouds. Had we included private clouds like those based on VMware, OpenStack, or CloudStack, we could have drawn more extensive price/performances comparisons. In addition, we could have extended the testing by using monitoring tools like Nagios, AppDynamics, or New Relic, which could tell us if the Azure instances were constricted by the lower memory sizes.
For the purposes of public cloud comparisons, the CliQr CloudCenter black-box approach provides a good start. Ultimately, each organization has different key indicators to optimize, and benchmarking tools can help generate apples-to-apples comparisons for better business decisions.
Pete Johnson is senior director of product evangelism at CliQr.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to firstname.lastname@example.org.