Meeting of the Minds, 2013

Page 56

Performance and Cost Analysis of MapReduce Applications on Public Clouds Author Fan Zhang, Ph.D.

Faculty Advisor Majd Sakr, Ph.D.

Category Postgraduate

Abstract: The MapReduce programming model is a widely accepted solution to address the rapid growth of big-data processing demands. Various MapReduce applications with a very large volume of input data can run on an elastic compute cloud composed of many distributed computing instances. A public cloud provider, such as Amazon EC2, offers a spectrum of cloud resources with varying costs. Cloud users typically rent these elastic cloud resources as virtual machines (VMs) in a pay-as-you-go model to have access to large scale cloud resources. However, different applications scale differently based on their type, behavior and effective use of resources available. In this work, we attempt to characterize how MapReduce performance is affected by increased compute resources for a variety of application types. Since resources on public clouds are rented, we carry out a performance cost analysis in order to assess the efficiency of a suite of MapReduce applications at utilizing a range of compute resources. These applications span across data- and compute-intensive benchmarks. Through empirical evidence, we observe a wide variation in speedup (5.2X to 36.7X) and cost (3.6X to 9.7X), across the applications when the cluster size is increased to 64 VMs. Map-intensive applications, such as TermVector and Grep, show a higher speedup as we increase the number of VMs without a significant increase in cost. However, reduce-intensive applications such as Sort exhibit limited speedup and hence cost a lot more since more resources are utilized for a longer period. Given this wide variation, we measure the efficiency of applications to utilize compute resources as the number of VMs is scaled from 1 to 64. We observe a negative slope in efficiency as the number of VMs is increased across all applications. At 64 VMs, the application efficiency range is from 57% down to 8%. Some applications, such as Sort, exhibit a steep negative slope in efficiency when the number of VMs is increased from 2 to 4. WordCount maintains a high efficiency at 4 VMs but exhibits a steep negative slope when increasing the VMs to 8 onwards. Grep on the other hand exhibits a slight but steady negative slope from 2 to 64 VMs. The efficiency of an application can guide cloud users in choosing appropriate computing resources based on compute resource budgets and deadlines. 49


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.