An IO Efficient Distributed Approximation Framework Using Cluster Sampling

Abstract: In this paper, we present an I/O efficient distributed approximation framework to support approximations on arbitrary sub datasets of a large dataset. Due to the prohibitive storage overhead of caching offline samples for each sub-dataset, existing offline sample-based systems provide high accuracy results for only a limited number of sub-datasets, such as the popular ones. On the other hand, current online sample-based approximation systems, which generate samples at runtime, do not take into account the uneven storage distribution of a sub-dataset. They work well for uniform distribution of a sub-dataset while suffer low I/O efficiency and poor estimation accuracy on unevenly distributed sub-datasets. To address the problem, we develop a distribution aware method called CLAP (cluster sampling based approximation). Our idea is to collect the occurrences of a subdataset at each logical partition of a dataset (storage distribution) in the distributed system, and make good use of such information to enable I/O efficient online sampling. There are three thrusts in CLAP. First, we develop a probabilistic map to reduce the exponential number of recorded sub-datasets to a linear one. Second, we apply the cluster sampling with unequal probability theory to implement a

Turn static files into dynamic content formats.

Create a flipbook

An IO Efficient Distributed Approximation Framework Using Cluster Sampling

Published on Oct 2, 2019Programming

ieeeprojectchennai

An IO Efficient Distributed Approximation Framework Using Cluster Sampling IEEE PROJECTS 2019-2020 TITLE LIST Call Us: +91-7806844441,8144199666 Mail Us: 9chennai@gmail.com Website: : http://www.ieeeproject.net : http://www.okokprojects.com : http://www.ieee-projects-chennai.com WhatsApp : +91-7806844441 Chat Online: https://goo.gl/p42cQt Support Including Packages ======================= * Complete Source Code * Complete Documentation * Complete Presentation Slides * Flow Diagram * Database File * Screenshots * Execution Procedure * Readme File * Video Tutorials * Supporting Softwares Support Specialization ======================= * 24/7 Support * Ticketing System * Voice Conference * Video On Demand * Remote Connectivity * Document Customization * Live Chat Support