Apache spark excellent data processing but needs a few improvements

Page 1

Apache Spark: Excellent Data Processing, but Needs a Few Improvements

Apache Spark is a data processing toolkit that allows you to work with large data sets without having to consider the underlying infrastructure. It provides the necessary abstraction to process huge chunks of data in real time or offline mode. Although there are other options available when you are looking for a suitable data processing framework (say Apache Samza or Apache Storm), Apache Spark is preferred because of its accelerated speed in data analysis, which is achieved through an improved implementation of MapReduce by keeping data in memory rather than on disk. This article takes an objective look at the best and the can-be-improved features of Apache Spark. The best implementation scenarios of Apache Spark 

Powerful real-time data processing


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.