Apache Spark: Excellent Data Processing, but Needs a Few Improvements
Apache Spark is a data processing toolkit that allows you to work with large data sets without having to consider the underlying infrastructure. It provides the necessary abstraction to process huge chunks of data in real time or offline mode. Although there are other options available when you are looking for a suitable data processing framework (say Apache Samza or Apache Storm), Apache Spark is preferred because of its accelerated speed in data analysis, which is achieved through an improved implementation of MapReduce by keeping data in memory rather than on disk. This article takes an objective look at the best and the can-be-improved features of Apache Spark. The best implementation scenarios of Apache Spark 
Powerful real-time data processing