International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020
p-ISSN: 2395-0072
www.irjet.net
An Overview of Machine Learning Algorithms for Data Science Rupesh Deshmukh1, Milind Kubal2 1,2Student,
Dept. of Computer Engineering, Terna Engineering College, Nerul, Navi Mumbai, India ---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract – Data Science has become a huge trend over
the past few years. Organizations all over the world have realized the true intrinsic value of their data and the demand for data scientists has risen tremendously. Setting up Business Intelligence departments and making datadriven decisions has gained popularity. Uncovering knowledge and hidden patterns from huge chunks of data can prove highly beneficial to an organization in terms of profit or otherwise. But analyzing the data in spreadsheets for this information with the naked eye turns out to be timeconsuming and highly inefficient. Various machine learning algorithms have been designed over the past decade to make the data classification and information extraction process effortless. In this paper, we describe some of the basic machine learning algorithms that every data science enthusiast should be familiar with. Key Words: Data Science, Machine Supervised, Unsupervised, Algorithms
Learning,
1. INTRODUCTION Fig -1: Data Science Lifecycle
Data Science is the art of uncovering patterns and information from a huge chunk of data, that can prove beneficial to the business. The first step is to understand the business model and try to comprehend its goals and its vision for its customers. It is possible to find something in data if only we know what we are looking for. Most datasets are never perfect enough in their raw form to be able to extract information from it. The data needs to be cleaned, sorted and even scaled [1]. Missing values are to be handled and inconsistencies and redundancies have to be taken care of. After the data cleaning is complete, it is examined visually. If processing is performed on all the data available, it might take months to complete the analysis. To make it feasible, some important features are selected from the complete dataset, and other features are discarded. This saves a lot of unnecessary calculations and processing time. Finally, when the targeted dataset is ready, it can be processed using various machine learning algorithms.
Machine Learning has a huge variety of applications and they are growing every day. The algorithms in machine learning learn from the training data and tune the algorithm’s parameters accordingly to accurately understand and classify the real-world data. The part of the dataset selected for training always contains the same features as the data to be classified. Machine Learning can be considered as one of the most important tools in a data science professional’s toolset. Nowadays, a huge number of organizations rely on decisions backed by these algorithmic findings from the data. As not all the data can be considered useful, it is up to a data scientist to deal with redundant or incomplete data, select the appropriate algorithms and work with them to get the best insights out of the datasets.
The insights derived from the data can be in more than one form. They can be patterns, strays and even predictions for future data. These insights are easy to comprehend for data experts, but might not be understandable by ordinary business people. Hence, it needs to be presented visually in a way that could be understood by anyone. This is the final step of the data science process, and this presented information can be used for further business operations.
© 2020, IRJET
|
Impact Factor value: 7.34
Fig -2: The Machine Learning Process
|
ISO 9001:2008 Certified Journal
|
Page 1149