Data needs cleaning before machine learning can find meaning

Page 1

Data needs cleaning before machine learning can find meaning

Enterprises large and small need to use all means at their disposal to gain a strategic advantage for their business. To remain competitive, these organizations must make use of contemporary machine learning tools to unlock value and meaning from the wealth of data they have about their customers, products, employees, work processes, and even competitors. Whether their customer service department wants to look at how their customers feel about their products, or their sales and product design teams need to analyze what products would sell when, to whom, with what features, and at what price point, advances in machine learning can enhance existing data models and take this decision making to the next level. So, what do businesses need to unlock the value in their data? After all, most machine learning tools are open source and there are plenty of commercial machine learning platforms that promise to run a number of models on your data. Then why is it so hard to adopt machine learning?

It’s a numbers game As is so often the case, the answer lies in numbers. Machine learning algorithms crunch numbers, so the transactional and analytical data that enterprises have in their databases and data warehouses needs to be selected and prepared to be fed into these algorithms. There are several data preprocessing steps, from data imputation that addresses data sparseness to techniques for normalizing and standardizing data, allowing the appropriate machine learning model to be applied.

Data Imputation and Normalization Common techniques for data imputation provide default values, such as 0 where none exist, or correct for erroneous and outlier data, thereby reducing “data noise�. The process of standardization and normalization utilizes a number of techniques for data preparation based on the type of machine learning model that needs to be utilized for a given business problem or question, such as label encoding and one-hot encoding to transform textual data values into an appropriate set of numeric values that do not introduce artificial correlation within the data.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.