DATA SCIENCE: Defining the Pieces of the Data Puzzle
Top Data Science Terms Defined
HINE RNIN G
Machine Learning involves training computers, through repeated presentation of observations and outcomes, to make predictions that are not obvious to a person.
Data analytics is the most common use of data in organizations, used to produce reports. Data analysts are often extracting data from relational databases (like SQL Server and Oracle ) and presenting them as reports and corporate dashboards.
DATA SCIENCE • Advanced usage of statistical tools and methods • Programming in at least one data science language (e.g. Python, R) • Extract and manipulate data from diverse data sources • Use machine learning methods, such as clustering and random forests
DATA ANALYTICS • Manipulate databases using SQL • Use dashboard tools and design effective dashboards. • Utilize statistical tools to maintain data integrity • Produce effective, clear charts that inform, rather than confuse, decision-makers
S A T A D
N E I C
This umbrella term encompasses the other data science disciplines. The data scientist is often attempting to create new knowledge from existing data— e.g. by producing predictions.
DA TA A
ARTIFICIAL INTELLIGENCE (AI)
Artificial Intelligence (AI) has been around since the 1950s, but today’s AI researchers use cutting-edge technologies such as deep learning (previously known as neural networks), Natural Language Process, or NLP (used in conversational user interfaces), and image processing (as used in products such as self-driving cars).
• Expertise in one or more modeling techniques/tools • Understand the statistical basis of algorithms • Programming in popular machines learning languages, such as Python • Work closely with data scientists to ensure machine learning technology delivers results for the organization
ARTIFICIAL INTELLIGENCE (AI) • Highly specialized; Skill requirements determined by area of research and niche expertise
Big data describes working with data that is too large to be processed using standard (e.g. workstation, single server) tools. The two most common big data platforms are Hadoop and Spark. Platforms like Tableau are popular with those looking to do data analytics with big data.
• Operate and manage clusters of networked computers • Maintain high availability of the cluster • Understand how cyber security issues can affect big data • Programming using enterprise languages, such as Java and Scala
Learn More at LearningTree.com/DataScience