Introduction to Hadoop : Architecture and Components

Introduction to Hadoop: Architecture and Components

What is Hadoop?

An open-source framework for distributed storage and processing of large datasets.

Developed by Apache Software Foundation. Based on the MapReduce programming model. Handles big data across multiple nodes in a cluster. Scalable, fault-tolerant, and cost-effective.

Why Use Hadoop?

✅ Scalability – Handles petabytes of data across many machines.

✅ Fault Tolerance – Automatically recovers from failures.

✅ Cost-Effective – Uses commodity hardware.

✅ Parallel Processing – Processes data across multiple nodes simultaneously.

✅ Supports Various Data Types – Structured, semi-structured, and unstructured data.

Hadoop Architecture Overview

1.Master-Slave Architecture:

Master Node – Manages and coordinates the cluster.

Slave Nodes – Store data and perform computations.

2.Core Components:

HDFS (Hadoop Distributed File System) – Storage layer.

MapReduce – Data processing engine.

YARN (Yet Another Resource Negotiator) – Manages resources.

Common Utilities – Shared libraries for Hadoop modules.

Key Components of Hadoop

HDFS (Storage Layer) – Stores data in a distributed manner using blocks.

MapReduce (Processing Layer) – Processes data in parallel using map & reduce tasks.

YARN (Resource Management) – Allocates and manages resources dynamically.

Hadoop Common – Provides utilities for all Hadoop modules.

Hadoop Ecosystem (Additional Tools)

Hive – SQL-like querying for big data.

Pig – High-level scripting language for data transformation.

HBase – NoSQL database on top of HDFS.

Spark – Fast in-memory processing engine.

Oozie – Workflow scheduling for Hadoop jobs.

Flume & Sqoop – Data ingestion from external sources.

Conclusion

Hadoop is a powerful big data framework for distributed storage & processing.

Highly scalable, fault-tolerant, and cost-effective for large-scale data.

Key components: HDFS (storage), MapReduce (processing), and YARN (resource management).

Rich ecosystem with tools like Hive, Pig, Spark, and HBase.

Turn static files into dynamic content formats.

Create a flipbook