HADOOP:

 

Big Data and Hadoop Introduction
What is Big Data and Hadoop?
Challenges of Big Data
Traditional approach Vs Hadoop
Hadoop Architecture
Distributed Model

Block structure File System
Technologies supporting Big Data
Replication
Fault Tolerance
Why Hadoop?
Hadoop Eco-System
Use cases of Hadoop
Fundamental Design Principles of Hadoop
Comparison of Hadoop Vs RDBMS

–>Understand Hadoop Cluster Architecture

Hadoop Cluster and Architecture
5 Daemons
Hands-On Exercise
Typical Workflow
Hands-On Exercise
Writing Files to HDFS
Hands-On Exercise
Reading Files from HDFS
Hands-On Exercise
Rack Awareness
Before Map Reduce

–>Map Reduce Concepts

Map Reduce Concepts
What is Map Reduce?
Why Map Reduce?
Map Reduce in real world  and Map Reduce Flow
What is Mapper,  Reducer, and Shuffling?
Word Count Problem
Hands-On Exercise
Distributed Word Count Flow and Solution
Log Processing and Map Reduce
Hands-On Exercise

–>Advanced Map Reduce Concepts

What is Combiner?
Hands-On Exercise
What is Partitioner?
Hands-On Exercise
What is Counter?
Hands-On Exercise
InputFormats/Output Formats
Hands-On Exercise
Map Join using MR
Hands-On Exercise
Reduce Join using MR
Hands-On Exercise
MR Distributed Cache
Hands-On Exercise
Using sequence files & images with MR
Hands-On Exercise
Planning for Cluster & Hadoop 2.0 Yarn
Configuration of Hadoop
Choosing Right Hadoop Hardware and Software?
Hadoop Log Files?

–>Hadoop 2.0 and YARN

Hadoop 1.0 Challenges
NN Scalability, SPOF, and HA
Job Tracker Challenges
Hadoop 2.0 New Features
Hadoop 2.0 Cluster Architecture & Federation
Hadoop 2.0 HA
Yarn & Hadoop Ecosystem
Yarn MR Application Flow

–>PIG

Introduction to Pig
What Is Pig?
Pig’s Features & Pig Use Cases
Interacting with Pig
Basic Data Analysis with Pig
Hands-On Exercise
Pig Latin Syntax
Loading Data
Hands-On Exercise
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Hands-On Exercise
Filtering and Sorting Data
Hands-On Exercise
Commonly-Used Functions
Hands-On Exercise: Pig for ETL Processing
Processing Complex Data with Pig
Hands-On Exercise
Storage Formats
Complex/Nested Data Types
Hands-On Exercise
Grouping
Hands-On Exercise
Built-in Functions for Complex Data
Hands-On Exercise
Iterating Grouped Data
Hands-On Exercises
Multi-Dataset Operations with Pig
Hands-On Exercise
Techniques for Combining Data Sets
Joining Data Sets in Pig
Hands-On Exercise
Splitting Data Sets
Hands-On Exercise

–>HIVE

Hive Fundamentals and Architecture
Loading and Querying Data in Hive
Hands-On Exercise
Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types, Operators and Functions
Hands-On Exercise
Hive Tables, Managed Tables and External Tables
Hands-On Exercise
Partitions and Buckets
Hands-On Exercise
Storage Formats, Importing Data, Altering Tables, Dropping Tables
Hands-On Exercise
Querying Data, Sorting and Aggregating, Map Reduce Scripts
Hands-On Exercise
Joins & Sub queries, Views
Hands-On Exercise
Integration, Data manipulation with Hive
Hands-On Exercise
User Defined Functions
Hands-On Exercise
Appending Data into existing Hive Table
Hands-On Exercise
Static partitioning vs dynamic partitioning
Hands-On Exercise

–>HBASE

CAP Theorem
HBase Architecture and concepts
Introduction to HBase
Client API’s and their features
HBase tables The ZooKeeper Service
Data Model, Operations

–>SQOOP

Introduction to Sqoop
MySQL Client & server
Connecting to relational data base using Sqoop
Importing data using Sqoop from Mysql
Exporting data using Sqoop to MySql
Incremental append
Importing data using Sqoop from Mysql to hive
Exporting data using Sqoop to MySql from hive
Importing data using Sqoop from Mysql to hbase
Using queries and sqoop

–>Flume and Oozie

What is Flume?
Why use Flume, Architecture, configurations
Master, collector, Agent
Twitter Data Sentimental Analysis project
Oozie
What is Oozie, Architecture, configurations?
Oozie Job Submission
Oozie properties
Hands-on exercises

–>Projects

Social Media Final Project
Hadoop Project
Objective
Problem Definition
Solution
Discuss datasets and specifications of the project

1)Project in Healthcare Domain

Hadoop Project in Healthcare
Objective
Problem Definition
Solution
Discuss datasets and specifications of the project

2)Project in Finance/Banking Domain

Hadoop Project in Banking Domain
Objective
Problem Definition
Solution
Discuss datasets and specifications of the project

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *