Advanced big data analytics

Learning Outcomes: 
After successful completion of the course students should be able to
To understand several key big data technologies used for storage, analysis and manipulation of data.
To recognize the key concepts of Hadoop framework, MapReduce, Pig, Hive, and No-SQL.
To prepare a sample project in Hadoop API.
Syllabus: 
Unit NoTopics
1

Introduction:

Introduction to Big Data The four dimensions of Big Data: volume, velocity, variety, veracity, Drivers for Big Data, Introducing the Storage, Query Stack, Revisit useful technologies and concepts, Real-time Big Data Analytics.

2

Distributed File Systems:

Hadoop Distributed File System, Google File System, Data Consistency.

3

Big Data Storage Models:

Distributed Hash-table, Key-Value Storage Model (Amazon's Dynamo), Document Storage Model (Facebook's Cassandra), Graph storage models

4

Scalable Algorithms:

Mining large graphs, with focus on social networks and web graphs. Centrality, similarity, al-distances sketches, community detection, link analysis, spectral techniques. Map-reduce, Pig Latin, and NoSQL, Algorithms for detecting similar items, Recommendation systems, Data stream  analysis algorithms, Clustering algorithms, Detecting frequent items

5

Employing Hadoop Map Reduce:

Creating the components of Hadoop Map Reduce jobs - Distributing data processing across server farms –Executing Hadoop Map Reduce jobs - Monitoring the progress of job flows - The Building Blocks of Hadoop Map Reduce - Distinguishing Hadoop daemons - Investigating the Hadoop Distributed File System Selecting appropriate execution modes: local, pseudo-distributed, fully distributed.  

6

Big Data Issues:

Privacy, Visualization, Compliance and Security, Structured vs Unstructured Data

Text Books: 
Name : 
Mining of massive datasets
Author: 
AnandRajaraman
Jure Leskovec
Jeffrey Ullman
Reference Books: 
Name: 
An Introduction to Information Retrieval
Author: 
Christopher D. Manning
Prabhakar Raghavan
HinrichSchütze
Name: 
Data-Intensive Text Processing with Map Reduce
Author: 
Jimmy Lin
Chris Dyer
Syllabus PDF: 
branch: 
BDA
Course: 
2018
Stream: 
B.Tech