Advanced big data analytics

Learning Outcomes: 
After successful completion of the course students should be able to
To understand several key big data technologies used for storage, analysis and manipulation of data.
To recognize the key concepts of Hadoop framework, MapReduce, Pig, Hive, and No-SQL.
To prepare a sample project in Hadoop API.

Practical Content :-

  •  Configure HIVE with MySQL and perform queries for Create, Alter & Drop Table (for both managed and
  • external tables)
  • Perform advanced HIVE queries (index, view, order by, group by, joins, subqueries, cluster by)
  • Configure PIG and implement various PIG commands, implement same programs using PIG script
  • Perform import and export database/tables from/to hadoop/RDBMS using Sqoop (Use various options
  • like custom number of mappers, delimiters, change default directory, etc.)
  • Implement advanced mapreduce programsusing joins, counters and sorting
  • Implement various tasks with Apache Spark (verify installation, create RDD, execute word count
  • transformation, cache transformations and check output)
  • Perform Data Visualization using various Tableau features
  • Prepare a case study/survey presentation on Big Data security and visualization
Syllabus: 
Unit NoTopics
1

AdvancedMapReduce:

MapReduce Joins, Sorting, Counters in MapReduce, Real Time MapReduce

2

PIG:

Introduction, Execution Modes, Pig Latin Basics, PIG OperatorsJoining data-sets, user defined functions

3

Hive:

Hive overview and concepts, Comparison with traditional Databases, HiveQL, Hive tables, Partitioning, Bucketing, Joins

4

SQOOP:

Introduction, SQOOP Connectors, Import and Export using SQOOP

5

SCALA and SPARK: 

SCALA: 

What is Scala? Basic Operations, variable types, control structure, foreach loop, functions, procedures, array, higher order functions, Class in Scala, getters and setters, constructor, singletons, traits

SPARK:

Spark Components & its Architecture, Spark Deployment Modes, Spark RDDs, RDD operations, transformations and actions, data loading and saving, Key-Value Pair RDDs, RDD Persistence, SPARK SQL, dataframes and datasets, JSON and Parquet file formats, 

6

Tableau:

Tableau installation,Data type, file type,tool type,show me menu,Type of data source supported by,how to connect different datasource,edit metadata, filter fields,filter data source,type of charts,filter data,data joining,data blending,extract data,adding filter data,apply filter on chart and data,number functions,string functions.

7

Big Data Issues:

Privacy, Visualization, Compliance and Security

Text Books: 
Name : 
Hadoop: The definitive guide
Author: 
By Tom White
\
Publication: 
O'Reilly Media
Reference Books: 
Name: 
Hadoop for Dummies
Author: 
by Dirk Deroos
Name: 
Hadoop in Action
Author: 
by Chuck Lam
Syllabus PDF: 
AttachmentSize
PDF icon ABDA.pdf222.32 KB
branch: 
BDA
Course: 
2018
Stream: 
B.Tech