Big Data and Data Analytics

Paper Code: 
MBB 422
Credits: 
4
Contact Hours: 
90.00
Max. Marks: 
100.00
Objective: 

Course Objectives: This course enables the students to:

  • Understand Big Data, type of data and applications of Big Data in business.
  • Learn to work on Big Data tools 

 

Course Outcomes:

Course outcome (at course level)

Learning and teaching strategies

Assessment Strategies

CO 136.Describe concepts of Big Data and applications and tools in business domain.

CLO 137.Explain Big data tools with their platform and their utility in business analytics.

CLO 138.Examine and apply tool/s on big data of business domain.

CLO 139.Analyze Big Data using tool/s.

CLO 140.Interpret and communicate results of tool/s.

Approach in teaching:

Interactive Lectures, Group Discussion, Tutorials, Case Study

 

Learning activities for the students:

Self-learning assignments, presentations

Class test, Semester end examinations, Quiz, Assignments, Presentation

 

 

 

18.00
Unit I: 
I

Understanding Big Data

Digital data and its classification, characteristics of data, evolution and definition of big data. Challenges with big data, why big data, Traditional Business intelligence versus Big Data

Big Data Analytics

What is Big data analytics, why sudden hype around big data analytics, classification of analytics, top challenges facing big data, terminologies used in big data environment, Top analytics tools

18.00
Unit II: 
II

Big Data Technology Landscape

Apache Hadoop,Why Hadoop, Comparison with other systems: RDBMS, Grid computing, Hadoop overview, HDFS and its ecosystems, Hadoop architecture and 2.x core components. Managing Resources and applications with Hadoop YARN (Yet Another Resource Negotiator), Understanding MapReduce Programming, Running sample MapReduce program, Executing MapReduce Applications -Word count, Tera Sort, Radix Sort.

Introduction to Hadoop Ecosystem, Pig, Hive, Sqoop, HBase.  

18.00
Unit III: 
III

Pig: Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Pig on Hadoop

Hive: Hive Shell, Architecture, data types, Comparison with Traditional Databases, HiveQL, Tables, User Defined Functions. 

18.00
Unit IV: 
IV

NoSQL: Use of NoSQL, Types of NoSQL, Advantages of NoSQL. Use of No SQL in Industry, NoSQL Vendors, SQL versus NoSQL, NewSQL

Hbase: Hbase basics, Concepts, Clients, Example, Hbase Versus RDBMS.

18.00
Unit V: 
V

Machine Learning using python ,Python installation (Window and Ubuntu), Execution modes of Python ,Executing  Python programs on hadoop, Python Libraries and Tools - Pandas for data analysis, Matplotlib for  data visualization, Numpy for matrix processing, SciPy for image manipulation. Applications of Machine Learning, Implementation of machine learning in Hadoop environment 

Essential Readings: 

*Case studies related to entire topics are to be taught

Suggested Readings:

  • Seema Acharya, Subhasini Chellappan, "Big Data Analytics" Wiley 2015.
  • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
  •  P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence", Addison-Wesley Professional, 2012.
  • Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
  • Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
  •  E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
  • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
  •  Müller, A. C., & Guido, S. (2016). Introduction to machine learning with Python: a guide for data scientists. " O'Reilly Media, Inc.".

 

Academic Year: