INTRODUCTION TO DATA MINING

Paper Code: 
BAC 331
Credits: 
2
Contact Hours: 
1.00
Max. Marks: 
100.00
Objective: 

Course Objectives:

The course will enable the students to

1. Study and learn the concepts of Data Mining

2. Develop their skills in data mining techniques for data analysis

3. Understand the qualitative and quantitative data analysis using data mining approaches.

Course Outcomes:

Learning outcome (at course level)

Learning and teaching strategies

Assessment Strategies

Students will be able to:

CO1:Explain basic applications, concepts, and techniques of data mining.

CO2: Apply data mining approaches to solve practical problems in a variety of disciplines

CO3: Analyze large sets of data to gain useful business understanding.

CO4: Describe and demonstrate basic data mining algorithms, methods, and tools

CO5: Generate quantitative analysis reports and perform comparative analysis of algorithms for decision making

Approach in teaching:

Interactive Lectures, Discussion, reading assignments, Demonstrations, Group activities, Teaching using advanced IT audio-video tools 

 

Learning activities for the students:

Self-learning assignments, Effective questions, Seminar presentation, Giving tasks.

 

Assessment Strategies

Class test, Semester end examinations, Quiz, Solving problems in tutorials, Assignments, Presentation

 

6.00

Introduction to Data Warehousing: Architecture of Data Warehouse, Data Preprocessing – Need, Data Cleaning, Data Integration &Transformation, Data Reduction, Machine Learning, Pattern Matching. Introduction to Data Mining: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Data Mining Metrics, Data Mining Query Language, Applications of Data Mining.

6.00

Data Mining Techniques: Market Basket Analysis, Data Stores, Customers, Orders, Items, Order Characteristics, Product Popularity, Tracking Marketing Interventions.
Association rules, Frequent item-sets and Frequent pattern mining.
Apriori algorithm, Confidence, Support, use of sampling for frequent item-set, FP tree algorithm, Correlation analysis. Graph Mining, Frequent sub-graph mining.

6.00

Classification & Prediction: Introduction to classification and prediction, Concept of learning, types of learning: unsupervised, semi supervised and supervised.
Classification by Decision tree, construction, performance, attribute selection Issues: missing values, Over-fitting, tree pruning methods, split algorithm based on Information Gain, Gain Ratio, Gini Index,. Classification and Regression Trees (CART) and C 5.0.

6.00

Bayesian Classification: Bayes Theorem, Naïve Bayes classification, Bayesian Belief Networks Classification using Artificial Neural Network: feedforward neural network, backpropagation, Classification using SVM. Prediction: Linear regression, Non-linear regression, logistic regression.

6.00

Clustering: types of Clustering, k-means, Expectation Maximization (M) algorithm, Hierarchical clustering, DBSCAN, Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation, bootstrap.

References: 
  1. Jiawei Han &MichelineKamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  2. Mohanty, Soumendra, “Data Warehousing: Design, Development and Best Practices”, Tata McGraw Hill, 2006
  3. G.K. Gupta, “Introduction to data Mining with case studies”, PHI,Third Edition, 2014.
Academic Year: