Course:             STAT 4310  Data Mining

                                                                        Instructor:          Dr. Xuelei (Sherry) Ni

                                                                        Office:               Science 462

                                                                        Office Hours:     By appointment     

                                                                        Email:               xni2@kennesaw.edu

 

 

 

Course Pre-requisite:

 

Basic knowledge of algebra, discrete math, college level calculus, and statistics. According to the semester system, you should have taken STAT3120 and STAT3130.  If you are uncertain about your prerequisite knowledge for this class, please review the appendixes in the course textbook.

 

 

Course Text (Suggested, not Required):

 

·         Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to Data Mining.  Addison Wesley. ISBN: 0-321-32136-7.  Website:  http://www-users.cs.umn.edu/~kumar/dmbook/index.php

·         Jiawei Han and Micheline Kamber (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers. ISBN 1-55860-489-8

·         Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlab.  ISBN 0-387-95284-5

·         David Hand, Heikki Mannila, and Padhraic Smyth (2001).  Principles of Data Mining. The MIT Press.  ISBN: 0-262-08290-X

 

 

Course Software:

 

This course will utilize SAS frequently.  Some Matlab application will also be shown.  Students are also encouraged to use any data-mining software package.  This class will focus on the method, not the software.  Students are supposed to become familiar with the software by themselves.

 

 

Course Description:

 

Data Mining is an information extraction activity whose goal is to discover hidden facts contained in databases, perform prediction and forecasting, and generally improve their performance through interaction with data. The process includes data selection, cleaning, coding, using different statistical, pattern recognition and machine learning techniques, and reporting and visualization of the generated structures. The course will cover all these issues and will illustrate the whole process by examples of practical applications. The students will be encouraged to use recent Data Mining software.

 

 

Course Objectives:

 

  1. To introduce students to the basic concepts and techniques of Data Mining;                                                          
  2. To provide hands-on experience in applying the concepts to real-world applications;
  3. To gain experience of doing independent study and research.

 

 

 


                                                                        Course:             STAT 4310  Data Mining

                                                                        Instructor:          Dr. Xuelei (Sherry) Ni

                                                                        Office:               Science 462

                                                                        Office Hours:     By appointment     

                                                                        Email:               xni2@kennesaw.edu

 

 

Content:

 

 

 

Learning Objectives:

 

Upon completion of the course, students will be able to

1.       Fully appreciate the concept of data as a strategic resource;

2.       Understand how and when data mining can be used as a problem-solving technique;

3.       Describe different methods of data mining;

4.       Select an appropriate data mining technique for a specific problem;

5.       Use existing data mining software to mine a prepared data set;

6.       Describe the preprocessing, the analysis, and the results clearly in writing and orally;

7.       Assess data analyses performed by others.

 

 

Grading:

                         

1.       Class Attendance + Discussion        10%

2.       Homework                                       20 %

3.       Take-home Midterm                         30 %    

4.       Final Project                                    40 %  (Presentation 15% + Report 25%)

 

 

                                                                        Course:             STAT 4310  Data Mining

                                                                        Instructor:          Dr. Xuelei (Sherry) Ni

                                                                        Office:               Science 462

                                                                        Office Hours:     By appointment     

                                                                        Email:               xni2@kennesaw.edu

 

 

 

Each student who chooses this project will be expected to read, review and present a paper from the research literature.  Candidate papers will be provided on the class website. 

For presentation, you should view this as if you were presenting the paper at a conference - be prepared to answer detailed technical questions. If you feel the work has problems, feel free to critique it. For review report, please find some guidelines on the class website. 

Students who choose this project can work as a team with up to 3 persons in each group.  You will choose your own interest area, the data set, and select the appropriate method learned from this class to apply on the data, and show the results (either benefits, improvements, or drawbacks)

Each group should submit a project report. The project report should roughly include:
   1. Motivation of the project.
   2. Existing approaches
   3. Method you choose and the reason, or model you create for the specific  problem and the frame work

   4. Experimental studies and conclusions.

Presentation requirement is the same as above.  And each group member should show for his/her contribution to the project.

 

 

POLICIES:

 

Attendance & Assignment Policies:  You are expected to attend all classes, and turn in homework sets, take-home exam, and final report by the due dates.  Late submission will NOT BE ACCEPTED.  While discussion/study groups are encouraged, you are expected to do your own work on homework problems that are turned in. 

 

Withdrawal Policy…The last day to withdraw from the course and possibly receive a "W" is __________________.

 

Students who find that they cannot continue in college for the entire semester after being enrolled, because of illness or any other reason, need to complete an online form. To completely or partially withdraw from classes at KSU, a student must withdraw online at www.kennesaw.edu, under Owl Express, Student Services.

 

The date the withdrawal is submitted online will be considered the official KSU withdrawal date which will be used in the calculation of any tuition refund or refund to Federal student aid and/or HOPE scholarship programs. It is advisable to print the final page of the withdrawal for your

 

                                                                        Course:             STAT 4310  Data Mining

                                                                        Instructor:          Dr. Xuelei (Sherry) Ni

                                                                        Office:               Science 462

                                                                        Office Hours:     By appointment     

                                                                        Email:               xni2@kennesaw.edu

 

 

 

records. Withdrawals submitted online prior to midnight on the last day to withdraw without academic penalty will receive a “W” grade. Withdrawals after midnight will receive a “WF”. Failure

to complete the online withdrawal process will produce no withdrawal from classes. Call the Registrar’s Office at 770-423-6200 during business hours if assistance is needed.

 

Students may, by means of the same online withdrawal and with the approval of the university Dean, withdraw from individual courses while retaining other courses on their schedules. This option may be exercised up until _______________ .

 

This is the date to withdraw without academic penalty for Fall Term, 2006 classes. Failure to withdraw by the date above will mean that the student has elected to receive the final grade(s) earned in the course(s). The only exception to those withdrawal regulations will be for those instances that involve unusual and fully documented circumstances.

 

Academic Integrity:  Each student is responsible for upholding the provisions of the Student Code of Conduct, as published in the Undergraduate and Graduate Catalogs.  For any questions involving these or any other Academic Honor Code issues, please consult http://www.kennesaw.edu/judiciary/code.conduct.shtml

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

                                                                        Course:             STAT 4310  Data Mining

                                                                        Instructor:          Dr. Xuelei (Sherry) Ni

                                                                        Office:               Science 462

                                                                        Office Hours:     By appointment     

                                                                        Email:               xni2@kennesaw.edu

 

 

 

WEEKLY SCHEDULE (Subject to change)

 

Week

Lect #

Topic

Notes

week 1

1

Topic 1. Introduction

Syllabus

2

Topic 2. Data (Sampling & Cleaning)

 

week 2

3

Topic 2. Data (Descriptive Statistics)

 

4

Topic 2. Data (Curse of Dimensionality)

Final Project Initiation Due

week 3

5

Topic 3. Regression (Multiple Linear Regression)

 

6

Topic 3. Regression (Generalized)

 

week 4

7

Discussion

 

8

Topic 4. Classification (Linear Method: LDA)

 

week 5

9

Topic 4. Classification

              (Linear Method: Logistic Regression)

 

10

Topic 4. Classification (Nearest Neighbor)

 

week 6

11

Topic 4. Classification (Bayesian classifiers)

 

12

Topic 4. Classification

              (Over fitting & Model Evaluation)

 

week 7

13

Topic 4. Classification

              (Ensemble Method and Class Imbalance

               Problem)

Midterm assigned

14

Discussion

 

week 8

15

Topic 5. Cluster Analysis (Center-based)

 

16

Topic 5. Cluster Analysis (Hierarchical)

 

week 9

17

Topic 5. Cluster Analysis (Density-based)

Midterm report due

18

Topic 5. Cluster Analysis (Cluster Validation)

 

week 10

19

Discussion

 

20

Topic 6. Association Analysis (Apriori)

 

week 11

21

Topic 6. Association Analysis

              (Maximal, Closed & FP-growth)

 

22

Topic 6. Association Analysis (Pattern Evaluation)

 

week 12

23

Topic 6. Association Analysis

              (Continuous, Categorical, Concept

              Hierarchies)

 

24

Discussion

 

week 13

25

Discussion

 

26

Project Presentation

Final project report due

week 14

 

Project Presentation

 

 

Project Presentation

 

week 15

 

Project Presentation

 

 

Project Presentation