Course:
STAT 4310 Data Mining
Instructor: Dr. Xuelei (Sherry) Ni
Office: Science 462
Office
Hours: By appointment
Email: xni2@kennesaw.edu
Course
Pre-requisite:
Basic knowledge of algebra, discrete
math, college level calculus, and statistics. According to the semester system,
you should have taken STAT3120 and STAT3130.
If you are uncertain about your prerequisite knowledge for this class,
please review the appendixes in the course textbook.
Course
Text (Suggested, not Required):
·
Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar (2005). Introduction to Data Mining. Addison Wesley. ISBN: 0-321-32136-7. Website:
http://www-users.cs.umn.edu/~kumar/dmbook/index.php
·
Jiawei Han and Micheline Kamber (2000). Data
Mining: Concepts and Techniques. Morgan Kaufmann Publishers. ISBN
1-55860-489-8
·
Trevor
Hastie, Robert Tibshirani,
and Jerome Friedman (2001). The Elements
of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlab. ISBN 0-387-95284-5
·
David Hand, Heikki Mannila,
and Padhraic Smyth (2001). Principles
of Data Mining. The MIT Press. ISBN:
0-262-08290-X
Course
Software:
This course will utilize SAS
frequently. Some Matlab application will
also be shown. Students are also encouraged to use
any data-mining software package. This
class will focus on the method, not the software. Students are supposed to become familiar with
the software by themselves.
Course
Description:
Data Mining is an information extraction activity whose goal is to
discover hidden facts contained in databases, perform prediction and forecasting,
and generally improve their performance through interaction with data. The
process includes data selection, cleaning, coding, using different statistical,
pattern recognition and machine learning techniques, and reporting and
visualization of the generated structures. The course will cover all these
issues and will illustrate the whole process by examples of practical
applications. The students will be encouraged to use recent Data Mining
software.
Course
Objectives:

Course:
STAT 4310 Data Mining
Instructor: Dr. Xuelei (Sherry) Ni
Office: Science 462
Office
Hours: By appointment
Email: xni2@kennesaw.edu
Content:
Learning
Objectives:
Upon completion of the course, students will be able to
1.
Fully appreciate the concept of data as a strategic resource;
2.
Understand how and when data mining can be used as a
problem-solving technique;
3.
Describe different methods of data mining;
4.
Select an appropriate data mining technique for a specific
problem;
5.
Use existing data mining software to mine a prepared data
set;
6.
Describe the preprocessing, the analysis, and the results
clearly in writing and orally;
7.
Assess data analyses performed by others.
Grading:
1. Class Attendance + Discussion 10%
2.
Homework 20 %
3.
Take-home
Midterm 30 %
4. Final Project 40 % (Presentation
15% + Report 25%)
Course:
STAT 4310 Data Mining
Instructor: Dr. Xuelei (Sherry) Ni
Office: Science 462
Office
Hours: By appointment
Email: xni2@kennesaw.edu
Each student who chooses this project will be expected to
read, review and present a paper from the research literature. Candidate papers will be provided on the
class website.
For presentation, you should view this as if you were
presenting the paper at a conference - be prepared to answer detailed technical
questions. If you feel the work has problems, feel free to critique it. For
review report, please find some guidelines on the class website.
Students who choose this project can work as a team with up
to 3 persons in each group. You will choose
your own interest area, the data set, and select the appropriate method learned
from this class to apply on the data, and show the results (either
benefits, improvements, or drawbacks)
Each group should submit a project report. The project
report should roughly include:
1. Motivation of the project.
2. Existing approaches
3. Method you choose and the reason,
or model you create for the specific problem and the frame work
4. Experimental
studies and conclusions.
Presentation requirement is the same as above. And each group member should show for his/her
contribution to the project.
POLICIES:
Attendance & Assignment
Policies: You are expected to attend all
classes, and turn in homework sets, take-home exam, and final report by the due
dates. Late submission will NOT BE
ACCEPTED. While discussion/study groups
are encouraged, you are expected to do your own work on homework
problems that are turned in.
Withdrawal Policy…The last day to
withdraw from the course and possibly receive a "W" is
__________________.
Students
who find that they cannot continue in college for the entire semester after
being enrolled, because of illness or any other reason, need to complete an online
form. To completely or partially withdraw from classes at KSU, a student must
withdraw online at www.kennesaw.edu, under Owl
Express, Student Services.
The date
the withdrawal is submitted online will be considered the official KSU
withdrawal date which will be used in the calculation of any tuition refund or
refund to Federal student aid and/or HOPE scholarship programs. It is advisable
to print the final page of the withdrawal for your
Course:
STAT 4310 Data Mining
Instructor: Dr. Xuelei (Sherry) Ni
Office: Science 462
Office
Hours: By appointment
Email: xni2@kennesaw.edu
records. Withdrawals submitted online prior to
to
complete the online withdrawal process will produce no withdrawal from classes.
Call the Registrar’s Office at 770-423-6200 during business hours if assistance
is needed.
Students
may, by means of the same online withdrawal and with the approval of the
university Dean, withdraw from individual courses while retaining other courses
on their schedules. This option may be exercised up until _______________ .
This is the date to withdraw without academic penalty
for Fall Term, 2006 classes. Failure to withdraw by the date above will mean
that the student has elected to receive the final grade(s) earned in the
course(s). The only exception to those withdrawal regulations will be for those
instances that involve unusual and fully documented circumstances.
Academic
Integrity: Each student is responsible for
upholding the provisions of the Student Code of Conduct, as published in the
Undergraduate and Graduate Catalogs. For
any questions involving these or any other Academic Honor Code issues, please
consult http://www.kennesaw.edu/judiciary/code.conduct.shtml
Course:
STAT 4310 Data Mining
Instructor: Dr. Xuelei (Sherry) Ni
Office: Science 462
Office
Hours: By appointment
Email: xni2@kennesaw.edu
WEEKLY SCHEDULE (Subject to change)
|
Week |
Lect # |
Topic |
Notes |
|
week 1 |
1 |
Topic 1. Introduction |
Syllabus |
|
2 |
Topic 2. Data (Sampling &
Cleaning) |
|
|
|
week 2 |
3 |
Topic 2. Data (Descriptive
Statistics) |
|
|
4 |
Topic 2. Data (Curse of
Dimensionality) |
Final Project Initiation Due |
|
|
week 3 |
5 |
Topic 3. Regression (Multiple
Linear Regression) |
|
|
6 |
Topic 3. Regression (Generalized) |
|
|
|
week 4 |
7 |
Discussion |
|
|
8 |
Topic 4. Classification (Linear
Method: LDA) |
|
|
|
week 5 |
9 |
Topic 4. Classification (Linear Method: Logistic
Regression) |
|
|
10 |
Topic 4. Classification (Nearest
Neighbor) |
|
|
|
week 6 |
11 |
Topic 4. Classification (Bayesian
classifiers) |
|
|
12 |
Topic 4. Classification (Over fitting & Model Evaluation) |
|
|
|
week 7 |
13 |
Topic 4. Classification (Ensemble Method and Class
Imbalance Problem) |
Midterm assigned |
|
14 |
Discussion |
|
|
|
week 8 |
15 |
Topic 5. Cluster Analysis
(Center-based) |
|
|
16 |
Topic 5. Cluster Analysis
(Hierarchical) |
|
|
|
week 9 |
17 |
Topic 5. Cluster Analysis
(Density-based) |
Midterm report due |
|
18 |
Topic 5. Cluster Analysis (Cluster
Validation) |
|
|
|
week 10 |
19 |
Discussion |
|
|
20 |
Topic 6. Association Analysis (Apriori) |
|
|
|
week 11 |
21 |
Topic 6. Association Analysis (Maximal, Closed &
FP-growth) |
|
|
22 |
Topic 6. Association Analysis (Pattern
Evaluation) |
|
|
|
week 12 |
23 |
Topic 6. Association Analysis (Continuous, Categorical,
Concept Hierarchies) |
|
|
24 |
Discussion |
|
|
|
week 13 |
25 |
Discussion |
|
|
26 |
Project Presentation |
Final project report due |
|
|
week 14 |
|
Project Presentation |
|
|
|
Project Presentation |
|
|
|
week 15 |
|
Project Presentation |
|
|
|
Project Presentation |
|