Computational Data Analysis:
FOUNDATIONS OF MACHINE LEARNING & DATA MINING

Spring 2007 graduate course
CS 8803-MDM

Alexander Gray
College of Computing, Georgia Tech
pdf
What's this course about?
This course covers the fundamental theoretical concepts needed to answer the practical questions which quickly arise in real data analysis and is a PhD-level course which prepares students to do research in machine learning. Machine learning, or pattern recognition, or computational statistics, or data mining, is a huge field with thousands of methods and mountains of theoretical ideas - in fact statistics, the mathematics of data analysis, is by far the largest area of mathematics. I will attempt to organize the many ideas in ways that reveal the true underlying repeating themes and separate issues which are truly separate. This course will pull together many of the ideas I feel are most helpful in practice based on my experience, a number of which lie outside the current culture of "machine learning" and are not described in any existing machine learning course, to my knowledge.

The course is designed to answer the most fundamental questions about machine learning, such as: How can we conceptually organize the zoo of available methods? How can we sometimes answer the question 'is this method better than this one' using asymptotic theory? How can we sometimes answer the question 'is this method better than this one' using finite datasets? What can we say about the errors our method will make on future data? What's so special about maximum likelihood? What's the 'right' objective function? What's the most reliable way to perform model selection? What does it mean to be statistically rigorous? What are ML people missing by not knowing much about statistics? Should I be a Bayesian? What is the source of certain 'religious' divides? What computer science ideas can make ML methods tractable on particularly large or complex datasets? What are some questions that need answering in the field?

How is the course graded?
1. Quizzes on theory: 30%. There will be a 10-minute multiple-choice quiz at the end of class every Thursday on the previous week's material. Your lowest 5 quiz scores (including zeros for not attending class on quiz days) will be thrown out.

2. Final exam on theory: 20%. This will cover the entire course and contain short-answer and multiple-choice questions. The course will not require that you prove theorems, though the questions may get at whether you understand the proofs of some of the key theorems in the class.

3. Implementations/experiments: 50%. Beginning somewhere around the 8th week of class (once you have the necesssary background), the students in the class will act as a team to experimentally answer several basic questions in the field of the form "What are the main approaches for X and how do they compare?", where X could be anomaly detection, comparing two distributions, robust mixture estimation, etc. You will get various amounts of points for finding, implementing, and comparing methods. All the work done by the class, including source code, will be organized on an internal wiki, and will finally be put on a big webpage which will be useful to other researchers and PhD students in the field, with links from the appropriate wikipedia pages. If enough completeness on a question is achieved, a tech report will be written and possibly submitted for publication in a conference or journal. To give you an idea of how much work will be involved, expect to implement a machine learning method from a description in a book or paper every 2 weeks, in the last ten weeks of class. Exact details regarding the logistics will be announced later in the class.

Should you take this course?
What is the intended audience of this course? Anyone who wants to competently apply or design statistical and machine learning methods on real-world datasets can benefit from understanding the tools I will describe. In this course I'll present the theoretical foundations of machine learning and cover actual methods in a succinct manner, as special cases of more general frameworks and theoretical approaches. For anyone who wants to do research in machine learning, these are the foundations you need to understand.

How does this course relate to other course offerings? This course can be thought of as extending the CoC undergraduate/graduate Intro to Machine Learning course, although it is not strictly necessary to take that course first. It will be very helpful though - machine learning is a big subject with many concepts, so seeing some of the same material twice will not hurt at all. That course goes into certain machine learning methods in more depth and at a slower pace. In ISYE there are courses which also cover a subset of the methods covered in this course in more depth, from the point of view of the culture of statistics. In CoC there is a course called pattern recognition, which also covers a subset of the methods covered in this course in more depth, from the point of view of the culture of computer vision and electrical engineering. Again, seeing some of the same material more than once will not hurt you. There is a course on text mining -- this will be useful for applying machine learning to text data. This course, by contrast, is of a more general nature and does not discuss the special aspects of text data. In the Math department there is a class on Learning Theory, a sub-topic of machine learning which focuses mainly on theoretical bounds for the classification problem. In my class I will briefly cover learning theory but will only hit the highlights which inform practice - if you want to really master proving those kinds of elegant theorems (and you have a background in Real Analysis) check out that course.

How hard is this course? This course is designed for the PhD level. MS students and advanced undergraduate students (who generally have a heavy courseload) are warned of the time commitment required by this course, due to the mathematical nature of the material as well as the time it takes to read about and implement many machine learning methods in the experimental part of the course. Otherwise, if you have the time to put in and the true desire to gain expertise in this topic, students of any level and in any discipline are welcome. Auditing is fine with me but I don't recommend it, due to the pace of the technical concepts.

What background is needed for this course? I will assume very little specific prior knowledge, aside from basic math familiarity (calculus, matrices, probability, random variables) and general principles of computer science (algorithms, data structures); however since this is a PhD-level course I'll assume a certain level of general technical maturity (in other words don't expect the material to be trivial). Familiarity with machine learning will be helpful but is not strictly necessary.
Where and when
TuTh 1:30-3pm, 316 Sustainable Education building. First class: Tuesday 1/9/06. My office hours: Tuesdays 3-5pm, 1312 Klaus Advanced Computing Building. Course mailing list: http://www2.isye.gatech.edu/mailman/listinfo/isye8803e.

Books
Required: All of Statistics by Larry Wasserman ("AOS") and The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman ("ESL"). Recommended: Pattern Recognition and Machine Learning by Bishop.

Syllabus
Topic Reading
Statistical principles
1 Machine learning overview, course logistics
2 Probability and distributions AOS 1.1-1.7, 2.1-2.10, 3.1-3.5.
3 Estimation and learning theory AOS 4.1, 5.1-5.4.
4 Confidence bands and hypothesis testing AOS 6.3, 10.1, 10.2.
5 Maximum likelihood estimation AOS 9.1, 9.3-9.10, 14.1-14.4. ESL 8.2.
6 Bayesian estimation AOS 11.1-11.7, 11.9, 12.3. ESL 8.3.
7 Loss functions and decision theory AOS 6.3, 12.1-12.6, ESL 10.6.
8 Model selection and generalization ESL 7.1-7.10, AOS 13.6, 22.8.
9 Resampling and model combination ESL 7.11, 8.2, 8.7, 8.8. AOS 7.1, 8.1-8.3.
10 Nonparametric estimation AOS 6.2, 7.2, 20.1-20.3.
Problem types and methods/models
11 Density estimation and regression methods I AOS 13.1-13.5, 20.4. ESL 3.1-3.5.
12 Density estimation and regression methods II, classification methods I ESL 2.1-2.9, 6.1-6.8.
13 Classification methods II AOS 22.1-22.6, 22.9. ESL 4.1-4.5.
14 Classification methods III AOS 22.7-22.8. ESL 9.2, 11.3-11.4, 12.1-12.3 13.1, 13.3.
15 Clustering methods ESL 13.3, 14.3.
16 The Georgia Tech Machine Learning (GTML) Project coding, Hamming on research
17 Dimensionality reduction methods I ESL 14.5, 14.6.
18 Dimensionality reduction methods II ESL 14.2, 14.7.
19 Dimensionality reduction methods III, kernelization AOS 22.10, ESL 12.3
20 Graphical models AOS 17.1-6, 18.1-4
21 Temporal methods no reading from class texts
Computational principles
22 Parameter optimization I: general ESL 10.10
23 Parameter optimization II: EM ESL 8.5, AOS 9.13.4
24 Parameter optimization III: convex no reading from class texts
- Group project presentations
25 Parameter optimization IV: linear algebraic and online no reading from class texts
26 Integration and Sampling AOS 24.1-5
27 Generalized N-body problems no reading from class texts
- Final exam