What's this course about?
This course covers the fundamental theoretical concepts needed to
answer the practical questions which quickly arise in real data
analysis and is a PhD-level course which prepares students to do
research in machine learning. Machine learning, or pattern
recognition, or computational statistics, or data mining, is a huge
field with thousands of methods and mountains of theoretical ideas -
in fact statistics, the mathematics of data analysis, is by far the
largest area of mathematics. I will attempt to organize the many
ideas in ways that reveal the true underlying repeating themes and
separate issues which are truly separate. This course will pull
together many of the ideas I feel are most helpful in practice based
on my experience, a number of which lie outside the current culture of
"machine learning" and are not described in any existing machine
learning course, to my knowledge.
The course is designed to answer the most fundamental questions about
machine learning, such as: How can we conceptually organize the zoo of
available methods? How can we sometimes answer the question 'is this
method better than this one' using asymptotic theory? How can we
sometimes answer the question 'is this method better than this one'
using finite datasets? What can we say about the errors our method
will make on future data? What's so special about maximum likelihood?
What's the 'right' objective function? What's the most reliable way
to perform model selection? What does it mean to be statistically
rigorous? What are ML people missing by not knowing much about
statistics? Should I be a Bayesian? What is the source of certain
'religious' divides? What computer science ideas can make ML methods
tractable on particularly large or complex datasets? What are some
questions that need answering in the field?
How is the course graded?
1. Quizzes on theory: 30%. There will be a 10-minute multiple-choice
quiz at the end of
class every Thursday on the previous week's material.
Your lowest 5 quiz scores (including zeros for not attending class on quiz
days) will be thrown out.
2. Final exam on theory: 20%. This will cover the entire course and
contain short-answer and multiple-choice questions. The course will not
require that you prove theorems, though the questions may get at whether
you understand the proofs of some of the key theorems in the class.
3. Implementations/experiments: 50%. Beginning somewhere around the 8th
week of class (once you have the necesssary background), the students
in the class will act as a team to experimentally answer several basic
questions in the field of the form "What are the main approaches for X
and how do they compare?", where X could be anomaly detection,
comparing two distributions, robust mixture estimation, etc. You will
get various amounts of points for finding, implementing, and comparing
methods. All the work done by the class, including source code, will
be organized on an internal wiki, and will finally be put on a big
webpage which will be useful to other researchers and PhD students in
the field, with links from the appropriate wikipedia pages. If enough
completeness on a question is achieved, a tech report will be written
and possibly submitted for publication in a conference or journal. To
give you an idea of how much work will be involved, expect to
implement a machine learning method from a description in a book or
paper every 2 weeks, in the last ten weeks of class. Exact details
regarding the logistics will be announced later in the class.
Should you take this course?
What is the intended audience of this course?
Anyone who wants to competently apply or design statistical and
machine learning methods on real-world datasets can benefit from
understanding the tools I will describe. In this course I'll present
the theoretical foundations of machine learning and cover actual
methods in a succinct manner, as special cases of more general frameworks and
theoretical approaches. For anyone who wants to do research in
machine learning, these are the foundations you need to understand.
How does this course relate to other course offerings? This course
can be thought of as extending the CoC undergraduate/graduate Intro to
Machine Learning course, although it is not strictly necessary to take
that course first. It will be very helpful though - machine learning
is a big subject with many concepts, so seeing some of the same
material twice will not hurt at all. That course goes into certain
machine learning methods in more depth and at a slower pace. In ISYE
there are courses which also cover a subset of the methods covered in
this course in more depth, from the point of view of the culture of
statistics. In CoC there is a course called pattern recognition,
which also covers a subset of the methods covered in this course in
more depth, from the point of view of the culture of computer vision
and electrical engineering. Again, seeing some of the same material
more than once will not hurt you. There is a course on text mining --
this will be useful for applying machine learning to text data. This
course, by contrast, is of a more general nature and does not discuss
the special aspects of text data. In the Math department there is a
class on Learning Theory, a sub-topic of machine learning which
focuses mainly on theoretical bounds for the classification problem.
In my class I will briefly cover learning theory but will only hit the
highlights which inform practice - if you want to really master
proving those kinds of elegant theorems (and you have a background in
Real Analysis) check out that course.
How hard is this course?
This course is designed for the PhD level. MS students and advanced
undergraduate students (who generally
have a heavy courseload) are warned of the time commitment required
by this course, due to the mathematical nature of
the material as well as the time it takes to read about and implement
many machine learning methods in the experimental part of the course.
Otherwise, if you have the time to put in and the true desire to gain
expertise in this topic, students of any level and in any discipline are
welcome. Auditing is fine with me but I don't recommend it, due to the pace
of the technical concepts.
What background is needed for this course?
I will assume very little specific prior knowledge, aside from basic
math familiarity (calculus, matrices, probability, random variables)
and general principles of computer science (algorithms, data
structures); however since this is a PhD-level course I'll assume a
certain level of general technical maturity (in other words don't
expect the material to be trivial). Familiarity with machine learning
will be helpful but is not strictly necessary.
|
Where and when
TuTh 1:30-3pm, 316 Sustainable Education building.
First class: Tuesday 1/9/06.
My office hours: Tuesdays 3-5pm, 1312 Klaus Advanced Computing Building.
Course mailing list:
http://www2.isye.gatech.edu/mailman/listinfo/isye8803e.
Books
Required: All of Statistics by Larry Wasserman ("AOS") and
The Elements of Statistical
Learning by Hastie, Tibshirani, and Friedman ("ESL"). Recommended:
Pattern Recognition and Machine Learning by Bishop.
Syllabus
| |
Topic |
Reading |
|
Statistical principles | |
| 1 |
Machine learning overview, course logistics |
|
| 2 |
Probability and distributions |
AOS 1.1-1.7, 2.1-2.10, 3.1-3.5. |
| 3 |
Estimation and learning theory |
AOS 4.1, 5.1-5.4. |
| 4 |
Confidence bands and hypothesis testing |
AOS 6.3, 10.1, 10.2. |
| 5 |
Maximum likelihood estimation |
AOS 9.1, 9.3-9.10, 14.1-14.4.
ESL 8.2. |
| 6 |
Bayesian estimation |
AOS 11.1-11.7, 11.9, 12.3. ESL 8.3. |
| 7 |
Loss functions and decision theory |
AOS 6.3, 12.1-12.6, ESL 10.6. |
| 8 |
Model selection and generalization |
ESL 7.1-7.10, AOS 13.6, 22.8. |
| 9 |
Resampling and model combination |
ESL 7.11, 8.2, 8.7, 8.8. AOS 7.1,
8.1-8.3. |
| 10 |
Nonparametric estimation |
AOS 6.2, 7.2, 20.1-20.3. |
|
Problem types and methods/models |
|
| 11 |
Density estimation and regression methods I |
AOS 13.1-13.5, 20.4. ESL 3.1-3.5. |
| 12 |
Density estimation and regression methods II,
classification methods I |
ESL 2.1-2.9, 6.1-6.8. |
| 13 |
Classification methods II |
AOS 22.1-22.6, 22.9. ESL 4.1-4.5. |
| 14 |
Classification methods III |
AOS 22.7-22.8. ESL 9.2, 11.3-11.4, 12.1-12.3
13.1, 13.3. |
| 15 |
Clustering methods |
ESL 13.3, 14.3. |
| 16 |
The Georgia Tech Machine Learning (GTML)
Project |
coding,
Hamming on research |
| 17 |
Dimensionality reduction methods I |
ESL 14.5, 14.6. |
| 18 |
Dimensionality reduction methods II |
ESL 14.2, 14.7. |
| 19 |
Dimensionality reduction methods III,
kernelization |
AOS 22.10, ESL 12.3 |
| 20 |
Graphical models |
AOS 17.1-6, 18.1-4 |
| 21 |
Temporal methods |
no reading from class texts |
|
Computational principles |
|
| 22 |
Parameter optimization I: general |
ESL 10.10 |
| 23 |
Parameter optimization II: EM |
ESL 8.5, AOS 9.13.4 |
| 24 |
Parameter optimization III: convex |
no reading from class texts |
| - |
Group project presentations |
|
| 25 |
Parameter optimization IV: linear algebraic and
online |
no reading from class texts |
| 26 |
Integration and Sampling |
AOS 24.1-5 |
| 27 |
Generalized N-body problems |
no reading from class texts |
| - |
Final exam |
|
|
|