CS269: Machine Learning Theory -- Fall 2010

Monday/Wednesday, 2-4pm, Boelter 2760
Instructor: Jenn Wortman Vaughan

If you got here looking for lecture notes on learning theory, please see the Fall 2011 version of this course which has more up-to-date and accurate versions of the scribe notes here.

Recent Announcements

November 21: I will hold special office hours this Tuesday, 11/23, 3:00-4:30pm for anyone who would like to chat about problem set 2 or final projects.
November 10: Problem set 2 has been posted.
October 20: Don't forget that problem set 1 and your project proposals are due next week! I will hold special office hours this Friday, 10/22, 3-4pm in my office, Boelter 4532H.
October 11: Problem set 1 and a write-up on project guidelines were both distributed in class today. These have been posted on the website below, as have lecture notes for lectures 2 and 3.
October 4: Scribe assignments were made in class today. If you are taking this class for credit and missed today's lecture, be sure to email me to be added to the alternates list.
September 30: Good news! We are officially moving to a luxuriously spacious new room, Boelter 2760. Chairs for everyone! Since we've got more space, I will happily assign a PTE to anyone who attended both the first and second lectures. If you have been coming to class this week and still need a PTE, please send me an email with your UID. Any and all auditors are also now officially welcome.
September 29: A couple of students have requested additional (optional) background reading. I've posted some links to notes that you might find useful. You can find these in the class schedule below.

Course Overview

This course will provide a broad overview of the theoretical foundations underlying common machine learning algorithms. We will cover the following topics, though not necessarily in this order:

An introduction to the Probably Approximately Correct (PAC) model of learning
Uniform convergence bounds
The growth function and VC dimension
Online learning, mistake bounds, the Perceptron algorithm, and Winnow
Learning from expert advice, no-regret learning, and the randomized Weighted Majority algorithm
Online convex optimization
Ensemble methods and boosting
Support vector machines and kernels
Connections to game theory including learning in repeated games
Current research in learning theory

This course has no official prerequisites, but basic knowledge of probability and some level of comfort reading and writing mathematical proofs will be extremely useful. This course will count for credit for the AI major and minor field requirements for computer science Ph.D. students.

There is no required text book for this course. Lecture notes and papers of interest will be made available online. Additionally, some optional books that you might find useful are listed below.

Office hours for this course are by appointment only. Email Jenn to schedule an appointment.

Schedule & Lecture Notes

This schedule is tentative and likely to change as we get further into the material. Scribe notes for each lecture will be posted here as soon as they have been approved.

DISCLAIMER: While I have tried to carefully read over all of the scribe notes for this class, it is entirely possible that some of them contain errors. If you think you see an error, please email me and I will check it out.

Part 1: Classification and the PAC Model

September 27: Introduction (lecture notes, slides)
Overview of course logistics. General introduction to machine learning, goals of learning theory, basic definitions, the consistency model, generalizability, and the start of the PAC model.
See also lecture 1 and lecture 2 from Rob Schapire's course at Princeton, or Chapter 1 of Kearns and Vazirani.
September 29: The PAC Model (these lecture notes were buggy; try the updated version for 2011)
The PAC learning model, learning thresholds, Occam's razor, and some general learning bounds.
See also lecture 3 and lecture 4 from Rob Schapire's course, the first half of lecture 2 from Mehryar Mohri's course at NYU, or Chapter 2 of Kearns and Vazirani. For a general probability review, try the probability review from David Blei's course at Princeton, or Appendix C.2 and C.3 of Cormen, Leiserson, Rivest, and Stein's Introduction to Algorithms.
October 4: The Unrealizable Case (lecture notes)
Continuation of Occam's razor bounds, relationship between PAC and Consistency models. Learning when there is no perfect target function, uniform convergence, Hoeffding's inequality, and bounds for the unrealizable setting.
See also the second half of lecture 2 from Mehryar Mohri's course. For a full proof of Hoeffding's Inequality, and an explanation of how it cleverly makes use of the Markov Inequality, see lecture 8 of Rob Schapire's course, the appendix of Kearns and Vazirani, or the introduction to this survey paper on concentration inequalities by Boucheron, Bousquet, and Lugosi. (We'll also talk about this a little more next time...)
October 6: Infinite Function Classes (lecture notes)
The growth function, VC dimension, Sauer's lemma.
See also Andrew Ng's lecture notes on learning theory, lecture 5 and lecture 6 of Rob Schapire's course, or Chapter 3 of Kearns and Vazirani.
October 11: Infinite Function Classes, Part 2 (lecture notes)
VC dimension lower bound. Overview of course projects.
See also lecture 7 of Rob Schapire's course, or Chapter 3 of Kearns and Vazirani. To see all of the details we skipped over in class, including the complete proofs of the realizable and unrealizable VC bounds, see Chapter 4 of Anthony and Bartlett.

Part 2: Online Learning in Adversarial Settings

October 13: Online Learning (lecture notes)
Mistake bounds, the halving algorithm, a lower bound, the Perceptron algorithm.
See also lecture 1 of Sham Kakade and Ambuj Tewari's course at TTI-C.
October 18: Perceptron Algorithm (lecture notes)
Finishing up the Perceptron, comparison of normalized and unnormalized versions, dependence on number of features.
See also the really nice proof of the Perceptron mistake bound from Andrew Ng's class. He also has a nice description of margins (Sections 1-3 only... we'll come back to the rest).
October 20: Multiplicative Weights (lecture notes)
Winnow. Start regret bounds and the expert advice model.
See also Avrim Blum's online learning survey.
October 25: Regret Minimization and Weighted Majority (lecture notes)
Guest lecture by Jake Abernethy from UC Berkeley.
Regret minimization, the expert advice or hedge setting, proof of the minimax theorem, Randomized Weighted Majority.
For the proof of the minimax theorem, see Chapter 4.4.2 of Algorithmic Game Theory. For more about no-regret learning in general, check out Prediction, Learning, and Games by Nicolò Cesa-Bianchi and Gábor Lugosi.
October 27: Online Convex Optimization (lecture notes)
Guest lecture by Jake Abernethy from UC Berkeley.
Online gradient descent, relationship to the Perceptron algorithm, online convex optimization.
See also the original paper by Martin Zinkevich or these lecture notes from Peter Bartlett's class.
November 1: Follow the Regularized Leader (lecture notes)
Regularization, alternate proof of the Randomized Weighted Majority regret bound.
See also these notes from Adam Kalai's course on Game Theory and Computer Science at Georgia Tech or Sasha Rahklin's lectures notes on online learning.
November 3: Online Learning Loose Ends (lecture notes)
Finishing the alternate RWM regret bound, alternate goals.
See also our paper Regret to the Best Vs. Regret to the Average.

Part 3: Some Practical Successes of Learning Theory

November 8: Boosting (lecture notes)
Weak vs. strong learning, Adaboost, bounds on training error.
See also lecture 9 of Rob Schapire's course, or Rob's overview of boosting (from 2003).
November 10: Generalization Error of Adaboost (lecture notes)
Existence of large margin thresholds via the minimax theorem, boosting the margin.
See also lecture 10 and lecture 11 from Rob Schapire's course. For more on the relationship between boosting and the minimax theorem (and regret minimization too), see Game Theory, On-line Prediction, and Boosting by Freund and Schapire.
November 15: Support Vector Machines (lecture notes)
Maximizing the margin via optimization, Lagrange duality, KKT conditions.
See also Andrew Ng's lecture notes. For more on Lagrange duality and convex optimization, check out Convex Optimization by Stephen Boyd and UCLA's Lieven Vandenberghe, available online for download.
November 17: SVMs and Kernels (lecture notes)
Deriving the dual, the non-separable case, kernel functions.
See also Andrew Ng's lecture notes.

Part 4: New Topics and Project Presentations

November 22: Learning from Labeled and Unlabeled Data (lecture notes)
Semi-supervised learning (brief overview only), active learning.
See also Sanjoy Dasgupta's survey paper on the theory behind active learning, or for a more applied point of view, the very comprehensive active learning survey by Burr Settles. If you're extra curious, you could also check out the notes from Steve Hanneke's active learning mini-course at CMU.
November 24: NO CLASS
Please spend the extra time working on your final projects!
November 29: Project Presentations
December 1: Project Presentations

Final projects will be due the second week of December.

Additional Reading

Students may find the following (completely optional) textbooks useful for exploring some of the topics that we cover in more depth:

An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani
Neural Network Learning: Theoretical Foundations by Martin Anthony and Peter Bartlett
Prediction, Learning, and Games by Nicolò Cesa-Bianchi and Gábor Lugosi

You might also find it helpful to look through lecture notes and slides from similar courses that have been offered at other universities such as Avrim Blum's course at CMU or Rob Schapire's course at Princeton. Links to specific notes from other courses that are especially relevant to particular lectures are included in the schedule above.

Homework Policies and Grading

Coursework will involve two written homework assignments and one open-ended final project. Course grades will be based 40% on homeworks (20% each), 20% on class participation (including preparation of scribe notes), and 40% on the final project (which will include a written report and short presentation in class). There will be no in-class exams.

Homework assignments are available here:

Problem Set 1 - due October 27
Problem Set 2 - due November 29

Collaboration on the homework assignments is encouraged! Students are free to discuss the homework problems with anyone in the class under the following conditions:

Each student must write down his or her solutions independently, and must understand the solutions he or she writes down. Talking over solutions is fine, but reading or copying another student's answers is not acceptable!
Each student must write a list of all of his or her collaborators at the top of each assignment.

Start your assignments early so that they will be completed on time! Assignments submitted up to 24 hours late will be penalized 25%. No assignments will be accepted for credit more than 24 hours after the deadline. Final projects and scribe notes must be submitted on time.

Instructions for Preparing Scribe Notes

Since there is no single ideal text book for this course, students will take turns preparing "scribe notes" on each lecture that will be made available online. Your notes should be written in complete sentences, and should contain enough detail to be understandable even to students who were unable to attend class. Please be clear! If you have questions about the material that was covered, get in touch with me and we can arrange a time to chat.

If you are preparing notes for a Monday lecture, your first draft is due on Wednesday (two days later) at 2pm. I will then give you feedback and you will have until the following Wednesday to revise. If you are preparing notes for a Wednesday lecture, your first draft is due on Friday (two days later) at 2pm. I will give you feedback and you will have until the following Friday to revise. This schedule is tight, but the goal is to make the notes available as quickly as possible.

Notes must be prepared using this LaTeX template. Here is the source file for the September 27 lecture notes to use as an example. If you are new to LaTeX, many good tutorials can be found online. A good reference for many common symbols is available here, or try Detexify, a cool machine learning tool that automatically finds the symbols you need. If you are a Mac user and would like to install LaTeX on your own machine, I highly recommend the bundled version of TeXShop and MacTeX available here. Windows users can try MiKTeX. LaTeX is also installed on department-run linux machines.

Both the first draft and the revisions should be submitted to me by email. Please submit both a PDF and your tex source. There should be one submission per team.

Final Projects

These final project guidelines were distributed in class on October 11. The guidelines provide details on the types of projects you can pursue, as well as a list of key dates and milestones. Be sure to read the guidelines carefully and make a note of important deadlines!

Below are some suggestions for topics that could be explored in more detail for the final project. All of these would make good literature synthesis topics, and some might lead to ideas for research projects. Of course you are welcome to choose a topic that is not on the list.

The Multi-Armed Bandit Problem:

Abernethy, Hazan, and Rakhlin. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. COLT, 2008.
Even-Dar, Mannor, and Mansour. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. JMLR, 2006.
Auer, Cesa-Bianchi, Freund, and Schapire. The nonstochastic multiarmed bandit problem. SICOMP, 2002.
Auer, Cesa-Bianchi, and Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 2002.

Domain Adaptation:

Ben-David, Blitzer, Crammer, Kulesza, Pereira, and Vaughan. A Theory of Learning from Different Domains. Machine Learning, 2010.
Mansour, Mohri, and Rostamizadeh. Domain adaptation: Learning bounds and algorithms. COLT, 2009.

Privacy-Preserving Machine Learning:

Chaudhuri, Monteleoni, Sarwate. Differentially Private Empirical Risk Minimization. In submission, 2010.
Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith. What Can We Learn Privately? SICOMP, to appear.
Blum, Ligett, and Roth. A Learning Theory Approach to Non-Interactive Database Privacy. STOC, 2008.

Learning from a Crowd:

Dekel and Shamir. Good Learners for Evil Teachers. ICML, 2009.
Dekel and Shamir. Vox Populi: Collecting High-Quality Labels from a Crowd. COLT, 2009.

Reinforcement Learning:

Li, Littman, and Walsh. Knows What It Knows: A Framework for Self-Aware Learning. ICML, 2008.
Kearns and Singh. Near-Optimal Reinforcement Learning in Polynomial Time. Machine Learning, 2002.

Machine Learning for Finance:

Hazan and Kale. On Stochastic and Worst-case Models for Investing. NIPS, 2009.
Blum and Kalai. Universal Portfolios With and Without Transaction Costs. Machine Learning, 1999.
Helmbold, Schapire, Singer, and Warmuth. On-line portfolio selection using multiplicative updates. Mathematical Finance, 1998.

A Different Twist on Learning for Finance:

Ganchev, Kearns, Nevmyvaka, and Vaughan. Censored Exploration and the Dark Pool Problem. UAI, 2009.
Agarwal, Bartlett, and Dama. Optimal Allocation Strategies for the Dark Pool Problem. AISTATS, 2009.

New Twists on Clustering:

Balcan, Blum, and Gupta. Approximate Clustering without the Approximation. SODA, 2009.
Balcan, Blum, and Vempala. A Discriminative Framework for Clustering via Similarity Functions. STOC, 2008.

Try checking out recent publications from COLT (2010, 2009) or NIPS (2010, 2009) for additional ideas.