Blogging the machine ... the machine learning course CS401 NUIM Computer Science Fall 2010, that is.
2010-11-16
2010-10-12
Assignment One: Multiple Choices
This assignment involved multiple choices.
For both the SVM and the k-means above, you should do it twice: once with no kernel trick, and once with the kernel trick. When using the kernel trick, what kernel(s) should you try? Your choice!
Each team should turn in their report either via email to barak+cs401@cs.nuim.ie or on paper: your choice.
Due: before class, Mon 1-Nov-2010.
- R or whatever other language you want: your choice.
- Work alone or in pairs, choose a partner: your choice.
- Use a k-means clustering on the digits in the MNIST dataset, label the clusters according to their most common member, and measure classification performance on the training and testing sets. (Also show the cluster centres, when not using kernel trick).
- Train an SVM to classify the training set (ten SVMs, one SVM for each digit class, with one of them outputting, e.g., +1 for "is a 3" and -1 for "not a 3"), and measure performance on the training and testing sets.
For both the SVM and the k-means above, you should do it twice: once with no kernel trick, and once with the kernel trick. When using the kernel trick, what kernel(s) should you try? Your choice!
Each team should turn in their report either via email to barak+cs401@cs.nuim.ie or on paper: your choice.
Due: before class, Mon 1-Nov-2010.
R Reference Materials
This post contains pointers to R reference materials, and will be updated.
- Google search for R Tutorial
- An Introduction to R (pdf), from the central R web site, something between a tutorial and a manual
- Jo Hardin's R Tutorial (pdf), shorter
- Notes from Thomas Lumley's two-day short course in R
- Video R Tutorial
- Short html R Tutorial with implausibly poor colour choice
- Short well-organised html R Tutorial with that National Geographic look
- Yet another html R Tutorial, one big page with nice graphics and examples
2010-10-10
Learn R Because it is Hot
R is so hot right now. |
(Update: machine learning is so hot.)
2010-10-08
Max vs Min
Oops! In the lecture of 5-Oct-2010 on finding the maximum margin hyperplane, I wrote max ||w||2 where I should have written min ||w||2. (Thanks to Thomas Whelan for spotting it.)
2010-10-05
Support Vector Machines
Wikipedia has a reasonably good entry on Support Vector Machines. The original paper proposing the technique is also a good resource, quite readable with good motivation: Corinna Cortes and Vladimir N. Vapnik (1995, Support-Vector Networks, Machine Learning 20).
2010-10-04
Lecture Notes
If you volunteer to take good notes in class and send them to me for posting on this blog, you will be rewarded with excellent karma and be bathed in warm feelings of good fellowship from the tips of your toes to the top of your head.
(And also extra credit.)
(And also extra credit.)
2010-10-01
Machine Learning Competition: the Hearst Challenge
New Machine Learning competition with a prize of $25,000 for the system best able to predict magazine sales: the Hearst Challenge. (If you win, I'll also give you extra credit for this course. In fact, if you are part of a team that puts together a serious entry, I'll give you extra credit for this course.)
2010-09-29
Machine Learning Textbooks: Excellent and Online
This is a list of textbooks about machine learning which are (a) really good, and (b) free on the web.
(Updated 13-Oct-2010)
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (deep coverage of an important subset of the field)
- Information Theory, Inference, and Learning Algorithms, by David MacKay (very mathematical approach, excellent but hard core)
- Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto (dated, but the bible of reinforcement learning)
- Bayesian Reasoning and Machine Learning, by David Barber (incomplete coverage of the field, but solid and accessible and includes Octave/Matlab code)
- Machine Learning by Simon Rogers (I will be using some of the chapters as lecture notes)
- Introduction to Machine Learning: Draft of Incomplete Notes by Nils J. Nilsson (excellent writer; his earlier book on AI saved my bacon; slightly dated)
(Updated 13-Oct-2010)
2010-09-28
Digits Dataset
The MNIST dataset of labelled handwritten digits: http://yann.lecun.com/exdb/mnist/. (Thanks to Yann LeCun and Corrina Cortes for cleaning the dataset and making it publicly available.)
Welcome to NUIM CS 401: The Blog!
Please comment copiously. Questions, answers, wild speculations and ideas: all welcome.
Guest posts upon request.
Guest posts upon request.
Subscribe to:
Posts (Atom)