Assignment One: Multiple Choices

This assignment involved multiple choices.
  1. R or whatever other language you want: your choice.
  2. Work alone or in pairs, choose a partner: your choice.
And what should you do?  One of the following:
  • Use a k-means clustering on the digits in the MNIST dataset, label the clusters according to their most common member, and measure classification performance on the training and testing sets. (Also show the cluster centres, when not using kernel trick).
  • Train an SVM to classify the training set (ten SVMs, one SVM for each digit class, with one of them outputting, e.g., +1 for "is a 3" and -1 for "not a 3"), and measure performance on the training and testing sets.
Your choice!

For both the SVM and the k-means above, you should do it twice: once with no kernel trick, and once with the kernel trick.  When using the kernel trick, what kernel(s) should you try?  Your choice!

Each team should turn in their report either via email to barak+cs401@cs.nuim.ie or on paper: your choice.

Due: before class, Mon 1-Nov-2010.

R Reference Materials

This post contains pointers to R reference materials, and will be updated.

Lecture Notes

More lecture notes, thanks to Paul Murray (!)
notes3 (w/ annotations)


Learn R Because it is Hot

R is so hot right now.
Functional programming Schemers like myself have spent years whining about how just because something like Cobol or Fortran or C or C++ or Java or perl is "hot" or "standard" or "used by highly profitable companies" or "easy to get a job if you know" does not mean it is actually good or worth learning. Well now the shoe's on the other foot, suckers: R is Hot!
(Update: machine learning is so hot.)


Max vs Min

Oops!  In the lecture of 5-Oct-2010 on finding the maximum margin hyperplane, I wrote max ||w||2 where I should have written min ||w||2. (Thanks to Thomas Whelan for spotting it.)


Support Vector Machines

Wikipedia has a reasonably good entry on Support Vector Machines. The original paper proposing the technique is also a good resource, quite readable with good motivation: Corinna Cortes and Vladimir N. Vapnik (1995, Support-Vector Networks, Machine Learning 20).


Lecture Notes

Here are some lecture notes, thanks to Paul Murray.
notes1 (w/ annotations)
notes2 (w/ annotations)

Lecture Notes

If you volunteer to take good notes in class and send them to me for posting on this blog, you will be rewarded with excellent karma and be bathed in warm feelings of good fellowship from the tips of your toes to the top of your head.

(And also extra credit.)


Machine Learning Competition: the Hearst Challenge

New Machine Learning competition with a prize of $25,000 for the system best able to predict magazine sales: the Hearst Challenge.  (If you win, I'll also give you extra credit for this course.  In fact, if you are part of a team that puts together a serious entry, I'll give you extra credit for this course.)


Machine Learning Textbooks: Excellent and Online

This is a list of textbooks about machine learning which are (a) really good, and (b) free on the web.
If you know of others (criteria: of general Machine Learning interest, not something highly specific like Gaussian Processes) post a comment and I'll add above.

(Updated 13-Oct-2010)


Digits Dataset

The MNIST dataset of labelled handwritten digits: http://yann.lecun.com/exdb/mnist/.  (Thanks to Yann LeCun and Corrina Cortes for cleaning the dataset and making it publicly available.)


Install and play with (i.e., learn) R: http://www.r-project.org

Welcome to NUIM CS 401: The Blog!

Please comment copiously.  Questions, answers, wild speculations and ideas: all welcome.

Guest posts upon request.