2010-10-12

Assignment One: Multiple Choices

This assignment involved multiple choices.
  1. R or whatever other language you want: your choice.
  2. Work alone or in pairs, choose a partner: your choice.
And what should you do?  One of the following:
  • Use a k-means clustering on the digits in the MNIST dataset, label the clusters according to their most common member, and measure classification performance on the training and testing sets. (Also show the cluster centres, when not using kernel trick).
  • Train an SVM to classify the training set (ten SVMs, one SVM for each digit class, with one of them outputting, e.g., +1 for "is a 3" and -1 for "not a 3"), and measure performance on the training and testing sets.
Your choice!


For both the SVM and the k-means above, you should do it twice: once with no kernel trick, and once with the kernel trick.  When using the kernel trick, what kernel(s) should you try?  Your choice!


Each team should turn in their report either via email to barak+cs401@cs.nuim.ie or on paper: your choice.


Due: before class, Mon 1-Nov-2010.

7 comments:

  1. Is the linear kernel {x, x'} the equivalent of using no kernel?

    ReplyDelete
  2. Correct: by "linear kernel" I meant the kernel that results from the mapping φ from input space to feature space being the identity, φ(x)=x, so K(x,x')=φ(x)·φ(x')=x·x'. (I suppose technically that is a bi-linear kernel function, as it is linear in each argument individually but not in both. But "linear kernel" is the terminology in the field.)

    ReplyDelete
  3. Post edited to include due date.

    ReplyDelete
  4. A few people have asked what your report should contain.

    (a) Since I have to read them, please don't make them 600 pages long.

    (b) It should show me what you did and how you did it and provide enough information for me to figure out whether and what you did wrongl if indeed you did something wrong.

    (c) Writing reports is extremely common in industry, and your boss doesn't tell you exactly what they should contain. But you're still judged by whether it is a good report. So enjoy!

    (d) Seriously: write what you'd want to read if you were reading someone else's report and trying to understand what they did and how well it worked.

    ReplyDelete
  5. Yes, you can use a language other than R, for example, Python or numpy in specific. But please don't use Java or Fortran of C++ and then come complain about how you didn't get much done because you had to write so much boilerplate and worry about memory management and there were no appropriate libraries and visualising anything was a lot of work so you didn't make and plots and blah blah blah.

    ReplyDelete
  6. Bit late to be posting this, but it's pretty cool: http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=117&Itemid=173

    ReplyDelete