- R or whatever other language you want: your choice.
- Work alone or in pairs, choose a partner: your choice.
- Use a k-means clustering on the digits in the MNIST dataset, label the clusters according to their most common member, and measure classification performance on the training and testing sets. (Also show the cluster centres, when not using kernel trick).
- Train an SVM to classify the training set (ten SVMs, one SVM for each digit class, with one of them outputting, e.g., +1 for "is a 3" and -1 for "not a 3"), and measure performance on the training and testing sets.
For both the SVM and the k-means above, you should do it twice: once with no kernel trick, and once with the kernel trick. When using the kernel trick, what kernel(s) should you try? Your choice!
Each team should turn in their report either via email to barak+cs401@cs.nuim.ie or on paper: your choice.
Due: before class, Mon 1-Nov-2010.
Is the linear kernel {x, x'} the equivalent of using no kernel?
ReplyDeleteCorrect: by "linear kernel" I meant the kernel that results from the mapping φ from input space to feature space being the identity, φ(x)=x, so K(x,x')=φ(x)·φ(x')=x·x'. (I suppose technically that is a bi-linear kernel function, as it is linear in each argument individually but not in both. But "linear kernel" is the terminology in the field.)
ReplyDeleteWhen is this due?
ReplyDeletePost edited to include due date.
ReplyDeleteA few people have asked what your report should contain.
ReplyDelete(a) Since I have to read them, please don't make them 600 pages long.
(b) It should show me what you did and how you did it and provide enough information for me to figure out whether and what you did wrongl if indeed you did something wrong.
(c) Writing reports is extremely common in industry, and your boss doesn't tell you exactly what they should contain. But you're still judged by whether it is a good report. So enjoy!
(d) Seriously: write what you'd want to read if you were reading someone else's report and trying to understand what they did and how well it worked.
Yes, you can use a language other than R, for example, Python or numpy in specific. But please don't use Java or Fortran of C++ and then come complain about how you didn't get much done because you had to write so much boilerplate and worry about memory management and there were no appropriate libraries and visualising anything was a lot of work so you didn't make and plots and blah blah blah.
ReplyDeleteBit late to be posting this, but it's pretty cool: http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=117&Itemid=173
ReplyDelete