Weekly log: December 12, 2009

This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.

Lab-related projects

  • As I had written I might do in an earlier post, I reimplemented the convolutional net for COIL-100 based on Torch, by “filling the easy parts/blanks” in code Hossein Mobahi had sent me. I learned some Lua along the way. I’m quite surprised (but then should I be?) to see that it seems to work for 30 objects: it reached ~30% before I killed it to try on 100 objects. ~9 hours later, tho, error is still around 100%… I’ll need to relaunch that test for on a longer period.
  • Last week, I implemented a very (very) lilliputian version of the similarity-based cost, with a custom one-neuron-followed-by-tanh “net”. Following Hossein suggestion, I had used two classes based on overlapping Gaussians. I first used only two “incorrect” (off-center and very misleading) points for supervised training, then “corrected” using the similarity-based cost. It indeed reduced the error by ~3% (35% vs 38%, mean over many, many tests).

Misc

  • I went to a Montreal Python meeting on Wednesday, and someone named Jeremy Barnes happened to be (very quickly) presenting a library he’s coding for machine learning tasks, which is using a Python-C bridge (boost::python).

Readings

I’m trying to read more on recurrent networks. To start off, I’ve finished reading:

  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representation by error propagation,” in Parallel Distributed Processing, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge MA: MIT Press, Bradford Books, vol. 1, 1986, pp. 318-362

and I’ve read about the first half of this paper:

  • J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, pp. 179-211, 1990

For a class project, I’ve read a bit of:

  • P. Simard, D. Steinkraus, J. C. Platt, “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”. ICDAR 2003: 958-962

Course work

  • I’ve implemented the elastic distortion method proposed in Simard et al. 2003 (see above) for the 4th assignment for the ML class. So using their settings (800 hidden units, about same learning rate schedule) I seem to obtain ~1.1% validation error on MNIST. I still haven’t reached the minimum, so I’ll maybe let it continue (requires very few effort on my part, at this point!).

Plan for next few weeks

  • I need to do (none of it done yet) the last assignement for my NLP class, which involves classifying documents according to author (should be short, which is the intention of the teacher).
  • I’ll continue reading on recurrent nets for a while, when I have time.
  • I need to relaunch the Torch-based test for 100 objects, make sure what I’m doing is all right, and if it indeed works, figure out what’s the difference with my Python/PyLearn implementation.

Log for last few weeks: November 27, 2009

This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.

Lab-related projects

  • Over the last few weeks, I’ve been quite absorbed in trying to get the embedding cost to work. I’ve tested numerous variations of hyperparameters, optimization loop configuration, I tried different costs (the one they used is linear, I tried quadratic), etc.
    • Through Yoshua, I asked the authors for help with the parameters, but even after their replies it won’t work. This seems to point to a bug in my code, obviously. Code which I’ve reviewed quite a few times, though…
  • Together with James, I’ve tried debugging an implementation I did for a toy problem, with a similar architecture. Even in such a simple case, we found the learning rate to be quite hard to tune correctly, and in no case did the additional cost help. That’s quite strange, as conceptually it should help.
    • Still, James gave me interesting ideas to reuse when debugging neural nets in the future.
  • One thing I might try when I get the time is to reimplement the toy problem with Torch. After taking a quick look in the doc, it seems quite easy to set up this kind of architecture/cost, so given the time this is ending up taking, it might be worthwhile a shot (to see if the result is the same).

Course work

  • We’ve handed in our second assignement in the NLP class, the one where we needed to translate terms based on relevant Wikipedia pages.

Plan for next few weeks

  • This weekend and the best part of next week I’ll devote to studying for my NLP exam and the presentation/report for my project in ML.
    • If some time remains, I’ll try to get as far as I can in the last assignment for the ML class, to get this off the TODO list as soon as possible.
  • After that point, I’ll resume working on trying to get the embedding cost to work. I need to resume readings, too, which have been suffering last two or so weeks.

Weekly log: November 6, 2009

This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.

Lab-related projects

  • Building on code for configuring a convolutional deep net I made last week (and code to load COIL-100), I implemented a network I think matches the one described in Mohabi et al. 2009 (Deep learning from temporal coherence in video), up to a few details (notably, subsampling layers work differently — no weight/bias, just max pooling, as that’s what’s readily available).
  • After having removed the most obvious bugs (made it run without errors, basically), I’ve tried running it, but training is pretty slow and my metrics (average of supervised training cost, average of the additional unsupervised cost, classification error) don’t seem to be showing signs of progress (after 15 epochs, whether or not I use the additional unsupervised cost).
    • Yet my base code (stack without unsupervised cost) seemed to work with MNIST (I tested very briefly to see where average cost was going — I think I’ll a more thorough run).
    • I need to dig deeper, tweak hyperparameters (notably learning rate — although I tried) and try other methods of “debugging” the network.

Readings

My readings this week were entirely conditioned by my project. I wanted to have an idea of what had been done to build convolutional networks with RBM layers.

  • Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, R. E. Howard, Wayne E. Hubbard, Lawrence D. Jackel: Handwritten Digit Recognition with a Back-Propagation Network. NIPS 1989: 396-404
  • Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML 2009: 77
  • Desjardins, G., & Bengio, Y. (2008).  Empirical evaluation of convolutional RBMs for vision (Technical Report).

The Lee et al. paper made me realize that there’s much more to using a convolutional architecture with RBMs than simply removing/sharing weights, especially with regards to the fact that they’re generative (I guess it’d be possible to downsample non-probabilistically (vs. their method) and train greedily/layer-wise anyway, but then we lose the nice generative/”content-addressable” abilities it provides (?)).

Course work

  • I began work with my partner on a second assignment in the NLP course, in which we have to translate terms based on Wikipedia input, with some preprocessing already done for us.

Plan for next week (outside course work)

  • Debug my network.
  • If that works, then maybe test a few other variations on this setup and start working on the convolutional DAA part.

Weekly study log: October 30, 2009

This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.

Lab-related projects

  • Asking questions to James, Frédéric and Pascal Lamblin, I managed to implement a stack of convolution/subsampling layers, based on lecun98.py (code on Assembla). I configured it based on similar parameters found in LeNet5 (see “Readings”) and fed it MNIST data. I think it works (training error is going down), but I need to make further tests, of course.
  • I continued work on code by James to load the COIL-100 dataset, to make it suit my need.
  • I experimented with matplotlib to explore options to visualize weight matrices.

Readings

  • Ruslan Salakhutdinov, Geoffrey E. Hinton: Semantic hashing. Int. J. Approx. Reasoning 50(7): 969-978 (2009)
  • Larochelle, Hugo and Erhan, Dumitru and Vincent, Pascal, Deep Learning using Robust Interdependent Codes (2009)
  • I’ve read a few bytes of this paper, mostly to get sensible parameters to configure my layers:
    • Yann LeCun, Patrick Haffner, Léon Bottou, Yoshua Bengio: Object Recognition with Gradient-Based Learning. Shape, Contour and Grouping in Computer Vision 1999: 319-

Plan for next week (outside course work)

  • I need to implement concrete tests for my convolutional stack, see if I can make it classify test data correctly, and actually integrate the Weston2008-like cost function.
  • Among other things, I want to experiment with visualizing the filters, weights (and maybe the result of convolutions) as I’ve seen in a couple of papers.

Weekly study log (October 9, 2009)

This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.

Readings

  • As I was telling James earlier today, I’ve read the basic Theano tutorial, parts of the advanced one (just to get a feeling of how it works) and I’ve been trying to read some actual PyLearn code (and code suggested by James) to get a feeling of how it’s used.
  • I’ve been re-reading the 3 main papers I mentioned last week (semi-supervised embedding, sparse encoding symmetric machines).
  • I had read a bit on generalities concerning Echo State Networks, mostly the Schorlarpedia description, so I could follow Razvan’s talk easier.
  • I’ve started reading this paper suggested by Yoshua and James, but haven’t got much further than the second page. It’s in the plans for next week.
    • Koray Kavukcuoglu, Marc’Aurelio Ranzato, Rob Fergus and Yann LeCun: Learning Invariant Features through Topographic Filter Maps, Proc. International Conference on Computer Vision and Pattern Recognition (CVPR’09), IEEE, 2009

General course/curriculum work

  • Finalized and posted my project proposal (and homework) for the ML class
  • Finished all the procedures related to scholarship requests (the whole thing takes an eternity, and eternity is kind of long towards the end).
  • Continued work on my NLP class assignment. The bulk of the algorithmic novelty we wanted to try is there, we now need to perform actual clean experiments and fine-tune how we combine different models.

Plan for next week

  • Try to write some actual code with Theano (hopefully useful for my term project), and probably ask Guillaume (?) about convolutional networks code so I can see how it’s done.
  • Continue work on NLP assignment.
  • Read that Kavukcuoglu et al. paper.
  • Do some research on dynamical systems. (I haven’t had much time to actually dig really deep in anything but the papers mentionned here (and then again) since the term began, so I’ll be shooting for a good overview of concepts that come back often in what I’ve heard so far.)
    • Razvan pointed me towards (Herbert Jaeger)’s notes from a course he took on ML, wherein are to be found relevant nuggets of dynamical systems knowledge