Weekly log: November 6, 2009
This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.
Lab-related projects
- Building on code for configuring a convolutional deep net I made last week (and code to load COIL-100), I implemented a network I think matches the one described in Mohabi et al. 2009 (Deep learning from temporal coherence in video), up to a few details (notably, subsampling layers work differently — no weight/bias, just max pooling, as that’s what’s readily available).
- After having removed the most obvious bugs (made it run without errors, basically), I’ve tried running it, but training is pretty slow and my metrics (average of supervised training cost, average of the additional unsupervised cost, classification error) don’t seem to be showing signs of progress (after 15 epochs, whether or not I use the additional unsupervised cost).
- Yet my base code (stack without unsupervised cost) seemed to work with MNIST (I tested very briefly to see where average cost was going — I think I’ll a more thorough run).
- I need to dig deeper, tweak hyperparameters (notably learning rate — although I tried) and try other methods of “debugging” the network.
Readings
My readings this week were entirely conditioned by my project. I wanted to have an idea of what had been done to build convolutional networks with RBM layers.
- Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, R. E. Howard, Wayne E. Hubbard, Lawrence D. Jackel: Handwritten Digit Recognition with a Back-Propagation Network. NIPS 1989: 396-404
- Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML 2009: 77
- Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision (Technical Report).
The Lee et al. paper made me realize that there’s much more to using a convolutional architecture with RBMs than simply removing/sharing weights, especially with regards to the fact that they’re generative (I guess it’d be possible to downsample non-probabilistically (vs. their method) and train greedily/layer-wise anyway, but then we lose the nice generative/”content-addressable” abilities it provides (?)).
Course work
- I began work with my partner on a second assignment in the NLP course, in which we have to translate terms based on Wikipedia input, with some preprocessing already done for us.
Plan for next week (outside course work)
- Debug my network.
- If that works, then maybe test a few other variations on this setup and start working on the convolutional DAA part.

Leave a comment