Log for last few weeks: November 27, 2009
This is a “weekly report” for the lab I study in, mostly intended for other lab members. See the first one for further explanations.
Lab-related projects
- Over the last few weeks, I’ve been quite absorbed in trying to get the embedding cost to work. I’ve tested numerous variations of hyperparameters, optimization loop configuration, I tried different costs (the one they used is linear, I tried quadratic), etc.
- Through Yoshua, I asked the authors for help with the parameters, but even after their replies it won’t work. This seems to point to a bug in my code, obviously. Code which I’ve reviewed quite a few times, though…
- Together with James, I’ve tried debugging an implementation I did for a toy problem, with a similar architecture. Even in such a simple case, we found the learning rate to be quite hard to tune correctly, and in no case did the additional cost help. That’s quite strange, as conceptually it should help.
- Still, James gave me interesting ideas to reuse when debugging neural nets in the future.
- One thing I might try when I get the time is to reimplement the toy problem with Torch. After taking a quick look in the doc, it seems quite easy to set up this kind of architecture/cost, so given the time this is ending up taking, it might be worthwhile a shot (to see if the result is the same).
Course work
- We’ve handed in our second assignement in the NLP class, the one where we needed to translate terms based on relevant Wikipedia pages.
Plan for next few weeks
- This weekend and the best part of next week I’ll devote to studying for my NLP exam and the presentation/report for my project in ML.
- If some time remains, I’ll try to get as far as I can in the last assignment for the ML class, to get this off the TODO list as soon as possible.
- After that point, I’ll resume working on trying to get the embedding cost to work. I need to resume readings, too, which have been suffering last two or so weeks.

Hossein Mobahi:
I am sorry that bug is not resolved yet. Maybe you want to use a single perceptron instead of a CNN for the toy problem. The toy problem is linearly separable, so using a single perceptron neuron is enough and it should simplify monitoring the weights and hopefully point you to the problem.
28 November 2009, 3:33 amHossein Mobahi:
By linearly separable, I did not mean perfect separation… I meant that by minimizing the squared error you should end up getting a good solution.
28 November 2009, 7:42 amFrancois:
Hi Hossein, I’m surprised you found out about my blog so quick!
The perceptron/really scaled down problem is another good idea, and I think I’ll try it too when I get time to resume working on this.
There were two factors motivating using a CNN for the toy problem. The first is that I wasn’t really sure how simple the architecture could get until the embedding cost became irrelevant/inapplicable. (Yet if it’s supposed to work at such a small scale, then why not? My goal is also to boil it down to its simplest element, to understand what’s going on.)
(The second factor is that, at any rate, I needed some toy problem to demonstrate the approach works for the term project I’m working on this for.)
Thanks again for the suggestions.
28 November 2009, 9:04 am