Small updates to Javascript speed reading app

Just a note concerning two small new features I added to my Javascript speed reading (RSVP) app:

  • You can now change the speed using your keyboard’s up/down arrows keys
  • Text and background colors may now be selected using a color picker (based on JsColor)

These were features some users asked for either on the blog or in the comment form in the app. Thanks for the feedback!

Simple Javascript memory game

Here’s a little memory game I just finished, using jQuery. It’s very bare bones, and I might add features to it, but it works, doesn’t have a bunch of ads floating around (like most do on the Web), and the board size can be changed (up to 60 total cards for the moment).

For the context: when we launched Clusterify, one of the early projects I proposed was a simple “matching pairs” game. Some almost-complete code I wrote up has been sitting on my computer ever since, just needing a few last fixes, and the addition of actual pictures. So I did those last fixes, adapted stock photos for it, and now here’s the game.

Changelog

  • 2010.02.22: as per a commenter’s (Jebadiah’s) suggestion, added a score and a timer. Also, images are now shuffled so the last ones (cats and birds) show up in the smaller grid.

Tests with basic multilayer perceptron

I’ve recently begun a MSc in computer science a Université de Montréal. I’ll be working in the LISA lab, concerned with machine learning. I’ll be producing weekly reports and I’ll be using this blog as a conduit for them, as I was doing a few years ago for UPIR (see first posts of the blog). Being reports, the audience I’ll have in mind is people already acquainted with the machine learning concepts involved.

*  *  *

I’ve been reading a bit during summer in the main reference book we’re using for our machine learning course (Pattern recognition and machine learning, by Christopher Bishop), yet I didn’t actually implement anything. Yoshua Bengio, my thesis supervisor, therefore suggested I do some experiments, starting off with multilayer perceptrons (MLP) and backpropagation, a very common approach in machine learning.

I’ve basically implemented the basics of the algorithm suggested in chapter 5 of Bishop for a two-layer (one hidden layer) MLP. Everything was pretty straightforward, except perhaps for handling of bias weights. I’ve been using Python, Numpy and Matplotlib, used in the lab and courses here.

I first trained the network to reproduce a sine wave over about one period, a regression problem. The network therefore had one input and one output. As predicted by Yoshua, at first performances were pretty poor, even though I could see it somehow worked (error went down with training), as some hyperparameters needed tweaking, notably:

  • the number of hidden weights,
  • the learning rate,
  • the number of training steps,
  • the type of activation function to use for hidden units,
  • distribution for initial random network weights

At first I tweaked the hyperparameters by hand, but I quickly realized this would take eons, as performance for a given set of hyperparameters varied from one training to the next. So I wrote a class named HyperparameterVariator which, in conjunction with a simple loop, generated new sets of random hyperparameters given “Hyperparameters parameters” objects (e.g. the NUM_HIDDENS hyperparameter can vary from 5 to 30 etc.). It ran for about 40 minutes; I made it generate an HTML report with graphs to be able to see what predictions looked like for a given set of hyperparameters.

I ran the loop 30 times, each time trying a set of hyperparameters for 5 training runs. I was lucky: I found a set of hyperparameters which consistently produced very little error on regression for the sine wave. Here’s the curve (non probabilistic output) for the best set of weights generated, for example (blue is the output, green is the original wave):

Output vs original sine wave

Of course, for other hyperparameters, there were hundreds of curves which had extremely dubious similarity with the original sinus! At least, I had one set which gave good results:

LEARNING_RATE_NEG_EXPONENT: 2.86065622478
TRAINING_STEPS: 349.0
NUM_HIDDENS: 22.0
WEIGHTS_ORIGINAL_DISTR_STD: 0.9608633875
ACTIVATION_FUNCTION: TANH

I therefore modified HyperparameterVariator to allow for randomization of one parameter at a time, others keeping the “safe” value found above. Yet, even after running this a few times, the parameters above remained the best. Probably means I was lucky in that first totally random run!

The sinus problem having produced some satisfying results, I moved on to a classification problem. Pierre-Antoine Manzagol, another grad student in the lab, suggested I use the Iris dataset, which is a classic in the field.

This problem was less straightforward, since I needed to choose how to represent the output. I choose a 1-in-k scheme (e.g. (0,0,1)), with 3 outputs activated by a logistic function (with consequent gradient). After training, I simply chose the maximum output as the winning one.

Once again, performances varied a lot when training depending on hyperparameters, but here too I was able to reduce the error count using HyperparameterVariator. In the end it varied from 4 to ~25 on the training set of 150 data points.

Yet I hit a problem with confidence: too often the next-best choice for class was too close in probability (say 0.7 and 0.6). I tried normalizing the inputs, but altough it made confidence a bit better, it also increased my error count (I didn’t try reoptimizing hyperparameters, though).

I asked Pierre-Antoine for some hints, and he suggested I use the softmax activation function, saying its very common to do this for classification problems. It makes a lot of sense since it involves the notion of maximum output in the training itself. That’s what I’ll be trying next.

Plyn: Yet another textfile-and-scripts based ToDo system

Warning: this post is

  1. out of the normal scope of this blog (it’s about personal _information_ management, not personal _knowledge_ management).
  2. mostly for geeks/programmers who will never be fully satisfied by any planning system, ever.

Over the years I’ve tried different ways of handling my ToDo, planning and work logging. This is my Xth iteration. I wonder if anyone but a programmer could use this, but hey, programmers are a non negligible fraction of society (which I happen to be part of)!

I’ve long wanted to create a program which would do precisely what I want in terms of planning, but was always put off by the “*knocks head on wall* the GUI is so long to code!” aspect. Well to hell with the GUI! Let’s deal with raw information, rarrr.

Err, sooo… Plyn (ie. this system) is inspired by the todo.txt scripts of Gina Trapani (LifeHacker author). Basically it allows you to have a very simple yet powerful todo.txt file, and the file is meant to be read directly (in contrast to other programs which use databases only the program can read). The difference is that my version:

  1. is written in Python (in contrast to Bash for Gina’s todo.txt)
  2. allows for hierarchy, empty lines, comment lines, etc. in the file, so the file can really be structured and read by itself, and a good deal of everyday tasks can be done without ever using the scripts
  3. includes a work log aspect, ie. you can record how much time you spent on tasks to keep stats.
  4. includes time estimates, but for the moment it’s not very developed.

So I’d say it departs from the need to be simple, to be expandable and support other dimensions of planning&logging.

Google Code link for the project & code: http://plyn.googlecode.com

The todo.txt format is pretty simple. Here’s an example of content:

12 Elephant in refrigerator project ||| Yeah, I shouldn't try myself at humor.
	# Open refrigerator door
	# Put elephant in refrigerator
	# Close refrigerator door

	-- This line is just a comment

A few observations:

  • You see a task may be nested in another one (which you can see as a project), simply using tabs.
  • Each line begins either with an ID (number) or with #. The # is replaced by a proper ID by cleanchanges.py (more on this later).
  • The ID is followed by a title, then |||, which indicates the start of parameters/comments.
  • You can have blank lines, and comments lines (starting with –).

After the ‘|||’ characters, you can place different parameters. In more detail, the format of a line is:

(INDENT) ID TITLE ||| {PRIORITY} <MINUTES_DONE/MINUTES_TODO> [START_DATETIME-ENDDATE_TIME] COMMENTS

As you can see, many more options may be specified (see the “format.txt” file for detailed information about each of these parameters), and of course this can be expanded (it all relies on a huge regex). But everything following ||| is optional.

So you can edit the todo.txt file manually, but there are, of course, helper scripts to automate certain tasks. The one you’d use the most is today.py. It gives you a list of all high-priority tasks, late/coming up tasks, and tasks awaiting feedback (“+feedback” tag in title). By editing the script you could add whatever other list you need.

You can also, of course, filter tasks by text using grep. So you could have tags or contexts, for example, if you’re into GTD.

The cleanchanges.py script will replace the # at the beginning of the line by an ID which can then be used to refer to the ToDo item in other scripts. cleanchanges.py will also transform dates, so you can write:

-- Today is 2009/03/14
# Clean refrigerator ||| [-+15]

and the item will be changed into

15 Clean refrigerator ||| [-2009/03/29]

ie. the date can be specified as the number of days in the future, which saves finger mana.

The work log is also simple. To say you’ve just spent the last 3 hours cleaning the refrigerator, you would do:

./log.py 15 180 "Some comment to add to the log"

(where 15 is the task ID and 180 is 3 hours expressed in minutes). This will add a line to the log.txt file, and will change the MINUTES_DONE field of the item in todo.txt.

Scripts are meant to be called from a command line you keep open somewhere in the scripts directory, so you can use autocompletion. Path for todo.txt and other files are configured in cfg.py.

And, of course, the whole thing can be extended as you please. My ultimate goal is to have a script with which I can truly estimate the free time I have, ie. to determine if I can engage in a new task or not.

If anyone ever uses this, be sure to let me know! I’m especially interested in hearing of other must-script-the-procrastination-away coders who expand this thing in whichever direction their urges take them.

Altering a Django project to migrate from a simple ManyToMany relation to one with extra information

There are still quite a few features which could be useful on Clusterify. One I thought could help in structuring projects is the concept of Roles: a way to let users joining a project specify which role they want to take in it.

Now, most common tasks are made amazingly easy with Django. Changing models, while probably common, seems very tricky to automate right, though (and anyway would you trust the automation logic to make the right changes?). Yet there’s dmigrations which I need to try one day.

Back to the original problem: Roles. Formerly we had a ManyToManyField in the Project model linked to the User model for users joining a project. In the background, Django creates a table to hold the relationship named after the field (projects_project_joined_users).

The way to add extra information to a relationship, in the Django ORM, is with the “through” argument to the ManyToManyField, which lets you specify a model which will hold the relationship. In this case, we’d get:

class Membership(models.Model):
    user = models.ForeignKey(User)
    project = models.ForeignKey(Project)
    role = models.CharField(max_length=120)
    approved = models.BooleanField(default=False)

and in Project we now add the field:

    members = models.ManyToManyField(User, through='Membership')

This would create a table named projects_membership. It still doesn’t hold any data, though. Now I see two ways of making this change in the DB:

  • Renaming or copying the old table holding the relationship, and ALTERing it until it looks like that new one.
  • Creating the new table (by looking at the output of sqlall) and filling it by copying over the data with a mapping of fields.

I chose the later, which seemed more straightforward (and anyway I have other manual copying operations to perform). There’s an easy way to copy over the data, the INSERT … SELECT syntax in SQL. The old table was named projects_project_joined_users, so the syntax becomes:

INSERT INTO projects_membership (user_id, project_id, role, approved)
    SELECT projects_project_joined_users.user_id, projects_project_joined_users.project_id, '', 1
    FROM projects_project_joined_users;

Running this copied over the old data, and now the code may be updated. Note that I had first created the new table by checking the output of “sqlall” and running the relevant statements (CREATE TABLE, ALTERs for foreign key constraints, and index creation).

It’s now more complicated to handle the relationship this way, though, as one can’t use add() or remove() as before. Take a look at the doc for more info on this.