Diff revision: specific benefits (personal examples)

As a follow-up to the post “Diff revision: diff-based revision of text notes, using spaced repetition”, and as suggested by gwern in the comments, I’m adding a list of very specific cases where diff revision has been useful to me. Warning: some of these examples are super-specific, programming-oriented. But they make good examples. Also, in some cases flashcards could have worked too, as of course there’s some overlap between the two memorization methods.

In all cases, keed in mind it’s impossible to say with 100% certainty what I would have remembered without the repetition, but these cases are examples where I’m pretty sure I’d have forgotten.

  • An example dear to my heart: neuroscience was a side subject I wanted to study for a while, at least some basics.
    • In the past (~2003) I tried reading the first few chapters out of Purves’ Neuroscience textbook, but forgot almost all of what I had read after beginning my baccalaureate (2004), for lack of time to read further. This was actually the main frustration that sparked the development for this tool.
    • Thanks to that system (I think), I now remember most of what I wanted to get out of these chapters (which I’ve had to re-read a few years later when I started my masters degree, ~2009), without having spent too much time encoding notes (I started with flashcards, then quickly realized this was way too much work for what I wanted).
  • There are some programming concepts, functions, or bash and vim commands, etc. I remember, or started using, thanks to this.
    • It’s very easy to reinvent the wheel, or do things the wrong way, if you don’t remember some specific things exist. It’s easy to know you need a screwdriver when you see a screw, but it’s not immediately obvious you need a screw anchor if you don’t know about them (yeah why not simply screw into the wall? you’ll know when things fall apart weeks later)
    • Very specific examples:
      • some much less frequently used vi commands/features, e.g. mode lines
      • very specific: remembered to use “nohup” to start background processes because I reviewed it, otherwise I would probably have ended with something more hackish (using a GNU screen)
      • extended attributes for files, notably ACLs, specifically about 4 months ago even though I hadn’t used them in ~3 years I remembered about specific details of them (exact command) and could propose them as a solution quickly
    • Also, it helps getting into the habit of using a command.
      • E.g. if I read that I can use “sudo !!” to redo the last command as super-user, by taking note of it, and reviewing it, I think I’m more likely to start using it, and then remembering comes even more easily, from using it.
      • That’s how I started using the “#”, “*” and “Ctrl+P” shortcuts in vim, as a specific example.
  • Of course there are higher-level notions I rememberin more details thanks to this. This is not as directly useful, but it comes in handy when discussing software design ideas away from the computer. Yes you can do a lookup on Google, but often that’s just not quick enough when brainstorming with others.
    • General example: recently, I learned lots of high-scalability architecture concepts, such as details of Amazon Dynamo or Hadoop
      • to give a very specific example: the way data is stored per version in Google BigTable.
  • I have a much better idea of where to store (take note of) something new, and what older entries to link with completely new entries/files, and where I might find something I’m looking for. It complements wiki search, but it’s better since I have a better mental model of my notes, which also becomes a working mental model of my knowledge (I wrote about that in a previous blog post, see bit on “Mirror of your knowledge”).
    • Prior to using that system, I tended to forget where I had put something, and would create duplicate. I have ~1700 wiki entries, and I often won’t add anything for months at a time in a given entry. It’s hard, then, to think of the right place to put something if
  • I feel much more confident I won’t forget somethingwhen I write it down in my wiki, that I’m not wasting my time.
    • Yes one could try to review things by hand, or rely on re-reading when editing their notes, but this formalizes the review with proper intervals. No need to worry about re-reading.

Diff revision: diff-based revision of text notes, using spaced repetition

A few years ago I started getting frustrated with forgetting material I studied for side projects and interests. I knew of flashcard spaced repetition, but it didn’t really suit my need. So I built a tool to help review text notes, to better remember their content. It serves many goals, notably studying a large number of subjects over longer periods of time, by taking advantage of text notes one would normally write anyway. The core idea is to review differences between versions of note files using spaced repetition intervals. I’ve been using this for about 3 years myself.

Warning: the concept is very general, and how you actually use the tool may change your experience entirely. (See “usage tips”)

Here’s a presentation (as a video) showing a typical use of the tool. (Note that “drev” and “diffrevision.py” are both aliases for “python path/to/diffrevision.py” on my machine)

What it is

Here’s the typical usage scenario:

  • Say you write notes in a bunch of text (.txt) files in a directory, or in a text-based wiki program (WikidPad, vimwiki, orgmode, etc.)
  • Every time you make a “significant” change in some files, you run the tool’s “add” and “diff” commands and it will know which files are new, which parts were modified in existing files, and which parts didn’t change.
  • You then review the parts that changed today in 2 days, then 6 days, then even further in the future, etc.
    • Every time you can remember longer until you need to review. That’s the spaced repetition idea, here applied to the differences in notes.
    • The idea is to highlight (in color) the parts that changed. You still see the whole text, including the old parts, when you review the new parts.

I made a command-line tool to support doing this. It’s a rewrite of something I had been using for two years, and I’ve been using the rewritten version for about a year.

Warning to spaced repetition fans:

  • there’s no quantification for the moment, no “grading yourself” from, e.g., 0 to 5 upon review. Intervals are fixed. I’m looking at ways of adding this, but it’s not obvious to do right.
  • there’s no “hiding the answer” (as with flashcards, or with cloze deletion) either. Again it might be an interesting feature to add.

Trying the tool

The tool is command-line based. The review is done in a web browser, by producing a set of HTML files, one for every “diff” to review. It works with plain text files, and if anything changed on a line, the whole line is highlighted when reviewing.

The code is on Github, or here’s a .zip of an hopefully stable version. See below for usage instructions, or just run the program without arguments (“python diffrevision.py”). It’s Python (no dependency on anything but core Python packages), so you’ll need the Python interpreter. I know it to work on Ubuntu Linux with Python 2.6 and 2.7, and I tested it partially under Cygwin (Windows, Python 2.5).

By the way I have other versions written, notably a web app version, but I have a feeling that this will appeal mostly to programmers or power users, so I’m releasing the open source version. To the business-minded readers who think one could charge for this: first read this, this and this, on the marketing of spaced repetition products.



Unix ‘diff’

The Unix ‘diff’ tool is a program which takes as input two files, and tells you what changed from one to the other. This is used extensively by programmers to know what parts of a source code changed from one version to the next.

Spaced repetition (with flashcards)

Spaced repetition programs such as Supermemo, Anki and Mnemosyne help you remember facts. I’ve described them in more details here. But in essence you create a question and answer flashcard, say Q: what does DNA stand for? A: Deoxyribonucleic acid. The program will ask in a day or two to review it. Then, depending on how well you remembered it, you give yourself a score. That score is then used to adjust the next interval, so maybe next review will then be in a week. This repeats itself, each time stretching the interval more and more, as you remember better every time (hopefully).

The benefits (why use this?)

Why you would want to do this kind of review, in addition to, or instead of, using a flashcard spaced repetition program? Succintly, and cutting corners:

  • Putting knowledge in the form of flashcards is a lot of work (a lot of time for a relatively small quantity of knowledge).
  • Also, the result is not an easily “browsable”, readable knowledge store.
  • On the other hand, taking continuous-text notes is very natural, easily done, and helps making ideas clearer.
    • And you’re free to organize pieces of knowledge the way you want.
    • Also, personally, I just do it anyway, whether I review or not. In itself it helps understanding, and remembering (generation effect).
  • But, contrary to flashcards, reviewing text that changes over time as we learn more on a topic, is a challenge: which parts should be reviewed, if old parts are well remembered, but novel bits, less familiar, are here and there all over the document?
    • Hence reviewing the changes helps being efficient in your review of what is also a useful reference text.

Also, sometimes you don’t mind forgetting some parts. You just want the main ideas to be “fresh in your memory”. I find this form of review especially well suited for this.

Going further, sometimes you simply don’t want to forget something even exists, or that you have notes somewhere on X or Y topic. I’m often confounded when someone tells me we discussed this or that and I’ve absolutely no recollection of the episode… I’m sure it also happens with some things I read, but I simply have no one to remind me of those.

Another benefit is that you can review “in context”: you see the whole text, with new parts being highlighted. So you see the old parts as well, the “context”.

Using the tool

I programmed a command-line utility in Python to support this. It builds a database of versions of note files.

In essence, you first configure the program in “core/config.py” (see explanations at the top of this file). You then write your notes in text files in a single directory (hierarchy could in principle be supported, but it’s not done yet). Say you’ve added your first two .txt files. You then run

python diffrevision.py show_new

and you’ll see a list of the new files in the directory of your notes (specified in the config). You then add them with:

python diffrevision.py add

which will simply add them to the list of files to be watched for changes.

When you’ve added new files or you’ve made some changes and want to construct reviews from differences, you run:

python diffrevision.py diff

This will look for changes in files that the system is tracking, and record new versions for files that changed since the last time that command was run. There are also commands to see new files, add them for tracking changes, or ignore some files.

When you want to review changes that are to be reviewed at this point in time, you use:

python diffrevision.py today

This will create a temporary directory and populate it with one HTML file for each “diff” to be reviewed that day. You then load the “index.html” file in a Web browser, and open tabs for each file. When finished, you run

python diffrevision.py finished 1234 2345 3456 …

where “1234 2345 3456” are the space-separated IDs of each of the diffs. To simplify this, the command is given to you in “index.html” based on the reviews (diffs) you clicked on.

Suggestions for Linux users: in your $HOME/.bashrc, write an alias such as “alias drev=’python path/to/diffrevision.py’ to make it easier to run the program.


  • This is limited to plain text files, with almost no support for formatting.
    • Formatting of notes only supports WikidPad syntax for the moment. Writing another formatter is a bit complicated, but doable provided you’re a Python programmer and you have well-isolated Python code to export wiki syntax to HTML. See wikidpad_formatter.py and surrounding files for an example.
  • Don’t use non-ASCII characters (accents etc.) in filenames; at first it’ll smile at you and things will work, but when you least expect it it’ll stab you in the back. For the content of the files, just configure the character encoding properly in configuration.py.

Usage tips

As said in the intro, how you use the tool may change your experience from “bleh” to “how did I ever live without this?” (that’s my case anyway).

  • The most important point: make every change small, 5-10 lines or less before you run “diff”.
    • Try to make changes be coherent, a single “idea”. Otherwise I find myself botching the review.
    • The tool is made for “diff” to be fast to run (checking file modification time), to allow doing exactly that.
  • Use with a personal wiki that stores notes in plain text (WikidPad, vimwiki, orgmode…). I wrote about why this is a good idea.
  • Don’t delay review too much, as reviews accumulate fast, and it can get daunting. Ideally it’s done every day, just like with other spaced repetition programs.


Main use cases (who might benefit from this?)

With time, I might write more use cases here. See also my follow-up post on very specific personal benefits I see with this. The main use cases I see are:

  • Studying a large diversity of topics over an extended period of time. This is typical of knowledge work, such as programming. To take an example:
    • Say a programming language is of interest to me, but I hear about it piece by piece (news about Erlang, say)
    • At first I’ll read about it on Wikipedia, take 10 minutes or so and write down what’s different about it in some bullet points (I like bullet points, if you hadn’t figured that out 😛 ).
    • That’ll be in a file named “Programming — Erlang.txt”
    • But then a month later I might see an interesting article about a project that benefited from it
    • I’ll add notes about this to the .txt file.
    • But in the meantime if I hadn’t reviewed my notes, as I’m pretty busy and studying Erlang is low in my priorities, I might very well have forgotten.
  • Studying topics that change over time, such as the state of the world, or any academic field.
    • For topics such as politics, I’ll usually have a “trends” and “news” sections to my pages.
  • Generally having a better mental model of the structure of your notes (no need to review the details if this is the only goal)

Cases where flashcards are better

This point is important: this is not a flashcard-review replacement. For factual information you want to remember very precisely, flashcards are much better. This includes vocabulary review, dates, precise technical terms, etc. I’ll probably expand on this list at some point.

The other very important point is that there’s no way of scoring your performance on a review, and even if I wanted to implement this, it’s just not clear what would be the best way to do it. The absence of score makes it hard to measure/quantify things such as retention rate, forgetting curves, etc. There’s a lot of room for improvement on scheduling, and it’s the next feature I want to add (quantification and better scheduling).

About “memorization”

Memorizing is a very, er, touchy subject. When someone comes up with the topic, many people go up in arms with “but memorizing is useless; creativity, usage and comprehension are much more important!” etc. Yet I still think that for some types of tasks memorizing, to varying degrees of precision, is useful. Hence this tool.

I’ll later present the way I roughly divide studying tasks and the memorization/study method I think appropriate for each category. I’m no pedagogy or spaced repetition expert, by the way, so all this is just the opinions of a programmer.

Personal history with this

I’ve been using this for 3 years, since the start of my Masters degree (which I finished a year ago). I’ve accumulated some (qualitative) experience with the process, some of which I’ll gradually share here.

I have no proof that this works (hard data, graphs). My qualitative experience is that it does help me remember better. To be specific I’ve written a follow-up post on some benefits I’ve seen personally, using very (very) specific examples, as suggested by gwern in the comments.

I hope this may be useful to others. I won’t have much time to work on improving the tool, but as I use it everyday, my natural itch-scratch cycle will probably lead to some new features. Also, feel free to improve on the tool. If your extensions are clean, useful and modular (ie. does not change behavior unless a user’s config specifies it) I’ll gladly merge them with the main Git branch.

Wishlist / TODO for the tool

  • Support for telling the program how well you remembered each diff, like with spaced repetition systems: grading yourself from 1 to 5 and adjusting the interval accordingly. Also, that will help experimenting and keeping statistics. However it’s not as simple as with flashcards, as there may be more than one single piece of information in a review, and you may remember some parts better than others. I’m open to ideas here.
  • No support for file hierarchy for the moment.
  • Adding other formatters would be very useful, notably for vimwiki, ReStructured Text and markdown.
  • For the moment, each version is stored in its entirety, which will sound stupid to anyone knowing anything about version control. But it was faster/simpler. The Storage object is well isolated: you want to make it more efficient, have a go at it 😛

There are many more tasks to be done, of course, but these are the main features I’d like to add.

Google Reader feed maker

UPDATE July 29, 2011: Google seems to have disabled this feature (a few months after it came out, actually). In any case, I find locally-installed extensions such as Firefox SiteDelta to be more reliable than online tools for this task, as you control the verification schedule  (e.g. I tried Page2RSS, but never got any updates in the provided feed).

Just a quick post to underline a Google Reader feature which, though simple, may come in handy: creating a feed for “feed-less” sites. It basically tracks updates on pages by periodically checking the pages you choose.

I need to mention that for table-based pages (or any page with recurring pattern but no feed) there are existing services such as Dapper which will allow you to create a more sophisticated and precise feed by creating a page scrapper on-the-fly.

It’s the kind of feature for which I tend to find more and more uses as time goes on. One significant example I’m thinking of is personal homepages of friends and people who haven’t yet integrated a feed: it’d be nice to be alerted when they change.

(Via this LifeHacker article)

Update the next day: there seems to be plenty other similar services. ChangeDetection.com is an old one, sending updates via email. For others, just Google for “monitor page changes”.

(To be perfectly honest, from a programmer’s point of view, I guess you could do the same by having a list of URLs and setting up a script to periodically check whether significant changes have been made (i.e. using a “diff”). Yet I never took the time to do it, and now that’s it’s easily available…)

Simple Javascript memory game

Here’s a little memory game I just finished, using jQuery. It’s very bare bones, and I might add features to it, but it works, doesn’t have a bunch of ads floating around (like most do on the Web), and the board size can be changed (up to 60 total cards for the moment).

Continue reading ‘Simple Javascript memory game’ »

A few useful augmented reality apps

Augmented reality is the concept of adding information to the stream your senses already provide about the surrounding scene. Concretely, these last few months, a lot of software has appeared for smartphones, taking advantage of the integration of a camera with a good-enough screen. Here are a few examples:

  • The recent Google Goggles and Google Shopper. Goggles adds information to objects you take a picture of, or uses GPS to retrieve information about shops you walk by and add it to the picture (most AR apps I’ve seen focus on this). Shopper adds information about the current product.
  • There are augmented reality “browsers” which provide a platform to add features to. For example, Layar lets you select “Layers” of information to add to the scene.
  • Wikitude uses augmented reality to add traveller’s guide type information to the scene.
  • TAT augmented ID: use the cam to get a good image of someone to identify, and this uses an online face recognition service to provide public information they want to share if they’ve set up their “public ID card” (Twitter profile etc.).

Augmented reality appears a lot in science fiction. For the most part, though, it involves directly augmenting the field of view of a person. If you’ve ever seen Ghost in the shell (the movie, especially the second one), you’ll know what I mean. I remember being quite excited when I read about the possibility of added information through semi-transparent head-mounted displays (this video demonstrates, though in this case it’s not transparent at all, and obviously not something you’d walk with in your everyday life 😛 ). Cam-and-screen is more reachable for the moment, I guess, and a lot less cumbersome.