Diff revision: diff-based revision of text notes, using spaced repetition

A few years ago I started getting frustrated with forgetting material I studied for side projects and interests. I knew of flashcard spaced repetition, but it didn’t really suit my need. So I built a tool to help review text notes, to better remember their content. It serves many goals, notably studying a large number of subjects over longer periods of time, by taking advantage of text notes one would normally write anyway. The core idea is to review differences between versions of note files using spaced repetition intervals. I’ve been using this for about 3 years myself.

Warning: the concept is very general, and how you actually use the tool may change your experience entirely. (See “usage tips”)

Here’s a presentation (as a video) showing a typical use of the tool. (Note that “drev” and “diffrevision.py” are both aliases for “python path/to/diffrevision.py” on my machine)

What it is

Here’s the typical usage scenario:

  • Say you write notes in a bunch of text (.txt) files in a directory, or in a text-based wiki program (WikidPad, vimwiki, orgmode, etc.)
  • Every time you make a “significant” change in some files, you run the tool’s “add” and “diff” commands and it will know which files are new, which parts were modified in existing files, and which parts didn’t change.
  • You then review the parts that changed today in 2 days, then 6 days, then even further in the future, etc.
    • Every time you can remember longer until you need to review. That’s the spaced repetition idea, here applied to the differences in notes.
    • The idea is to highlight (in color) the parts that changed. You still see the whole text, including the old parts, when you review the new parts.

I made a command-line tool to support doing this. It’s a rewrite of something I had been using for two years, and I’ve been using the rewritten version for about a year.

Warning to spaced repetition fans:

  • there’s no quantification for the moment, no “grading yourself” from, e.g., 0 to 5 upon review. Intervals are fixed. I’m looking at ways of adding this, but it’s not obvious to do right.
  • there’s no “hiding the answer” (as with flashcards, or with cloze deletion) either. Again it might be an interesting feature to add.

Trying the tool

The tool is command-line based. The review is done in a web browser, by producing a set of HTML files, one for every “diff” to review. It works with plain text files, and if anything changed on a line, the whole line is highlighted when reviewing.

The code is on Github, or here’s a .zip of an hopefully stable version. See below for usage instructions, or just run the program without arguments (“python diffrevision.py”). It’s Python (no dependency on anything but core Python packages), so you’ll need the Python interpreter. I know it to work on Ubuntu Linux with Python 2.6 and 2.7, and I tested it partially under Cygwin (Windows, Python 2.5).

By the way I have other versions written, notably a web app version, but I have a feeling that this will appeal mostly to programmers or power users, so I’m releasing the open source version. To the business-minded readers who think one could charge for this: first read this, this and this, on the marketing of spaced repetition products.



Unix ‘diff’

The Unix ‘diff’ tool is a program which takes as input two files, and tells you what changed from one to the other. This is used extensively by programmers to know what parts of a source code changed from one version to the next.

Spaced repetition (with flashcards)

Spaced repetition programs such as Supermemo, Anki and Mnemosyne help you remember facts. I’ve described them in more details here. But in essence you create a question and answer flashcard, say Q: what does DNA stand for? A: Deoxyribonucleic acid. The program will ask in a day or two to review it. Then, depending on how well you remembered it, you give yourself a score. That score is then used to adjust the next interval, so maybe next review will then be in a week. This repeats itself, each time stretching the interval more and more, as you remember better every time (hopefully).

The benefits (why use this?)

Why you would want to do this kind of review, in addition to, or instead of, using a flashcard spaced repetition program? Succintly, and cutting corners:

  • Putting knowledge in the form of flashcards is a lot of work (a lot of time for a relatively small quantity of knowledge).
  • Also, the result is not an easily “browsable”, readable knowledge store.
  • On the other hand, taking continuous-text notes is very natural, easily done, and helps making ideas clearer.
    • And you’re free to organize pieces of knowledge the way you want.
    • Also, personally, I just do it anyway, whether I review or not. In itself it helps understanding, and remembering (generation effect).
  • But, contrary to flashcards, reviewing text that changes over time as we learn more on a topic, is a challenge: which parts should be reviewed, if old parts are well remembered, but novel bits, less familiar, are here and there all over the document?
    • Hence reviewing the changes helps being efficient in your review of what is also a useful reference text.

Also, sometimes you don’t mind forgetting some parts. You just want the main ideas to be “fresh in your memory”. I find this form of review especially well suited for this.

Going further, sometimes you simply don’t want to forget something even exists, or that you have notes somewhere on X or Y topic. I’m often confounded when someone tells me we discussed this or that and I’ve absolutely no recollection of the episode… I’m sure it also happens with some things I read, but I simply have no one to remind me of those.

Another benefit is that you can review “in context”: you see the whole text, with new parts being highlighted. So you see the old parts as well, the “context”.

Using the tool

I programmed a command-line utility in Python to support this. It builds a database of versions of note files.

In essence, you first configure the program in “core/config.py” (see explanations at the top of this file). You then write your notes in text files in a single directory (hierarchy could in principle be supported, but it’s not done yet). Say you’ve added your first two .txt files. You then run

python diffrevision.py show_new

and you’ll see a list of the new files in the directory of your notes (specified in the config). You then add them with:

python diffrevision.py add

which will simply add them to the list of files to be watched for changes.

When you’ve added new files or you’ve made some changes and want to construct reviews from differences, you run:

python diffrevision.py diff

This will look for changes in files that the system is tracking, and record new versions for files that changed since the last time that command was run. There are also commands to see new files, add them for tracking changes, or ignore some files.

When you want to review changes that are to be reviewed at this point in time, you use:

python diffrevision.py today

This will create a temporary directory and populate it with one HTML file for each “diff” to be reviewed that day. You then load the “index.html” file in a Web browser, and open tabs for each file. When finished, you run

python diffrevision.py finished 1234 2345 3456 …

where “1234 2345 3456” are the space-separated IDs of each of the diffs. To simplify this, the command is given to you in “index.html” based on the reviews (diffs) you clicked on.

Suggestions for Linux users: in your $HOME/.bashrc, write an alias such as “alias drev=’python path/to/diffrevision.py’ to make it easier to run the program.


  • This is limited to plain text files, with almost no support for formatting.
    • Formatting of notes only supports WikidPad syntax for the moment. Writing another formatter is a bit complicated, but doable provided you’re a Python programmer and you have well-isolated Python code to export wiki syntax to HTML. See wikidpad_formatter.py and surrounding files for an example.
  • Don’t use non-ASCII characters (accents etc.) in filenames; at first it’ll smile at you and things will work, but when you least expect it it’ll stab you in the back. For the content of the files, just configure the character encoding properly in configuration.py.

Usage tips

As said in the intro, how you use the tool may change your experience from “bleh” to “how did I ever live without this?” (that’s my case anyway).

  • The most important point: make every change small, 5-10 lines or less before you run “diff”.
    • Try to make changes be coherent, a single “idea”. Otherwise I find myself botching the review.
    • The tool is made for “diff” to be fast to run (checking file modification time), to allow doing exactly that.
  • Use with a personal wiki that stores notes in plain text (WikidPad, vimwiki, orgmode…). I wrote about why this is a good idea.
  • Don’t delay review too much, as reviews accumulate fast, and it can get daunting. Ideally it’s done every day, just like with other spaced repetition programs.


Main use cases (who might benefit from this?)

With time, I might write more use cases here. See also my follow-up post on very specific personal benefits I see with this. The main use cases I see are:

  • Studying a large diversity of topics over an extended period of time. This is typical of knowledge work, such as programming. To take an example:
    • Say a programming language is of interest to me, but I hear about it piece by piece (news about Erlang, say)
    • At first I’ll read about it on Wikipedia, take 10 minutes or so and write down what’s different about it in some bullet points (I like bullet points, if you hadn’t figured that out 😛 ).
    • That’ll be in a file named “Programming — Erlang.txt”
    • But then a month later I might see an interesting article about a project that benefited from it
    • I’ll add notes about this to the .txt file.
    • But in the meantime if I hadn’t reviewed my notes, as I’m pretty busy and studying Erlang is low in my priorities, I might very well have forgotten.
  • Studying topics that change over time, such as the state of the world, or any academic field.
    • For topics such as politics, I’ll usually have a “trends” and “news” sections to my pages.
  • Generally having a better mental model of the structure of your notes (no need to review the details if this is the only goal)

Cases where flashcards are better

This point is important: this is not a flashcard-review replacement. For factual information you want to remember very precisely, flashcards are much better. This includes vocabulary review, dates, precise technical terms, etc. I’ll probably expand on this list at some point.

The other very important point is that there’s no way of scoring your performance on a review, and even if I wanted to implement this, it’s just not clear what would be the best way to do it. The absence of score makes it hard to measure/quantify things such as retention rate, forgetting curves, etc. There’s a lot of room for improvement on scheduling, and it’s the next feature I want to add (quantification and better scheduling).

About “memorization”

Memorizing is a very, er, touchy subject. When someone comes up with the topic, many people go up in arms with “but memorizing is useless; creativity, usage and comprehension are much more important!” etc. Yet I still think that for some types of tasks memorizing, to varying degrees of precision, is useful. Hence this tool.

I’ll later present the way I roughly divide studying tasks and the memorization/study method I think appropriate for each category. I’m no pedagogy or spaced repetition expert, by the way, so all this is just the opinions of a programmer.

Personal history with this

I’ve been using this for 3 years, since the start of my Masters degree (which I finished a year ago). I’ve accumulated some (qualitative) experience with the process, some of which I’ll gradually share here.

I have no proof that this works (hard data, graphs). My qualitative experience is that it does help me remember better. To be specific I’ve written a follow-up post on some benefits I’ve seen personally, using very (very) specific examples, as suggested by gwern in the comments.

I hope this may be useful to others. I won’t have much time to work on improving the tool, but as I use it everyday, my natural itch-scratch cycle will probably lead to some new features. Also, feel free to improve on the tool. If your extensions are clean, useful and modular (ie. does not change behavior unless a user’s config specifies it) I’ll gladly merge them with the main Git branch.

Wishlist / TODO for the tool

  • Support for telling the program how well you remembered each diff, like with spaced repetition systems: grading yourself from 1 to 5 and adjusting the interval accordingly. Also, that will help experimenting and keeping statistics. However it’s not as simple as with flashcards, as there may be more than one single piece of information in a review, and you may remember some parts better than others. I’m open to ideas here.
  • No support for file hierarchy for the moment.
  • Adding other formatters would be very useful, notably for vimwiki, ReStructured Text and markdown.
  • For the moment, each version is stored in its entirety, which will sound stupid to anyone knowing anything about version control. But it was faster/simpler. The Storage object is well isolated: you want to make it more efficient, have a go at it 😛

There are many more tasks to be done, of course, but these are the main features I’d like to add.

Small Android self-survey app

For some time I had this annoying itch of wanting to be able to quantify some basic life stuff: time spent sleeping, sports, etc. This is in the spirit of the Quantified Self movement. However I hadn’t found a simple app that would allow me to customize the Q&A format I wanted.

A few months ago I switched to an Android smartphone, and I figured it’d be a great opportunity to play with Android development. I spent two days or so coding this “self-survey” app to fill the basic need. Based on a flat-file question specification format I defined, I can create new surveys quite simply, but of course it’s not that user-friendly 🙂 However it’s very efficient to use and the data is then easily usable in a sort-of-CSV format I can simply grab from the internal storage.

The format allows for 4 types of questions: checkbox, multiple choice answers, free text, and something I call “button array” which is handy for entering time spent on something like sleep at various points in the day.

I don’t want to take the time to polish & publish it on the appstore. Anyway at this point it’s not that user-friendly; it’s best used by programmers. It would take much more time to add the survey-creation, graphs etc. that a full app of this kind needs. As I like to deal with raw data and use existing tools for graphing, I personally don’t mind. Don’t hesitate to fork and add that kind of functionality, though!

Code on github.

(EDIT: as I mention in a comment below, to run the app you need to compile it. My logic here was that the app didn’t have much use to non-programmers anyway, as so many bits are missing)

I still need to add comments and documentation. This is a work in progress.

Simple Javascript memory game

Here’s a little memory game I just finished, using jQuery. It’s very bare bones, and I might add features to it, but it works, doesn’t have a bunch of ads floating around (like most do on the Web), and the board size can be changed (up to 60 total cards for the moment).

Continue reading ‘Simple Javascript memory game’ »

Plyn: Yet another textfile-and-scripts based ToDo system

Warning: this post is mostly for geeks/programmers who will never be fully satisfied by any planning system, ever.

Over the years I’ve tried different ways of handling my ToDo, planning and work logging. This is my Xth iteration. I wonder if anyone but a programmer could use this, but hey, programmers are a non negligible fraction of society (which I happen to be part of)!

I’ve long wanted to create a program which would do precisely what I want in terms of planning, but was always put off by the “*knocks head on wall* the GUI is so long to code!” aspect. Well to hell with the GUI!

Continue reading ‘Plyn: Yet another textfile-and-scripts based ToDo system’ »

Just launched Clusterify.com — small project meetups site for programmers

Here’s the project I’ve been working on for the past 3 weeks with another programmer named Aneesh: Clusterify.com .

It’s a site on which you send project proposals, usually very short in time demand (2 hours is suggested), and through them you meet other programmers.

I can’t spend too much time writing a two-volume novel about it for the moment, as the launch is still ongoing, but here’s the launch thread on Hacker News.