Django and full-text search

Lately I’ve been searching for a simple solution for full-text Model search using Django. Every task up to this point just seemed so easy, so I was a bit surprised to discover there’s no quick, clean and preferred way to go about adding site search functionality in the framework.

So far, the information I read seems to suggest existing solutions are:

  • Based on a dedicated full-text search module
    • djangosearch
      • Supposed to become the official search contrib. Rather recent history (during 2008).
      • It’s an framework over existing, dedicated full text indexing engines:
    • django-sphinx
      • Wrapper around Sphinx full-text search engine
  • Based on a database engine full-text capability (ie. you must create full text indexes with appropriate DB commands)
    • For the MySQL backend, there’s already a “fieldname__search” syntax already supported in the framework, translating into a MATCH AGAINST query in SQL.
      • Supports basic boolean operators
      • Reference (look at the conclusion of the article)
    • For PostgreSQL, depending on the version of the engine, there are solutions, but they seem complex, relative to the MySQL approach
  • Most simple, but very inefficient: based on a simple LIKE %keyword% query
    • Uses the “fieldname__icontains” filter syntax
    • That’s what I used temporarily for get the feature going in my prototype

Other approaches are mentioned in this thread on StackOverflow.

Custom Django filters: displaying a date as “[time delta] ago”

I’m working on a project based on Django, these days. This framework simplifies developing web applications by a great measure, and is very flexible.

One example of this flexibility is the ability to add new functionality to template syntax. For example, I needed a way to display a date/time as a difference to the current one (“19 days ago”, “10 hours ago”).

Turns out there’s a way to do this in Django: as soon as a need can be generalized, it’s probably in there somewhere, or on djangosnippets. It’s the template filter “timesince“. Basically, in the template, you do:

{{ mydate|timesince }} ago

and it will display your date as said above. Problem is, the string is too long for my needs (it will display “19 days, 10 hours” when I need only “19 days”). So I wrapped the function in another one, in a file place under an app of my project, as explained here. Here’s the content:

from django import template
from django.utils.timesince import timesince
register = template.Library()
@register.filter(name='ago')
def ago(date):
    ago = timesince(date)
    # selects only the first part of the returned string
    return ago.split(",")[0] + " ago"

I place that under “myproject/myapp/templatetags/my_date_filter.py”. I also created an empty file __init__.py under templatetags, too. Then, in the template, I may do:

{% load my_date_filter %}
...
{% mydate|ago %}

and I get the shortened string.

Django is full of “extension points” like such. It’s a lot of fun to work with.

CUSEC 2009

Last 3 days (January 22-23-24) were the days of CUSEC 2009, the Canadian Universities Software Engineering Conference. I graduated last year, but the event makes for loads of geek fun, so I went anyway. Quick highlights of the non-technical talks I went to:

  • Leah Culver, cofounder of Pownce, now at Six Apart, opened the conference by speaking of pursuing passions, which is a recurrent theme at CUSEC. To her, that should involve lots of creativity, say being imaginative in repurposing her tiny apartment in California for parties. In fact, it turns out that a big part of her strategy revolves around partying, as she met Kevin Rose (founder at Digg and Pownce), Jimmy Wales (founder of Wikipedia) and other big names in such contexts. (Not to downplay her technical ability, though, as she has also contributed to the Django framework and wrote an OAuth library for Python.)
  • Avi Bryant, founder of Dabble DB, suggested we (in his terms) steal ideas from the academic world to bring them to market by actually making them usable. He showed a demo of his recent (quite magical indeed) Magic/Replace web app, which he said was based on research done at MIT.
  • Giles Bowkett had this wild presentation about his unusual career, showing over 400 slides in about an hour. The slides weren’t exactly packed with precise information, but it’s certainly the most entertaining use of PowerPoint/Keynote/whatever I’ve ever witnessed (though Avi Bryant live editing of Venn diagrams deserves mention too). To illustrate, let’s say it involved quite a few FAIL pictures. He demoed very quickly his Archaeopteryx random music generator program, a Lisp-inspired piece of Ruby that generates MIDI notes later sent to a sound synthesizer/program (he used Reason, IIRC, in his presentation). The result was a pretty good beat, even if didn’t get to hear much of it.
  • Joey deVilla, “the accordion guy”, gave this talk on the job of a tech evangelist, since that’s what he’s now doing for Microsoft. Curiously enough, he barely mentionned Microsoft, but he did play Nine Inch Nails “Head like a hole” on his accordion (entirely justified by earlier stories, btw), which resulted in one of those precious WTF moments that punctuate one’s life.
  • Francis Hwang made a presentation on the nature of software engineering, comparing it to other spheres of human activity. It was a very balanced talk, very interesting, if only for the fact that it departs from the way these comparisons are usually made. For example, it’s often said that software engineering is similar to art, and to him the link is pretty weak, whereas closer fields, intellectually, would be politics, law or economics (ex: politics is close due to the balance of interests involved in the design of systems, intra and inter-organization).
  • Last but clearly not least, Richard Stallman exposed his ideas on copyright and, of course, on Free Software, ideas which are available on the FSF site btw. Another local maximum of LOLs in the conference happened when he decided to auction a cute doll of a Gnu. Seeing him auction the thing was fun enough, but seeing it go to Joey deVilla, who paid it with his Microsoft credit card at the suggestion of the heckling crowd, was, err, priceless. deVilla inviting Stallman to join him in bringing order to the galaxy in a Darth Vader-esque voice was just icing on the cake.

Of course there was also a more technical side to the conference. I won’t elaborate too much on this, but I’ll mention the IBM programming challenge which was a bit weird for a competition: we were asked to make a spell checker / spelling suggestions engine. In 3 hours. No restrictions, be creative. So a few teams ended up writing a wrapper around Google spelling suggestions, as a joke. I didn’t submit anything, but a friend’s quick exploration did bring up the page on Peter Norvig’s explanation of the basic principle behind Google suggestions. The algorithm is interesting, if only for being so short and sweet.

Organizing code snippets and programming knowledge

(This post is geared towards programmers.)

This blog is about structuring your personal knowledge. Code snippets and, more generally, programming language information, are interesting in that everyone and their cubicle neighbor seem to have their own approach to organizing them. Here I survey some interesting software and approaches I’ve read about, their features, and present my own method based on my personal wiki.

UPDATE July 29, 2011: there’s a good discussion about the wide range of options people use over at StackOverflow.

This post is an example where wiki features come in handy (by opp. to a thorough survey of Code Snippet Management as, err, an academic field of study).

Software and approaches

A code snippet manager is a piece of software which allows you to organize short pieces of code to reuse later. Yet I’m also seeking the ability to integrate general information about the language (explanations, elements of theory, etc.): in my experience, snippets are often examples of a notion I’m learning.

In researching a bit on existing systems, I’ve found a few feature families:

  • Code features
    • Syntax highlighting
    • Management of multiple files (a plus if you want to add entire libs to your snippet database)
    • More specific:
      • automatic indentation on insertion
      • dependency management
      • IDE integration
      • (other noteworthy?)
  • Organization and retrieval features
    • Hierarchical: by language, by functionality/algorithm
    • Tags
      • Tags are particularly useful here (vs pure hierarchical) because I’ll often stumble on situations like:
        • I need a snippet in whatever language for a quick sort algorithm.
        • I need a C++ snippet with an iterator loop.
    • Full-text/regular expression search
      • Regular expressions are especially useful since you often seek specific constructs and regular text indexing won’t cut it.
    • Hyperlinks (well, hallmark of wikis here)
    • Date and other general fields
  • Sharing

There are lots of different approaches and systems. Specialized software exists that allows you to organize your snippet library in a standalone and dedicated manner. Google Snippely is an example:


Screenshot of Google Snippely

A whole bunch of sharewares exist that do similar jobs. Some IDEs come with a snippet manager integrated, as is the case with Visual Studio. Most of these local programs offer a basic outline for organization with more or less search capabilities. If you’re looking for an online version with tagging, check out Snipplr, which, being online, also allows you to share and search others’ submissions.


Snipplr homepage screenshot

In the homebrew solution department, this thread is interesting. Some people talk of filesystem based solutions. A few even use a custom database. Personal wikis (as I use, see bellow) and general outliner software clearly need mention too. For example, this blogger says she uses Microsoft OneNote to organize her snippets.

Getting a bit less personal, it should be noted a quite a few bloggers describe their blog as being a “repository for them to search later”. Therefore blogs and websites somehow count as personal snippet libraries (I did a bit of this with my old me-me-me blog over yonder). These score high on integrating other information (ie. free-form formatted text) with the snippets, and of course on the sharing aspect. Community wikis (ie. not personal) are also a great way to organize and share snippets and knowledge (examples here, here).

As a side note, it’s pretty clear we won’t only rely on our own snippets when coding. “The Web + Google” describes my most often used “system” when searching for coding solutions. Yet there are specialized search engines for this job: Google Code Search (you can use regexps on the whole DB!), Koders, and quite a few more.

My approach

Given earlier posts, this doesn’t need a drum roll introduction: I use my personal wiki to organize my snippets and my programming language learning. Of course, this solution allows for inclusion of formatted text. I admit I have a strong tendency to use my snippets for learning more than for reuse, so that factor might weight more than usual in my choice.

A wiki will allow for many different types of retrieval. For example, using the right combination of plugins, with WikidPad I have hierarchical organization, tags/keywords, full text and regular expression search, and, of course, linking. Most popular wiki systems will have plugins to allow for syntax highlighting, and WikidPad is no exception.


Code snippet screenshot in WikidPad
(using the PrettyCode extension)

Where that solution might be lacking is in the IDE integration department, and in the management of multiple files. In the last case, I have a separate personal code (file system) directory to which I may refer using file:// links.

Repetition and my WikidPad dynamic search extension

Digression on repetition

Information overload has numerous causes, and one of them is plain old repetition, e.g.: two sources delivering the same information, with superficial differences. It’s natural to repeat information for various reasons.

As an example, when students take notes on a teacher’s lecture, they all duplicate basically the same information. If they all decide to put their notes online, bam, 30 new versions of “Notes on Heisenberg uncertainty principle”. Same goes for journals and bloggers reporting on a given event.

Of course there might be additional value to each version, different points being made, but for someone doing research on recent events, he still gets to read again and again the same basic facts.

Clearly there’s no simple solution. In fact I might mention here that discussion in the blogosphere does create repetition, but makes that information evolve. Something similar happens for students exchanging notes. In this light, repetition appears as a necessary evil.

If we really want to get philosophical, let’s just say repetition is unavoidable from the very start, as production of repetitive information is just the consequence of information flowing in the social graph and of different human beings going through similar experiences and train of thoughts. And clearly it’s not because one of them has eaten apple pie that humanity can move on and experience other stuff.


Gratuitous picture of humanity’s bane (source)

(Ah, of course, the irony here is that this very article is just some remix of ideas told a zillion times over).

My WikidPad extension

Yet, being aware of the problem, you can at least work on making your own set of notes as repetition-free as possible. That’s another core reason why I love personal wikis. Instead of rewriting information on two pages, as you’d do in paper notes because you don’t have your old notebooks handy, you simply link to the other page and voilà! you just avoided adding a little more repetition to this world (why not add some grandiose here? :) ).

Yet there are cases where where linking is not enough. Say I’m taking notes on the differences between two programming languages, C# and Java. I have a page on C#, a page on Java. Where do I put the notes? I could create a page dedicated to that topic, but I don’t have enough material for the moment to justify that. So say I put them in the page about Java. Consequence: when on C# page I have to navigate to the other page to read the info.


Diagram explaining the extension

What my extension does is grab the info on the Java page (and any other page) and dynamically bring the relevant sections in the C# page. Technically, you give the extension a keyword, and it will search your whole wiki to find pages that contain it. Then, in those pages, it searches for precisely the lines that contain your keyword and some context around it (“sections”). It then prints a list of those sections.

Now it doesn’t matter as much where I put the notes. As long as I label the sections correctly, I can centralize them in the relevant pages when needed, and I don’t need manual copy anymore.

Grab the code & read details here: http://www.fsavard.com/flow/wikidpad-dynamic-search-results/