Django and full-text search
Lately I’ve been searching for a simple solution for full-text Model search using Django. Every task up to this point just seemed so easy, so I was a bit surprised to discover there’s no quick, clean and preferred way to go about adding site search functionality in the framework.
So far, the information I read seems to suggest existing solutions are:
- Based on a dedicated full-text search module
- djangosearch
- Supposed to become the official search contrib. Rather recent history (during 2008).
- It’s an framework over existing, dedicated full text indexing engines:
- Lucene (Java version)
- Solr (still Java, and also based on Lucene)
- Xapian (C++)
- HyperEstraier
- django-sphinx
- Wrapper around Sphinx full-text search engine
- djangosearch
- Based on a database engine full-text capability (ie. you must create full text indexes with appropriate DB commands)
- For the MySQL backend, there’s already a “fieldname__search” syntax already supported in the framework, translating into a MATCH AGAINST query in SQL.
- Supports basic boolean operators
- Reference (look at the conclusion of the article)
- For PostgreSQL, depending on the version of the engine, there are solutions, but they seem complex, relative to the MySQL approach
- For the MySQL backend, there’s already a “fieldname__search” syntax already supported in the framework, translating into a MATCH AGAINST query in SQL.
- Most simple, but very inefficient: based on a simple LIKE %keyword% query
- Uses the “fieldname__icontains” filter syntax
- That’s what I used temporarily for get the feature going in my prototype
Other approaches are mentioned in this thread on StackOverflow.

piramida:
Maybe it’s time to publish a project description on your new site for a django module which would do good indexed relevance-sorted FT searching done in python (maybe embedding C if needed but I think python would work)? Have an indexer task running listening for changes and updating index for all models marked for indexing.
The problem is real, we’re using Lucene but having to keep Tomcat up just for that and having on save and on delete signals syncing the index through POST requests is a completely bad django-fu, so I’d be glad to get rid of it.
19 February 2009, 3:56 amfrancois:
Actually there are pure Python indexing engines, but all I’ve seen have a “slower than C counterparts” mention somewhere. See for example this thread:
http://stackoverflow.com/questions/438315/is-there-a-pure-python-lucene
which seems to point to Whoosh (http://whoosh.ca/) as the best alternative right now.
More quick searching brought this: it seems a week ago someone started a project on Github to make a Django-Whoosh module:
http://github.com/ericflo/djan.....ree/master
Maybe the solution _is_ on the way.
19 February 2009, 10:18 amfrancois:
Quickly looking at the source, though, I think the part of your comment concerning signals is not addressed: I might be wrong, but it seems indexing will happen directly while the HTTP request is processed (ie. synchronously, in the same thread).
19 February 2009, 10:35 ampiramida:
Thanks for the links, have not heard about Whoosh – would follow it’s development. If that happens in a django module there would be much more ways to manipulate the process in case it starts slowing the system down
Run a separate indexer thread watching the db or having a queue is one idea. Direct ORM access to data creates many easy solutions.
19 February 2009, 2:58 pmJoseph Turian:
Francois, I have also heard about haystack. It looks very django-y, with Whoosh and Solr backends. However, I have been having some difficulty getting it working just yet.
What solution are you currently using?
7 June 2009, 7:25 pmFrancois:
After what’s probably the longest moderation time in recorded history (err, so sorry about that…), I’ll try an answer, which is the least I can do: after all this search, I was still using the LIKE statement (well, the “__icontains” filter in Django). IIRC the idea was that at some point I could create a MySQL index, if the need ever arose for faster search (but it didn’t).
Here’s the source as used in Clusterify, now that it’s on Github:
http://github.com/clusterify/clusterify/blob/master/utils.py
And example of use (search for “get_query”):
http://github.com/clusterify/c.....s/views.py
12 December 2009, 11:26 am