One of the tasks we shall add before the 1.0 release of 4Suite is full-text search for documents in the repository. We actually had this support before (in 4Suite 0.11.1), using Swish++. It was implemented with quite a kludge: we’d use os.system calls to invoke the indexer on files which we temporarily copied to disk. Even this hack was undone by the confusion over the forks and future of the various Swish code-bases. It looks as if things have finally settled down under the Swish-E umbrella, but now we’re looking at all options.

Our preference is for a search engine with a clean C API to which one can pass text and get indexes back in a nice data structure. Another preference is for XML indexing features. Since we’d ask full-text search users to install that engine separately, a nice, clean install would be nice. And if it already came with a Python API, it would be save a good deal of work.

Bill Ellridge suggested mnoGoSearch (which has a horrible name from a PHB’s point of view), and he even tried his hand at a Python port of the Perl/C module for it. But I have not been encouraged that I am not even able to get mnoGoSearch working from an end-user POV. I tried to set it up as the search engine for the 4Suite mailing list, and no matter how much I tinker with the config file, the Indexer dies with an error.

So we’re still looking. The Open Source Search Engines page is a great resource, but its summaries don’t really give me the sort of in-depth information one needs to evaluate a search engine for such an intimate use.

This is a wheel I’d hate to reinvent, so I’d be grateful for any suggestions.

Do you have a favorite full-text search engine you would recommend for Python users? Or do you know of one with a well-designed C API?