by Martin Eliasson
2008-10-23 19:14:12
public

TurboLucene on Ubuntu Linux 8.04 LTS

Here's another victory: Managing to get TurboLucene to work on Ubuntu Linux 8.04 LTS.

TurboLucene is an extension to the TurboGears mega-web-framework. TurboLucene is a little tricky, because it is based on PyLucene which uses Lucene (written in Java). Thee problem now solved is getting TurboLucene to work with PyLucene release 3.2 on Ubuntu 8.04 LTS.

It all boils down to first getting PyLucene to build on Ubuntu 8.04 which required a lot of tweaking. I think I had to install setuptools from trunk to get around some bug. Anyway, I got PyLucene to compile by checking out release-3.2 from the PyLucene trunk and build JCC in it and then editing the Makefile etc. Basically, follow the INSTALL.txt and you will make it. In case of trouble with setuptools complaining about some 'log' google on that problem, your setuptools need to be upgraded.

The really hard part is that TurboLucene was created when PyLucene compiled with GCJ, but release-3.2 compiles with JCC. This has implications for threading (and a few other minor issues). Also, the JCC version of PyLucene is now called 'lucene' and not 'PyLucene'. All information is available in Reference section below and the pathch at the bottom of the page:

References

Here are the articles that helped me patching the TurboLucene code:
And this is the tutorial that helped me getting started:

Patch/Diff

The following diff are all done to __init__.py in turbolucene folder which you get by unzipping the latest egg.

Index: __init__.py
===================================================================
--- __init__.py       (revision 167)
+++ __init__.py       (arbetskopia)
@@ -148,9 +148,10 @@

 #---  Third-party imports
 import lucene as  PyLucene
-PyLucene.initVM(PyLucene.CLASSPATH)
-from lucene import (IndexModifier, JavaError, Term,
-  IndexSearcher, MultiFieldQueryParser, FSDirectory)
+Env = PyLucene.initVM(PyLucene.CLASSPATH)
+Env.attachCurrentThread()
+
+from lucene import (IndexModifier, JavaError, Term, IndexSearcher, MultiFieldQueryParser, FSDirectory)
 # For use in make_document
 from lucene import Document, Field

@@ -665,6 +666,7 @@

         """
         threading.Thread.__init__(self) # PythonThread is an old-style class
+
         self._make_document = make_document
         self._task_queue = Queue()
         self._indexes = {}
@@ -749,6 +751,9 @@
           - `turbolucene.start` for details about ``make_document``.

         """
+        Env=PyLucene.getVMEnv()
+        Env.attachCurrentThread()
+
         while True:
             task, object_, language = self._task_queue.get()
             method = getattr(self, '_' + task)
@@ -898,6 +903,7 @@

         """
         threading.Thread.__init__(self) # PythonThread is an old-style class
+
         self._results_formatter = results_formatter
         self._query_queue = Queue()
         self._results_queue = Queue()
@@ -972,19 +978,28 @@
             details about Lucene's query syntax.

         """
+        Env = PyLucene.getVMEnv()
+        Env.attachCurrentThread()
+
         query, language = self._query_queue.get()
-        searcher = IndexSearcher(_get_index_path(language))
+        searcher = PyLucene.IndexSearcher(_get_index_path(language))
         search_fields = config.get('turbolucene.search_fields', ['id'])
-        parser = MultiFieldQueryParser(search_fields, _analyzer_factory(
-          language))
+        parser = MultiFieldQueryParser(search_fields, _analyzer_factory(language))
         default_operator = getattr(parser.Operator, config.get(
           'turbolucene.default_operator', 'AND').upper())
         parser.setDefaultOperator(default_operator)
         try:
-            hits = searcher.search(parser.parse(query))
-            results = [document['id'] for _, document in hits]
+            hits = searcher.search(parser.parse(parser,  query))
+            results = []
+            numHits = hits.length()
+            for i in range(0, numHits):
+                # do something with the data
+                doc = hits.doc(i)
+                field1 = doc.get("id")
+                results.append(field1)
         except JavaError:
             results = []
+
         self._results_queue.put(results)
         searcher.close()

@@ -1046,6 +1061,9 @@
         :Note: This method is run in the thread.

         """
+        Env = PyLucene.getVMEnv()
+        Env.attachCurrentThread()
+
         while True:
             request = self._request_queue.get()
             if request == 'stop':

Comments