A warm welcome at the new #bytemine-office for a #Zarafa meeting: http://t.co/8WJA3Cx4MO
During our Zarafa Tour customers had advanced search features high on their wish list. Indeed Zarafa-search is missing some features that users have come to expect from other search engines. People are heavily interacting with email data as never before. We have made it our goal to support the user in all his advanced search adventures. That is why I would like to show you what we are doing at development.
So, where do you start such project that is very challenging from an engineering perspective as well as an user perspective? We decided to start with at feature requests from customers. One very useful feature that we are currently lacking, for example, is to be able to give search suggestions ("did you mean..?"). Another problem with our current search is that the code that powers it has grown to be quite large and complex. It consists of thousands of lines of C++. We have essentially implemented our own search engine, comparable to SOLR or Xapian, except with fewer features.
Historically, there have been good reasons for developing Zarafa-search as a custom search engine. In part this was due to compatibility constraints with Outlook (prefix searching, most importantly), and in part due to performance constraints - during initial indexing, for example, we essentially have to scan and index the whole history of all users, including attachments. Even for a server with many users this should be a process which takes days, and not weeks or even months. The end result was that there was no other choice but to build Zarafa-search atop its own custom search engine.
Ideally of course we prefer to make full use of an existing search engine, to reduce the maintenance burden of our own code, and to more easily add modern features. So a few months ago we decided to re-evaluate the situation, and see if in the meantime new or existing search engines have matured to the point that they could satisfy our specific needs. Since then we have built a working prototype which uses Xapian and Python, and initial results look promising.
Why did we choose Xapian as the default? Being a C++ and GPL project, we think that Xapian is the best fit for Zarafa. Dependency-wise, most distros actually also already ship with Xapian out-of-the-box, and provide packages for its Python bindings. Initial benchmarking shows that both indexing speed and query performance are also more than acceptable. This is especially the case given the Outlook-compatibility constraint of doing prefix searches by default. The Python-bindings to Xapian (python-xapian) are a bit low-level, but that's just a small nitpick.
As you may have noticed by now; we are using Xapian and Python together. But why did we choose Python? Well, we like to think we are just following in the footsteps of the wider open source community where more and more code is being written using high-level languages. With computers being so fast these days, programs are usually blocking on I/O, so a high-level language will often be just as fast as a low-level language. Which in many cases kind of takes away the main advantage of using a low-level language (which C++ definitely is when compared to a language such as Python). Other current examples of Python usage at Zarafa are:
Next to these software components, we also use Python intensively for testing and for admin-type scripting (more about this in a later blog posting.. :-)).
Do you have to share your experiences with Zarafa search? Please drop us a note below. We would like to know how you would like to take advantage of the search functionalities.