Microsoft has extended its open source activity to include projects derived from work done on its Bing search engine.
Over the last few days, engineers at Microsoft have started to provide code and details about the BitFunnel full-text search system used to power Bing. The pieces available so far are minimal, but one of them -- an open source just-in-time compiler -- hints at applications beyond search systems.
A little bit of JIT
The GitHub site for BitFunnel lists three major projects so far. One is the BitFunnel text search/retrieval system itself, which has been posted there "in the spirit of doing development out in the open," even though "the documentation is non-existent and the code is in an incomplete state," according to the repository's README file. A second project, WorkBench, is a tool for preparing text for use in BitFunnel.
The third project, NativeJIT, is the most interesting of the bunch, even in its current early state. Written in C++, it takes expressions that use C data structures and transforms them into highly optimized assembly code.
According to Microsoft, NativeJIT provides the best results when the work you're performing with it meets three conditions:
- The expression isn't known in advance.
- The cost of compiling the statement to assembly will be more than offset by the number of times the expression will be run.
- "Latency and throughput demands require low cost for compilation," meaning there are so many queries being processed in this manner that compilation can't afford to be a bottleneck.
Bing's original scenario for NativeJIT involved scoring documents in Bing's database based on keyword matches from queries. Each query would have a custom expression created for it, and the scoring process would then be partitioned across a cluster.
As it currently stands, NativeJIT is still rudimentary. Functions optimized with it are assumed not to change anything outside of the context of the function. If those functions call other items via NativeJIT's
CallNode method, it's the responsibility of the programmer to track and deal with any changes that occur.
Many future optimizations are planned but haven't been implemented yet. Conditionals, or the equivalent in NativeJIT of an if-then-else statement, currently have all their paths evaluated. "Down the road we intend to rework the code generator to restrict execution to either the true or the false path [of such a statement]," said BitFunnel engineer Dan Luu.
Some assembly required
NativeJIT isn't meant to be a full-blown JIT for C++, in the manner of projects like Cling. That project uses the LLVM compiler infrastructure to evaluate C++ code line by line from a command line, in effect allowing for interactive C++ programming.
However, NativeJIT is meant to focus on creating highly optimized instances of specific functions, with C++ as the programming framework for assembling those functions. In theory, NativeJIT's API could be made available to any number of other languages (Python, for instance), but that is a possibility for later down the road.
Open source projects have been emerging gradually from different sectors of Microsoft. Originally those projects were entirely self-serving, such as releasing patches for the Linux kernel to ensure it ran well on Hyper-V. But over time, those projects have become more open-ended and outward-facing, such as the CNTK machine learning framework. NativeJIT continues that process. Let's hope it's becoming a tradition at Microsoft, not a passing phase.