Tuesday, October 21, 2014

MITIE v0.3 Released: Now with Java and R APIs

We just made the next release of MITIE, a new DARPA funded information extraction tool being created by our team at MIT. This release is relatively minor and just adds APIs for Java and R.  The project page on github explains how to get started using either of these APIs.  

I want to take some time and explain how the Java API is implemented since, as I discovered while making MITIE's Java API, there aren't clear instructions for doing this anywhere on the internet. So hopefully this little tutorial will help you if you decide to make a similar Java binding to a C++ library.  So to begin, let's think about the requirements for a good Java binding:
  • You should be able to compile it from source with a simple command
  • A user of your library should not need to edit or configure anything to compile the API
  • The compilation process should work on any platform
  • Writing JNI is awful so you shouldn't have to do that
This pretty much leads you to Swig and CMake which are both great tools.  However, finding out how to get CMake to work with Swig was painful and is pretty much what this blog post is about.  Happily, it's possible to do and results in a very clean and easy to use mechanism for creating Java APIs.  In particular, you can compile MITIE's Swig/CMake based Java API using the usual CMake commands:
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install
That creates a jar file and shared library file which together form the MITIE Java API.  Let's run through a little example to see how you can define new Java APIs.  Imagine you have created a simple C++ API that looks like this:
void printSomeString (const std::string& message);

class MyClass {
    std::vector<std::string> getSomeStrings() const;
and you want to be able to use it from Java.  You just need to put this C++ API in a header file called swig_api.h and include some Swig commands that tell it what to call std::vector<std::string> in the generated Java API.  So the contents of swig_api.h would look like:
// Define some swig type maps that tell swig what to call various instantiations of
// std::vector.
#ifdef SWIG
%include "std_string.i"
%include "std_vector.i"
%template(StringVector)         std::vector<std::string>;

#include <string>
#include <vector>

void printSomeString (const std::string& message);

class MyClass {
    std::vector<std::string> getSomeStrings() const;
The next step is to create a CMakeLists.txt file that tells CMake how to compile your API.  In our case, it would look like:

cmake_minimum_required (VERSION 2.8.4)

set(java_package_name  edu.mit.ll.example)

# List the source files you want to compile into the Java API.  These contain 
# things like implementations of printSomeString() and whatever else you need.
set(source_files my_source.cpp another_source_file.cpp )

# List the folders that contain your header files
include_directories( . )

# List of libraries to link to.  For example, you might need to link to pthread
set(additional_link_libraries pthread)

# Tell CMake to put the compiled shared library and example.jar file into the
# same folder as this CMakeLists.txt file when the --target install option is
# executed. You can put any folder here, just give a path that is relative to
# the CMakeLists.txt file.
set(install_target_output_folder .)

That's it.  Now you can compile your Java API using CMake and you will get an example.jar and example.dll or libexample.so file depending on your platform.  Then to use it you can write java code like this:
import edu.mit.ll.example.*;
public class Example {
    public static void main(String args[]) {
        global.printSomeString("hello world!");

        MyClass obj = new MyClass();
        StringVector temp = obj.getSomeStrings();
        for (int i = 0; i < temp.size(); ++i)
and execute it via:
javac -classpath example.jar  Example.java
java -classpath example.jar;. -Djava.library.path=. Example

assuming the examle.jar and shared library are in your current folder.  Note that Linux or OS X users will need to use a : as the classpath separator rather than ; as is required on Windows.  But that's it!  You just made a Java interface to your C++ library.  You might have noticed the include(cmake_swig_jni) statement though.  That is a bunch of CMake magic I had to write to make all this work, but work it does and on different platforms without trouble.  You can see a larger example of a Java to C++ binding in MITIE's github repo using this same setup.