UIMA project logo
Getting Started: Apache UIMA C++ Framework
Apache UIMA

Search the site

 Getting Started: Apache UIMA C++ Framework

The "Getting Started: Apache UIMA™ C++ Framework" guide should help you to understand how UIMA C++ enables the creation of UIMA compliant Analysis Engines (AE) in C++ and other programming languages, and how these AE interact with UIMA's Java framework.

UIMA C++ Overview

The UIMA C++ framework is designed to facilitate the creation of UIMA compliant Analysis Engines (AE) from analytics written in C++, or written in languages that can utilize C++ libraries. The UIMACPP SDK directly supports C++, and indirectly supports Perl, Python and Tcl languages via SWIG (http://www.swig.org/). Existing analytic programs in any of these languages can be wrapped with a UIMACPP annotator and integrated with other UIMA compliant analytics or UIMA-based applications.

UIMA Framework Core
Figure 1 - UIMA Framework Core

Figure 1 illustrates the core functions of the UIMA framework which define a primitive AE: the CAS interface methods used to access the artifact being analyzed and any pre-existing analysis results or other metadata, and to create new analysis results; context APIs used to access configuration and logging; and finally, methods used to provide a standardized interface to a UIMA compliant AE. For Perl, Python and Tcl, SWIG is used to expose the CAS and context methods to user code in each language.

A UIMA C++ AE can be used anywhere a UIMA Java AE can be used, for example, as a delegate in an aggregate AE, or as a UIMA service (using JMS, Vinci or SOAP protocols). When used in the Java framework, by default a C++ AE is instantiated and called via the JNI, running as part of the JVM process. This is also true for Vinci and SOAP services. For JMS services, the UIMACPP SDK includes a native service wrapper compatible with UIMA-AS.

The UIMA C++ framework supports testing and embedding UIMA components into native processes. A UIMA C++ test driver, runAECpp, is available so that UIMA C++ components can be fully developed and tested in the native environment, no use of Java is needed.

UIMA C++ includes APIs to parse component descriptors, instantiate and call analysis engines, so that UIMA C++ compliant AE can be used in native applications. However, UIMA C++ components are primarily intended to be integrated into applications using UIMA's Java-based interfaces.

The UIMA Download Page contains UIMACPP source and 32-bit binary SDK packages for both Linux and Windows platforms. The Linux source package has been used to build and test successfully on MacOSX and on 64-bit Linux platforms.

UIMACPP Sample Code

The UIMACPP package includes several sample UIMA C++ annotators and a sample C++ application that instantiates and uses a C++ annotator. Please go to the UIMA Download Page and get the "UIMACPP Framework" package for Linux or Windows as appropriate. For best interaoperability with the Java version of UIMA, unpack into the $UIMA_HOME directory. See the README file in the top level directory for instructions on testing the package, and follow the links there to the sample code in C++, Perl, Python and Tcl.

A UIMA C++ annotator descriptor differs from a Java descriptor in the frameworkImplementation, specifying

<frameworkImplementation>org.apache.uima.cpp</frameworkImplementation>
For a C++ annotator, the annotatorImplementationName specifies the name of a dynamic link library. UIMACPP will add the OS appropriate suffix and search the active dynamic libary path: LD_LIBRARY_PATH for Linux, PATH for Windows, and DYLD_LIBRARY_PATH for MacOSX. The suffix is not automatically added when the annotatorImplementationName includes a path.

An annotator library is derived from the UIMACPP class "Annotator" and must implement basic annotator methods. Annotators in Perl, Python and Tcl languages each use a C++ annotator to instantiate the appropriate interpreter, load the specified annotator source and call the annotator methods.

UIMACPP Example - Running a C++ analytic in a Native Process

As in UIMA, UIMACPP includes application level methods to instantiate an Analysis Engine from a UIMA annotator descriptor, create a CAS using the AE type system, and call AE methods.

examples/src/ExampleApplication.cpp is a simple program that instantiates the specified annotator, reads a directory of txt files, and for each file sets the document text in a CAS and calls the AE process method. For annotator development, this program can be modified to create arbitrary CAS content to drive the annotator. Because the entire application is C++, standard tools such as gdb or devenv can be easily used for debugging.

runAECpp is a UIMA C++ application driver modeled closely after the Java tool runAE. Like ExampleApplication, this tool can read a directory of text files and exercise the given annotator. In addition, runAECpp can take input from XML format CAS files, call the annotator's process() method, and output the resultant CAS in XML format files. XML format CAS input files can be created from upstream UIMA components, or created manually with the content needed to develop and unit test an annotator.

runAECpp - Native C++ Test Driver
Figure 2 - runAECpp - Native C++ Test Driver.

UIMACPP Example - Running a C++ analytic in a JVM Process

Using the UIMA or UIMA AS packages, a UIMA C++ Analysis Engine can be used anywhere a UIMA Java AE can be used, for example, as a delegate in an aggregate AE, or as a UIMA service (using JMS, Vinci or SOAP protocols). When used in the Java framework, by default a C++ AE is instantiated and called via the JNI, running as part of the JVM process.

When a UIMA component descriptor specifies the frameworkImplementation as org.apache.uima.cpp, UIMA's Java framework instantiates a proxy annotator that transparently creates the UIMACPP component through the JNI. When the process(cas) method is called on the proxy, the CAS is binary serialized through the JNI into the native environment. The UIMA C++ annotator operates on the native copy of the CAS, and then the CAS is serialized back to the Java environment.

Colocated UIMA C++ Component
Figure 3 - UIMA C++ Component Colocated with Java Framework.

There are some limitations to this configuration:

  1. When more than one UIMA C++ component is colocated in the JVM, all must share identical versions of the UIMACPP framework.
  2. Runtime problems in the C++ code can crash the entire JVM process.
  3. Standard OS parameters for a process, such as program stack size, are different for a JVM process than a native process.
  4. Debugging native code running in a JVM process can be problematic.

UIMACPP Example - Running a C++ analytic as a Native UIMA AS Service

With the UIMA AS package, a UIMA C++ component can be run as a UIMA AS service using the UIMA C++ application deployCppService. This application instantiates a UIMA C++ AE from the specified annotator descriptor, and then connects to the specified ActiveMQ broker and input queue. In order to take advantage of multi-core hardware, deployCppService supports instantiating multiple copies of the C++ analytic, each in a different thread; this option requires the analytic to be designed for multithreaded operation.

Once deployed, the service can be utilized from UIMA applications and aggregate analysis engines in exactly the same way as other UIMA AS services written in Java.

Native UIMA AS Service
Figure 4 - UIMA C++ Component Running as Native UIMA AS Service.


Using Java to provide lifecycle management and monitoring

UIMA AS services written in Java are deployed using UIMA Deployment Descriptors. These descriptors, which specify the UIMA component descriptor to instantiate and the connectivity and error handling options, are used by the UIMA utility deployAsyncService to launch a Java service. Deployment Descriptors have special support for UIMA C++ services, with the ability to provide lifecycle management, JMX monitoring and integrated logging of C++ native services. This support is enabled when the UIMA AS Deployment Descriptor specifies

<custom name="run_top_level_CPP_service_as_separate_process"/>
in which case Java will launch deployCppService as a separate process on the same machine and establish socket connections for logging and monitoring.

Note that in this case the Deployment Descriptor can also specify the environment for the native process using entries such as

<environmentVariable name="LD_LIBRARY_PATH">/home/user/apache-uima-as/uimacpp/lib</environmentVariable>
This feature enables multiple UIMA C++ components with different levels of UIMACPP to be managed by the same JVM.

Managed Native Service
Figure 5 - Java Managed UIMA C++ Service.