Version 2.4.0
Copyright © 2006, 2012 The Apache Software Foundation
Copyright © 2004, 2006 International Business Machines Corporation
License and Disclaimer. The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); you may not use this documentation except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, this documentation and its contents are distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Trademarks. All terms mentioned in the text that are known to be trademarks or service marks have been appropriately capitalized. Use of such terms in this book should not be regarded as affecting the validity of the the trademark or service mark.
January, 2012
The Apache UIMA™ C++ framework allows the creation of UIMA compliant analysis engines from analytics written in C++ and several scripting languages that can utilize C++ libraries. A rich set of standard UIMA interface methods minimizes the effort to extract input data from a CAS and then update the CAS with the analytic results. The UIMA framework transparently moves the CAS between Java and C++ components and between UIMA components running in different processes.
A UIMA C++ component is identified as such in its component descriptor:
<frameworkImplementation>org.apache.uima.cpp</frameworkImplementation>
UIMA C++ annotators can be utilized from C++ applications, from Java applications, and can be aggregated with other UIMA-compliant annotators. For C++ applications the UIMA C++ framework has APIs to parse component descriptors, then instantiate and call analysis engines. A C++ test driver is available so that a UIMA C++ analytic can be developed and tested with standard native programming tools; no programming in Java is required. On the other hand, for a more consistent development environment, Eclipse can provide a single IDE for both Java and C++ components using the CDT.
For Java applications, there are two approaches to integrating UIMA C++ analytics: using the Java Native Interface (JNI), and using a C++ service wrapper to create a UIMA AS compatible service. Using the JNI, a C++ analysis engine can be used anywhere a Java analysis engine is used; in this case a Java proxy will instantiate the uimacpp framework though the JNI. Note that if more than one C++ component is used in the same JVM, they must share the same native environment. Using UIMA AS, a C++ component can be started as a separate process, and therefore each component can have different native environments, if desired. When C++ is launched automatically from Java, logging and JMX monitoring of the annotator is done via the JVM.
The UIMA C++ framework implements a subset of that in Java. Major functionality consists of:
Major UIMA functionality missing in the C++ framework:
UIMA compliant annotators can be written in Perl, Python and Tcl using C++ annotators included in this package. For further details see Perl, Python and Tcl.
The UIMA C++ framework depends on Unicode support from the ICU (see http://www.ibm.com/software/globalization/icu), XML parsing support from xerces (see http://xml.apache.org/xerces-c/) and platform portability from APR (see http://apr.apache.org/).
API documentation for the C++ framework is available here.
Linux® Intel® 32 and 64-bit platforms, MacOSX and Windows® 2000/XP.
Binary distributions are in compressed tarfiles for Linux and zipfiles for Windows.
Set UIMACPP_HOME to the installed location of the UIMA C++ SDK.
Both the UIMA C++ framework and the users' C++ components are implemented as shared libraries and must be available to the native library loader. On Linux these directories must be in the LD_LIBRARY_PATH, in DYLD_LIBRARY_PATH on MacOSX and on Windows in the system PATH. UIMA C++ executables should be added to the system PATH.
export LD_LIBRARY_PATH=$UIMACPP_HOME/lib:$LD_LIBRARY_PATH
export PATH=$UIMACPP_HOME/bin:$PATH
set PATH=%UIMACPP_HOME%\bin;%PATH%
To test the installation, set the environment variables as described above and follow these directions:
cd $UIMACPP_HOME/examples
make -C src -f DaveDetector.mak
The build should create a shared library, DaveDetector.so, which must be placed
in the LD_LIBRARY_PATH.
LD_LIBRARY_PATH=`pwd`/src:$LD_LIBRARY_PATH
runAECpp descriptors/DaveDetector.xml data/example.txt
The runAECpp driver will process the input text file and DaveDetector should find a Dave in it.
cd %UIMACPP_HOME%\examples
devenv src\DaveDetector.vcproj /build release
The build should create a shared library, DaveDetector.dll, which must be
placed in the PATH.
PATH=%CD%\src;$PATH
runAECpp descriptors\DaveDetector.xml data\examples.txt
The runAECpp driver will process the input text file and DaveDetector should find a Dave in it.
For further details about how to build and run other examples see C++ Examples
runAE.sh descriptors/DaveDetector.xml data
runAE descriptors\DaveDetector.xml data
start the ActiveMQ broker following directions in $UIMA_HOME/README
cd $UIMACPP_HOME/examples/tutorial
runRemoteAsyncAE.sh tcp://localhost:61616 MeetingAnnotator \
-d descriptors/Deploy_MeetingAnnotator.xml
start the ActiveMQ broker following directions in %UIMA_HOME%\README
cd %UIMACPP_HOME%\examples\tutorial
runRemoteAsyncAE tcp://localhost:61616 MeetingAnnotator \
-d descriptors\Deploy_MeetingAnnotator.xml
NOTE: On Windows, the case of the environment variable name 'Path' in the deployment descriptor must match the case of the Path environment variable name in the system.
It is advantageous to develop C++ components as stand-alone C++ applications.
The program runAECpp
found in $UIMACPP_HOME/bin is a
native utility that instantiates the specified C++ annotator, imports
input files into CAS objects, and for each input calls the annotator's
process
method.
runAECpp
supports input files in plain text (the default), as well as files in XMI and XCAS format.
The output CASes are optionally saved.
runAECpp UimaCppDescriptor InputFileOrDirectory [OutputDirectory] [-x | -xmi] [-lenient] [-s ViewName] [-l loglevel] [-n numInstances] [-r numRuns] [-rand] [-rdelay Max]
The options -r, -rand and -rdelay are quite useful for detecting threading problems with annotators intended for multi-threaded deployments.
Sample XMI and XCAS format CAS files are included with the UIMA C++ examples. After building the
SofaExampleAnnotator
example as described above for DaveDetector, try:
runAECpp -xmi descriptors/DaveDetector.xml data/tcas.xmi <yourOutputDir>
runAECpp -xmi descriptors/DaveDetector.xml data/sofa.xmi <yourOutputDir> -s EnglishDocument
runAECpp -x descriptors/SofaExampleAnnotator.xml data/sofa.xcas <yourOutputDir>
For further details about these and other examples see C++ Examples
The component driver, runAECpp, simplifies running the C++ component under a native debugger.
The UIMA C++ framework has special provisions for debugging on Windows. UIMA C++ components built debug should link to a debug version of the framework, uimaD.dll. The debug framework automatically appends "D" to the name of C++ components before trying to load them. This applies to annotators and URI scheme handlers. All UIMA C++ example code follow this convention.
Note also that the runAECppD version of the component driver should be used with debug components.
Native components running under Java may operate differently than when run from a native application. This is because the JVM uses different default process limits than those used for a native application. For example, the maximum stack size for a thread running under a JVM may be 100KB versus 1MB for a native command line application. Use "java -X" to get more information on non-standard JVM options.
In order to run UIMA C++ components built debug, Java must load the debug version of the framework. Define the Java system property DEBUG_UIMACPP to specify use of the debug framework. A convenient way to pass JVM properties to UIMA's Java commandline utilities, such as runAE.sh, is to define them in the environmental variable UIMA_JVM_OPTS. For example to run a debug version of DaveDetector from Java:
export UIMA_JVM_OPTS=-DDEBUG_UIMACPP
runAE.sh descriptors/DaveDetector.xml data
For formal integration with UIMA applications, a logfile interface is available. When a C++ annotator is called from Java, logging messages are integrated into the Java log. If the C++ annotator is called from a native C++ application, such as runAECpp, a local logfile may be created. The name of the logfile is taken from the the environmental parameter, UIMACPP_LOGFILE, and it is opened "append". The default is to disable logging.
Three levels of message logging can be used: Message, Warning and Error. When called from Java the UIMA log level is used to control output. When called from a C++ application an API is available to set the log level; the default level is Error. When called from runAECpp the value of these levels are 0, 1, and 2, respectively.