Package org.apache.uima.cas.impl


package org.apache.uima.cas.impl
Implementation and Low-Level API for the CAS Interfaces.

These are Internal APIs. Use these APIs at your own risk. APIs in this package are subject to change without notice, even in minor releases. Use of this package is not supported. If you think you have found a bug in this package, please try to reproduce it with the officially supported APIs before reporting it.


Internals documentation

NOTE: This documentation is plain HTML, generated from a WYSIWIG editor "tinymce".   The way to work on this:  after setting up a small web page with the tinymce (running from a local file), use the Tools - source code to cut/paste between this file's source and that editor.

Java Cover Objects for version 3

The Java Cover Objects are no longer cover objects; instead, these objects are the Feature Structures.  The Java classes for these objects are in a hierarchy that corresponds to the UIMA type hierarchy.  JCasGen continues to serve to generate (for user, not for built-in types) particular Java Classes for particular UIMA Types.  And, as before, JCasGen'd classes are optional.  If there was not a JCasGen'd class for "MyType" (assume a subtype of "Annotation"), then the most specific supertype of "MyType" which has a particular corresponding Java cover class, is used.  (This is how it works in V2, also). 

There is one definition of these objects per UIMA Type System.  Support for PEARs having different "customizations" of the same JCas classname is not supported in v3.

  • This loss of capability is mitigated by the addition of more kinds of Java types as built-in values.
  • The reason for this not being supported is that there's no solution figured out for sharing types between the outer and PEAR pipelines, without encountering class-cast exceptions.
  • The PEAR can still define customizations for types only it defines (that is, not used by the outer pipeline).

Much of the infrastructure is kept as-is in version 3 to support backwards compatibility.

Format of a JCas class version 3

The _Type is not used.  May revisit this if users are using the low-level access made possible by _Type.

There is one definition of the class per type system.  Type systems are often shared among multiple CASes.  Each definition is loaded under a specific loader for that type system.  

(Not implemented) The loader is set up to delegate to the parent for all classes except the JCas types, and for those, it generates them using ASM byte code generation from the fully merged TypeSystem information and existing "customizations".

Each feature is stored in one of two arrays, kept per Java Object Feature Structure Instance: an "int" array, holding boolean/byte/short/int/long/float/double values, and a "Object" array holding strings/refs-to-other-FSs.  Longs and Doubles take 2 int slots.

Built-in arrays have their array parts represented by native Java Arrays.  Getters and Setters are provided as before.  Constructors are provided as before.

Extra fields in the Feature Structure include both instance and class fields:

  • (static class fields) a set of fields representing the int offset in the "int" and "object" arrays for all the features
  • (instance field) a reference to the TypeImpl for this class - initialized by a reference to a TypeSystemImpl thread local value, at load time.  This is updatable to handle two edge cases.
  • (instance field) a reference to the CAS View used when this feature structure was created

Extra methods in the FeatureStructure

  • a set of generic getters and setters, one per incompatible value type.
    • All references to non-primitive FeatureStructures values are collapsed into a single TOP ref.
    • These are used for generic access, including serialization/deserialization
    • more: see package.html for uimaj-tools jcasgen (link only works if all sources checked out)

UIMA Indexes

Indexes are defined for a pipeline, and are kept as part of the general CAS definition.

Each CAS View has its own instantiation of the defined indexes (there's one definition for all views), and as a result, a particular FS may be added-to-indexes and indexed in some views, and not in others.

There are 3 kinds of indexes: Sorted, Set, and Bag.  The basic object type for an index is FsIndex_singleType. This has 3 subtypes, one for each of the index types:

  • FsIndex_bag
  • FsIndex_set_sorted (used for both Sets and Sorted indexes
  • FsIndex_flat (used for flattened indexes, for instance, with snapshot iterators)

The FsIndex_singleType index is just for one type (and doesn't include entries for any subtypes).

The Set and Sorted implementations are combined; the only difference is in the comparator used.  For sets, the comparator is what the index definition specifies.  For sorted, the specified comparator is augmented with an least significant extra key which is the Feature Structure id.

Indexes are connected to specific index definitions; these definitions include a type which is the top type for elements of this index. The index definition logically includes that type and all of its subtypes.

An additional data struction, the IndexIteratorCachePair, is associated with each index definition.  It holds references to the subtype FsIndex_singleType implementations for all subtypes of an index; this list is created lazily, only when an iterator is created over this index at a particular type level (which can be the type the index was defined for, or any subtype).  This lazy aspect is important, because UIMA is often used in cases where there's a giant type system, with lots of subtypes, only a few of which are used in a particular pipeline instance.

There are two tasks that indexes accomplish:

  • updating the index with adds and removes of FSs.  This update operation is optimized by
    • keeping each type indexed separately, so only that data structure for the particular type need be updated (this design choice has a cost in iteration, though)
    • treating more common use cases efficiently - the main one being that of adding something "to the end" of the items in the index.
  • iterating over an index for a type and its subtypes. 
    • For indexes having no subtypes, this is done by iterating over the FSLeafIndexImpl for that index and type. 
    • For indexing with subtypes, this is done by creating individual iterators for the type and all of its subtypes, each iterating over the FSLeafIndexImpl for that type.  These iterators are then logically combined into one iterator.

Iterators

There are two main kinds of iterators:

  • Iterators over UIMA Indexes
  • Iterators over other UIMA objects, such as Views, or internal structures.

Iterators over UIMA indexes

There are two main kinds of iterators over UIMA indexes:

  • those returning Java cover objects representing the FS.
  • those returning int values representing the location of the FS in the heap.  These are the so-called low level iterators; they are less efficient in V3.  

The basic iterator over a single type is implemented by FsIterator_singletype.  This has subtypes FsIterator_bag and FsIterator_set_sorted.