org.apache.uima.cas.impl (Apache UIMA Java SDK 3.6.1-SNAPSHOT User-Level API Documentation)

package org.apache.uima.cas.impl

Implementation and Low-Level API for the CAS Interfaces.

These are Internal APIs. Use these APIs at your own risk. APIs in this package are subject to change without notice, even in minor releases. Use of this package is not supported. If you think you have found a bug in this package, please try to reproduce it with the officially supported APIs before reporting it.

Internals documentation

NOTE: This documentation is plain HTML, generated from a WYSIWIG editor "tinymce". The way to work on this: after setting up a small web page with the tinymce (running from a local file), use the Tools - source code to cut/paste between this file's source and that editor.

Java Cover Objects for version 3

The Java Cover Objects are no longer cover objects; instead, these objects are the Feature Structures. The Java classes for these objects are in a hierarchy that corresponds to the UIMA type hierarchy. JCasGen continues to serve to generate (for user, not for built-in types) particular Java Classes for particular UIMA Types. And, as before, JCasGen'd classes are optional. If there was not a JCasGen'd class for "MyType" (assume a subtype of "Annotation"), then the most specific supertype of "MyType" which has a particular corresponding Java cover class, is used. (This is how it works in V2, also).

There is one definition of these objects per UIMA Type System. Support for PEARs having different "customizations" of the same JCas classname is not supported in v3.

This loss of capability is mitigated by the addition of more kinds of Java types as built-in values.
The reason for this not being supported is that there's no solution figured out for sharing types between the outer and PEAR pipelines, without encountering class-cast exceptions.
The PEAR can still define customizations for types only it defines (that is, not used by the outer pipeline).

Much of the infrastructure is kept as-is in version 3 to support backwards compatibility.

Format of a JCas class version 3

The _Type is not used. May revisit this if users are using the low-level access made possible by _Type.

There is one definition of the class per type system. Type systems are often shared among multiple CASes. Each definition is loaded under a specific loader for that type system.

(Not implemented) The loader is set up to delegate to the parent for all classes except the JCas types, and for those, it generates them using ASM byte code generation from the fully merged TypeSystem information and existing "customizations".

Each feature is stored in one of two arrays, kept per Java Object Feature Structure Instance: an "int" array, holding boolean/byte/short/int/long/float/double values, and a "Object" array holding strings/refs-to-other-FSs. Longs and Doubles take 2 int slots.

Built-in arrays have their array parts represented by native Java Arrays. Getters and Setters are provided as before. Constructors are provided as before.

Extra fields in the Feature Structure include both instance and class fields:

(static class fields) a set of fields representing the int offset in the "int" and "object" arrays for all the features
(instance field) a reference to the TypeImpl for this class - initialized by a reference to a TypeSystemImpl thread local value, at load time. This is updatable to handle two edge cases.
(instance field) a reference to the CAS View used when this feature structure was created

Extra methods in the FeatureStructure

a set of generic getters and setters, one per incompatible value type.
- All references to non-primitive FeatureStructures values are collapsed into a single TOP ref.
- These are used for generic access, including serialization/deserialization
- more: see package.html for uimaj-tools jcasgen (link only works if all sources checked out)

UIMA Indexes

Indexes are defined for a pipeline, and are kept as part of the general CAS definition.

Each CAS View has its own instantiation of the defined indexes (there's one definition for all views), and as a result, a particular FS may be added-to-indexes and indexed in some views, and not in others.

There are 3 kinds of indexes: Sorted, Set, and Bag. The basic object type for an index is FsIndex_singleType. This has 3 subtypes, one for each of the index types:

FsIndex_bag
FsIndex_set_sorted (used for both Sets and Sorted indexes
FsIndex_flat (used for flattened indexes, for instance, with snapshot iterators)

The FsIndex_singleType index is just for one type (and doesn't include entries for any subtypes).

The Set and Sorted implementations are combined; the only difference is in the comparator used. For sets, the comparator is what the index definition specifies. For sorted, the specified comparator is augmented with an least significant extra key which is the Feature Structure id.

Indexes are connected to specific index definitions; these definitions include a type which is the top type for elements of this index. The index definition logically includes that type and all of its subtypes.

An additional data struction, the IndexIteratorCachePair, is associated with each index definition. It holds references to the subtype FsIndex_singleType implementations for all subtypes of an index; this list is created lazily, only when an iterator is created over this index at a particular type level (which can be the type the index was defined for, or any subtype). This lazy aspect is important, because UIMA is often used in cases where there's a giant type system, with lots of subtypes, only a few of which are used in a particular pipeline instance.

There are two tasks that indexes accomplish:

updating the index with adds and removes of FSs. This update operation is optimized by
- keeping each type indexed separately, so only that data structure for the particular type need be updated (this design choice has a cost in iteration, though)
- treating more common use cases efficiently - the main one being that of adding something "to the end" of the items in the index.
iterating over an index for a type and its subtypes.
- For indexes having no subtypes, this is done by iterating over the FSLeafIndexImpl for that index and type.
- For indexing with subtypes, this is done by creating individual iterators for the type and all of its subtypes, each iterating over the FSLeafIndexImpl for that type. These iterators are then logically combined into one iterator.

Iterators

There are two main kinds of iterators:

Iterators over UIMA Indexes
Iterators over other UIMA objects, such as Views, or internal structures.

Iterators over UIMA indexes

There are two main kinds of iterators over UIMA indexes:

those returning Java cover objects representing the FS.
those returning int values representing the location of the FS in the heap. These are the so-called low level iterators; they are less efficient in V3.

The basic iterator over a single type is implemented by FsIterator_singletype. This has subtypes FsIterator_bag and FsIterator_set_sorted.

Related Packages

Package

Description

org.apache.uima.cas

Common Analysis System(CAS) Interfaces

org.apache.uima.cas.admin

org.apache.uima.cas.text

Text Common Annotation System (TCAS) Interfaces.
Class

Description

AllowPreexistingFS

AnnotationBaseImpl

Deprecated.
use AnnotationBase instead

AnnotationImpl

Deprecated.
use Annotation instead

AnnotationImplException

Exception class for package org.apache.uima.cas.impl.

AnnotationTreeImpl<T extends AnnotationFS>

Implementation of annotation tree.

AnnotationTreeNodeImpl<T extends AnnotationFS>

BinaryCasSerDes

Binary (mostly non compressed) CAS deserialization The methods in this class were originally part of the CASImpl, and were moved here to this class for v3 Binary non compressed CAS serialization is in class CASSerializer, but that class uses routines and data structures in this class.

BinaryCasSerDes4

User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS, assuming that the type information remains constant.

BinaryCasSerDes4.Compression

BinaryCasSerDes4.CompressLevel

Compression alternatives

BinaryCasSerDes4.CompressStrat

BinaryCasSerDes6

User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS.

BinaryCasSerDes6.CompressLevel

Compression alternatives

BinaryCasSerDes6.CompressStrat

BinaryCasSerDes6.ReuseInfo

Info reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out..

BooleanArrayFSImpl

Deprecated.
use BooleanArray instead

BooleanConstraint

Implementation of boolean match constraint.

BuiltinTypeKinds

Constants representing Built in type collections String Sets: creatableArrays primitiveTypeNames == noncreatable primitives creatableBuiltinJcas (e.g. empty/non-empty FloatList non creatable primitives (e.g. can't do createFS for primitive int) non creatable and builtin Arrays

ByteArrayFSImpl

Deprecated.
use ByteArray instead

CasCompare

Used by tests for Binary Compressed de/serialization code.

CASCompleteSerializer

This is a small object which contains - CASMgrSerializer instance - a Java serializable form of the type system + index definitions - CASSerializer instance - a Java serializable form of the CAS including lists of which FSs are indexed

CASImpl

Implements the CAS interfaces.

CASImpl.FsChange

Journaling changes for computing delta cas.

CASMgrSerializer

Container for serialized CAS typing information.

CasSeqAddrMaps

Used by Binary serialization form 4 and 6 Manage the conversion of FSs to relative sequential index number, and back Manage the difference in two type systems both size of the FSs and handling excluded types During serialization, these maps are constructed before serialization.

CASSerializer

This object has 2 purposes

CasSerializerSupport

CAS serializer support for XMI and JSON formats.

CasSerializerSupport.CasSerializerSupportSerialize

CasState

states the CAS can be in

CasTypeSystemMapper

This class gets initialized with two type systems, and then provides resources to map type and feature codes between them.

CommonArrayFSImpl

Deprecated.

CommonSerDes

Common de/serialization

CommonSerDes.Header

HEADERS Serialization versioning There are 1 or 2 words used for versioning.

CommonSerDes.Reading

byte swapping reads of integer forms

CommonSerDesSequential

Common de/serialization for plain binary and compressed binary form 4 which both used to walk the cas using the sequential, incrementing id approach Lifecycle: There is 0/1 instance per CAS, representing the FSs at some point in time in that CAS.

ConstraintFactoryImpl

Implementation of the ConstraintFactory interface.

CopyOnWriteIndexPart<T extends FeatureStructure>

common APIs supporting the copy on write aspect of index parts

DebugFSLogicalStructure

DebugFSLogicalStructure.IndexInfo

Class holding information about an FSIndex Includes the "label" of the index, and a ref to the CAS this index contents are in.

DebugFSLogicalStructure.ViewInfo

Class holding info about a View/Sofa.

DebugNameValuePair

DoubleArrayFSImpl

Deprecated.
use DoubleArray instead

FeatureImpl

The implementation of features in the type system.

FeatureImpl_jcas_only

The implementation of jcas-only features in the type system.

FeatureStructureImpl

Deprecated.
use TOP instead

FeatureStructureImplC

Feature structure implementation (for non JCas and JCas) Each FS has - int data - used for boolean, byte, short, int, long, float, double data -- long and double use 2 int slots - may be null if all slots are in JCas cover objects as fields - ref data - used for references to other Java objects, such as -- strings -- other feature structures -- arbitrary Java Objects - may be null if all slots are in JCas cover objects as fields - an id: an incrementing integer, starting at 1, per CAS, of all FSs created for that CAS - a ref to the casView where this FS was created - a ref to the TypeImpl for this class -- can't be static - may be multiple type systems in use

FeatureStructureImplC.PrintReferences

FeatureValuePathImpl

Contains CAS Type and Feature objects to represent a feature path of the form feature1/...

FloatArrayFSImpl

Deprecated.
use FloatArray instead

FSBooleanConstraintImpl

FSClassRegistry

There is one **class** instance of this per UIMA core class loader.

FSClassRegistry.JCasClassInfo

One instance per JCas class defined for it, per class loader - per class loader, because different JCas class definitions for the same name are possible, per class loader Kept in maps, per class loader.

FSComparator

UNUSED V3 backwards compat only Delete REplace with Comparator<FeatureStructure> or the like.

FSGenerator<T extends FeatureStructure>

Deprecated.
unused in v3, only present to avoid compile errors in unused v2 classes

FsGenerator3

A Functional Interface for generating V3 Java Feature Structures

FsGeneratorArray

A Functional Interface for generating Java Feature Structures NO LONGER USED

FSImplComparator

UNUSED V3, backwards compat only Interface to compare two feature structures, represented by their addresses.

FsIndex_annotation<T extends AnnotationFS>

Implementation of annotation indexes.

FsIndex_bag<T extends FeatureStructure>

Used for UIMA FS Bag Indexes Uses ObjHashSet to hold instances of FeatureStructures

FsIndex_flat<T extends FeatureStructure>

Common part of flattened indexes, used for both snapshot iterators and flattened sorted indexes built from passed in instance of FsIndex_iicp

FsIndex_set_sorted<T extends FeatureStructure>

Common index impl for set and sorted indexes.

FsIndex_singletype<T extends FeatureStructure>

The common (among all index kinds - set, sorted, bag) info for an index over 1 type (excluding subtypes) SubClasses FsIndex_bag, FsIndex_flat, FsIndex_set_sorted, define the actual index repository for each kind.

FsIndex_snapshot<T extends FeatureStructure>

Implementation of light-weight wrapper of normal indexes, which support special kinds of iterators base on the setting of IteratorExtraFunction

FSIndexComparatorImpl

Specifies the comparison to be used for an index, in terms of - the keys and the typeorder, in an order - the standard/reverse ordering

FSIndexRepositoryImpl

There is one instance of this class per CAS View.

FsIterator_multiple_indexes<T extends FeatureStructure>

Common code for both aggregation of indexes (e.g. select, iterating over multiple views) aggregation of indexes in type/subtype hierarchy Supports creating corresponding iterators just for the non-empty ones Supports reinit - evaluating when one or more formerly empty indexes is no longer empty, and recalculating the iterator set Supports move-to-leftmost when typeOrdering is to be ignored -- when no typeorder key -- when typeorder key, but select framework requests no typeordering for move to leftmost

FsIterator_singletype<T extends FeatureStructure>

FsIterator_subtypes_ordered<T extends FeatureStructure>

Performs an ordered iteration among a set of iterators, each one corresponding to the type or subtype of the uppermost type.

FsIterator_subtypes_snapshot<T extends FeatureStructure>

FSIteratorImplBase<T extends FeatureStructure>

Version 2 compatibility only, not used internally in version 3 Base class for FSIterator implementations.

FSRefIterator

Heap

the v2 CAS heap - used in modeling some binary (de)serialization

Id2FS

A map from ints representing FS id's (or "addresses") to those FSs There is one map instance per CAS (all views).

IntArrayFSImpl

Deprecated.
use IntegerArray instead

LinearTypeOrderBuilderImpl

Implementation of the LinearTypeOrderBuilder interface.

LinearTypeOrderBuilderImpl.TotalTypeOrder

An implementation of the LinearTypeOrder interface.

LLUnambiguousIteratorImpl<T extends FeatureStructure>

Implements a low level ambiguous or unambiguous iterator over some type T which doesn't need to be a subtype of Annotation. - This iterator skips types which are not Annotation or a subtype of Annotation.

LongArrayFSImpl

Deprecated.
use LongArray instead

LowLevelCAS

Defines the low-level CAS APIs.

LowLevelException

Exception class for package org.apache.uima.cas.impl.

LowLevelIndex<T extends FeatureStructure>

Low-level FS index object.

LowLevelIndexRepository

Low-level index repository access.

LowLevelIterator<T extends FeatureStructure>

Low-level FS iterator.

LowLevelIterator_empty<T extends FeatureStructure>

An empty Low-level FS iterator

LowLevelTypeSystem

Low-level version of the type system APIs.

MarkerImpl

A MarkerImpl holds a high-water "mark" in the CAS, for all views.

MethodHandlesLookup

OutOfTypeSystemData

This class is used by the XCASDeserializer to store feature structures that do not fit into the type system of the CAS it is deserializing into.

SelectFSs_impl<T extends FeatureStructure>

Collection of builder style methods to specify selection of FSs from indexes shift handled in this routine Comment codes: AI = implies AnnotationIndex Iterator varieties and impl bounded?

Serialization

This class has no fields or instance methods, but instead has only static methods.

ShortArrayFSImpl

Deprecated.
use ShortArray instead

SlotKinds

NOTE: adding or altering slots breaks backward compatability and the ability do deserialize previously serialized things This definition shared with BinaryCasSerDes4 Define all the slot kinds.

SlotKinds.SlotKind

SlotKindsConstants

Users "implement" this interface to get access to these constants in their code

SofaFSImpl

Deprecated.
use Sofa instead

StringArrayFSImpl

Deprecated.
use StringArray instead

StringHeapDeserializationHelper

Support for legacy string heap format.

StringMap

Appears to be unused, 1-2015 schor

Subiterator<T extends AnnotationFS>

Subiterator implementation.

Subiterator.BoundsUse

TypeImpl

The implementation of types in the type system.

TypeImpl_annot

A version of TypeImpl for Annotations and subtypes of Annotations

TypeImpl_annotBase

A version of TypeImpl for the AnnotationBase type and its subtypes

TypeImpl_array

TypeImpl_list

TypeImpl_primitive

TypeImpl_string

String or String Subtype

TypeImpl_stringSubtype

TypeNameSpaceImpl

TypeSystem2Xml

Dumps a Type System object to XML.

TypeSystemConstants

This interface defines static final constants for Type Systems For the built-in types and features: - the type and feature codes - the adjOffsets

TypeSystemImpl

Type system implementation.

TypeSystemUtils

Type Utilities - all static, so class is abstract to prevent creation Used by Feature Path

TypeSystemUtils.PathValid

XCASDeserializer

XCAS Deserializer.

XCASParsingException

Exception class for package org.apache.uima.cas.impl.

XCASSerializer

XCAS serializer.

XmiCasDeserializer

XMI CAS deserializer.

XmiCasSerializer

CAS serializer for XMI format; writes a CAS in the XML Metadata Interchange (XMI) format.

XmiSerializationSharedData

A container for data that is shared between the XmiCasSerializer and the XmiCasDeserializer.

XmiSerializationSharedData.NameMultiValue

XmiSerializationSharedData.OotsElementData

Data structure holding all information about an XMI element containing an out-of-typesystem FS.

XmiSerializationSharedData.XmiArrayElement

Data structure holding the index and the xmi:id of an array or list element that is a reference to an out-of-typesystem FS.

XMLTypeSystemConsts

Class comment for XMLTypeSystemConsts.java goes here.

Package org.apache.uima.cas.impl