Package org.apache.uima.cas.impl
These are Internal APIs. Use these APIs at your own risk. APIs in this package are subject to change without notice, even in minor releases. Use of this package is not supported. If you think you have found a bug in this package, please try to reproduce it with the officially supported APIs before reporting it.
Internals documentation
NOTE: This documentation is plain HTML, generated from a WYSIWIG editor "tinymce". The way to work on this: after setting up a small web page with the tinymce (running from a local file), use the Tools - source code to cut/paste between this file's source and that editor.
Java Cover Objects for version 3
The Java Cover Objects are no longer cover objects; instead, these objects are the Feature Structures. The Java classes for these objects are in a hierarchy that corresponds to the UIMA type hierarchy. JCasGen continues to serve to generate (for user, not for built-in types) particular Java Classes for particular UIMA Types. And, as before, JCasGen'd classes are optional. If there was not a JCasGen'd class for "MyType" (assume a subtype of "Annotation"), then the most specific supertype of "MyType" which has a particular corresponding Java cover class, is used. (This is how it works in V2, also).
There is one definition of these objects per UIMA Type System. Support for PEARs having different "customizations" of the same JCas classname is not supported in v3.
- This loss of capability is mitigated by the addition of more kinds of Java types as built-in values.
- The reason for this not being supported is that there's no solution figured out for sharing types between the outer and PEAR pipelines, without encountering class-cast exceptions.
- The PEAR can still define customizations for types only it defines (that is, not used by the outer pipeline).
Much of the infrastructure is kept as-is in version 3 to support backwards compatibility.
Format of a JCas class version 3
The _Type is not used. May revisit this if users are using the low-level access made possible by _Type.
There is one definition of the class per type system. Type systems are often shared among multiple CASes. Each definition is loaded under a specific loader for that type system.
(Not implemented) The loader is set up to delegate to the parent for all classes except the JCas types, and for those, it generates them using ASM byte code generation from the fully merged TypeSystem information and existing "customizations".
Each feature is stored in one of two arrays, kept per Java Object Feature Structure Instance: an "int" array, holding boolean/byte/short/int/long/float/double values, and a "Object" array holding strings/refs-to-other-FSs. Longs and Doubles take 2 int slots.
Built-in arrays have their array parts represented by native Java Arrays. Getters and Setters are provided as before. Constructors are provided as before.
Extra fields in the Feature Structure include both instance and class fields:
- (static class fields) a set of fields representing the int offset in the "int" and "object" arrays for all the features
- (instance field) a reference to the TypeImpl for this class - initialized by a reference to a TypeSystemImpl thread local value, at load time. This is updatable to handle two edge cases.
- (instance field) a reference to the CAS View used when this feature structure was created
Extra methods in the FeatureStructure
- a set of generic getters and setters, one per incompatible value type.
- All references to non-primitive FeatureStructures values are collapsed into a single TOP ref.
- These are used for generic access, including serialization/deserialization
- more: see package.html for uimaj-tools jcasgen (link only works if all sources checked out)
UIMA Indexes
Indexes are defined for a pipeline, and are kept as part of the general CAS definition.
Each CAS View has its own instantiation of the defined indexes (there's one definition for all views), and as a result, a particular FS may be added-to-indexes and indexed in some views, and not in others.
There are 3 kinds of indexes: Sorted, Set, and Bag. The basic object type for an index is FsIndex_singleType
. This has 3 subtypes, one for each of the index types:
- FsIndex_bag
- FsIndex_set_sorted (used for both Sets and Sorted indexes
- FsIndex_flat (used for flattened indexes, for instance, with snapshot iterators)
The FsIndex_singleType index is just for one type (and doesn't include entries for any subtypes).
The Set and Sorted implementations are combined; the only difference is in the comparator used. For sets, the comparator is what the index definition specifies. For sorted, the specified comparator is augmented with an least significant extra key which is the Feature Structure id.
Indexes are connected to specific index definitions; these definitions include a type which is the top type for elements of this index. The index definition logically includes that type and all of its subtypes.
An additional data struction, the IndexIteratorCachePair, is associated with each index definition. It holds references to the subtype FsIndex_singleType implementations for all subtypes of an index; this list is created lazily, only when an iterator is created over this index at a particular type level (which can be the type the index was defined for, or any subtype). This lazy aspect is important, because UIMA is often used in cases where there's a giant type system, with lots of subtypes, only a few of which are used in a particular pipeline instance.
There are two tasks that indexes accomplish:
- updating the index with adds and removes of FSs. This update operation is optimized by
- keeping each type indexed separately, so only that data structure for the particular type need be updated (this design choice has a cost in iteration, though)
- treating more common use cases efficiently - the main one being that of adding something "to the end" of the items in the index.
- iterating over an index for a type and its subtypes.
- For indexes having no subtypes, this is done by iterating over the FSLeafIndexImpl for that index and type.
- For indexing with subtypes, this is done by creating individual iterators for the type and all of its subtypes, each iterating over the FSLeafIndexImpl for that type. These iterators are then logically combined into one iterator.
Iterators
There are two main kinds of iterators:
- Iterators over UIMA Indexes
- Iterators over other UIMA objects, such as Views, or internal structures.
Iterators over UIMA indexes
There are two main kinds of iterators over UIMA indexes:
- those returning Java cover objects representing the FS.
- those returning int values representing the location of the FS in the heap. These are the so-called low level iterators; they are less efficient in V3.
The basic iterator over a single type is implemented by FsIterator_singletype. This has subtypes FsIterator_bag and FsIterator_set_sorted.
-
ClassDescriptionDeprecated.use AnnotationBase insteadDeprecated.use Annotation insteadException class for package org.apache.uima.cas.impl.AnnotationTreeImpl<T extends AnnotationFS>Implementation of annotation tree.AnnotationTreeNodeImpl<T extends AnnotationFS>Binary (mostly non compressed) CAS deserialization The methods in this class were originally part of the CASImpl, and were moved here to this class for v3 Binary non compressed CAS serialization is in class CASSerializer, but that class uses routines and data structures in this class.User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS, assuming that the type information remains constant.Compression alternativesUser callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS.Compression alternativesInfo reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out..Deprecated.use BooleanArray insteadImplementation of boolean match constraint.Constants representing Built in type collections String Sets: creatableArrays primitiveTypeNames == noncreatable primitives creatableBuiltinJcas (e.g.Deprecated.use ByteArray insteadUsed by tests for Binary Compressed de/serialization code.This is a small object which contains - CASMgrSerializer instance - a Java serializable form of the type system + index definitions - CASSerializer instance - a Java serializable form of the CAS including lists of which FSs are indexedImplements the CAS interfaces.Journaling changes for computing delta cas.Container for serialized CAS typing information.Used by Binary serialization form 4 and 6 Manage the conversion of FSs to relative sequential index number, and back Manage the difference in two type systems both size of the FSs and handling excluded types During serialization, these maps are constructed before serialization.This object has 2 purposes.CAS serializer support for XMI and JSON formats.states the CAS can be inThis class gets initialized with two type systems, and then provides resources to map type and feature codes between them.Deprecated.Common de/serializationHEADERS Serialization versioning There are 1 or 2 words used for versioning.byte swapping reads of integer formsCommon de/serialization for plain binary and compressed binary form 4 which both used to walk the cas using the sequential, incrementing id approach Lifecycle: There is 0/1 instance per CAS, representing the FSs at some point in time in that CAS.Implementation of the ConstraintFactory interface.CopyOnWriteIndexPart<T extends FeatureStructure>common APIs supporting the copy on write aspect of index partsClass holding information about an FSIndex Includes the "label" of the index, and a ref to the CAS this index contents are in.Class holding info about a View/Sofa.Deprecated.use DoubleArray insteadThe implementation of features in the type system.The implementation of jcas-only features in the type system.Deprecated.use TOP insteadFeature structure implementation (for non JCas and JCas) Each FS has - int data - used for boolean, byte, short, int, long, float, double data -- long and double use 2 int slots - may be null if all slots are in JCas cover objects as fields - ref data - used for references to other Java objects, such as -- strings -- other feature structures -- arbitrary Java Objects - may be null if all slots are in JCas cover objects as fields - an id: an incrementing integer, starting at 1, per CAS, of all FSs created for that CAS - a ref to the casView where this FS was created - a ref to the TypeImpl for this class -- can't be static - may be multiple type systems in useContains CAS Type and Feature objects to represent a feature path of the form feature1/.../featureN.Deprecated.use FloatArray insteadSee interface for documentation.There is one **class** instance of this per UIMA core class loader.One instance per JCas class defined for it, per class loader - per class loader, because different JCas class definitions for the same name are possible, per class loader Kept in maps, per class loader.UNUSED V3 backwards compat only Delete REplace with Comparator<FeatureStructure> or the like.FSGenerator<T extends FeatureStructure>Deprecated.unused in v3, only present to avoid compile errors in unused v2 classesA Functional Interface for generating V3 Java Feature StructuresA Functional Interface for generating Java Feature Structures NO LONGER USEDUNUSED V3, backwards compat only Interface to compare two feature structures, represented by their addresses.FsIndex_annotation<T extends AnnotationFS>Implementation of annotation indexes.FsIndex_bag<T extends FeatureStructure>Used for UIMA FS Bag Indexes Uses ObjHashSet to hold instances of FeatureStructuresFsIndex_flat<T extends FeatureStructure>Common part of flattened indexes, used for both snapshot iterators and flattened sorted indexes built from passed in instance of FsIndex_iicpFsIndex_set_sorted<T extends FeatureStructure>Common index impl for set and sorted indexes.FsIndex_singletype<T extends FeatureStructure>The common (among all index kinds - set, sorted, bag) info for an index over 1 type (excluding subtypes) SubClasses FsIndex_bag, FsIndex_flat, FsIndex_set_sorted, define the actual index repository for each kind.FsIndex_snapshot<T extends FeatureStructure>Implementation of light-weight wrapper of normal indexes, which support special kinds of iterators base on the setting of IteratorExtraFunctionSpecifies the comparison to be used for an index, in terms of - the keys and the typeorder, in an order - the standard/reverse orderingThere is one instance of this class per CAS View.FsIterator_multiple_indexes<T extends FeatureStructure>Common code for both aggregation of indexes (e.g.FsIterator_singletype<T extends FeatureStructure>FsIterator_subtypes_ordered<T extends FeatureStructure>Performs an ordered iteration among a set of iterators, each one corresponding to the type or subtype of the uppermost type.FsIterator_subtypes_snapshot<T extends FeatureStructure>FSIteratorImplBase<T extends FeatureStructure>Version 2 compatibility only, not used internally in version 3 Base class for FSIterator implementations.the v2 CAS heap - used in modeling some binary (de)serializationA map from ints representing FS id's (or "addresses") to those FSs There is one map instance per CAS (all views).Deprecated.use IntegerArray insteadImplementation of the
LinearTypeOrderBuilder
interface.An implementation of theLinearTypeOrder
interface.LLUnambiguousIteratorImpl<T extends FeatureStructure>Implements a low level ambiguous or unambiguous iterator over some type T which doesn't need to be a subtype of Annotation.Deprecated.use LongArray insteadDefines the low-level CAS APIs.Exception class for package org.apache.uima.cas.impl.LowLevelIndex<T extends FeatureStructure>Low-level FS index object.Low-level index repository access.LowLevelIterator<T extends FeatureStructure>Low-level FS iterator.LowLevelIterator_empty<T extends FeatureStructure>An empty Low-level FS iteratorLow-level version of the type system APIs.A MarkerImpl holds a high-water "mark" in the CAS, for all views.This class is used by the XCASDeserializer to store feature structures that do not fit into the type system of the CAS it is deserializing into.SelectFSs_impl<T extends FeatureStructure>Collection of builder style methods to specify selection of FSs from indexes shift handled in this routine Comment codes: AI = implies AnnotationIndex Iterator varieties and impl bounded? type order not unambig? strict? skipEq Priority? Needed? no coveredBy covering sameas for not-bounded, - ignore strict and skipEq -- except: preceding implies skipping annotations whose end > positioning begin - order-not-needed only applies if iicp size > 1 - unambig ==> use Subiterator -- subiterator wraps: according to typePriority and order-not-needed - no Type Priority - need to pass in as arg to fsIterator_multiple_indexes == if no type priority, need to prevent rattling off the == type while compare is equal == affects both FsIterator_aggregation_common and FsIterator_subtypes_ordered for 3 other boundings: - use subiterator, pass in strict and skipeq finish this javadoc comment edit T extends FeatureStructure, not TOP, because of ref from FSIndex which uses FeatureStructure for backwards compatibilityThis class has no fields or instance methods, but instead has only static methods.Deprecated.use ShortArray insteadNOTE: adding or altering slots breaks backward compatability and the ability do deserialize previously serialized things This definition shared with BinaryCasSerDes4 Define all the slot kinds.Users "implement" this interface to get access to these constants in their codeDeprecated.use Sofa insteadDeprecated.use StringArray insteadSupport for legacy string heap format.Appears to be unused, 1-2015 schorSubiterator<T extends AnnotationFS>Subiterator implementation.The implementation of types in the type system.A version of TypeImpl for Annotations and subtypes of AnnotationsA version of TypeImpl for the AnnotationBase type and its subtypesString or String SubtypeDumps a Type System object to XML.This interface defines static final constants for Type Systems For the built-in types and features: - the type and feature codes - the adjOffsetsType system implementation.Type Utilities - all static, so class is abstract to prevent creation Used by Feature PathXCAS Deserializer.Exception class for package org.apache.uima.cas.impl.XCAS serializer.XMI CAS deserializer.CAS serializer for XMI format; writes a CAS in the XML Metadata Interchange (XMI) format.A container for data that is shared between theXmiCasSerializer
and theXmiCasDeserializer
.Data structure holding all information about an XMI element containing an out-of-typesystem FS.Data structure holding the index and the xmi:id of an array or list element that is a reference to an out-of-typesystem FS.Class comment for XMLTypeSystemConsts.java goes here.