Package org.apache.uima.cas.impl
Class BinaryCasSerDes6
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes6
- All Implemented Interfaces:
SlotKindsConstants
User callable serialization and deserialization of the CAS in a compressed Binary Format
This serializes/deserializes the state of the CAS. It has the capability to map type systems,
so the sending and receiving type systems do not have to be the same.
- types and features are matched by name, and features must have the same range (slot kind)
- types and/or features in one type system not in the other are skipped over
Header specifies to reader the format, and the compression level.
How to Serialize:
1) create an instance of this class
a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created
after deserializing this CAS initially.
b) if serializaing to a target with a different type system, pass the target's type system impl object
so the serialization can filter the types for the target.
2) call serialize() to serialize the CAS
3) If doing serialization to a target from which you expect to receive back a delta CAS,
create a ReuseInfo object from this object and reuse it for deserializing the delta CAS.
TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in
serializing or deserializing. These are preserved for future calls, so their setup / initialization is only
needed the first time.
TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems;
these too are preserved and reused on future calls.
Compressed Binary CASes are designed to be "self-describing" -
The format of the compressed binary CAS, including version info,
is inserted at the beginning so that a proper deserialization method can be automatically chosen.
Compressed Binary format implemented by this class supports type system mapping.
Types in the source which are not in the target
(or vice versa) are omitted.
Types with "extra" features have their extra features omitted
(or on deserialization, they are set to their default value - null, or 0, etc.).
Feature slots which hold references to types not in the target type system are replaced with 0 (null).
How to Deserialize:
1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must
be the originating CAS from which the delta was produced.
2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta,
then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx
a) Assuming the object being deserialized has a different type system,
set the "target" type system to the TypeSystemImpl instance of the
object being deserialized.
a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized
3) call xxx.deserialize(inputStream)
Compression/Decompression
Works in two stages:
application of Zip/Unzip to particular sub-collections of CAS data,
grouped according to similar data distribution
collection of like kinds of data (to make the zipping more effective)
There can be up to ~20 of these collections, such as
control info, float-exponents, string chars
Deserialization:
Read all bytes,
create separate ByteArrayInputStreams for each segment
create appropriate unzip data input streams for these
Slow but expensive data:
extra type system info - lazily created and added to shared TypeSystemImpl object
set up per type actually referenced
mapper for type system - lazily created and added to shared TypeSystemImpl object
in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl.
Defaulting:
flags: doMeasurements, compressLevel, CompressStrategy
Per serialize call: cas, output, [target ts], [mark for delta]
Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization
CASImpl has instance method with defaulting args for serialization.
CASImpl has reinit which works with compressed binary serialization objects
if no type mapping
If type mapping, (new BinaryCasSerDes6(cas,
marker-or-null,
targetTypeSystem (for stream being deserialized),
reuseInfo-or-null)
.deserialize(in-stream)
Use Cases, filtering and delta
**************************************************************************
* (de)serialize * filter? * delta? * Use case
**************************************************************************
* serialize * N * N * Saving a Cas,
* * * * sending Cas to service with identical ts
**************************************************************************
* serialize * Y * N * sending Cas to service with
* * * * different ts (a guaranteed subset)
**************************************************************************
* serialize * N * Y * returning Cas to client
* * * * uses info saved when deserializing
* * * * (?? saving just a delta to disk??)
**************************************************************************
* serialize * Y * Y * NOT SUPPORTED (not needed)
**************************************************************************
* deserialize * N * N * reading/(receiving) CAS, identical TS
**************************************************************************
* deserialize * Y * N * reading/receiving CAS, different TS
* * * * ts not guaranteed to be superset
* * * * for "reading" case.
**************************************************************************
* deserialize * N * Y * receiving CAS, identical TS
* * * * uses info saved when serializing
**************************************************************************
* deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset)
* * * * uses info saved when serializing
**************************************************************************
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Compression alternativesstatic enum
static class
Info reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out.. -
Field Summary
Fields inherited from interface org.apache.uima.cas.impl.SlotKindsConstants
arrayLength_i, byte_i, CAN_BE_NEGATIVE, control_i, double_Exponent_i, double_Mantissa_Sign_i, float_Exponent_i, float_Mantissa_Sign_i, fsIndexes_i, heapRef_i, IGNORED, IN_MAIN_HEAP, int_i, long_High_i, long_Low_i, NBR_SLOT_KIND_ZIP_STREAMS, short_i, strChars_i, strLength_i, strOffset_i, strSeg_i, typeCode_i
-
Constructor Summary
ConstructorDescriptionSetup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature StructuresSetup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurementsBinaryCasSerDes6
(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, TypeSystemImpl tgtTs) Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures -
Method Summary
Modifier and TypeMethodDescriptionboolean
compareCASes
(CASImpl c1, CASImpl c2) Compare 2 CASes, with perhaps different type systems.void
deserialize
(InputStream istream) void
deserialize
(InputStream istream, AllowPreexistingFS aAllowPreexistingFS) Version used by uima-as to read delta cas from remote parallel stepsvoid
deserializeAfterVersion
(DataInputStream istream, boolean aIsDelta, AllowPreexistingFS aAllowPreexistingFS) S E R I A L I Z E
-
Constructor Details
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures- Parameters:
aCas
- required - refs the CAS being serialized or deserialized intomark
- if not null is the serialization mark for delta serialization. Unused for deserialization.tgtTs
- if not null is the target type system. - For serialization - this is a subset of the CASs TS - for deserialization, is the type system of the serialized data being read.rfs
- For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be nulldoMeasurements
- if true, measurements are done (on serialization)compressLevel
- if not null, specifies enum instance for compress levelcompressStrategy
- if not null, specifies enum instance for compress strategy- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature Structures- Parameters:
cas
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -tgtTs
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -mark
- -tgtTs
- - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - required for both delta serialization and delta deserialization- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) throws ResourceInitializationException Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements- Parameters:
cas
- -mark
- -tgtTs
- - - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - speed up on serialization, required on delta deserializationdoMeasurements
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -storeTS
- -storeTSI
- -- Throws:
ResourceInitializationException
- never thrown
-
-
Method Details
-
getReuseInfo
-
serialize
S E R I A L I Z E- Parameters:
out
- -- Returns:
- null or serialization measurements (depending on setting of doMeasurements)
- Throws:
IOException
- passthru
-
deserialize
- Parameters:
istream
- -- Throws:
IOException
- -
-
deserialize
public void deserialize(InputStream istream, AllowPreexistingFS aAllowPreexistingFS) throws IOException Version used by uima-as to read delta cas from remote parallel steps- Parameters:
istream
- input streamaAllowPreexistingFS
- what to do if item already exists below the mark- Throws:
IOException
- passthru
-
deserializeAfterVersion
public void deserializeAfterVersion(DataInputStream istream, boolean aIsDelta, AllowPreexistingFS aAllowPreexistingFS) throws IOException - Throws:
IOException
-
compareCASes
Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs- Parameters:
c1
- CAS to comparec2
- CAS to compare- Returns:
- true if equal (for types / features in both)
-