Class BinaryCasSerDes6

java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes6
All Implemented Interfaces:
SlotKindsConstants

public class BinaryCasSerDes6 extends Object implements SlotKindsConstants
User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS. It has the capability to map type systems, so the sending and receiving type systems do not have to be the same. - types and features are matched by name, and features must have the same range (slot kind) - types and/or features in one type system not in the other are skipped over Header specifies to reader the format, and the compression level. How to Serialize: 1) create an instance of this class a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created after deserializing this CAS initially. b) if serializaing to a target with a different type system, pass the target's type system impl object so the serialization can filter the types for the target. 2) call serialize() to serialize the CAS 3) If doing serialization to a target from which you expect to receive back a delta CAS, create a ReuseInfo object from this object and reuse it for deserializing the delta CAS. TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in serializing or deserializing. These are preserved for future calls, so their setup / initialization is only needed the first time. TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems; these too are preserved and reused on future calls. Compressed Binary CASes are designed to be "self-describing" - The format of the compressed binary CAS, including version info, is inserted at the beginning so that a proper deserialization method can be automatically chosen. Compressed Binary format implemented by this class supports type system mapping. Types in the source which are not in the target (or vice versa) are omitted. Types with "extra" features have their extra features omitted (or on deserialization, they are set to their default value - null, or 0, etc.). Feature slots which hold references to types not in the target type system are replaced with 0 (null). How to Deserialize: 1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must be the originating CAS from which the delta was produced. 2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta, then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx a) Assuming the object being deserialized has a different type system, set the "target" type system to the TypeSystemImpl instance of the object being deserialized. a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized 3) call xxx.deserialize(inputStream) Compression/Decompression Works in two stages: application of Zip/Unzip to particular sub-collections of CAS data, grouped according to similar data distribution collection of like kinds of data (to make the zipping more effective) There can be up to ~20 of these collections, such as control info, float-exponents, string chars Deserialization: Read all bytes, create separate ByteArrayInputStreams for each segment create appropriate unzip data input streams for these Slow but expensive data: extra type system info - lazily created and added to shared TypeSystemImpl object set up per type actually referenced mapper for type system - lazily created and added to shared TypeSystemImpl object in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl. Defaulting: flags: doMeasurements, compressLevel, CompressStrategy Per serialize call: cas, output, [target ts], [mark for delta] Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization CASImpl has instance method with defaulting args for serialization. CASImpl has reinit which works with compressed binary serialization objects if no type mapping If type mapping, (new BinaryCasSerDes6(cas, marker-or-null, targetTypeSystem (for stream being deserialized), reuseInfo-or-null) .deserialize(in-stream) Use Cases, filtering and delta ************************************************************************** * (de)serialize * filter? * delta? * Use case ************************************************************************** * serialize * N * N * Saving a Cas, * * * * sending Cas to service with identical ts ************************************************************************** * serialize * Y * N * sending Cas to service with * * * * different ts (a guaranteed subset) ************************************************************************** * serialize * N * Y * returning Cas to client * * * * uses info saved when deserializing * * * * (?? saving just a delta to disk??) ************************************************************************** * serialize * Y * Y * NOT SUPPORTED (not needed) ************************************************************************** * deserialize * N * N * reading/(receiving) CAS, identical TS ************************************************************************** * deserialize * Y * N * reading/receiving CAS, different TS * * * * ts not guaranteed to be superset * * * * for "reading" case. ************************************************************************** * deserialize * N * Y * receiving CAS, identical TS * * * * uses info saved when serializing ************************************************************************** * deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset) * * * * uses info saved when serializing **************************************************************************
  • Constructor Details

    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException
      Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures
      Parameters:
      aCas - required - refs the CAS being serialized or deserialized into
      mark - if not null is the serialization mark for delta serialization. Unused for deserialization.
      tgtTs - if not null is the target type system. - For serialization - this is a subset of the CASs TS - for deserialization, is the type system of the serialized data being read.
      rfs - For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be null
      doMeasurements - if true, measurements are done (on serialization)
      compressLevel - if not null, specifies enum instance for compress level
      compressStrategy - if not null, specifies enum instance for compress strategy
      Throws:
      ResourceInitializationException - if the target type system is incompatible with the source type system
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas) throws ResourceInitializationException
      Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature Structures
      Parameters:
      cas - -
      Throws:
      ResourceInitializationException - never thrown
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs) throws ResourceInitializationException
      Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures
      Parameters:
      cas - -
      tgtTs - -
      Throws:
      ResourceInitializationException - if the target type system is incompatible with the source type system
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
      Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures
      Parameters:
      cas - -
      mark - -
      tgtTs - - for deserialization, is the type system of the serialized data being read.
      rfs - Reused Feature Structure information - required for both delta serialization and delta deserialization
      Throws:
      ResourceInitializationException - if the target type system is incompatible with the source type system
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) throws ResourceInitializationException
      Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements
      Parameters:
      cas - -
      mark - -
      tgtTs - - - for deserialization, is the type system of the serialized data being read.
      rfs - Reused Feature Structure information - speed up on serialization, required on delta deserialization
      doMeasurements - -
      Throws:
      ResourceInitializationException - if the target type system is incompatible with the source type system
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
      Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature Structures
      Parameters:
      cas - -
      rfs - -
      Throws:
      ResourceInitializationException - never thrown
    • BinaryCasSerDes6

      public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) throws ResourceInitializationException
      Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structures
      Parameters:
      cas - -
      rfs - -
      storeTS - -
      storeTSI - -
      Throws:
      ResourceInitializationException - never thrown
  • Method Details

    • getReuseInfo

      public BinaryCasSerDes6.ReuseInfo getReuseInfo()
    • serialize

      public SerializationMeasures serialize(Object out) throws IOException
      S E R I A L I Z E
      Parameters:
      out - -
      Returns:
      null or serialization measurements (depending on setting of doMeasurements)
      Throws:
      IOException - passthru
    • deserialize

      public void deserialize(InputStream istream) throws IOException
      Parameters:
      istream - -
      Throws:
      IOException - -
    • deserialize

      public void deserialize(InputStream istream, AllowPreexistingFS aAllowPreexistingFS) throws IOException
      Version used by uima-as to read delta cas from remote parallel steps
      Parameters:
      istream - input stream
      aAllowPreexistingFS - what to do if item already exists below the mark
      Throws:
      IOException - passthru
    • deserializeAfterVersion

      public void deserializeAfterVersion(DataInputStream istream, boolean aIsDelta, AllowPreexistingFS aAllowPreexistingFS) throws IOException
      Throws:
      IOException
    • compareCASes

      public boolean compareCASes(CASImpl c1, CASImpl c2)
      Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs
      Parameters:
      c1 - CAS to compare
      c2 - CAS to compare
      Returns:
      true if equal (for types / features in both)