Class CasCompare

java.lang.Object
org.apache.uima.cas.impl.CasCompare

public class CasCompare extends Object
Used by tests for Binary Compressed de/serialization code. Used by test app: XmiCompare. Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs The index definitions are not compared. The indexes are used to locate the FSs to be compared. Reports are produced to System.out and System.err as a side effect System.out: status messages, type system comparison System.err: mismatch comparison information Usage: Use the static compareCASes method for default comparisons Use the multi-step approach for more complex comparisons: - Make an instance of this class, passing in the two CASes. - Set any additional configuration cc.compareAll(true) - continue comparing if mismatch found cc.compardIds(true) - compare ids (require ids to be ==) - Do any transformations needed on the CASes to account for known but allowed differences: -- These are transformations done on the CAS Feature Structures outside of this routine -- example: for certain type:feature string values, normalize to the same canonical value -- example: for certain type:feature string arrays, where the order is not important, sort them -- example: for certain type:feature FSArrays, where the order is not important, sort them --- using the sortFSArray method - Do any configuration to specify congruence sets for String values -- example: addStringCongruenceSet( type, feature, set-of-strings, -1 or int index if array) -- these are specific to type / feature specs -- range can be string or string array - if string array, the spec includes the index or -1 to indicate all indexes How it works Prepare arrays of all the FSs in the CAS - for each of 2 CASes to be compared - 2 arrays: -- all FSs in any index in any view -- the above, plus all FSs reachable via references -- but omit some types: only of interest when reached via ref, e.g. String/Int/Float/Boolean arrays The comparison of FSs is done, one FS at a time. - in order to determine the right FSs to compare with each other, the FSs for each CAS are sorted. The sort and the CAS compare both use a Compare method. - sorting skips items not in the other type system, including features - (only possible if comparing two CASes with different type systems, of course) Compare - used for two purposes: a) sorting FSs belonging to one CAS - can be used by caller to pre-sort any array values where the compare should be for set equality (in other words, ignore the order) b) comparing a FS in one CAS with a FS in the other CAS sort keys, in order: 1) type 2) if primitive array: sort based on - size - iterating thru all array items 3) All the features, considered in an order where non-refs are sorted before refs. comparing values: primitives - value comparison refs - compare the ref'd FS, while recording reference paths - stop when reach a compare point where the pair being compared has been seen - stop at any point if the two FSs compare unequal - at the stop point, if compare is equal, check the reference paths, and report unequal reference paths (different cycle lengths, or different total lengths, see the Prev data structure) Context information, reused across compares: prevCompare - if a particular pair of FSs compared equal -- used to speed up comparison -- used to stop recursive loops of references prev1, prev2 - reset for each top level FS compare - not reset for inner FS compares of fs-reference values) holds information about the reference path for chains of FS references
  • Constructor Details

    • CasCompare

      public CasCompare(CASImpl c1, CASImpl c2)
      Make an instance of this class to set up a compare operation, and optionally use to configure the compare.
      Parameters:
      c1 - one CAS to compare
      c2 - the other CAS to compare
  • Method Details

    • compareCASes

      public static boolean compareCASes(CASImpl c1, CASImpl c2)
      Compare 2 CASes, with perhaps different type systems. - using default configuration.
      Parameters:
      c1 - CAS to compare
      c2 - CAS to compare
      Returns:
      true if equal (for types / features in both)
    • compareAll

      public void compareAll(boolean v)
      Continues the comparison after a miscompare (or not). This is useful when you want to see all of the miscompares.
      Parameters:
      v - defaults to false, set to true to continue the comparison after a miscompare
    • compareIds

      public void compareIds(boolean v)
      Normally, compares ignore the Feature Structure ID when comparing.
      Parameters:
      v - defaults to false, set to true to include the Feature Structure ID in the compare.
    • applyToBoth

      public void applyToBoth(Consumer<CASImpl> c)
      Many times some customation needs to be applied to both CASs being compared. This routine does that
      Parameters:
      c - the customization to be applied to both CASs
    • applyToTypeFeature

      public void applyToTypeFeature(String typeName, String featureBaseName, org.apache.uima.internal.util.function.Consumer2<TOP,Feature> c)
      Before comparing, you can adjust specific features of specific types, arbitrarily. This routine applies the adjustments to both CASs.
      Parameters:
      typeName - the fully qualified name of the type
      featureBaseName - the short feature name to adjust
      c - a function to do the adjustment
    • type_feature_to_runnable

      public List<Runnable> type_feature_to_runnable(String typeName, String featureBaseName, BiFunction<TOP,Feature,Runnable> c)
      Before comparing, you can create pending values for specific types / features, and return a list of runnables, which when run, plug in those pending values.
      Parameters:
      typeName - the type
      featureBaseName - the feature of the type
      c - the code to run for this type and feature
      Returns:
      a list of runnables, for both CASs
    • canonicalizeString

      public void canonicalizeString(String typeName, String featureBaseName, String[] items_to_change, String canonical_value)
      Before comparing, you can, for a selected type and feature which has a string value belonging to one of a set of strings, change the value to another (fixed) string which will of course compare equal. Use this to ignore selected string-valued features having particular values.
      Parameters:
      typeName - the fully qualified type name
      featureBaseName - the feature
      items_to_change - an array of strings to change if matched to one of these
      canonical_value - the new value
    • sortFSArray

      public List<Runnable> sortFSArray(String typeName, String featureBaseName)
    • sort_dedup_FSArray

      public List<Runnable> sort_dedup_FSArray(String typeName, String featureBaseName)
    • sortStringArray

      public List<Runnable> sortStringArray(String typeName, String featureBaseName)
    • excludeRootTypesFromIndexes

      public void excludeRootTypesFromIndexes(Set<String> excluded_typeNames)
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. You can exclude these from being found via indexes by setting types here. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
      Parameters:
      excluded_typeNames - type names to exclude
    • excludeCollectionsTypesFromIndexes

      public void excludeCollectionsTypesFromIndexes()
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the array types: boolean, byte, short, integer, long, float, double, string and fs arrays from being found via indexes. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
    • excludeListTypesFromIndexes

      public void excludeListTypesFromIndexes()
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of List FeatureStructures like StringLists which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the list types non-empty Float/Integer/String list elements from being found in the index. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
    • includeOnlyTheseTypesFromIndexes

      public void includeOnlyTheseTypesFromIndexes(List<String> includedTypeNames)
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude all types except for a few selected ones which are indexed, in favor of finding these only via refs. Calling this disables any excludeXXXTypesFromIndexes calls;
      Parameters:
      includedTypeNames - fully qualified type names to include when finding Feature Structures to compare via the indexes.
    • addStringCongruenceSet

      public void addStringCongruenceSet(String typeName, String featureBaseName, String[] set_of_strings_that_are_equivalent, int index)
      Add a set of strings that should be considered equal when doing string comparisons. This is conditioned on the typename and feature name
      Parameters:
      typeName - the fully qualified type name
      featureBaseName - the feature short name
      set_of_strings_that_are_equivalent - a set of strings that should compare equal, if testing the type / feature
      index - if the item being compared is a reference to a string array, which index should be compared. Use -1 if not applicable.
    • showProgress

      public static void showProgress()
      call this to show progress of the compare - useful for long compares
    • compareCASes

      public boolean compareCASes()
      This does the actual comparison operation of the previously specified CASes
      Returns:
      true if compare is OK
    • sortFSArray

      public Runnable sortFSArray(FSArray<?> fsArray)
      This is an optional pre-compare operation. Somtimes, when comparing FSArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. The sort order is not significant; it just needs to be the same order for otherwise equal FSArrays. Use this routine to accomplish the sort, on particular FSArrays you designate. Call it for each one you want to sort. During the sort, links are followed. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done on the original item values (in case the links refer back to the originals)
      Parameters:
      fsArray - the array to be sorted
      Returns:
      a runnable, which (when invoked) updates the original array with the sorted result.
    • sort_dedup_FSArray

      public Runnable sort_dedup_FSArray(TOP fs, Feature feat)
      This is an optional pre-compare operation. It is identical to the method above, except that after sorting, it removes duplicates.
      Parameters:
      fs - the feature structure having the fsarray feature
      feat - the feature having the fsarray
      Returns:
      a runnable, which (when invoked) updates the original array with the sorted result.
    • sortStringArray

      public Runnable sortStringArray(StringArray stringArray)
      This is an optional pre-compare operation. Somtimes, when comparing StringArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. Use this routine to accomplish the sort, on particular StringArrays you designate. Call it for each one you want to sort. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done while keeping the original values until a later time
      Parameters:
      stringArray - the array to be sorted
      Returns:
      null or a runnable, which (when invoked) updates the original array with the sorted result. callers should insure the runnable is garbage collected after use
    • compareNumberOfFSsByType

      public static StringBuilder compareNumberOfFSsByType(CAS cas1, CAS cas2)
      Counts and compares the number of Feature Structures, by type, and generates a report
      Parameters:
      cas1 - first CAS to compare
      cas2 - second CAS to compare
      Returns:
      a StringBuilder with a report