Interface CollectionProcessingManager


public interface CollectionProcessingManager
A CollectionProcessingManager (CPM) manages the application of an AnalysisEngine to a collection of artifacts. For text analysis applications, this will be a collection of documents. The analysis results will then be delivered to one ore more CasConsumers.

The CPM is configured with an Analysis Engine and CAS Consumers by calling its setAnalysisEngine(AnalysisEngine) and addCasConsumer(CasConsumer) methods. Collection processing is then initiated by calling the process(CollectionReader) or process(CollectionReader,int) methods.

The process methods take a CollectionReader object as an argument. The Collection Reader retrieves each artifact from the collection as a CAS object.

Listeners can register with the CPM by calling the addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status callbacks during the processing. At any time, performance and progress reports are available from the getPerformanceReport() and getProgress() methods.

A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.

Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a CPM or start a new processing job while a previous processing job is occurring will result in a UIMA_IllegalStateException. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPM.

A CollectionProcessingManager instance can be obtained by calling UIMAFramework.newCollectionProcessingManager().

  • Method Details

    • getAnalysisEngine

      AnalysisEngine getAnalysisEngine()
      Gets the AnalysisEngine that is assigned to this CPM.
      Returns:
      the AnalysisEngine that this CPM will use to analyze each CAS in the collection.
    • setAnalysisEngine

      void setAnalysisEngine(AnalysisEngine aAnalysisEngine) throws ResourceConfigurationException
      Sets the AnalysisEngine that is assigned to this CPM.
      Parameters:
      aAnalysisEngine - the AnalysisEngine that this CPM will use to analyze each CAS in the collection.
      Throws:
      ResourceConfigurationException - if this CPM is currently processing
    • getCasConsumers

      CasConsumer[] getCasConsumers()
      Gets the CasConsumerss assigned to this CPM.
      Returns:
      an array of CasConsumers
    • addCasConsumer

      void addCasConsumer(CasConsumer aCasConsumer) throws ResourceConfigurationException
      Adds a CasConsumer to this CPM.
      Parameters:
      aCasConsumer - a CasConsumer to add
      Throws:
      ResourceConfigurationException - if this CPM is currently processing
    • removeCasConsumer

      void removeCasConsumer(CasConsumer aCasConsumer)
      Removes a CasConsumer from this CPM.
      Parameters:
      aCasConsumer - the CasConsumer to remove
      Throws:
      UIMA_IllegalStateException - if this CPM is currently processing
    • isSerialProcessingRequired

      boolean isSerialProcessingRequired()
      Gets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization). Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.
      Returns:
      true if and only if serial processing is required
    • setSerialProcessingRequired

      void setSerialProcessingRequired(boolean aRequired)
      Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization). If this method is not called,* the default is false. Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.
      Parameters:
      aRequired - true if and only if serial processing is required
      Throws:
      UIMA_IllegalStateException - if this CPM is currently processing
    • isPauseOnException

      boolean isPauseOnException()
      Gets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.
      Returns:
      true if and only if this CPM will pause on exception
    • setPauseOnException

      void setPauseOnException(boolean aPause)
      Sets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.
      Parameters:
      aPause - true if and only if this CPM should pause on exception
      Throws:
      UIMA_IllegalStateException - if this CPM is currently processing
    • addStatusCallbackListener

      void addStatusCallbackListener(StatusCallbackListener aListener)
      Registers a listsner to receive status callbacks.
      Parameters:
      aListener - the listener to add
    • removeStatusCallbackListener

      void removeStatusCallbackListener(StatusCallbackListener aListener)
      Unregisters a status callback listener.
      Parameters:
      aListener - the listener to remove
    • process

      void process(CollectionReader aCollectionReader) throws ResourceInitializationException
      Initiates processing of a collection. CollectionReader initializes the CAS with Documents from the Colection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with the addStatusCallbackListener(StatusCallbackListener) method.

      A CPM can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a UIMA_IllegalStateException will result. To find out whether a CPM is free to begin another processing request, call the isProcessing() method.

      Parameters:
      aCollectionReader - the CollectionReader from which to obtain the Entities to be processed
      Throws:
      ResourceInitializationException - if an error occurs during initialization
      UIMA_IllegalStateException - if this CPM is currently processing
    • process

      void process(CollectionReader aCollectionReader, int aBatchSize) throws ResourceInitializationException
      Initiates processing of a collection. This method works in the same way as process(CollectionReader), but it breaks the processing up into batches of a size determined by the aBatchSize parameter. Each CasConsumer will be notified at the end of each batch.
      Parameters:
      aCollectionReader - the CollectionReader from which to obtain the Entities to be processed
      aBatchSize - the size of the batch.
      Throws:
      ResourceInitializationException - if an error occurs during initialization
      UIMA_IllegalStateException - if this CPM is currently processing
    • isProcessing

      boolean isProcessing()
      Determines whether this CPM is currently processing. This means that a processing request has been submitted and has not yet completed or been stop()ped. If processing is paused, this method will still return true.
      Returns:
      true if and only if this CPM is currently processing.
    • pause

      void pause()
      Pauses processing. Processing can later be resumed by calling the resume(boolean) method.
      Throws:
      UIMA_IllegalStateException - if no processing is currently occurring
    • isPaused

      boolean isPaused()
      Determines whether this CPM's processing is currently paused.
      Returns:
      true if and only if this CPM's processing is currently paused.
    • resume

      void resume(boolean aRetryFailed)
      Resumes processing that has been paused.
      Parameters:
      aRetryFailed - if processing was paused because an exception occurred (see setPauseOnException(boolean)), setting a value of true for this parameter will cause the failed entity to be retried. A value of false (the default) will cause processing to continue with the next entity after the failure.
      Throws:
      UIMA_IllegalStateException - if processing is not currently paused
    • resume

      void resume()
      Resumes processing that has been paused.
      Throws:
      UIMA_IllegalStateException - if processing is not currently paused
    • stop

      void stop()
      Stops processing.
      Throws:
      UIMA_IllegalStateException - if no processing is currently occuring
    • getPerformanceReport

      ProcessTrace getPerformanceReport()
      Gets a performance report for the processing that is currently occurring or has just completed.
      Returns:
      an object containing performance statistics
    • getProgress

      Progress[] getProgress()
      Gets a progress report for the processing that is currently occurring or has just completed.
      Returns:
      an array of Progress objects, each of which represents the progress in a different set of units (for example number of entities or bytes)