Interface CollectionProcessingEngine

All Known Implementing Classes:
CollectionProcessingEngine_impl

public interface CollectionProcessingEngine
A CollectionProcessingEngine (CPE) processes a collection of artifacts (for text analysis applications, this will be a collection of documents) and produces collection-level results.

A CPE consists of a CollectionReader, zero or more AnalysisEngines and zero or more CasConsumers. The Collection Reader is responsible for reading artifacts from a collection and setting up the CAS. The AnalysisEngines analyze each CAS and the results are passed on to the CAS Consumers. CAS Consumers perform analysis over multiple CASes and generally produce collection-level results in some application-specific data structure.

Processing is started by calling the process() method. Processing can be controlled via thepause(), resume(), and stop() methods.

Listeners can register with the CPE by calling the addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status callbacks during the processing. At any time, performance and progress reports are available from the getPerformanceReport() and getProgress() methods.

A CPE implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.

Note that a CPE only supports processing one collection at a time. Attempting to start a new processing job while a previous processing job is running will result in an exception. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPE.

A CollectionProcessingEngine instance can be obtained by calling UIMAFramework.produceCollectionProcessingEngine(CpeDescription).

  • Method Details

    • initialize

      void initialize(CpeDescription aCpeDescription, Map<String,Object> aAdditionalParams) throws ResourceInitializationException
      Initializes this CPE from a cpeDescription Applications do not need to call this method. It is called automatically by the framework and cannot be called a second time.
      Parameters:
      aCpeDescription - CPE description, generally parsed from an XML file
      aAdditionalParams - a Map containing additional parameters. May be null if there are no parameters. Each class that implements this interface can decide what additional parameters it supports.
      Throws:
      ResourceInitializationException - if a failure occurs during initialization.
      UIMA_IllegalStateException - if this method is called more than once on a single instance.
    • addStatusCallbackListener

      void addStatusCallbackListener(StatusCallbackListener aListener)
      Registers a listener to receive status callbacks.
      Parameters:
      aListener - the listener to add
    • removeStatusCallbackListener

      void removeStatusCallbackListener(StatusCallbackListener aListener)
      Unregisters a status callback listener.
      Parameters:
      aListener - the listener to remove
    • process

      void process() throws ResourceInitializationException
      Initiates processing of a collection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with the addStatusCallbackListener(StatusCallbackListener) method.

      A CPE can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a UIMA_IllegalStateException will result. To find out whether a CPE is free to begin another processing request, call the isProcessing() method.

      Throws:
      ResourceInitializationException - if an error occurs during initialization
      UIMA_IllegalStateException - if this CPE is currently processing
    • isProcessing

      boolean isProcessing()
      Determines whether this CPE is currently processing. This means that a processing request has been submitted and has not yet completed or been stop()ped. If processing is paused, this method will still return true.
      Returns:
      true if and only if this CPE is currently processing.
    • pause

      void pause()
      Pauses processing. Processing can later be resumed by calling the resume() method.
      Throws:
      UIMA_IllegalStateException - if no processing is currently occuring
    • isPaused

      boolean isPaused()
      Determines whether this CPE's processing is currently paused.
      Returns:
      true if and only if this CPE's processing is currently paused.
    • resume

      void resume()
      Resumes processing that has been paused.
      Throws:
      UIMA_IllegalStateException - if processing is not currently paused
    • stop

      void stop()
      Stops processing.
      Throws:
      UIMA_IllegalStateException - if no processing is currently occuring
    • getPerformanceReport

      ProcessTrace getPerformanceReport()
      Gets a performance report for the processing that is currently occurring or has just completed.
      Returns:
      an object containing performance statistics
    • getProgress

      Progress[] getProgress()
      Gets a progress report for the processing that is currently occurring or has just completed.
      Returns:
      an array of Progress objects, each of which represents the progress in a different set of units (for example number of entities or bytes)
    • getCollectionReader

      BaseCollectionReader getCollectionReader()
      Gets the Collection Reader for this CPE.
      Returns:
      the collection reader
    • getCasProcessors

      CasProcessor[] getCasProcessors()
      Gets the CasProcessorss in this CPE, in the order in which they will be executed.
      Returns:
      an array of CasProcessors
    • kill

      void kill()
      Kill CPM hard.