org.apache.uima.collection
Interface CollectionProcessingManager


public interface CollectionProcessingManager

A CollectionProcessingManager (CPM) manages the application of an AnalysisEngine to a collection of artifacts. For text analysis applications, this will be a collection of documents. The analysis results will then be delivered to one ore more CasConsumers.

The CPM is configured with an Analysis Engine and CAS Consumers by calling its setAnalysisEngine(AnalysisEngine) and addCasConsumer(CasConsumer) methods. Collection processing is then initiated by calling the process(CollectionReader) or process(CollectionReader,int) methods.

The process methods take a CollectionReader object as an argument. The Collection Reader retreivies each artifact from the collection as a CAS object.

Listeners can register with the CPM by calling the addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status callbacks during the processing. At any time, performance and progress reports are available from the getPerformanceReport() and getProgress() methods.

A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.

Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a CPM or start a new processing job while a previous processing job is occurring will result in a UIMA_IllegalStateException. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPM.

A CollectionProcessingManager instance can be obtained by calling UIMAFramework.newCollectionProcessingManager().


Method Summary
 void addCasConsumer(CasConsumer aCasConsumer)
          Adds a CasConsumer to this CPM.
 void addStatusCallbackListener(StatusCallbackListener aListener)
          Registers a listsner to receive status callbacks.
 AnalysisEngine getAnalysisEngine()
          Gets the AnalysisEngine that is assigned to this CPM.
 CasConsumer[] getCasConsumers()
          Gets the CasConsumerss assigned to this CPM.
 ProcessTrace getPerformanceReport()
          Gets a performance report for the processing that is currently occurring or has just completed.
 Progress[] getProgress()
          Gets a progress report for the processing that is currently occurring or has just completed.
 boolean isPaused()
          Determines whether this CPM's processing is currently paused.
 boolean isPauseOnException()
          Gets whether this CPM will automatically pause processing if an exception occurs.
 boolean isProcessing()
          Determines whether this CPM is currently processing.
 boolean isSerialProcessingRequired()
          Gets whether this CPM is required to process the collection's elements serially (as opposed to perfoming parallelization).
 void pause()
          Pauses processing.
 void process(CollectionReader aCollectionReader)
          Initiates processing of a collection.
 void process(CollectionReader aCollectionReader, int aBatchSize)
          Initiates processing of a collection.
 void removeCasConsumer(CasConsumer aCasConsumer)
          Removes a CasConsumer from this CPM.
 void removeStatusCallbackListener(StatusCallbackListener aListener)
          Unregisters a status callback listener.
 void resume()
          Resumes processing that has been paused.
 void resume(boolean aRetryFailed)
          Resumes processing that has been paused.
 void setAnalysisEngine(AnalysisEngine aAnalysisEngine)
          Sets the AnalysisEngine that is assigned to this CPM.
 void setPauseOnException(boolean aPause)
          Sets whether this CPM will automatically pause processing if an exception occurs.
 void setSerialProcessingRequired(boolean aRequired)
          Sets whether this CPM is required to process the collection's elements serially* (as opposed to perfoming parallelization).
 void stop()
          Stops processing.
 

Method Detail

getAnalysisEngine

AnalysisEngine getAnalysisEngine()
Gets the AnalysisEngine that is assigned to this CPM.

Returns:
the AnalysisEngine that this CPM will use to analyze each CAS in the collection.

setAnalysisEngine

void setAnalysisEngine(AnalysisEngine aAnalysisEngine)
                       throws ResourceConfigurationException
Sets the AnalysisEngine that is assigned to this CPM.

Parameters:
aAnalysisEngine - the AnalysisEngine that this CPM will use to analyze each CAS in the collection.
Throws:
UIMA_IllegalStateException - if this CPM is currently processing
ResourceConfigurationException

getCasConsumers

CasConsumer[] getCasConsumers()
Gets the CasConsumerss assigned to this CPM.

Returns:
an array of CasConsumers

addCasConsumer

void addCasConsumer(CasConsumer aCasConsumer)
                    throws ResourceConfigurationException
Adds a CasConsumer to this CPM.

Parameters:
aCasConsumer - a CasConsumer to add
Throws:
UIMA_IllegalStateException - if this CPM is currently processing
ResourceConfigurationException

removeCasConsumer

void removeCasConsumer(CasConsumer aCasConsumer)
Removes a CasConsumer from this CPM.

Parameters:
aCasConsumer - the CasConsumer to remove
Throws:
UIMA_IllegalStateException - if this CPM is currently processing

isSerialProcessingRequired

boolean isSerialProcessingRequired()
Gets whether this CPM is required to process the collection's elements serially (as opposed to perfoming parallelization). Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.

Returns:
true if and only if serial processing is required

setSerialProcessingRequired

void setSerialProcessingRequired(boolean aRequired)
Sets whether this CPM is required to process the collection's elements serially* (as opposed to perfoming parallelization). If this method is not called,* the default is false. Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.

Parameters:
aRequired - true if and only if serial processing is required
Throws:
UIMA_IllegalStateException - if this CPM is currently processing

isPauseOnException

boolean isPauseOnException()
Gets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.

Returns:
true if and only if this CPM will pause on exception

setPauseOnException

void setPauseOnException(boolean aPause)
Sets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.

Parameters:
aPause - true if and only if this CPM should pause on exception
Throws:
UIMA_IllegalStateException - if this CPM is currently processing

addStatusCallbackListener

void addStatusCallbackListener(StatusCallbackListener aListener)
Registers a listsner to receive status callbacks.

Parameters:
aListener - the listener to add

removeStatusCallbackListener

void removeStatusCallbackListener(StatusCallbackListener aListener)
Unregisters a status callback listener.

Parameters:
aListener - the listener to remove

process

void process(CollectionReader aCollectionReader)
             throws ResourceInitializationException
Initiates processing of a collection. CollectionReader initializes the CAS with Documents from the Colection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with the addStatusCallbackListener(StatusCallbackListener) method.

A CPM can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a UIMA_IllegalStateException will result. To find out whether a CPM is free to begin another processing request, call the isProcessing() method.

Parameters:
aCollectionReader - the CollectionReader from which to obtain the Entities to be processed
Throws:
ResourceInitializationException - if an error occurs during initialization
UIMA_IllegalStateException - if this CPM is currently processing

process

void process(CollectionReader aCollectionReader,
             int aBatchSize)
             throws ResourceInitializationException
Initiates processing of a collection. This method works in the same way as process(CollectionReader), but it breaks the processing up into batches of a size determined by the aBatchSize parameter. Each CasConsumer will be notified at the end of each batch.

Parameters:
aCollectionReader - the CollectionReader from which to obtain the Entities to be processed
aBatchSize - the size of the batch.
Throws:
ResourceInitializationException - if an error occurs during initialization
UIMA_IllegalStateException - if this CPM is currently processing

isProcessing

boolean isProcessing()
Determines whether this CPM is currently processing. This means that a processing request has been submitted and has not yet completed or been stop()ped. If processing is paused, this method will still return true.

Returns:
true if and only if this CPM is currently processing.

pause

void pause()
Pauses processing. Processing can later be resumed by calling the resume(boolean) method.

Throws:
UIMA_IllegalStateException - if no processing is currently occuring

isPaused

boolean isPaused()
Determines whether this CPM's processing is currently paused.

Returns:
true if and only if this CPM's processing is currently paused.

resume

void resume(boolean aRetryFailed)
Resumes processing that has been paused.

Parameters:
aRetryFailed - if processing was paused because an exception occurred (see setPauseOnException(boolean)), setting a value of true for this parameter will cause the failed entity to be retried. A value of false (the default) will cause processing to continue with the next entity after the failure.
Throws:
UIMA_IllegalStateException - if processing is not currently paused

resume

void resume()
Resumes processing that has been paused.

Throws:
UIMA_IllegalStateException - if processing is not currently paused

stop

void stop()
Stops processing.

Throws:
UIMA_IllegalStateException - if no processing is currently occuring

getPerformanceReport

ProcessTrace getPerformanceReport()
Gets a performance report for the processing that is currently occurring or has just completed.

Returns:
an object containing performance statistics

getProgress

Progress[] getProgress()
Gets a progress report for the processing that is currently occurring or has just completed.

Returns:
an array of Progress objects, each of which represents the progress in a different set of units (for example number of entities or bytes)


Copyright © 2010 The Apache Software Foundation. All Rights Reserved.