Interface CollectionProcessingManager
CollectionProcessingManager
(CPM) manages the application of an
AnalysisEngine
to a collection of artifacts. For text analysis applications, this will be
a collection of documents. The analysis results will then be delivered to one ore more
CasConsumer
s.
The CPM is configured with an Analysis Engine and CAS Consumers by calling its
setAnalysisEngine(AnalysisEngine)
and addCasConsumer(CasConsumer)
methods.
Collection processing is then initiated by calling the process(CollectionReader)
or
process(CollectionReader,int)
methods.
The process
methods take a CollectionReader
object as an argument. The
Collection Reader retrieves each artifact from the collection as a
CAS
object.
Listeners can register with the CPM by calling the
addStatusCallbackListener(StatusCallbackListener)
method. These listeners receive status
callbacks during the processing. At any time, performance and progress reports are available from
the getPerformanceReport()
and getProgress()
methods.
A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a
CPM or start a new processing job while a previous processing job is occurring will result in a
UIMA_IllegalStateException
. Processing multiple collections
simultaneously is done by instantiating and configuring multiple instances of the CPM.
A CollectionProcessingManager
instance can be obtained by calling
UIMAFramework.newCollectionProcessingManager()
.
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addCasConsumer
(CasConsumer aCasConsumer) Adds aCasConsumer
to this CPM.void
addStatusCallbackListener
(StatusCallbackListener aListener) Registers a listsner to receive status callbacks.Gets theAnalysisEngine
that is assigned to this CPM.Gets theCasConsumers
s assigned to this CPM.Gets a performance report for the processing that is currently occurring or has just completed.Progress[]
Gets a progress report for the processing that is currently occurring or has just completed.boolean
isPaused()
Determines whether this CPM's processing is currently paused.boolean
Gets whether this CPM will automatically pause processing if an exception occurs.boolean
Determines whether this CPM is currently processing.boolean
Gets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization).void
pause()
Pauses processing.void
process
(CollectionReader aCollectionReader) Initiates processing of a collection.void
process
(CollectionReader aCollectionReader, int aBatchSize) Initiates processing of a collection.void
removeCasConsumer
(CasConsumer aCasConsumer) Removes aCasConsumer
from this CPM.void
Unregisters a status callback listener.void
resume()
Resumes processing that has been paused.void
resume
(boolean aRetryFailed) Resumes processing that has been paused.void
setAnalysisEngine
(AnalysisEngine aAnalysisEngine) Sets theAnalysisEngine
that is assigned to this CPM.void
setPauseOnException
(boolean aPause) Sets whether this CPM will automatically pause processing if an exception occurs.void
setSerialProcessingRequired
(boolean aRequired) Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization).void
stop()
Stops processing.
-
Method Details
-
getAnalysisEngine
AnalysisEngine getAnalysisEngine()Gets theAnalysisEngine
that is assigned to this CPM.- Returns:
- the
AnalysisEngine
that this CPM will use to analyze each CAS in the collection.
-
setAnalysisEngine
Sets theAnalysisEngine
that is assigned to this CPM.- Parameters:
aAnalysisEngine
- theAnalysisEngine
that this CPM will use to analyze each CAS in the collection.- Throws:
ResourceConfigurationException
- if this CPM is currently processing
-
getCasConsumers
CasConsumer[] getCasConsumers()Gets theCasConsumers
s assigned to this CPM.- Returns:
- an array of
CasConsumer
s
-
addCasConsumer
Adds aCasConsumer
to this CPM.- Parameters:
aCasConsumer
- aCasConsumer
to add- Throws:
ResourceConfigurationException
- if this CPM is currently processing
-
removeCasConsumer
Removes aCasConsumer
from this CPM.- Parameters:
aCasConsumer
- theCasConsumer
to remove- Throws:
UIMA_IllegalStateException
- if this CPM is currently processing
-
isSerialProcessingRequired
boolean isSerialProcessingRequired()Gets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization). Note that a value offalse
does not guarantee that parallelization is performed; this is left up to the CPM implementation.- Returns:
- true if and only if serial processing is required
-
setSerialProcessingRequired
void setSerialProcessingRequired(boolean aRequired) Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization). If this method is not called,* the default isfalse
. Note that a value offalse
does not guarantee that parallelization is performed; this is left up to the CPM implementation.- Parameters:
aRequired
- true if and only if serial processing is required- Throws:
UIMA_IllegalStateException
- if this CPM is currently processing
-
isPauseOnException
boolean isPauseOnException()Gets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling theresume(boolean)
method.- Returns:
- true if and only if this CPM will pause on exception
-
setPauseOnException
void setPauseOnException(boolean aPause) Sets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling theresume(boolean)
method.- Parameters:
aPause
- true if and only if this CPM should pause on exception- Throws:
UIMA_IllegalStateException
- if this CPM is currently processing
-
addStatusCallbackListener
Registers a listsner to receive status callbacks.- Parameters:
aListener
- the listener to add
-
removeStatusCallbackListener
Unregisters a status callback listener.- Parameters:
aListener
- the listener to remove
-
process
Initiates processing of a collection. CollectionReader initializes the CAS with Documents from the Colection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with theaddStatusCallbackListener(StatusCallbackListener)
method.A CPM can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a
UIMA_IllegalStateException
will result. To find out whether a CPM is free to begin another processing request, call theisProcessing()
method.- Parameters:
aCollectionReader
- theCollectionReader
from which to obtain the Entities to be processed- Throws:
ResourceInitializationException
- if an error occurs during initializationUIMA_IllegalStateException
- if this CPM is currently processing
-
process
void process(CollectionReader aCollectionReader, int aBatchSize) throws ResourceInitializationException Initiates processing of a collection. This method works in the same way asprocess(CollectionReader)
, but it breaks the processing up into batches of a size determined by theaBatchSize
parameter. EachCasConsumer
will be notified at the end of each batch.- Parameters:
aCollectionReader
- theCollectionReader
from which to obtain the Entities to be processedaBatchSize
- the size of the batch.- Throws:
ResourceInitializationException
- if an error occurs during initializationUIMA_IllegalStateException
- if this CPM is currently processing
-
isProcessing
boolean isProcessing()Determines whether this CPM is currently processing. This means that a processing request has been submitted and has not yet completed or beenstop()
ped. If processing is paused, this method will still returntrue
.- Returns:
- true if and only if this CPM is currently processing.
-
pause
void pause()Pauses processing. Processing can later be resumed by calling theresume(boolean)
method.- Throws:
UIMA_IllegalStateException
- if no processing is currently occurring
-
isPaused
boolean isPaused()Determines whether this CPM's processing is currently paused.- Returns:
- true if and only if this CPM's processing is currently paused.
-
resume
void resume(boolean aRetryFailed) Resumes processing that has been paused.- Parameters:
aRetryFailed
- if processing was paused because an exception occurred (seesetPauseOnException(boolean)
), setting a value oftrue
for this parameter will cause the failed entity to be retried. A value offalse
(the default) will cause processing to continue with the next entity after the failure.- Throws:
UIMA_IllegalStateException
- if processing is not currently paused
-
resume
void resume()Resumes processing that has been paused.- Throws:
UIMA_IllegalStateException
- if processing is not currently paused
-
stop
void stop()Stops processing.- Throws:
UIMA_IllegalStateException
- if no processing is currently occuring
-
getPerformanceReport
ProcessTrace getPerformanceReport()Gets a performance report for the processing that is currently occurring or has just completed.- Returns:
- an object containing performance statistics
-
getProgress
Progress[] getProgress()Gets a progress report for the processing that is currently occurring or has just completed.- Returns:
- an array of
Progress
objects, each of which represents the progress in a different set of units (for example number of entities or bytes)
-