Apache UIMA-DUCC (Unstructured Information Management Architecture - Distributed UIMA Cluster Computing ) v.2.0.0 Release Notes
1. What is UIMA-DUCC?
2. Major Changes in this Release
DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling,
management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process unstructured information such as human
language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA
pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources.
DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC
provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
UIMA DUCC 2.0.0 Apache is a major release containing new features and bug fixes. What's new:
2.1 Non-preemptive (NP) workloads
In order to prevent the cluster from being filled with non-preemptable (NP) allocations it is possible to place
limit on total NP allocations for each user. The limit applies globally and can be overridden on a per-user
basis by the DUCC administrator. Additionally all NP allocations are now limited to a single instance per request.
Please refer to sections "13.4 Allotment", "12.8 Ducc User Definitions", and "12.4.6 Resource Manager Properties" of
DUCC Administrative Guide for more details.
2.2 Classpath isolation
User's code now runs with only the classpath it supplies. The user's classpath specification for jobs must now include
Any jobs calling UIMA-AS services, "DD jobs" and UIMA-AS services themselves will need to include all UIMA jars and
any additional 3rd party jars that are required
2.3 DUCC error handler
The interface to this optional capability has changed.
2.4 Job Processes (JP's) now pull Work Items (WIs) from their Job Driver (JD) via HTTP
JD's no longer uses ActiveMQ to push WI's to JP's for processing. Instead JP's use HTTP to
pull WIs from their associated JD.
2.5 DUCC flow controller typesystem
The original name of the flow controller typesystem file has been deprecated. The old
version will remain available for now.
For the future, please make the following change to CR/CM/CC components using this typesystem:
2.6 CGROUPS to control CPU share as well as memory share.
CPU shares are set proportionally to memory shares when CGROUPS are enabled.
2.7 Queue resource requests that were previously unfulfillable
Requests for resources are held pending if they can't be fulfilled for any reason
other than the scheduling class being missing. Shares may be made available when
other work exits, or if resources are dynamically added to the cluster. The
WebServer shows the reason for work that is enqueued, WaitngForResources.
2.8 Queue service requests that were previously unfulfillable
Work that is dependent on a service is held pending even if the service can't be started successfully.
The work will continue when the service becomes available.
2.9 Service Manager instances
A unique instance ID is assigned for each of the multiple instances of a service.
This ID is made available to the running instances to enable reasoning (such as how to
partition a data set) on the instance. If a service instance terminates unexpectedly,
a new instance will be started with the appropriate ID.
For a complete list of issues fixed and up-to-date information on UIMA-DUCC issues, see our issue tracker: