Apache UIMA-DUCC (Unstructured Information Management Architecture - Distributed UIMA Cluster Computing ) v1.1.0. Release Notes


1. What is UIMA-DUCC?
2. Major Changes in this Release
3. Limitations in this Release

1. What is UIMA-DUCC?

DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling, management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.

2. Major Changes in this Release

UIMA DUCC 1.1.0 Apache is a maintenance release containing bug fixes and a few new features. What's new:

2.1 Service Manager Changes

Advanced ping support - pinger is able to microschedule service instances The Ping API is enhanced to allow the following actions: A sample microscheduling pinger is supplied to illustrate the new API. Multiple pingers may be registered by the admin to run internally as threads in the SM instead of as external processes. CLI support to enable / disable instance startup. CLI support to seamlessly transition service startup mode among autostarted, reference-started, and manually-started Enhance query to provide more information to CLI and web server: Service last use Registration date Explicit denotation of start mode: autostart, reference start, and manual start CLI now does user authentication and administrator authorization on all actions. Most registration parameters can be dynamically modified without re-registering the service. Dynamic modification of pinger properties automatically restarts the pinger. Debug support - a service may be registered to connect back to a debug port when it is started. Multiple per-service admins - A service may register a list of ids which are allowed to perform administrative functions for that service.

2.2 ducc_ling Changes

All registered groups are set for processes. User may set DUCC_UMASK to establish the umask for a processes.

2.3 Resource Manager Changes

Administrative CLI interface Vary-off a node to temporarily exclude it from scheduling Vary-on a node to return it to the scheduling pool Query occupancy - for each node, shows what is scheduled there Query load - summary of scheduling tables to allow external entities such as LSF to collaborate with DUCC scheduler Misc enhancements Better handling of failed nodes, purges all work other than reservations Improved de-fragmentation logic Improved handling of small clusters Improved eviction, takes into account the amount of work that would be lost before scheduling a process for eviction.

2.4 Web Server Changes

Added Node visualization

For a complete list of issues fixed and up-to-date information on UIMA-DUCC issues, see our issue tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%221.1.0-Ducc%22%20ORDER%20BY%20key%20ASC

3. Limitations in this Release

3.1 FireFox memory bloat

DUCC's Web Server comprises a javascript that provides the ability to monitor various aspects of the DUCC system via a browser. It has been occasionally observed for a browser that if several tabs are simultaneously activated, each containing an "Automatic" monitor of one aspect of the DUCC system, then over a relatively long period of time (on the order of days) the browser process may consume a large amount of memory (on the order of several GB). At the time of this writing, this problem is not reliably reproduced. This limitation has not been observed when in "Manual" monitoring mode. The memory bloat has only been observed on Firefox browser.