UIMA project logo
Running UIMA DUCC in a Cloud environment
Apache UIMA

Search the site

 DUCC in Clouds

Preliminary

This page collects information on running DUCC in Cloud environments.

It is preliminary, and is missing details. Nevertheless, it may have some potentially useful references and ideas. More will be added as it becomes available.

External Links

Considerations

NOTE: This information is from an email to the uima-dev list by one of the DUCC committers. In this email, they reference a "glusterfs" which is a shared file system cross mounted on various machines. This "shared file system" approach is the conventional way DUCC is configured, but it can be configured without a shared file system; see the DUCC docs.

We've been running DUCC in a cloud environment for almost a year now. The DUCC master and a glusterfs servers run on bare metal and all of the workstations and worker machines run on VMs. Cluster users add VMs to the cluster as needed. A job can be started on one more workers and then additional VMs dynamically added to which the job will automatically scale out to use. A common system image is maintained on all VM machines via an LDAP server and shared filesystem data. Users belong to groups and share machines allocated by members of the group.

A DUCC VM-image is used to automatically connect new VMs to the DUCC master and glusterfs. The DUCC master configuration may be updated anytime, for example to add new groups or even update master software. VMs automatically sync DUCC software and configuration each time they start their DUCC agent. The VM image supports three different machine types: a graphical workstation, a CPU worker and a GPU worker (used typically for deep learning training). DUCC spawns work on specified worker machine types and even specific machines. Workstations are optional as DUCC requests can be submitted from worker machines. Docker images are supported using Podman. Podman runs rootless and only allows access to all mounted file systems with user credentials.

In order to keep some level of data security, a group directory is only mounted on the VMs created by members of the group. Individual users maintain file permissions as desired, but, as anyone that creates a VM has root access, they could become any other user and access data from other group members.There is a self-service glusterfs webapp that is used to export group data to new VMs and manage quotas.

The VM-image builder and glusterfs webapp are not yet part of Apache DUCC.