UIMA project logo
How we use Maven for building
Apache UIMA

Search the site

 Maven use for Apache UIMA builds

This is developer information, mostly. It documents how we are using Maven for building the various parts of UIMA.

We follow Apache conventions for releasing Maven based projects (all of ours except the UIMA C++ enablement layer), using Nexus, as documented here: https://www.apache.org/dev/publishing-maven-artifacts.html.

This document covers how we use Maven for building Version: 2.3.1 and onwards.

Nexus

Apache runs a version of Nexus, a repository "manager". Nexus has been set up to support releasing Maven artifacts. It allows Maven's release plugin to think it is actually "releasing" and deploying to a Maven repository, when in actuality, the release configuration specifies deploying to the "staging" part of the Apache Nexus instance. Once the release is verified and voted on, it can then be transferred up to Maven central repository with a few clicks on the website.

We are currently using the Apache common parent POM which supports deployment to this Apache Nexus repository, for staging and release. It also supplies many of the standard items needed for all Apache projects.

POM Conventions

We follow the conventions regarding the layout of the POM as specified here.

Our POM hierarchy

POMs have two kinds of hierarchy:

  1. The main one is used for factoring out common things and arranges POMs in a parent-child hierarchy, by having the child identify the parent POM.
  2. The other kind is aggregation - POMs which specify sub-modules for purposes of building multi-module things, and releasing them all at once.
For clarity, we mostly keep these two uses in separate POMs: parent POMs do not do aggregation, and aggregation POMs do not do common factoring.

The UIMA Project has one common parent pom, called "parent-pom", that contains common things for all the sub-projects. Each multi-module sub-project, for instance, the UIMA Java SDK (uimaj), contains, in turn, a sub-project common parent pom for all the modules in that project.

The Project-wide parent pom and related build artifacts are kept in a separate part of our svn tree, in a top level directory named "build".

Selecting build alternatives based on the project

The shared UIMA-wide common parent POM supports many different kinds of builds. For instance, some projects use Docbook, some projects are "distribution" projects that serve to build our binary distribution, some projects build PEAR artifacts (e.g., many of our Addons projects), etc.

Common support for each of these kinds of project builds is kept in the common parent POM, but in a profile, which is only activated for that kind of project. The activation is actually done by detecting the presence of a file, named by convention marker-file-identifying-xxxxxxxx, where the xxxxxxxx is a particular name. These are 0-length files put at the top level of projects that need a particular profile activation in the common Parent pom.

An exception to this is the profile for processing Docbooks, which instead is activated by the existence of the directory src/docbook in the project.

These are the current marker-file kinds:

  • marker-file-identifying-parent-pom
  • marker-file-identifying-eclipse-plugin
  • marker-file-identifying-eclipse-feature
  • marker-file-identifying-single-project
  • marker-file-identifying-standard-pear
  • marker-file-identifying-osgi-project

POM style

When writing a new POM, it is best to start with an existing POM for a similar kind of project, and derive the new POM from that. Some points:

  • POMs contain a <url> element, which is supposed to point to the UIMA website page for this artifact (or the main UIMA page). If a POM doesn't have this element, it will inherit one from its parent but Maven will assume (usually incorrectly in our case) that the url is the parent url value followed by "/" and this POMs artifactId.

    So, If this does not correspond to how the website is set up, please specify the url that is correct for the new project.

  • The SCM connection is needed for doing releases - and needs to be accurate for this component. The same defaulting is applied as above, which is incorrect for our flat layout, so each POM needs to have an explicit SCM element.
  • The POM's "version" is stated literally, not via a "property", etc. Maven 3 will complain if you use properties here, and the release mechanism for maven manipulates these values (removing SNAPSHOT, incrementing things, etc.)
  • The POM's groupId is omitted if it's the same as the parent pom's (which is true for all of our projects except our top-most parent pom); if omitted, it inherits from the parent.
  • The packaging is omitted if it is "jar".
  • In the build section, the <groupId> for standard maven plugins is omitted.

Release artifacts, signing, checksums

We follow the standard release process for Maven-based artifacts at Apache, documented here.

There are two places artifacts go to:

  1. Apache's mirror distribution system
  2. Maven "central" for maven-findable artifacts (typically JARs)

Apache's mirror distribution system is used to distribute the source-release.zip and any binary convenience packages.

For each artifact, the release process may build additional artifacts, and attach them to the main one, so they will "go along with" the main artifact during Maven deployment to repositories. Some of these are:

  • sources.jar - holds the source files - this is for IDEs that want to refer to the source
  • javadocs.jar

The source-release.zip file is a special artifact, like the sources.jar, but includes all other files (such as the pom.xml) not under the /src directory, needed for building. The intent is that this is the same as the SVN checked-in files, and once unzipped, this should be "buildable" by doing "mvn install", etc. in the unzipped directory.

source-release.zip files for multi-project aggregates (such as the main UIMA SDK) are built only at the top (root) level and are not "attached" so it is not uploaded to maven for distribution, but instead is distributed via Apache's mirror system.

The release process happens when the commands mvn release:prepare followed by mvn release:perform are executed. The release plugin is set up by the common Apache parent Pom to specify the apache-release profile. It is this profile being selected that causes (among many other things) the sources.jar and source-release.zip artifacts to be built. You can debug this process without doing a release, by adding the parameter -Papache-release to the non-release Maven build commands.

For apache-releases (triggered with the apache-release profile), all artifacts get signed with gpg signatures, as well as with .sha512 checksums. These are created during the build for the top level project, when the apache-release profile is specified, or when the mvn release plugin is run. Most of the signatures happen because of the gpg plugin and the checsum-maven-plugin, but the source-release.zip artifact has its own special antrun task since it's not attached.

LICENSE and NOTICE files

Things that are distributed from Apache need LICENSE and NOTICE files. We have several kinds of distributions:

  • Releases
    • source-release.zip files
    • binary (built) packagings
      • Binary assemblies
        • Multi-project assemblies, like our main UIMA SDK
        • Single-project assemblies, like our individual add-on projects
      • uploaded to main Maven Repository as part of releasing:
        • Individual Jars
        • PEAR packages of some individual projects
        • OSGi packaging of some individual projects
  • SVN
  • Eclipse Features
    • Features have their own special Eclipse-style form for licenses
    • Boilerplate for these are kept in a common spot in the uima-build-resources project, in the subdirectory: licenses-eclipse-plugs-features. There are two files here. One is the boilerplate license, in uima-eclipse-user-agreement.html (a text copy is also embedded in the features.properties). The second file is a boilerplate features.properties.
    • Both of these files should be copied to the top level of feature project(s), and the feature.properties should be edited to have the right values for the particular feature.

Three kinds of LICENSE/NOTICE files

There are typically three versions of these files, corresponding to source (only) distributions, other distributions which include "dependent" artifacts, which may have their own separate license and notice information, and a special one for Eclipse Features. The source (only) versions of these files are in SVN at the top level of various project hierarchies.

For some packagings of source artifacts, such as JARs for projects that are part of bigger assemblies, the projects do not contain individual license and notice files (see for instance, .../uimaj/trunk/uimaj-core. For these, a standard LICENSE and NOTICE is computed using a template, augmented if needed by additions from the project's pom.

All major releasable things (other than projects for Jars which are always released as part of an assembly - such as the Jars which make up the UIMA SDK), have top level license and notice files in their top most project; these are for the source, only, and do not cover the dependencies (if any) that might be included with a binary-style release.

Many of our packagings include dependencies, including OSGi and PEAR packagings, as well as normal binary assemblies. The license and notice for these packagings is made by merging the license and notice files from the source artifact plus those from all the dependent artifacts (removing duplications).

Standard Artifacts

The release process includes standard boiler-plate things in standard places. The Maven remote-resources-plugin is used to get these resources from a special UIMA build artifact (uima-build-resources), and customizes them for the particular project:

  • The DEPENDENCIES file is generated from the transitive closure of the dependencies in the POM.
  • The NOTICES file is sometimes augmented with additional text (we use this to add the IBM Copyright formerly in the files, per the Apache practice for moving these to the NOTICE file).

These resources, after customization, are placed in the project's target/maven-shared-archive-resources/META-INF/ directory. Later steps in the build use this directory for two purposes:

  • adding this information to any JAR that might be built
  • adding this information as part of generated assemblies - these files are copied to the top level (above any project).

The remote-resources-plugin adds a <resource> entry to the maven in-memory model <resources> element that specifies that the files in target/maven-shared-archive-resources/ be copied to target/classes. (You can see this by running a mvn package step with the -X parameter.) This is what the remote-resources documentation means when it says: "... the resources are injected into the current (in-memory) Maven project, making them available to the process-resources phase."

Overriding on a per-project basis: Files that the remote-resources-plugin obtains and places in the target/maven-shared-archive-resources/META-INF/ directory can be overridden by identically named files at the top level of the project.

Note that there are two sets of LICENSE / NOTICE files for distributable entities - one for the source distribution, and one for the binary distribution. This is because the source distribution rarely needs other than the standard LICENSE/NOTICE files because it is (usually) only distributing Apache-authored source; while the binary distribution often distributes additional components that are licensed under other licenses, with perhaps additional NOTICE requirements - in which case, the LICENSE and NOTICE files contains all of the required licenses and notices for everything being distributed.

For binary distributions, the LICENSE and NOTICE files are taken from src/main/readme/ directory.

Some addon projects have just a single instance of LICENSE and NOTICE at the top level. In this case, these are used for both the source and binary distribution, and they therefore need to cover everything distributed with the binary distribution (even if these are not delivered with the source distribution).

Eclipse Features and Plugins

The license and notice section for Eclipse plugins follows the conventions used for other Jars. For Eclipse Features, there is a boilerplate license used for all Features, which, in turn, refers to specific other embedded Licenses and Notices. The boilerplate becomes part of the Feature jar.

When setting up a new Eclipse Feature, developers need to manually copy the latest boilerplate features.properties and uima-eclipse-user-agreement.html files from the uima-build-resources project into their feature project top-level directory. They need to then modify the values of the following properties for their feature:

  • featureName
  • description

The build adds these files to the resource set when building the jar.

Summary: License and Notices

This next table summarizes the packaging artifacts and how and where they are located and added during the build process.

For Eclipse features, a build <resource> element addes the 2 files.

Artifacts Variants Sources Targets Methods

LICENSE, NOTICE, DEPENDENCIES.

Standard, for source distribution



Alternate: has extra Notice element used for copyrights moved to Notice file.



Alternate2: for binary assemblies, the LICENSE and NOTICE are customized for each binary assembly.



Eclipse Features

For source distributions: uima-build-resources (in the build tooling).

Additional text for NOTICE (if needed) comes from the property <postNoticeText>in the build POM. For many cases, the additional copyright notice is for IBM contributed code, which can be included using <postNoticeText>${ibmNoticeText}</postNoticeText>.

For binary distributions, comes from src/main/readme/

For Eclipse Features: developer manually copies features.properties and uima-eclipse-user-agreement.html from uima-build-resources project's folder licenses-eclipse-plugs-features, to the Eclipse feature top level. Developer manually edits two properties in features.properties - the feature name and description.

Jars: goes into META-INF.

Source-Release zips, source assemblies, binary assemblies, PEAR files, OSGi artifacts:

goes into the zip/tar as top level files.

Eclipse Features: filter.properties and uima-eclipse-user-agreement.html get included in Jar at top level

org.apache.uima:parent-pom configures the remote-resources plugin to copy the standard LICENSE/NOTICE/DEPENDENCIES into target/maven-shared-archive-resources/META-INF/ directory. The remote-resources plugin addes this dir to the the list of standard resources the resources:resources goal copies into target/classes. This info is then included in any Jars that are built, in META-INF.

During release (only) (apache-release profile activated) the information in target/maven-shared-archive-resources/META-INF/ is copied to the top level of the source-release archive. Any versions of these files at the project's top level will subsequently override these, at the top level of the archive.

For PEAR, OSGi, and binary assemblies, these files come from src/main/readme/, the maven-resources-plugin uses the copy-resources goal to copy these.

README.txt

For PEAR, OSGi, binary assemblies, and source zips distributions, these come from same-named files at the top level of the project doing the creating of the release artifact.

Included in all packagings, except project Jars at the top level of package.

For source release building, during release (only) (apache-release profile activated) parent-pom-top configures the assembly plugin for source assemblies to use the multimodule-source-release in uima-build-resources. This copies the README file from the top level into the top level of the archive.

RELEASE_NOTES.html

Each release can include release notes describing the main changes for the release.

At the top level

Included in all packagings, except project Jars at the top level of package.

During release builds the assembly or maven-resources-plugin is configured to copy these from the top level of the project to the top level of the archive.

issuesFixed

Each release can include a top-level directory called issuesFixed, which has a file jira-report.html

This is generated automatically when a release is built.

It contains the set of fixed/resolved Jiras that correspond to this release.

none This is a computed resource. This is generated into the top level of the project, and then goes into all packagings except OSGi, at the top level.

The build process runs the maven changes:jira-report plugin to generate this. The release manager must insure the property specifying the Jira(s) versions included in this release is properly specified, to enable the right issues to be included in the report.

Handling Documentation

We have several kinds of documentation:

  • Javadocs
  • Docbook
  • UIMA Website (Anakia)

We use Docbook style for much of our documentation, the main exception being our website, which uses Anakia. See the how to use docbook page for information about how to write Docbook style documentation, and how it is processed during building.

Docbooks are built, if present, during the normal lifecycle.

Javadocs are built during the release process for Java sources packaged as a Jar. In addition, some distribution projects collect things from multiple projects and build explicitly, larger Javadoc sets. (e.g., uimaj-distr).

For the web site, documentation is kept in the SVN "site" top level directory. Within that project, the sources are in the xdocs directory; anything there is expected to be written using Anakia markup, and the build.xml ant script is used to transform these into corresponding html files in the docs directory.

Normally, all website-ready content (not needing Anakia processing) is kept in SVN in the uima/site/trunk/uima-website/docs directory; this directory is manually checked out into the right spot on people.apache.org, the sources are updated. Large generated documents are put directly onto people.apache.org in the directory /www/uima.apache.org/d/, and not checked into SVN. The files kept here include the javadocs and the generated docbook html and pdf files. This avoids using SVN for large generated files. This satisfies the requirements for Apache websites.

Packaging Individual Projects

The UIMA Project, in addition to the main frameworks (the UIMA SDK), has Annotators and other components and tools (such as the SimpleServer) that it releases. These in the past have been released as a big assembly we called the Addons or Sandbox, but now are are being supported also as individual items.

Two kinds of packaging for these are available, chosen by specifying one of the following parent-poms:

  • parent-pom-annotator - this packages things as a PEAR
  • parent-pom-single-project - this packages things as a tar / zip file having a lib/ directory with the generated Jar and dependencies (see below).

The individual Annotators are mostly packaged as PEAR files, which are the UIMA standard for annotator component packaging and distribution. The Pear is generated using the PearPackagingMavenPlugin, which generates automatically a conventional Pear Installation Descriptor, from the information in the project. The PEAR artifact is generated in addition to Jar of the code, and includes other items such as the generated documentation, data, and a lib/ directory with other Jars needed for the PEAR to operate. This PEAR is the packaged equivalent of a binary distribution artifact produced for the main framework, and comes with a license and notice that covers any included libraries. Generated PEARs are included in the artifacts that are managed by Maven, and are available in the Maven repository system.

The PEAR internal folder "src" is not used by components here; the source is instead available via the standard Maven source directory conventions.

PEARs must have a "main" descriptor, which is the one normally used to configure and run the annotator. When using the parent-pom-annotator to indicate PEAR packaging, you must include a property called <pearMainDescriptor> in the Annotator's POM, whose value is the path from the project base directory identifying the main descriptor. For example, if the folder "desc" was at the root of your project, the value might be something like "desc/my-main-descriptor.xml".

If an annotator isn't suitable for PEAR packaging, perhaps because it is inappropriate to have one pre-done main descriptor, then the annotator can be packaged as a single-project, instead, by marking the POM's parent parent-pom-single-project.

Common conventions for structuring individually releaseable projects

Use these conventions to get your annotator properly packaged when you use the parent-pom-annotator or the parent-pom-single-project as your parent pom:

  • Use <package>jar</package> (which is the default, so it should be omitted), if there is a Jar that should be built (as is usually the case). This will cause the main annotator Jar to be built, including any resources (plus the top-level desc folder, if it exists), as normal.
  • Create the required License, Notice, and optional Readme, and Release Notes files in the project directory at the top level; these will be included in the PEAR or simple project binary assembly at the top level. The License and Notice files should be for the entire project, including all third-party Jars etc. being distributed.

    The generated Jar, which only has Apache-developed code, will have the normal Apache License and Notice files.

  • Dependencies: those marked with scope "compile" (the default) or "runtime" are copied to the PEAR or single-project's lib/ directory. To keep a dependency needed for compiling from being included in the lib/, give it a scope of "provided".

    You may use the dependencies <exclusions> element to control transitive dependencies.

Conventions for structuring individual projects released as PEARs

  • Using the following folders at the top level of your project (which are the folders used in the "conventional" PEAR layout, causes the contents of these folders to be added to the corresponding files in the PEAR or binary zip/tar:
    • desc - for descriptiors, including the main descriptor
    • bin - (optional) for extra executables
    • lib - (optional) for libraries (not being included via the Maven dependencies mechanism)
    • doc - (optional) not normally used, but can have additional user-ready documentation
    • data - (optional) for arbitrary data
    • resources - (optional) not normally used, but can have additional data which should be on the classpath
    • conf - (optional) for extra configuration files
    • src - not-used - Maven conventions for src directories are used instead

    The following Maven conventional information will be added in the generated PEAR:

    • lib - dependencies with scope "compile" or "runtime" are resolved from maven repositories and added to the lib/
    • doc - docbook sources under src/docbook are processed and the html and pdf forms are added
    • Maven resources - (not the top level resources directory described above) are by convention included in the Jar, and are not included in this folder

Building Assemblies for Distribution

The normal operation of Maven is concerned with the building of individual modules. Each module when built produces maven artifacts in repositories (your local repository, or perhaps uploaded to a snapshot or staging repository).

When releases are done, additional artifacts are constructed using the distribution assemblies. We have these for

  • UIMA - the base Java framework
  • UIMA Add-ons - These are released as a collection. Future updates will likely release these individually.
  • UIMACPP - the C++ framework

Special resources for build

During the build process, several resources are used, and are built into Maven Artifact Jars with Maven coordinates. These projects are in the build section of SVN.

  • uima-build-resources - this contains various resources used in the build process
    • the standard License Notice and Dependencies from the Apache common parent pom, except it allows adding post-Notice text. We use this when we need to include the IBM Copyright notice, moved here from donated code from IBM.
    • an override for the multi-module source release assembly
    • the binary assembly descriptor for single-projects.
    • common specification for the titlepage of docbooks
  • uima-docbook-olink - this is the shared olink data. This project's source contains the information about the UIMA Bookshelf - the set of books that can cross-reference each other, and how they're laid out (current layout - they all are subitems of one common directory). This artifact is not released, it always stays at 1-SNAPSHOT level, but can be deployed to the snapshot repository for sharing with others.

There is also one custom Maven plugin (uima-build-helper-maven-plugin) which is used to get the build month and build year into properties. This is used, for instance, in the common docbook frontmatter to indicate when the book was built.

Using the Release Audit Tool (RAT)

From parent-pom version 2 and onwards, RAT is run automatically, but only when the Maven profile "apache-release" is activated. Projects that override the default RAT exclusions must include all things that need to be excluded, and put this configuration into their POM in the "pluginManagement" section; see the project uimaj-core for an example.

Lifecycle for building addons

Addon projects have multiple binary build artifacts:

  • binary assembly for the individual addon
  • binary assembly for aggregate of all the addons
  • PEAR packaging
  • OSGi packaging

Since these have many common parts, we use the maven lifecycle to order building of shared common things ahead of their use:

  • package phase: build JARs, docbooks, etc.
  • pre-integration-test phase: this is the first phase after package, and is used to copy the Jar and docbook processing results to a common directory structure (base-bin).
  • integration-test phase: this is the 2nd phas after package, and is used to copy the common things from base-bin to places for building the PEAR and OSGi artifacts. It is also used for running the packaging steps sequentially, after the copy, for the PEAR, OSGi, and single project assemblies.
  • post-integration-test phase: this is used to mark the generated PEAR file for "attachment" using the maven-build-helper, so it gets deployed to the maven repositories.