|
Maven use for Apache UIMA builds
|
This is developer information, mostly. It documents how we are using Maven for
building the various parts of UIMA.
We follow Apache conventions for releasing Maven based projects
(all of ours except the
UIMA C++ enablement layer), using Nexus, as documented here:
https://www.apache.org/dev/publishing-maven-artifacts.html.
This document covers how we use Maven for building Version: 2.3.1 and onwards.
Nexus
|
Apache runs a version of Nexus, a
repository "manager". Nexus has been set up to support releasing Maven artifacts. It allows
Maven's release plugin to think it is actually "releasing" and deploying to a Maven repository, when
in actuality, the release configuration specifies deploying to the "staging" part of the Apache
Nexus instance. Once the release is verified and voted on, it can then be transferred up to
Maven central repository with a few clicks on the website.
We are currently using the
Apache common parent POM
which supports deployment to
this Apache Nexus repository, for staging and release.
It also supplies many of the standard items needed for all Apache projects.
|
Our POM hierarchy
|
POMs have two kinds of hierarchy:
- The main one is used for factoring out common
things and arranges POMs in a parent-child hierarchy, by having the child
identify the parent POM.
-
The other kind is aggregation - POMs which specify sub-modules
for purposes of building multi-module things, and releasing them all at once.
For clarity,
we mostly keep these two uses in separate POMs: parent POMs do not do aggregation,
and aggregation POMs do not do common factoring.
The UIMA Project has one common parent pom, called "parent-pom", that contains common
things for all the sub-projects. Each multi-module sub-project, for instance, the UIMA Java SDK (uimaj),
contains, in turn, a sub-project common parent pom for all the modules in that project.
The Project-wide parent pom and related build artifacts are kept in a separate part of
our svn tree, in a top level directory named "build".
|
Selecting build alternatives based on the project
|
The shared UIMA-wide common parent POM supports many different
kinds of builds. For instance, some projects use Docbook,
some projects are "distribution" projects that serve to build
our binary distribution, some projects build PEAR artifacts
(e.g., many of our Addons projects), etc.
Common support for each of these kinds of project builds is
kept in the common parent POM, but in a profile, which is
only activated for that kind of project. The activation
is actually done by detecting the presence of a file, named
by convention marker-file-identifying-xxxxxxxx ,
where the xxxxxxxx is a particular name. These are 0-length
files put at the top level of projects that need a particular
profile activation in the common Parent pom.
An exception to this is the profile
for processing Docbooks, which instead is activated by the
existence of the directory src/docbook in the project.
These are the current marker-file kinds:
- marker-file-identifying-parent-pom
- marker-file-identifying-eclipse-plugin
- marker-file-identifying-eclipse-feature
- marker-file-identifying-single-project
- marker-file-identifying-standard-pear
- marker-file-identifying-osgi-project
|
POM style
|
When writing a new POM, it is best to start with an existing POM for a similar
kind of project, and derive the new POM from that. Some points:
POMs contain a <url> element, which is supposed to
point to the UIMA website page for this artifact (or the main UIMA page).
If a POM doesn't have this element, it will inherit one from its parent
but Maven will assume (usually incorrectly in our case)
that the url is the parent url value followed by "/" and
this POMs artifactId.
So, If this does not correspond to how the website is set up,
please specify the url that is correct for the new project.
- The SCM connection is needed for doing releases - and needs to be accurate for this
component. The same defaulting is applied as above, which is incorrect for our
flat layout, so each POM needs to have an explicit SCM element.
- The POM's "version" is stated literally, not via a "property", etc.
Maven 3
will complain if you use properties here, and the release mechanism for maven
manipulates these values (removing SNAPSHOT, incrementing things, etc.)
- The POM's groupId is omitted if it's the same as the parent pom's
(which is true for all of our projects except our top-most parent pom);
if omitted, it inherits from the parent.
- The packaging is omitted if it is "jar".
- In the
build section, the <groupId> for
standard maven plugins is omitted.
|
Release artifacts, signing, checksums
|
We follow the standard release process for Maven-based artifacts at Apache,
documented here.
There are two places artifacts go to:
- Apache's mirror distribution system
- Maven "central" for maven-findable artifacts (typically JARs)
Apache's mirror distribution system is used to distribute the source-release.zip and
any binary convenience packages.
For each artifact, the release process may build additional artifacts, and
attach them to the main one, so they will "go along with" the main artifact during
Maven deployment to repositories. Some of these are:
-
sources.jar - holds the source files - this is for IDEs that want to refer to the source
- javadocs.jar
The source-release.zip file is a special artifact, like the sources.jar, but includes all other files
(such as the pom.xml) not under the /src
directory, needed for building.
The intent is that this is the same as the SVN checked-in files, and once unzipped, this
should be "buildable" by doing "mvn install", etc. in the unzipped directory.
source-release.zip files for multi-project aggregates (such as the main
UIMA SDK) are built only at the top (root) level and are not "attached" so it is
not uploaded to maven for distribution, but instead is distributed via Apache's mirror system.
The release process happens when the commands mvn release:prepare followed by
mvn release:perform are executed. The release plugin is set up by the common
Apache parent Pom to specify the apache-release profile. It is this profile
being selected that causes (among many other things) the sources.jar and
source-release.zip artifacts to be built.
You can debug this process without doing a release, by adding the parameter
-Papache-release to the non-release Maven build commands.
For apache-releases (triggered with the apache-release profile), all artifacts
get signed with gpg signatures, as well as with .sha512 checksums.
These are created during the build for the top level project, when
the apache-release profile is specified, or when the mvn release plugin is run.
Most of the signatures happen because of the gpg plugin and the
checsum-maven-plugin, but the source-release.zip artifact has its own
special antrun task since it's not attached.
LICENSE and NOTICE files
Things that are distributed from Apache need LICENSE and NOTICE files. We have several kinds
of distributions:
- Releases
- source-release.zip files
- binary (built) packagings
- Binary assemblies
- Multi-project assemblies, like our main UIMA SDK
- Single-project assemblies, like our individual add-on projects
- uploaded to main Maven Repository as part of releasing:
- Individual Jars
- PEAR packages of some individual projects
- OSGi packaging of some individual projects
- SVN
- Eclipse Features
- Features have their own special Eclipse-style form for licenses
- Boilerplate for these are kept in a common spot in the uima-build-resources project,
in the subdirectory: licenses-eclipse-plugs-features.
There are two files here. One is the boilerplate license, in
uima-eclipse-user-agreement.html (a text copy is also embedded in the features.properties).
The second file is a boilerplate features.properties.
- Both of these files should be copied to the top level of feature project(s), and
the feature.properties should be edited to have the right values for the particular feature.
Three kinds of LICENSE/NOTICE files
There are typically three versions of these files, corresponding to source (only)
distributions, other distributions which include "dependent" artifacts, which
may have their own separate license and notice information, and a special one for
Eclipse Features. The source (only)
versions of these files are in SVN at the top level of various project hierarchies.
For some packagings of source artifacts, such as JARs for
projects that are part of bigger assemblies, the projects do not contain individual
license and notice files (see for instance, .../uimaj/trunk/uimaj-core .
For these, a standard LICENSE and NOTICE is computed using a template, augmented
if needed by additions from the project's pom .
All major releasable things (other than projects for Jars which are always released
as part of an assembly - such as the Jars which make up the UIMA SDK), have top
level license and notice files in their top most project; these are for the
source, only, and do not cover the dependencies (if any) that might be included with
a binary-style release.
Many of our packagings include dependencies, including OSGi and PEAR packagings,
as well as normal binary assemblies.
The license and notice for these packagings is made by merging the license and notice
files from the source artifact plus those from all the dependent artifacts
(removing duplications).
Standard Artifacts
The release process includes standard boiler-plate things in standard places.
The Maven remote-resources-plugin is used to get these resources from a special
UIMA build artifact (uima-build-resources), and customizes them for the particular project:
- The DEPENDENCIES file is generated from the
transitive closure of the dependencies in the POM.
- The NOTICES file is sometimes augmented with additional text
(we use this to add the IBM Copyright formerly
in the files, per the Apache practice for moving these to the NOTICE file).
These resources, after customization, are placed in the project's
target/maven-shared-archive-resources/META-INF/
directory. Later steps in the build use this directory for two purposes:
- adding this information to any JAR that might be built
- adding this information as part of generated assemblies - these files are copied to the top
level (above any project).
The remote-resources-plugin adds a <resource> entry to the maven in-memory model <resources>
element that specifies that the files in target/maven-shared-archive-resources/ be copied to
target/classes. (You can see this by running a mvn package step with the -X parameter.)
This is what the remote-resources documentation means when it says:
"... the resources are injected into the current (in-memory) Maven project,
making them available to the process-resources phase."
Overriding on a per-project basis:
Files that the remote-resources-plugin obtains and places
in the target/maven-shared-archive-resources/META-INF/ directory can be overridden
by identically named files at the top level of the project.
Note that there are two sets of LICENSE / NOTICE files for distributable entities -
one for the source distribution,
and one for the binary distribution. This is because the source distribution rarely needs other
than the standard LICENSE/NOTICE files because it is
(usually) only distributing Apache-authored source; while the
binary distribution often distributes additional components that are licensed under other
licenses, with perhaps additional NOTICE requirements - in which case, the LICENSE and NOTICE files
contains all of the required licenses and notices for everything
being distributed.
For binary distributions, the LICENSE and NOTICE files
are taken from src/main/readme/ directory.
Some addon projects have just a single instance of LICENSE and NOTICE at the top level.
In this case, these are used for both the source and binary distribution, and they
therefore need to cover everything distributed with the binary distribution (even
if these are not delivered with the source distribution).
Eclipse Features and Plugins
The license and notice section for Eclipse plugins follows the conventions used for
other Jars. For Eclipse Features, there is a boilerplate license used for all Features, which, in turn, refers to specific
other embedded Licenses and Notices. The boilerplate becomes part of the Feature jar.
When setting up a new Eclipse Feature, developers need to manually copy the latest boilerplate
features.properties and uima-eclipse-user-agreement.html files from the uima-build-resources project
into their feature project top-level directory.
They need to then modify the values of the following properties for their feature:
The build adds these files to the resource set when building the jar.
Summary: License and Notices
This next table summarizes the packaging artifacts and how and where they are
located and added during the build process.
Artifacts |
Variants |
Sources |
Targets |
Methods |
LICENSE, NOTICE, DEPENDENCIES. |
Standard, for source distribution
Alternate: has extra Notice element used for copyrights moved to Notice file.
Alternate2: for binary assemblies, the LICENSE and NOTICE
are customized for each binary assembly.
Eclipse Features
|
For source distributions: uima-build-resources (in the build tooling).
Additional text for NOTICE (if needed) comes from
the property <postNoticeText> in the build POM.
For many cases, the additional copyright notice is for
IBM contributed code, which can be included using
<postNoticeText>${ibmNoticeText}</postNoticeText> .
For binary distributions, comes from src/main/readme/
For Eclipse Features: developer manually copies features.properties and
uima-eclipse-user-agreement.html from uima-build-resources project's folder
licenses-eclipse-plugs-features, to the Eclipse feature top level. Developer
manually edits two properties in features.properties - the feature name and description.
|
Jars: goes into META-INF.
Source-Release zips, source assemblies, binary assemblies, PEAR files, OSGi artifacts:
goes into the zip/tar as top level files.
Eclipse Features: filter.properties and uima-eclipse-user-agreement.html get included in Jar at top level
|
org.apache.uima:parent-pom configures the remote-resources plugin
to copy the standard LICENSE/NOTICE/DEPENDENCIES into
target/maven-shared-archive-resources/META-INF/
directory. The remote-resources plugin addes this dir to the
the list of standard resources the resources:resources goal copies into
target/classes.
This info is then included in any Jars that are built, in META-INF.
During release (only) (apache-release profile activated)
the information in target/maven-shared-archive-resources/META-INF/
is copied to the top level of the source-release archive.
Any versions of these files at the project's top level
will subsequently override these, at the top level of the archive.
For PEAR, OSGi, and binary assemblies,
these files come from src/main/readme/, the maven-resources-plugin
uses the copy-resources goal to copy these. |
For Eclipse features, a build <resource> element addes the 2 files.
README.txt |
|
For PEAR, OSGi, binary assemblies, and source zips distributions, these come from
same-named files at the top level of the project doing the creating of the release artifact.
|
Included in all packagings, except project Jars at the top level of package. |
For source release building, during release (only) (apache-release profile activated)
parent-pom-top configures the assembly plugin for source assemblies to use
the multimodule-source-release in uima-build-resources. This copies
the README file from the top level into the top level of the archive.
|
RELEASE_NOTES.html
Each release can include release notes
describing the main changes for the release. |
|
At the top level
|
Included in all packagings, except project Jars at the top level of package. |
During release builds the assembly or maven-resources-plugin
is configured to copy these from the top
level of the project to the top level of the archive.
|
issuesFixed
Each release can include a top-level directory called issuesFixed,
which has a file jira-report.html
This is generated
automatically when a release is built.
It contains the set of fixed/resolved Jiras that correspond to this release.
|
none |
This is a computed resource. |
This is generated into the top level of the project, and then goes
into all packagings except OSGi, at the top level. |
The build process runs the maven changes:jira-report plugin to generate this.
The release manager must insure the property specifying the Jira(s)
versions included in this release is properly specified, to enable the right
issues to be included in the report.
|
|
Handling Documentation
|
We have several kinds of documentation:
- Javadocs
- Docbook
- UIMA Website (Anakia)
We use Docbook style for much of our documentation,
the main exception being our website, which uses Anakia.
See the how to use docbook page for information
about how to write Docbook style documentation, and how it is processed during building.
Docbooks are built, if present, during the normal lifecycle.
Javadocs are built during the release process for Java sources packaged as a Jar.
In addition, some distribution projects collect things from multiple projects and build
explicitly, larger Javadoc sets. (e.g., uimaj-distr).
For the web site, documentation is kept in the SVN "site" top level directory. Within that project,
the sources are in the xdocs directory; anything there is expected to be written using Anakia
markup, and the build.xml ant script is used to transform these into corresponding html
files in the docs directory.
Normally, all website-ready content (not needing Anakia processing)
is kept in SVN in the uima/site/trunk/uima-website/docs directory;
this directory is manually checked out into the right spot on people.apache.org , the sources
are updated.
Large generated documents are put directly onto people.apache.org in the directory
/www/uima.apache.org/d/, and not checked into SVN. The files kept here include
the javadocs and the generated docbook html and pdf files.
This avoids using SVN for large generated files. This satisfies the
requirements
for Apache websites.
|
Packaging Individual Projects
|
The UIMA Project, in addition to the main frameworks (the UIMA SDK),
has Annotators and other components and tools
(such as the SimpleServer) that it releases. These in the past have been released as a big
assembly we called the Addons or Sandbox, but now are are being supported also as individual items.
Two kinds of packaging for these are available, chosen by specifying one of the following parent-poms:
- parent-pom-annotator - this packages things as a PEAR
- parent-pom-single-project - this packages things as a tar / zip file having a lib/ directory
with the generated Jar and dependencies (see below).
The individual Annotators are mostly packaged as
PEAR files, which are the UIMA standard for annotator component packaging and distribution.
The Pear is generated using the PearPackagingMavenPlugin, which generates automatically a
conventional Pear Installation Descriptor, from the information in the project.
The PEAR artifact is generated in addition to Jar of the code, and includes
other items such as the generated documentation, data, and a lib/ directory with other Jars
needed for the PEAR to operate. This PEAR is the packaged equivalent of a binary distribution
artifact produced for the main framework, and comes with a license and notice that
covers any included libraries. Generated PEARs are included in the artifacts that are managed by
Maven, and are available in the Maven repository system.
The PEAR internal folder "src" is not used by components here; the source is instead available
via the standard Maven source directory conventions.
PEARs must have a "main" descriptor, which is the one normally used to configure and run
the annotator. When using the parent-pom-annotator to indicate PEAR packaging,
you must include a property called
<pearMainDescriptor> in the Annotator's POM,
whose value is the path from the project base directory
identifying the main descriptor. For example, if the folder "desc" was at the root of your project, the
value might be something like "desc/my-main-descriptor.xml".
If an annotator isn't suitable for PEAR packaging, perhaps because it is inappropriate to have
one pre-done main descriptor, then the annotator can be packaged as a single-project, instead, by
marking the POM's parent parent-pom-single-project.
Common conventions for structuring individually releaseable projects
Use these conventions to get your annotator properly packaged when you use the parent-pom-annotator or
the parent-pom-single-project as your parent pom:
- Use <package>jar</package> (which is the default, so it should be omitted), if there
is a Jar that should be built (as is usually the case). This will cause the main
annotator Jar to be built, including any resources (plus the top-level desc folder, if it exists),
as normal.
- Create the required License, Notice, and optional Readme, and Release Notes files
in the project directory at the top level; these will be included in the
PEAR or simple project binary assembly at the top level.
The License and Notice files should be for the entire project, including
all third-party Jars etc. being distributed.
The generated Jar, which only has Apache-developed code,
will have the normal Apache License and Notice files.
- Dependencies: those marked with scope "compile" (the default) or "runtime"
are copied to the PEAR or single-project's lib/ directory. To keep a dependency needed for compiling
from being included in the lib/, give it a scope of "provided".
You may use the dependencies <exclusions> element to control
transitive dependencies.
Conventions for structuring individual projects released as PEARs
|
Building Assemblies for Distribution
|
The normal operation of Maven is concerned with the building of individual modules. Each module
when built produces maven artifacts in repositories (your local repository, or perhaps uploaded
to a snapshot or staging repository).
When releases are done, additional artifacts are constructed using the distribution assemblies.
We have these for
- UIMA - the base Java framework
- UIMA Add-ons - These are released as a collection. Future updates will
likely release these individually.
- UIMACPP - the C++ framework
|
Special resources for build
|
During the build process, several resources are used, and are built into Maven Artifact Jars with
Maven coordinates. These projects are in the build section of SVN.
- uima-build-resources - this contains various resources used in the build process
- the standard License Notice and Dependencies from the
Apache common parent pom, except it allows adding post-Notice text. We use this when
we need to include the IBM Copyright notice, moved here from donated code from IBM.
- an override for the multi-module source release assembly
- the binary assembly descriptor for single-projects.
- common specification for the titlepage of docbooks
- uima-docbook-olink - this is the shared
olink data.
This project's source contains the
information about the UIMA Bookshelf - the set of books that can cross-reference each other,
and how they're laid out (current layout - they all are subitems of one common directory).
This artifact is not released, it always stays at 1-SNAPSHOT level, but can be deployed
to the snapshot repository for sharing with others.
There is also one custom Maven plugin (uima-build-helper-maven-plugin)
which is used to get the build month and build year into properties.
This is used, for instance, in the common docbook frontmatter to indicate when the book was
built.
|
Using the Release Audit Tool (RAT)
|
From parent-pom version 2 and onwards,
RAT is run automatically, but only when the Maven profile "apache-release" is activated.
Projects that override the default RAT exclusions must include all things that need to be excluded,
and put this configuration into their POM in the "pluginManagement" section; see the project uimaj-core
for an example.
|
Lifecycle for building addons
|
Addon projects have multiple binary build artifacts:
- binary assembly for the individual addon
- binary assembly for aggregate of all the addons
- PEAR packaging
- OSGi packaging
Since these have many common parts, we use the maven lifecycle to order building of shared
common things ahead of their use:
- package phase: build JARs, docbooks, etc.
- pre-integration-test phase: this is the first phase after
package , and is
used to copy the Jar and docbook processing results to a common directory structure (base-bin).
- integration-test phase: this is the 2nd phas after
package , and is
used to copy the common things from base-bin to places for building the PEAR and OSGi artifacts.
It is also used for running the packaging steps sequentially, after the copy, for the PEAR, OSGi,
and single project assemblies.
- post-integration-test phase: this is used to mark the generated PEAR file for "attachment"
using the maven-build-helper, so it gets deployed to the maven repositories.
|
|
|