UIMA project logo
Updating the Apache UIMA Website
Apache UIMA

Search the site

 Updating the Apache UIMA Website

The Apache UIMA™ website is updated by committers on the project from time to time. Besides normal edits to various pages, with every release the download page needs updating. There are special procedures for effectively handling large generated documentation, as well.

Kinds of Web Content

There are many kinds of data in the uima-website component. Here's a summary:

  • Normal web pages. These are kept in source form in /xdocs/....xml, and converted to published ....html form (formatting, adding required headers/footers, etc.) and kept as identically named (except for the .html suffix) files under /docs/.
  • Generated files. These are things like the current javadocs. These are often large, with 1000's of files. These are kept in /docs/d/ directory.
  • xdocs/stylesheets - this is a repository for the main formatting macros and data for the website. It includes the project.xml, which has the left-hand menu details, and the release details, and transformation macros. Note: we use anakia and "velocity" (vsl) macros, we do not use XSLT transformation. The filel site.xsl is out of date, not maintained, etc.; see the file site.vsl instead.
  • downloads - needs impovement. It has some old generated files (should be cleaned up - deleted or moved to /docs/d/). There should also not be 2 copies both in docs/ and xdocs/. A reasonable design would be for this to be only in the /docs/ directory, and to contain non-generated files for download. Be careful in doing changes, in that external websites could link to files here.

    This has generated docs for the sandbox projects, for old versions of UIMA while in incubation (probably should be removed), and charts and PDFs for the UIMA track of the 2007 GLDV conference.

  • ip-clearances - this should be moved from the /docs/ to a non-published directory. This is an output directory that doesn't really need to be checked into SVN - its there only to check ip-clearance documentation before committing it to the incubator web site (see the Apache IP clearance process).
  • The DOAP file for UIMA. See http://projects.apache.org/doap.html. This file at this location is registered with the Apache infrastructure so do not move it without updating that.

How to SVN checkout the web-site

You can use this trick when checkout of the uima-website project, to reduce the footprint of what's checked out. The basic idea is to do a full checkout, but then discard from the working-copy, the /docs/d/ content (the generated Javadocs, etc.). This can be done using the command:

cd to the top of the uima-website working copy, and then
svn update --set-depth exclude docs/d
svn update --set-depth exclude docs/downloads/gldv
svn update --set-depth exclude docs/downloads/releaseDocs
This will run for a while, but then the directories mentioned and their contents will be deleted from your working copy; this will speed up various SVN operations that have to scan the files for changes, etc. For more details, see http://svnbook.red-bean.com/en/1.6/svn.advanced.sparsedirs.html. If this doesn't work for you, check that your svn version is at least 1.6.

How to Generate and Publish the web-site

The SVN spot uima/site/trunk/uima-website has the current published version of the website. The subtree starting at the docs directory is automatically copied to the Apache webserver, whenever an SVN update is done.

After updating any pages in xdocs/, you must run the build.xml ANT script to convert these changes to corresponding pages in the docs/ directory. The build.xml script uses the time stamp on the source files to determine which files to run against, and only regenerates files which have changed (or been added).

Using the stylesheets

There are two kinds of stylesheets. One is in the /docs/stylesheets and is the CSS styling of the website. Change this to tweak the look and feel of the website.

The other is the stylesheets directory in the xdocs/. Change the project.xml to update the left-hand-side navigation menu, and to add details about new releases (these are used to generate the bulk of the download page).

Generated Content Management

Generated content can be large, and can contain 1000's of files. It needs to be somewhat carefully managed to conserve Apache infrastructure resources.

With the new style of web-site management, the only way to get information published is to put it into SVN in our docs/ directory. By convention, generated information goes under docs/d/.

Since most generated information is associated with release versions, this can change with each release. Our website often keeps links to the current, and maybe one previous release. To make the consumption of SVN resources minimal, the following is suggested as the way to update these.

  • Arrange to have files which don't change from release to release shared within the SVN.
  • Arrange to have files which change minimally, and which are "text", to take advantage of SVN storage of differences between versions

These two principles mean we should not just generate a new set of Javadocs, for instance, and then commit them as xyz-version-2.4.0/ - this would add all these files as new ones. Instead we need to do something like do an SVN copy (within SVN) of the previous files, check that out, overwrite that with changes from the new generation, and check that back in.

There is a webpage which details recommendations for how to do this: http://www.apache.org/dev/project-site.html#generated.

Download CGI scripting

The main download page, downloads.cgi in docs/ is specially crafted to interface with Apache's mirror system for downloading. The actual download page is a "template" that the cgi uses, substituting a randomly picked mirror site. Users which want to link to the download page need to link to the .cgi version, to get a proper display of the webpage.

IP-Clearance documentation

We keep ip-clearance documentation for our project in /xdocs/ip-clearances, so they can be iteratively developed.