Class FileSystemCollectionReader

All Implemented Interfaces:
BaseCollectionReader, CollectionReader, ConfigurableResource, Resource

public class FileSystemCollectionReader extends CollectionReader_ImplBase
A simple collection reader that reads documents from a directory in the filesystem. It can be configured with the following parameters:
  • InputDirectory - path to directory containing files
  • Encoding (optional) - character encoding of the input files
  • Language (optional) - language of the input documents
  • Field Details

    • PARAM_INPUTDIR

      public static final String PARAM_INPUTDIR
      Name of configuration parameter that must be set to the path of a directory containing input files.
      See Also:
    • PARAM_ENCODING

      public static final String PARAM_ENCODING
      Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used.
      See Also:
    • PARAM_LANGUAGE

      public static final String PARAM_LANGUAGE
      Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.
      See Also:
    • PARAM_XCAS

      public static final String PARAM_XCAS
      Optional configuration parameter that specifies XCAS input files
      See Also:
    • PARAM_LENIENT

      public static final String PARAM_LENIENT
      Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknown
      See Also:
  • Constructor Details

    • FileSystemCollectionReader

      public FileSystemCollectionReader()
  • Method Details

    • initialize

      public void initialize() throws ResourceInitializationException
      Description copied from class: CollectionReader_ImplBase
      This method is called during initialization, and does nothing by default. Subclasses should override it to perform one-time startup logic.
      Overrides:
      initialize in class CollectionReader_ImplBase
      Throws:
      ResourceInitializationException - if a failure occurs during initialization.
      See Also:
    • hasNext

      public boolean hasNext()
      Description copied from interface: BaseCollectionReader
      Gets whether there are any elements remaining to be read from this CollectionReader.
      Returns:
      true if and only if there are more elements available from this CollectionReader.
      See Also:
    • getNext

      public void getNext(CAS aCAS) throws IOException, CollectionException
      Description copied from interface: CollectionReader
      Gets the next element of the collection. The element will be stored in the provided CAS object. If this is a consuming CollectionReader (see BaseCollectionReader.isConsuming()), this element will also be removed from the collection.
      Parameters:
      aCAS - the CAS to populate with the next element of the collection
      Throws:
      IOException - if an I/O failure occurs
      CollectionException - if there is some other problem with reading from the Collection
      See Also:
    • close

      public void close() throws IOException
      Description copied from interface: BaseCollectionReader
      Closes this CollectionReader, after which it may no longer be used.
      Throws:
      IOException - if an I/O failure occurs
      See Also:
    • getProgress

      public Progress[] getProgress()
      Description copied from interface: BaseCollectionReader
      Gets information about the number of entities and/or amount of data that has been read from this CollectionReader, and the total amount that remains (if that information is available).

      This method returns an array of Progress objects so that results can be reported using different units. For example, the CollectionReader could report progress in terms of the number of documents that have been read and also in terms of the number of bytes that have been read. In many cases, it will be sufficient to return just one Progress object.

      Returns:
      an array of Progress objects. Each object may have different units (for example number of entities or bytes).
      See Also:
    • getNumberOfDocuments

      public int getNumberOfDocuments()
      Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.
      Returns:
      the number of documents in the collection
    • getDescription

      public static CollectionReaderDescription getDescription() throws InvalidXMLException
      Parses and returns the descriptor for this collection reader. The descriptor is stored in the uima.jar file and located using the ClassLoader.
      Returns:
      an object containing all of the information parsed from the descriptor.
      Throws:
      InvalidXMLException - if the descriptor is invalid or missing
    • getDescriptorURL

      public static URL getDescriptorURL()