Overview   Class List   Class Hierarchy   Class Members   Functions & Constants   Defines   Header Files  

uima::TokenProperties Class Reference

List of all members.

Detailed Description

The class TokenProperties is used to encapsulate information about the characters occuring in a token (for example, upper and lower).

At the centre it is a bitset, but with inline member functions for convenient access. This has to be filled by each compliant tokenizer and stored with each token. Example:

See also:


Public Member Functions

Constructors
 TokenProperties (void)
 Constructs an object, initializing all bit values to zero.
 TokenProperties (const icu::UnicodeString &ustrInputString)
 Constructs an object from a UString, computing the bit values for the string.
 TokenProperties (const UnicodeStringRef &ulstrInputString)
 Constructs an object from a UString, computing the bit values for the string.
 TokenProperties (const UChar *cpucCurrent, const UChar *cpucEnd)
 Constructs an object from a two pointers, computing the bit values for the string.
 TokenProperties (WORD32 w32Val)
 initializes bits to value of w32Val
Properties
bool hasLeadingUpper (void) const
 true if the first char in the token is upper case
void setLeadingUpper (bool bSetOn=true)
 sets the hasLeadingUpper() property to bSetOn
bool hasTrailingUpper (void) const
 true if some char after the first char in the token is upper case
void setTrailingUpper (bool bSetOn=true)
 sets the hasTrailingUpper() property to bSetOn
bool hasUpper (void) const
 true if the token has upper case chars (leading or trailing)
bool hasLower (void) const
 true if the token has lower case chars
void setLower (bool bSetOn=true)
 sets the hasLower() property to bSetOn
bool hasNumeric (void) const
 true if the token has numeric chars
void setNumeric (bool bSetOn=true)
 sets the hasNumeric() property to bSetOn
bool hasSpecial (void) const
 true if the token has special chars (e.g. hyphen, period etc.)
void setSpecial (bool bSetOn=true)
 sets the hasSpecial() property to bSetOn
Miscellaneous
bool isPlainWord () const
 true if not hasSpecial() and not hasNumeric()
bool isAllUppercaseWord (void) const
 true if only hasUpper()
bool isAllLowercaseWord (void) const
 true if only hasLower()
bool isInitialUppercaseWord (void) const
 true if only hasLeadingUpper() and hasTrailingUpper()
bool isPlainNumber () const
 true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign
bool isPureNumber () const
 unlike isPlainNumber() this only allows for digits (no sign and point)
bool isPureSpecial () const
 true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign
void reset (void)
 Resets all bits in *this, and returns *this.
void initFromString (const UChar *cpucCurrent, const UChar *cpucEnd)
 Resets all bits and reinitializes from the string.
std::string to_string (void) const
 Returns an object of type string, N characters long.
unsigned long to_ulong (void) const
 Returns the integral value corresponding to the bits in *this.


Constructor & Destructor Documentation

uima::TokenProperties::TokenProperties void   )  [inline]
 

Constructs an object, initializing all bit values to zero.

uima::TokenProperties::TokenProperties const icu::UnicodeString &  ustrInputString  ) 
 

Constructs an object from a UString, computing the bit values for the string.

uima::TokenProperties::TokenProperties const UnicodeStringRef ulstrInputString  ) 
 

Constructs an object from a UString, computing the bit values for the string.

uima::TokenProperties::TokenProperties const UChar *  cpucCurrent,
const UChar *  cpucEnd
 

Constructs an object from a two pointers, computing the bit values for the string.

Note: cpucEnd points beyond the end of the string

uima::TokenProperties::TokenProperties WORD32  w32Val  )  [inline]
 

initializes bits to value of w32Val


Member Function Documentation

bool uima::TokenProperties::hasLeadingUpper void   )  const [inline]
 

true if the first char in the token is upper case

void uima::TokenProperties::setLeadingUpper bool  bSetOn = true  )  [inline]
 

sets the hasLeadingUpper() property to bSetOn

bool uima::TokenProperties::hasTrailingUpper void   )  const [inline]
 

true if some char after the first char in the token is upper case

void uima::TokenProperties::setTrailingUpper bool  bSetOn = true  )  [inline]
 

sets the hasTrailingUpper() property to bSetOn

bool uima::TokenProperties::hasUpper void   )  const [inline]
 

true if the token has upper case chars (leading or trailing)

bool uima::TokenProperties::hasLower void   )  const [inline]
 

true if the token has lower case chars

void uima::TokenProperties::setLower bool  bSetOn = true  )  [inline]
 

sets the hasLower() property to bSetOn

bool uima::TokenProperties::hasNumeric void   )  const [inline]
 

true if the token has numeric chars

void uima::TokenProperties::setNumeric bool  bSetOn = true  )  [inline]
 

sets the hasNumeric() property to bSetOn

bool uima::TokenProperties::hasSpecial void   )  const [inline]
 

true if the token has special chars (e.g. hyphen, period etc.)

void uima::TokenProperties::setSpecial bool  bSetOn = true  )  [inline]
 

sets the hasSpecial() property to bSetOn

bool uima::TokenProperties::isPlainWord  )  const [inline]
 

true if not hasSpecial() and not hasNumeric()

bool uima::TokenProperties::isAllUppercaseWord void   )  const [inline]
 

true if only hasUpper()

bool uima::TokenProperties::isAllLowercaseWord void   )  const [inline]
 

true if only hasLower()

bool uima::TokenProperties::isInitialUppercaseWord void   )  const [inline]
 

true if only hasLeadingUpper() and hasTrailingUpper()

bool uima::TokenProperties::isPlainNumber  )  const [inline]
 

true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign

bool uima::TokenProperties::isPureNumber  )  const [inline]
 

unlike isPlainNumber() this only allows for digits (no sign and point)

bool uima::TokenProperties::isPureSpecial  )  const [inline]
 

true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign

void uima::TokenProperties::reset void   )  [inline]
 

Resets all bits in *this, and returns *this.

void uima::TokenProperties::initFromString const UChar *  cpucCurrent,
const UChar *  cpucEnd
 

Resets all bits and reinitializes from the string.

std::string uima::TokenProperties::to_string void   )  const
 

Returns an object of type string, N characters long.

Each position in the new string is initialized with a character ('0' for zero and '1' for one), representing the value stored in the corresponding bit position of this. Character position N - 1 corresponds to bit position 0. Subsequent decreasing character positions correspond to increasing bit positions.

unsigned long uima::TokenProperties::to_ulong void   )  const [inline]
 

Returns the integral value corresponding to the bits in *this.


The documentation for this class was generated from the following file:
Generated on Mon Oct 1 16:04:14 2012 for UIMACPP API by  doxygen 1.3.9.1