TokenProperties
is used to encapsulate information about the characters occuring in a token (for example, upper and lower).
At the centre it is a bitset, but with inline member functions for convenient access. This has to be filled by each compliant tokenizer and stored with each token. Example:
Public Member Functions | |
Constructors | |
TokenProperties (void) | |
Constructs an object, initializing all bit values to zero. | |
TokenProperties (const icu::UnicodeString &ustrInputString) | |
Constructs an object from a UString, computing the bit values for the string. | |
TokenProperties (const UnicodeStringRef &ulstrInputString) | |
Constructs an object from a UString, computing the bit values for the string. | |
TokenProperties (const UChar *cpucCurrent, const UChar *cpucEnd) | |
Constructs an object from a two pointers, computing the bit values for the string. | |
TokenProperties (WORD32 w32Val) | |
initializes bits to value of w32Val | |
Properties | |
bool | hasLeadingUpper (void) const |
true if the first char in the token is upper case | |
void | setLeadingUpper (bool bSetOn=true) |
sets the hasLeadingUpper() property to bSetOn | |
bool | hasTrailingUpper (void) const |
true if some char after the first char in the token is upper case | |
void | setTrailingUpper (bool bSetOn=true) |
sets the hasTrailingUpper() property to bSetOn | |
bool | hasUpper (void) const |
true if the token has upper case chars (leading or trailing) | |
bool | hasLower (void) const |
true if the token has lower case chars | |
void | setLower (bool bSetOn=true) |
sets the hasLower() property to bSetOn | |
bool | hasNumeric (void) const |
true if the token has numeric chars | |
void | setNumeric (bool bSetOn=true) |
sets the hasNumeric() property to bSetOn | |
bool | hasSpecial (void) const |
true if the token has special chars (e.g. hyphen, period etc.) | |
void | setSpecial (bool bSetOn=true) |
sets the hasSpecial() property to bSetOn | |
Miscellaneous | |
bool | isPlainWord () const |
true if not hasSpecial() and not hasNumeric() | |
bool | isAllUppercaseWord (void) const |
true if only hasUpper() | |
bool | isAllLowercaseWord (void) const |
true if only hasLower() | |
bool | isInitialUppercaseWord (void) const |
true if only hasLeadingUpper() and hasTrailingUpper() | |
bool | isPlainNumber () const |
true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign | |
bool | isPureNumber () const |
unlike isPlainNumber() this only allows for digits (no sign and point) | |
bool | isPureSpecial () const |
true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign | |
void | reset (void) |
Resets all bits in *this, and returns *this. | |
void | initFromString (const UChar *cpucCurrent, const UChar *cpucEnd) |
Resets all bits and reinitializes from the string. | |
std::string | to_string (void) const |
Returns an object of type string, N characters long. | |
unsigned long | to_ulong (void) const |
Returns the integral value corresponding to the bits in *this. |
|
Constructs an object, initializing all bit values to zero.
|
|
Constructs an object from a UString, computing the bit values for the string.
|
|
Constructs an object from a UString, computing the bit values for the string.
|
|
Constructs an object from a two pointers, computing the bit values for the string. Note: cpucEnd points beyond the end of the string |
|
initializes bits to value of
|
|
true if the first char in the token is upper case
|
|
sets the
|
|
true if some char after the first char in the token is upper case
|
|
sets the
|
|
true if the token has upper case chars (leading or trailing)
|
|
true if the token has lower case chars
|
|
sets the
|
|
true if the token has numeric chars
|
|
sets the
|
|
true if the token has special chars (e.g. hyphen, period etc.)
|
|
sets the
|
|
true if not hasSpecial() and not hasNumeric()
|
|
true if only hasUpper()
|
|
true if only hasLower()
|
|
true if only hasLeadingUpper() and hasTrailingUpper()
|
|
true if hasNumeric() && !(hasLower() || hasUpper()) Note: this might have decimal point and sign
|
|
unlike isPlainNumber() this only allows for digits (no sign and point)
|
|
true if hasSpecail() && !(hasLower() || hasUpper() || hasNumeric()) Note: this might have decimal point and sign
|
|
Resets all bits in *this, and returns *this.
|
|
Resets all bits and reinitializes from the string.
|
|
Returns an object of type string, N characters long. Each position in the new string is initialized with a character ('0' for zero and '1' for one), representing the value stored in the corresponding bit position of this. Character position N - 1 corresponds to bit position 0. Subsequent decreasing character positions correspond to increasing bit positions. |
|
Returns the integral value corresponding to the bits in *this.
|