Overview Class List Class Hierarchy Class Members Functions & Constants Defines Header Files

uima::UnicodeStringRef Class Reference

Detailed Description

The class UnicodeStringRef provides support for non zero-terminated strings that are presented as pointers to Unicode character arrays with an associated length.

As this type of string is supposed to be used only as string reference into read-only buffers, the string pointer is constant. The member functions are named to implement the icu::UnicodeString interface but only providing const member functions This class is a quick ,light-weight, shallow string (internally it consists only of a pointer and a length) which can be copied by value without performance penalty. It allows references into other string buffers to be treated like real string objects. Since it does not own it's string memory care must be taken to make sure the lifetime of an UnicodeStringRef object does not exceed the lifetime of the Unicode character buffer it references.

Public Member Functions

UnicodeStringRef (void)

Default Constructor.

UnicodeStringRef (const icu::UnicodeString &crUniString)

Constructor from icu::UnicodeString.

UnicodeStringRef (UChar const *cpacString)

Constructor from zero terminated string.

UnicodeStringRef (UChar const *cpacString, int32_t uiLength)

Constructor from string and length.

UnicodeStringRef (UChar const *paucStringBegin, UChar const *paucStringEnd)

Constructor from a two pointers (begin/end).

int32_t getSizeInBytes (void) const

Accessor for the number of bytes occupied by this string.

UChar const * getBuffer (void) const

CONST Accessor for the string content (NOT ZERO DELIMITED!).

UnicodeStringRef & operator= (UnicodeStringRef const &crclRHS)

Assignment operator.

int operator== (const UnicodeStringRef &crclRHS) const

Equality operator.

int operator!= (const UnicodeStringRef &crclRHS) const

Inequality operator.

bool operator< (UnicodeStringRef const &text) const

less operator

bool operator<= (UnicodeStringRef const &text) const

less equal operator

bool operator> (UnicodeStringRef const &text) const

greater operator

bool operator>= (UnicodeStringRef const &text) const

greater equal operator

int8_t compare (const UnicodeStringRef &text) const

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

int8_t compare (const icu::UnicodeString &text) const

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText) const

Compare the characters bitwise in the range [start, start + length) with the characters in srcText.

int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const

Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).

int8_t compare (UChar const *srcChars, int32_t srcLength) const

Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.

int8_t compare (int32_t start, int32_t length, UChar const *srcChars) const

Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.

int8_t compare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const

Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).

int8_t compareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const

Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).

int8_t compareCodePointOrder (const UnicodeStringRef &text) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrder (UChar const *srcChars, int32_t srcLength) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const

Compare two Unicode strings in code point order.

int8_t compareCodePointOrderBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const

Compare two Unicode strings in code point order.

int8_t caseCompare (const UnicodeStringRef &text, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompare (UChar const *srcChars, int32_t srcLength, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength, uint32_t options) const

Compare two strings case-insensitively using full case folding.

int8_t caseCompareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit, uint32_t options) const

Compare two strings case-insensitively using full case folding.

bool startsWith (const UnicodeStringRef &text) const

Determine if this starts with the characters in text.

bool startsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const

Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).

bool startsWith (UChar const *srcChars, int32_t srcLength) const

Determine if this starts with the characters in srcChars.

bool startsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const

Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).

bool endsWith (const UnicodeStringRef &text) const

Determine if this ends with the characters in text.

bool endsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const

Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).

bool endsWith (UChar const *srcChars, int32_t srcLength) const

Determine if this ends with the characters in srcChars.

bool endsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const

Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).

int32_t indexOf (const UnicodeStringRef &text) const

Locate in this the first occurrence of the characters in text, using bitwise comparison.

int32_t indexOf (const UnicodeStringRef &text, int32_t start) const

Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.

int32_t indexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const

Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

int32_t indexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const

Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

int32_t indexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

int32_t indexOf (UChar c) const

Locate in this the first occurrence of the code unit c, using bitwise comparison.

int32_t indexOf (UChar32 c) const

Locate in this the first occurrence of the code point c, using bitwise comparison.

int32_t indexOf (UChar c, int32_t start) const

Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.

int32_t indexOf (UChar32 c, int32_t start) const

Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.

int32_t indexOf (UChar c, int32_t start, int32_t length) const

Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

int32_t indexOf (UChar32 c, int32_t start, int32_t length) const

Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.

int32_t lastIndexOf (const UnicodeStringRef &text) const

Locate in this the last occurrence of the characters in text, using bitwise comparison.

int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start) const

Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.

int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const

Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

int32_t lastIndexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const

Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

int32_t lastIndexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

int32_t lastIndexOf (UChar c) const

Locate in this the last occurrence of the code unit c, using bitwise comparison.

int32_t lastIndexOf (UChar32 c) const

Locate in this the last occurrence of the code point c, using bitwise comparison.

int32_t lastIndexOf (UChar c, int32_t start) const

Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.

int32_t lastIndexOf (UChar32 c, int32_t start) const

Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.

int32_t lastIndexOf (UChar c, int32_t start, int32_t length) const

Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

int32_t lastIndexOf (UChar32 c, int32_t start, int32_t length) const

Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.

UChar charAt (int32_t offset) const

Return the code unit at offset offset.

UChar operator[] (int32_t offset) const

Return the code unit at offset offset.

UChar32 char32At (int32_t offset) const

Return the code point that contains the code unit at offset offset.

int32_t getChar32Start (int32_t offset) const

Adjust a random-access offset so that it points to the beginning of a Unicode character.

int32_t getChar32Limit (int32_t offset) const

Adjust a random-access offset so that it points behind a Unicode character.

int32_t moveIndex32 (int32_t index, int32_t delta) const

Move the code unit index along the string by delta code points.

void extract (int32_t start, int32_t length, UChar *dst, int32_t dstStart=0) const

Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.

void extractBetween (int32_t start, int32_t limit, UChar *dst, int32_t dstStart=0) const

Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.

int32_t extract (UChar *dst, int32_t dstCapacity, UErrorCode &errorCode) const

Copy the contents of the string into dst.

void extract (int32_t start, int32_t length, UnicodeString &dst) const

Copy the characters in the range [start, start + length) into the UnicodeString dst.

void extractBetween (int32_t start, int32_t limit, UnicodeString &dst) const

Copy the characters in the range [start, limit) into the UnicodeString dst.

int32_t extract (int32_t start, int32_t startLength, char *target, const char *codepage=0) const

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

int32_t extract (int32_t start, int32_t startLength, char *target, uint32_t targetLength, const char *codepage=0) const

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

int32_t extract (char *target, int32_t targetCapacity, UConverter *cnv, UErrorCode &errorCode) const

Convert the UnicodeStringRef into a codepage string using an existing UConverter.

int32_t extract (int32_t start, int32_t startLength, std::string &target, const char *codepage=0) const

Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.

int32_t extract (std::string &target, const char *codepage=0) const

Copy all the characters in the string into an std::string object in a specified codepage.

int32_t extractUTF8 (std::string &target) const

Copy all the characters in the string into an std::string object in UTF-8.

std::string asUTF8 (void) const

Convert to a UTF8 string.

int32_t length (void) const

Return the length of the UnicodeStringRef object.

int32_t countChar32 (int32_t start=0, int32_t length=0x7fffffff) const

Count Unicode code points in the length UChar code units of the string.

bool isEmpty (void) const

Determine if this string is empty.

UnicodeStringRef & setTo (const UnicodeStringRef &srcText)

Set the text in the UnicodeString object to the characters in srcText.

UnicodeStringRef & setTo (const UnicodeString &srcText)

Set the text in the UnicodeString object to the characters in srcText.

UnicodeStringRef & setTo (const UChar *srcChars, int32_t srcLength)

Set the characters in the UnicodeString object to the characters in srcChars.

void toSingleByteStream (std::ostream &outStream) const

Print a single byte version to outStream.

Static Public Member Functions

void release (std::string &target)

Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.

Constructor & Destructor Documentation

uima::UnicodeStringRef::UnicodeStringRef ( void ) [inline]

Default Constructor.

uima::UnicodeStringRef::UnicodeStringRef ( const icu::UnicodeString & crUniString ) [inline]

Constructor from icu::UnicodeString.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const * cpacString ) [inline, explicit]

Constructor from zero terminated string.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const * cpacString,

int32_t uiLength

) [inline]

Constructor from string and length.

uima::UnicodeStringRef::UnicodeStringRef ( UChar const * paucStringBegin,

UChar const * paucStringEnd

) [inline]

Constructor from a two pointers (begin/end).
Note: end points to the first char behind the string.

Deprecated:
Replace with UnicodeStringRef(paucStringBegin,paucStringEnd-paucStringBegin).

Member Function Documentation

int32_t uima::UnicodeStringRef::getSizeInBytes ( void ) const [inline]

Accessor for the number of bytes occupied by this string.

UChar const * uima::UnicodeStringRef::getBuffer ( void ) const [inline]

CONST Accessor for the string content (NOT ZERO DELIMITED!).

UnicodeStringRef & uima::UnicodeStringRef::operator= ( UnicodeStringRef const & crclRHS ) [inline]

Assignment operator.

int uima::UnicodeStringRef::operator== ( const UnicodeStringRef & crclRHS ) const [inline]

Equality operator.

int uima::UnicodeStringRef::operator!= ( const UnicodeStringRef & crclRHS ) const [inline]

Inequality operator.

bool uima::UnicodeStringRef::operator< ( UnicodeStringRef const & text ) const [inline]

less operator

bool uima::UnicodeStringRef::operator<= ( UnicodeStringRef const & text ) const [inline]

less equal operator

bool uima::UnicodeStringRef::operator> ( UnicodeStringRef const & text ) const [inline]

greater operator

bool uima::UnicodeStringRef::operator>= ( UnicodeStringRef const & text ) const [inline]

greater equal operator

int8_t uima::UnicodeStringRef::compare ( const UnicodeStringRef & text ) const [inline]

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:

text The UnicodeStringRef to compare to this one.

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( const icu::UnicodeString & text ) const [inline]

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:

text The UnicodeString to compare to this one.

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText

) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcText.

Parameters:

start the offset at which the compare operation begins

length the number of characters of text to compare.

srcText the text to be compared

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength

) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:

start the offset at which the compare operation begins

length the number of characters in this to compare.

srcText the text to be compared

srcStart the offset into srcText to start comparison

srcLength the number of characters in src to compare

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( UChar const * srcChars,

int32_t srcLength

) const [inline]

Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.

Parameters:

srcChars The characters to compare to this UnicodeStringRef.

srcLength the number of characters in srcChars to compare

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t start,

int32_t length,

UChar const * srcChars

) const [inline]

Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.

Parameters:

start the offset at which the compare operation begins

length the number of characters to compare.

srcChars the characters to be compared

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare ( int32_t start,

int32_t length,

UChar const * srcChars,

int32_t srcStart,

int32_t srcLength

) const [inline]

Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:

start the offset at which the compare operation begins

length the number of characters in this to compare

srcChars the characters to be compared

srcStart the offset into srcChars to start comparison

srcLength the number of characters in srcChars to compare

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareBetween ( int32_t start,

int32_t limit,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLimit

) const [inline]

Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).

Parameters:

start the offset at which the compare operation begins

limit the offset immediately following the compare operation

srcText the text to be compared

srcStart the offset into srcText to start comparison

srcLimit the offset into srcText to limit comparison

Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareCodePointOrder ( const UnicodeStringRef & text ) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

text Another string to compare this one to.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcText Another string to compare this one to.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcText Another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLength The number of code units from that string to compare.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( UChar const * srcChars,

int32_t srcLength

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

srcChars A pointer to another string to compare this one to.

srcLength The number of code units from that string to compare.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t start,

int32_t length,

UChar const * srcChars

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcChars A pointer to another string to compare this one to.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder ( int32_t start,

int32_t length,

UChar const * srcChars,

int32_t srcStart,

int32_t srcLength

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcChars A pointer to another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLength The number of code units from that string to compare.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrderBetween ( int32_t start,

int32_t limit,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLimit

) const [inline]

Compare two Unicode strings in code point order.
This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:

start The start offset in this string at which the compare operation begins.

limit The offset after the last code unit from this string to compare.

srcText Another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLimit The offset after the last code unit from that string to compare.

Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::caseCompare ( const UnicodeStringRef & text,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(text.foldCase(options)).

Parameters:

text Another string to compare this one to.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcText Another string to compare this one to.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t start,

int32_t length,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcText Another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLength The number of code units from that string to compare.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( UChar const * srcChars,

int32_t srcLength,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:

srcChars A pointer to another string to compare this one to.

srcLength The number of code units from that string to compare.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t start,

int32_t length,

UChar const * srcChars,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcChars A pointer to another string to compare this one to.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare ( int32_t start,

int32_t length,

UChar const * srcChars,

int32_t srcStart,

int32_t srcLength,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:

start The start offset in this string at which the compare operation begins.

length The number of code units from this string to compare.

srcChars A pointer to another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLength The number of code units from that string to compare.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompareBetween ( int32_t start,

int32_t limit,

const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLimit,

uint32_t options

) const [inline]

Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compareBetween(text.foldCase(options)).

Parameters:

start The start offset in this string at which the compare operation begins.

limit The offset after the last code unit from this string to compare.

srcText Another string to compare this one to.

srcStart The start offset in that string at which the compare operation begins.

srcLimit The offset after the last code unit from that string to compare.

options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I

Returns:
A negative, zero, or positive integer indicating the comparison result.

bool uima::UnicodeStringRef::startsWith ( const UnicodeStringRef & text ) const [inline]

Determine if this starts with the characters in text.

Parameters:

text The text to match.

Returns:
TRUE if this starts with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::startsWith ( const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength

) const [inline]

Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:

srcText The text to match.

srcStart the offset into srcText to start matching

srcLength the number of characters in srcText to match

Returns:
TRUE if this starts with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::startsWith ( UChar const * srcChars,

int32_t srcLength

) const [inline]

Determine if this starts with the characters in srcChars.

Parameters:

srcChars The characters to match.

srcLength the number of characters in srcChars

Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::startsWith ( UChar const * srcChars,

int32_t srcStart,

int32_t srcLength

) const [inline]

Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:

srcChars The characters to match.

srcStart the offset into srcText to start matching

srcLength the number of characters in srcChars to match

Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( const UnicodeStringRef & text ) const [inline]

Determine if this ends with the characters in text.

Parameters:

text The text to match.

Returns:
TRUE if this ends with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength

) const [inline]

Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:

srcText The text to match.

srcStart the offset into srcText to start matching

srcLength the number of characters in srcText to match

Returns:
TRUE if this ends with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( UChar const * srcChars,

int32_t srcLength

) const [inline]

Determine if this ends with the characters in srcChars.

Parameters:

srcChars The characters to match.

srcLength the number of characters in srcChars

Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith ( UChar const * srcChars,

int32_t srcStart,

int32_t srcLength

) const [inline]

Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:

srcChars The characters to match.

srcStart the offset into srcText to start matching

srcLength the number of characters in srcChars to match

Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef & text ) const [inline]

Locate in this the first occurrence of the characters in text, using bitwise comparison.

Parameters:

text The text to search for.

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef & text,

int32_t start

) const [inline]

Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:

text The text to search for.

start The offset at which searching will start.

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef & text,

int32_t start,

int32_t length

) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:

text The text to search for.

start The offset at which searching will start.

length The number of characters to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength,

int32_t start,

int32_t length

) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:

srcText The text to search for.

srcStart the offset into srcText at which to start matching

srcLength the number of characters in srcText to match

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar const * srcChars,

int32_t srcLength,

int32_t start

) const [inline]

Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:

srcChars The text to search for.

srcLength the number of characters in srcChars to match

start the offset into this at which to start matching

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar const * srcChars,

int32_t srcLength,

int32_t start,

int32_t length

) const [inline]

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:

srcChars The text to search for.

srcLength the number of characters in srcChars

start The offset at which searching will start.

length The number of characters to search

Returns:
The offset into this of the start of srcChars, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar const * srcChars,

int32_t srcStart,

int32_t srcLength,

int32_t start,

int32_t length

) const

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:

srcChars The text to search for.

srcStart the offset into srcChars at which to start matching

srcLength the number of characters in srcChars to match

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar c ) const [inline]

Locate in this the first occurrence of the code unit c, using bitwise comparison.

Parameters:

c The code unit to search for.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32 c ) const [inline]

Locate in this the first occurrence of the code point c, using bitwise comparison.

Parameters:

c The code point to search for.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar c,

int32_t start

) const [inline]

Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:

c The code unit to search for.

start The offset at which searching will start.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32 c,

int32_t start

) const [inline]

Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:

c The code point to search for.

start The offset at which searching will start.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar c,

int32_t start,

int32_t length

) const [inline]

Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:

c The code unit to search for.

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf ( UChar32 c,

int32_t start,

int32_t length

) const [inline]

Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:

c The code point to search for.

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef & text ) const [inline]

Locate in this the last occurrence of the characters in text, using bitwise comparison.

Parameters:

text The text to search for.

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef & text,

int32_t start

) const [inline]

Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:

text The text to search for.

start The offset at which searching will start.

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef & text,

int32_t start,

int32_t length

) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:

text The text to search for.

start The offset at which searching will start.

length The number of characters to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( const UnicodeStringRef & srcText,

int32_t srcStart,

int32_t srcLength,

int32_t start,

int32_t length

) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:

srcText The text to search for.

srcStart the offset into srcText at which to start matching

srcLength the number of characters in srcText to match

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const * srcChars,

int32_t srcLength,

int32_t start

) const [inline]

Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:

srcChars The text to search for.

srcLength the number of characters in srcChars to match

start the offset into this at which to start matching

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const * srcChars,

int32_t srcLength,

int32_t start,

int32_t length

) const [inline]

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:

srcChars The text to search for.

srcLength the number of characters in srcChars

start The offset at which searching will start.

length The number of characters to search

Returns:
The offset into this of the start of srcChars, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar const * srcChars,

int32_t srcStart,

int32_t srcLength,

int32_t start,

int32_t length

) const

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:

srcChars The text to search for.

srcStart the offset into srcChars at which to start matching

srcLength the number of characters in srcChars to match

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar c ) const [inline]

Locate in this the last occurrence of the code unit c, using bitwise comparison.

Parameters:

c The code unit to search for.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32 c ) const [inline]

Locate in this the last occurrence of the code point c, using bitwise comparison.

Parameters:

c The code point to search for.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar c,

int32_t start

) const [inline]

Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:

c The code unit to search for.

start The offset at which searching will start.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32 c,

int32_t start

) const [inline]

Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:

c The code point to search for.

start The offset at which searching will start.

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar c,

int32_t start,

int32_t length

) const [inline]

Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:

c The code unit to search for.

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf ( UChar32 c,

int32_t start,

int32_t length

) const [inline]

Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:

c The code point to search for.

start the offset into this at which to start matching

length the number of characters in this to search

Returns:
The offset into this of c, or -1 if not found.

UChar uima::UnicodeStringRef::charAt ( int32_t offset ) const [inline]

Return the code unit at offset offset.

Parameters:

offset a valid offset into the text

Returns:
the code unit at offset offset

UChar uima::UnicodeStringRef::operator[] ( int32_t offset ) const [inline]

Return the code unit at offset offset.

Parameters:

offset a valid offset into the text

Returns:
the code unit at offset offset

UChar32 uima::UnicodeStringRef::char32At ( int32_t offset ) const [inline]

Return the code point that contains the code unit at offset offset.

Parameters:

offset a valid offset into the text that indicates the text offset of any of the code units that will be assembled into a code point (21-bit value) and returned

Returns:
the code point of text at offset

int32_t uima::UnicodeStringRef::getChar32Start ( int32_t offset ) const [inline]

Adjust a random-access offset so that it points to the beginning of a Unicode character.
The offset that is passed in points to any code unit of a code point, while the returned offset will point to the first code unit of the same code point. In UTF-16, if the input offset points to a iv_uiLength surrogate of a surrogate pair, then the returned offset will point to the first surrogate.
Parameters:

offset a valid offset into one code point of the text

Returns:
offset of the first code unit of the same code point

int32_t uima::UnicodeStringRef::getChar32Limit ( int32_t offset ) const [inline]

Adjust a random-access offset so that it points behind a Unicode character.
The offset that is passed in points behind any code unit of a code point, while the returned offset will point behind the last code unit of the same code point. In UTF-16, if the input offset points behind the first surrogate (i.e., to the iv_uiLength surrogate) of a surrogate pair, then the returned offset will point behind the iv_uiLength surrogate (i.e., to the first surrogate).
Parameters:

offset a valid offset after any code unit of a code point of the text

Returns:
offset of the first code unit after the same code point

int32_t uima::UnicodeStringRef::moveIndex32 ( int32_t index,

int32_t delta

) const

Move the code unit index along the string by delta code points.
Interpret the input index as a code unit-based offset into the string, move the index forward or backward by delta code points, and return the resulting index. The input index should point to the first code unit of a code point, if there is more than one.
Both input and output indexes are code unit-based as for all string indexes/offsets in ICU (and other libraries, like MBCS char*). If delta<0 then the index is moved backward (toward the start of the string). If delta>0 then the index is moved forward (toward the end of the string).
This behaves like CharacterIterator::move32(delta, kCurrent).
Examples: // s has code points 'a' U+10000 'b' U+10ffff U+2029 UnicodeStringRef s=UNICODE_STRING("a\\U00010000b\\U0010ffff\\u2029", 31).unescape();
// initial index: position of U+10000 int32_t index=1;
// the following examples will all result in index==4, position of U+10ffff
// skip 2 code points from some position in the string index=s.moveIndex32(index, 2); // skips U+10000 and 'b'
// go to the 3rd code point from the start of s (0-based) index=s.moveIndex32(0, 3); // skips 'a', U+10000, and 'b'
// go to the next-to-last code point of s
index=s.moveIndex32(s.length(), -2); // backward-skips U+2029 and U+10ffff

Parameters:

index input code unit index

delta (signed) code point count to move the index forward or backward in the string

Returns:
the resulting code unit index

void uima::UnicodeStringRef::extract ( int32_t start,

int32_t length,

UChar * dst,

int32_t dstStart = 0

) const [inline]

Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.
If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:

start offset of first character which will be copied into the array

length the number of characters to extract

dst array in which to copy characters. The length of dst must be at least (dstStart + length).

dstStart the offset in dst where the first character will be extracted

void uima::UnicodeStringRef::extractBetween ( int32_t start,

int32_t limit,

UChar * dst,

int32_t dstStart = 0

) const [inline]

Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.

Parameters:

start offset of first character which will be copied into the array

limit offset immediately following the last character to be copied

dst array in which to copy characters. The length of dst must be at least (dstStart + (limit - start)).

dstStart the offset in dst where the first character will be extracted

int32_t uima::UnicodeStringRef::extract ( UChar * dst,

int32_t dstCapacity,

UErrorCode & errorCode

) const

Copy the contents of the string into dst.
This is a convenience function that checks if there is enough space in dst, extracts the entire string if possible, and NUL-terminates dst if possible.
If the string fits into dst but cannot be NUL-terminated (length()==dstCapacity) then the error code is set to U_STRING_NOT_TERMINATED_WARNING. If the string itself does not fit into dst (length()>dstCapacity) then the error code is set to U_BUFFER_OVERFLOW_ERROR.
If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:

dst Destination string buffer.

dstCapacity Number of UChars available at dst.

errorCode ICU error code.

Returns:
length()

void uima::UnicodeStringRef::extract ( int32_t start,

int32_t length,

UnicodeString & dst

) const [inline]

Copy the characters in the range [start, start + length) into the UnicodeString dst.

Parameters:

start offset of first character which will be copied

length the number of characters to extract

dst UnicodeString into which to copy characters.

Returns:
A reference to dst

void uima::UnicodeStringRef::extractBetween ( int32_t start,

int32_t limit,

UnicodeString & dst

) const [inline]

Copy the characters in the range [start, limit) into the UnicodeString dst.

Parameters:

start offset of first character which will be copied

limit offset immediately following the last character to be copied

dst UnicodeString into which to copy characters.

Returns:
A reference to dst

int32_t uima::UnicodeStringRef::extract ( int32_t start,

int32_t startLength,

char * target,

const char * codepage = 0

) const [inline]

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
The output string is NUL-terminated.

Parameters:

start offset of first character which will be copied

startLength the number of characters to extract

target the target buffer for extraction

codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned. NOTE: It is assumed that the target is big enough to fit all of the characters.

Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract ( int32_t start,

int32_t startLength,

char * target,

uint32_t targetLength,

const char * codepage = 0

) const

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
This function does not write any more than targetLength characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.

Parameters:

start offset of first character which will be copied

startLength the number of characters to extract

target the target buffer for extraction

targetLength the length of the target buffer

codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned.

Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract ( char * target,

int32_t targetCapacity,

UConverter * cnv,

UErrorCode & errorCode

) const

Convert the UnicodeStringRef into a codepage string using an existing UConverter.
The output string is NUL-terminated if possible.
This function avoids the overhead of opening and closing a converter if multiple strings are extracted.

Parameters:

target destination string buffer, can be NULL if targetCapacity==0

targetCapacity the number of chars available at target

cnv the converter object to be used (ucnv_resetFromUnicode() will be called), or NULL for the default converter

errorCode normal ICU error code

Returns:
the length of the output string, not counting the terminating NUL; if the length is greater than targetCapacity, then the string will not fit and a buffer of the indicated length would need to be passed in

int32_t uima::UnicodeStringRef::extract ( int32_t start,

int32_t startLength,

std::string & target,

const char * codepage = 0

) const

Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.
The output string is NUL-terminated.

Parameters:

start offset of first character which will be copied

startLength the number of characters to extract

target the target string for extraction

codepage the desired codepage for the characters. 0 has the special meaning of the default codepage. If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h.

Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract ( std::string & target,

const char * codepage = 0

) const [inline]

Copy all the characters in the string into an std::string object in a specified codepage.
Equivalent to extract(0, length(), target, codepage)

Parameters:

target the target string for extraction

codepage the desired codepage for the characters.

Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extractUTF8 ( std::string & target ) const

Copy all the characters in the string into an std::string object in UTF-8.
Slightly more efficient than asUTF8() as avoids one copy.

Parameters:

target the target string for extraction

Returns:
the output string length, not including the terminating NUL

std::string uima::UnicodeStringRef::asUTF8 ( void ) const [inline]

Convert to a UTF8 string.

Returns:
a std::string

void uima::UnicodeStringRef::release ( std::string & target ) [static]

Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.
when debug code uses a release library. Is static so can be called on the UnicodeStringRef class directly.

int32_t uima::UnicodeStringRef::length ( void ) const [inline]

Return the length of the UnicodeStringRef object.
The length is the number of characters in the text.
Returns:
the length of the UnicodeStringRef object

int32_t uima::UnicodeStringRef::countChar32 ( int32_t start = 0,

int32_t length = 0x7fffffff

) const

Count Unicode code points in the length UChar code units of the string.
A code point may occupy either one or two UChar code units. Counting code points involves reading all code units.
This functions is basically the inverse of moveIndex32().

Parameters:

start the index of the first code unit to check

length the number of UChar code units to check

Returns:
the number of code points in the specified code units

bool uima::UnicodeStringRef::isEmpty ( void ) const [inline]

Determine if this string is empty.

Returns:
TRUE if this string contains 0 characters, FALSE otherwise.

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UnicodeStringRef & srcText ) [inline]

Set the text in the UnicodeString object to the characters in srcText.
srcText is not modified.
Parameters:

srcText the source for the new characters

Returns:
a reference to this

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UnicodeString & srcText ) [inline]

Set the text in the UnicodeString object to the characters in srcText.
srcText is not modified.
Parameters:

srcText the source for the new characters

Returns:
a reference to this

UnicodeStringRef & uima::UnicodeStringRef::setTo ( const UChar * srcChars,

int32_t srcLength

) [inline]

Set the characters in the UnicodeString object to the characters in srcChars.
srcChars is not modified.
Parameters:

srcChars the source for the new characters

srcLength the number of Unicode characters in srcChars.

Returns:
a reference to this

void uima::UnicodeStringRef::toSingleByteStream ( std::ostream & outStream ) const

Print a single byte version to outStream.
The encoding is UTF-8 if outStream is directed to disk, if outStream is cout our cerr the encoding is a Console-CCSID that will allow most character to be readable in a shell/command window.

The documentation for this class was generated from the following file:

unistrref.hpp

Generated on Mon Oct 1 16:04:14 2012 for UIMACPP API by

1.3.9.1


Public Member Functions
	UnicodeStringRef (void)
	Default Constructor.
	UnicodeStringRef (const icu::UnicodeString &crUniString)
	Constructor from icu::UnicodeString.
	UnicodeStringRef (UChar const *cpacString)
	Constructor from zero terminated string.
	UnicodeStringRef (UChar const *cpacString, int32_t uiLength)
	Constructor from string and length.
	UnicodeStringRef (UChar const paucStringBegin, UChar const paucStringEnd)
	Constructor from a two pointers (begin/end).
int32_t	getSizeInBytes (void) const
	Accessor for the number of bytes occupied by this string.
UChar const *	getBuffer (void) const
	CONST Accessor for the string content (NOT ZERO DELIMITED!).
UnicodeStringRef &	operator= (UnicodeStringRef const &crclRHS)
	Assignment operator.
int	operator== (const UnicodeStringRef &crclRHS) const
	Equality operator.
int	operator!= (const UnicodeStringRef &crclRHS) const
	Inequality operator.
bool	operator< (UnicodeStringRef const &text) const
	less operator
bool	operator<= (UnicodeStringRef const &text) const
	less equal operator
bool	operator> (UnicodeStringRef const &text) const
	greater operator
bool	operator>= (UnicodeStringRef const &text) const
	greater equal operator
int8_t	compare (const UnicodeStringRef &text) const
	Compare the characters bitwise in this UnicodeStringRef to the characters in `text`.
int8_t	compare (const icu::UnicodeString &text) const
	Compare the characters bitwise in this UnicodeStringRef to the characters in `text`.
int8_t	compare (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
	Compare the characters bitwise in the range [`start`, `start + length`) with the characters in `srcText`.
int8_t	compare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
	Compare the characters bitwise in the range [`start`, `start + length`) with the characters in `srcText` in the range [`srcStart`, `srcStart + srcLength`).
int8_t	compare (UChar const *srcChars, int32_t srcLength) const
	Compare the characters bitwise in this UnicodeStringRef with the first `srcLength` characters in `srcChars`.
int8_t	compare (int32_t start, int32_t length, UChar const *srcChars) const
	Compare the characters bitwise in the range [`start`, `start + length`) with the first `length` characters in `srcChars`.
int8_t	compare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
	Compare the characters bitwise in the range [`start`, `start + length`) with the characters in `srcChars` in the range [`srcStart`, `srcStart + srcLength`).
int8_t	compareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
	Compare the characters bitwise in the range [`start`, `limit`) with the characters in `srcText` in the range [`srcStart`, `srcLimit`).
int8_t	compareCodePointOrder (const UnicodeStringRef &text) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrder (UChar const *srcChars, int32_t srcLength) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
	Compare two Unicode strings in code point order.
int8_t	compareCodePointOrderBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
	Compare two Unicode strings in code point order.
int8_t	caseCompare (const UnicodeStringRef &text, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompare (UChar const *srcChars, int32_t srcLength, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompare (int32_t start, int32_t length, UChar const *srcChars, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
int8_t	caseCompareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit, uint32_t options) const
	Compare two strings case-insensitively using full case folding.
bool	startsWith (const UnicodeStringRef &text) const
	Determine if this starts with the characters in `text`.
bool	startsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
	Determine if this starts with the characters in `srcText` in the range [`srcStart`, `srcStart + srcLength`).
bool	startsWith (UChar const *srcChars, int32_t srcLength) const
	Determine if this starts with the characters in `srcChars`.
bool	startsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
	Determine if this starts with the characters in `srcChars` in the range [`srcStart`, `srcStart + srcLength`).
bool	endsWith (const UnicodeStringRef &text) const
	Determine if this ends with the characters in `text`.
bool	endsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
	Determine if this ends with the characters in `srcText` in the range [`srcStart`, `srcStart + srcLength`).
bool	endsWith (UChar const *srcChars, int32_t srcLength) const
	Determine if this ends with the characters in `srcChars`.
bool	endsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
	Determine if this ends with the characters in `srcChars` in the range [`srcStart`, `srcStart + srcLength`).
int32_t	indexOf (const UnicodeStringRef &text) const
	Locate in this the first occurrence of the characters in `text`, using bitwise comparison.
int32_t	indexOf (const UnicodeStringRef &text, int32_t start) const
	Locate in this the first occurrence of the characters in `text` starting at offset `start`, using bitwise comparison.
int32_t	indexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
	Locate in this the first occurrence in the range [`start`, `start + length`) of the characters in `text`, using bitwise comparison.
int32_t	indexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the first occurrence in the range [`start`, `start + length`) of the characters in `srcText` in the range [`srcStart`, `srcStart + srcLength`), using bitwise comparison.
int32_t	indexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
	Locate in this the first occurrence of the characters in `srcChars` starting at offset `start`, using bitwise comparison.
int32_t	indexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the first occurrence in the range [`start`, `start + length`) of the characters in `srcChars`, using bitwise comparison.
int32_t	indexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the first occurrence in the range [`start`, `start + length`) of the characters in `srcChars` in the range [`srcStart`, `srcStart + srcLength`), using bitwise comparison.
int32_t	indexOf (UChar c) const
	Locate in this the first occurrence of the code unit `c`, using bitwise comparison.
int32_t	indexOf (UChar32 c) const
	Locate in this the first occurrence of the code point `c`, using bitwise comparison.
int32_t	indexOf (UChar c, int32_t start) const
	Locate in this the first occurrence of the code unit `c` starting at offset `start`, using bitwise comparison.
int32_t	indexOf (UChar32 c, int32_t start) const
	Locate in this the first occurrence of the code point `c` starting at offset `start`, using bitwise comparison.
int32_t	indexOf (UChar c, int32_t start, int32_t length) const
	Locate in this the first occurrence of the code unit `c` in the range [`start`, `start + length`), using bitwise comparison.
int32_t	indexOf (UChar32 c, int32_t start, int32_t length) const
	Locate in this the first occurrence of the code point `c` in the range [`start`, `start + length`), using bitwise comparison.
int32_t	lastIndexOf (const UnicodeStringRef &text) const
	Locate in this the last occurrence of the characters in `text`, using bitwise comparison.
int32_t	lastIndexOf (const UnicodeStringRef &text, int32_t start) const
	Locate in this the last occurrence of the characters in `text` starting at offset `start`, using bitwise comparison.
int32_t	lastIndexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
	Locate in this the last occurrence in the range [`start`, `start + length`) of the characters in `text`, using bitwise comparison.
int32_t	lastIndexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the last occurrence in the range [`start`, `start + length`) of the characters in `srcText` in the range [`srcStart`, `srcStart + srcLength`), using bitwise comparison.
int32_t	lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
	Locate in this the last occurrence of the characters in `srcChars` starting at offset `start`, using bitwise comparison.
int32_t	lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the last occurrence in the range [`start`, `start + length`) of the characters in `srcChars`, using bitwise comparison.
int32_t	lastIndexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
	Locate in this the last occurrence in the range [`start`, `start + length`) of the characters in `srcChars` in the range [`srcStart`, `srcStart + srcLength`), using bitwise comparison.
int32_t	lastIndexOf (UChar c) const
	Locate in this the last occurrence of the code unit `c`, using bitwise comparison.
int32_t	lastIndexOf (UChar32 c) const
	Locate in this the last occurrence of the code point `c`, using bitwise comparison.
int32_t	lastIndexOf (UChar c, int32_t start) const
	Locate in this the last occurrence of the code unit `c` starting at offset `start`, using bitwise comparison.
int32_t	lastIndexOf (UChar32 c, int32_t start) const
	Locate in this the last occurrence of the code point `c` starting at offset `start`, using bitwise comparison.
int32_t	lastIndexOf (UChar c, int32_t start, int32_t length) const
	Locate in this the last occurrence of the code unit `c` in the range [`start`, `start + length`), using bitwise comparison.
int32_t	lastIndexOf (UChar32 c, int32_t start, int32_t length) const
	Locate in this the last occurrence of the code point `c` in the range [`start`, `start + length`), using bitwise comparison.
UChar	charAt (int32_t offset) const
	Return the code unit at offset `offset`.
UChar	operator[] (int32_t offset) const
	Return the code unit at offset `offset`.
UChar32	char32At (int32_t offset) const
	Return the code point that contains the code unit at offset `offset`.
int32_t	getChar32Start (int32_t offset) const
	Adjust a random-access offset so that it points to the beginning of a Unicode character.
int32_t	getChar32Limit (int32_t offset) const
	Adjust a random-access offset so that it points behind a Unicode character.
int32_t	moveIndex32 (int32_t index, int32_t delta) const
	Move the code unit index along the string by delta code points.
void	extract (int32_t start, int32_t length, UChar *dst, int32_t dstStart=0) const
	Copy the characters in the range [`start`, `start + length`) into the array `dst`, beginning at `dstStart`.
void	extractBetween (int32_t start, int32_t limit, UChar *dst, int32_t dstStart=0) const
	Copy the characters in the range [`start`, `limit`) into the array `dst`, beginning at `dstStart`.
int32_t	extract (UChar *dst, int32_t dstCapacity, UErrorCode &errorCode) const
	Copy the contents of the string into dst.
void	extract (int32_t start, int32_t length, UnicodeString &dst) const
	Copy the characters in the range [`start`, `start + length`) into the UnicodeString `dst`.
void	extractBetween (int32_t start, int32_t limit, UnicodeString &dst) const
	Copy the characters in the range [`start`, `limit`) into the UnicodeString `dst`.
int32_t	extract (int32_t start, int32_t startLength, char target, const char codepage=0) const
	Copy the characters in the range [`start`, `start + length`) into an array of characters in a specified codepage.
int32_t	extract (int32_t start, int32_t startLength, char target, uint32_t targetLength, const char codepage=0) const
	Copy the characters in the range [`start`, `start + length`) into an array of characters in a specified codepage.
int32_t	extract (char target, int32_t targetCapacity, UConverter cnv, UErrorCode &errorCode) const
	Convert the UnicodeStringRef into a codepage string using an existing UConverter.
int32_t	extract (int32_t start, int32_t startLength, std::string &target, const char *codepage=0) const
	Copy the characters in the range [`start`, `start + length`) into a std::string object in a specified codepage.
int32_t	extract (std::string &target, const char *codepage=0) const
	Copy all the characters in the string into an std::string object in a specified codepage.
int32_t	extractUTF8 (std::string &target) const
	Copy all the characters in the string into an std::string object in UTF-8.
std::string	asUTF8 (void) const
	Convert to a UTF8 string.
int32_t	length (void) const
	Return the length of the UnicodeStringRef object.
int32_t	countChar32 (int32_t start=0, int32_t length=0x7fffffff) const
	Count Unicode code points in the length UChar code units of the string.
bool	isEmpty (void) const
	Determine if this string is empty.
UnicodeStringRef &	setTo (const UnicodeStringRef &srcText)
	Set the text in the UnicodeString object to the characters in `srcText`.
UnicodeStringRef &	setTo (const UnicodeString &srcText)
	Set the text in the UnicodeString object to the characters in `srcText`.
UnicodeStringRef &	setTo (const UChar *srcChars, int32_t srcLength)
	Set the characters in the UnicodeString object to the characters in `srcChars`.
void	toSingleByteStream (std::ostream &outStream) const
	Print a single byte version to outStream.
Static Public Member Functions
void	release (std::string &target)
	Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.