Overview   Class List   Class Hierarchy   Class Members   Functions & Constants   Defines   Header Files  

uima::UnicodeStringRef Class Reference

List of all members.

Detailed Description

The class UnicodeStringRef provides support for non zero-terminated strings that are presented as pointers to Unicode character arrays with an associated length.

As this type of string is supposed to be used only as string reference into read-only buffers, the string pointer is constant. The member functions are named to implement the icu::UnicodeString interface but only providing const member functions This class is a quick ,light-weight, shallow string (internally it consists only of a pointer and a length) which can be copied by value without performance penalty. It allows references into other string buffers to be treated like real string objects. Since it does not own it's string memory care must be taken to make sure the lifetime of an UnicodeStringRef object does not exceed the lifetime of the Unicode character buffer it references.


Public Member Functions

 UnicodeStringRef (void)
 Default Constructor.
 UnicodeStringRef (const icu::UnicodeString &crUniString)
 Constructor from icu::UnicodeString.
 UnicodeStringRef (UChar const *cpacString)
 Constructor from zero terminated string.
 UnicodeStringRef (UChar const *cpacString, int32_t uiLength)
 Constructor from string and length.
 UnicodeStringRef (UChar const *paucStringBegin, UChar const *paucStringEnd)
 Constructor from a two pointers (begin/end).
int32_t getSizeInBytes (void) const
 Accessor for the number of bytes occupied by this string.
UChar const * getBuffer (void) const
 CONST Accessor for the string content (NOT ZERO DELIMITED!).
UnicodeStringRefoperator= (UnicodeStringRef const &crclRHS)
 Assignment operator.
int operator== (const UnicodeStringRef &crclRHS) const
 Equality operator.
int operator!= (const UnicodeStringRef &crclRHS) const
 Inequality operator.
bool operator< (UnicodeStringRef const &text) const
 less operator
bool operator<= (UnicodeStringRef const &text) const
 less equal operator
bool operator> (UnicodeStringRef const &text) const
 greater operator
bool operator>= (UnicodeStringRef const &text) const
 greater equal operator
int8_t compare (const UnicodeStringRef &text) const
 Compare the characters bitwise in this UnicodeStringRef to the characters in text.
int8_t compare (const icu::UnicodeString &text) const
 Compare the characters bitwise in this UnicodeStringRef to the characters in text.
int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcText.
int8_t compare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).
int8_t compare (UChar const *srcChars, int32_t srcLength) const
 Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.
int8_t compare (int32_t start, int32_t length, UChar const *srcChars) const
 Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.
int8_t compare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).
int8_t compareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
 Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).
int8_t compareCodePointOrder (const UnicodeStringRef &text) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (UChar const *srcChars, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrder (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Compare two Unicode strings in code point order.
int8_t compareCodePointOrderBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit) const
 Compare two Unicode strings in code point order.
int8_t caseCompare (const UnicodeStringRef &text, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (UChar const *srcChars, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompare (int32_t start, int32_t length, UChar const *srcChars, int32_t srcStart, int32_t srcLength, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
int8_t caseCompareBetween (int32_t start, int32_t limit, const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLimit, uint32_t options) const
 Compare two strings case-insensitively using full case folding.
bool startsWith (const UnicodeStringRef &text) const
 Determine if this starts with the characters in text.
bool startsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).
bool startsWith (UChar const *srcChars, int32_t srcLength) const
 Determine if this starts with the characters in srcChars.
bool startsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).
bool endsWith (const UnicodeStringRef &text) const
 Determine if this ends with the characters in text.
bool endsWith (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength) const
 Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).
bool endsWith (UChar const *srcChars, int32_t srcLength) const
 Determine if this ends with the characters in srcChars.
bool endsWith (UChar const *srcChars, int32_t srcStart, int32_t srcLength) const
 Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).
int32_t indexOf (const UnicodeStringRef &text) const
 Locate in this the first occurrence of the characters in text, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &text, int32_t start) const
 Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.
int32_t indexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
 Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.
int32_t indexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t indexOf (UChar c) const
 Locate in this the first occurrence of the code unit c, using bitwise comparison.
int32_t indexOf (UChar32 c) const
 Locate in this the first occurrence of the code point c, using bitwise comparison.
int32_t indexOf (UChar c, int32_t start) const
 Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.
int32_t indexOf (UChar32 c, int32_t start) const
 Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.
int32_t indexOf (UChar c, int32_t start, int32_t length) const
 Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.
int32_t indexOf (UChar32 c, int32_t start, int32_t length) const
 Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text) const
 Locate in this the last occurrence of the characters in text, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start) const
 Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &text, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.
int32_t lastIndexOf (const UnicodeStringRef &srcText, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start) const
 Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.
int32_t lastIndexOf (UChar const *srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length) const
 Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.
int32_t lastIndexOf (UChar c) const
 Locate in this the last occurrence of the code unit c, using bitwise comparison.
int32_t lastIndexOf (UChar32 c) const
 Locate in this the last occurrence of the code point c, using bitwise comparison.
int32_t lastIndexOf (UChar c, int32_t start) const
 Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar32 c, int32_t start) const
 Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.
int32_t lastIndexOf (UChar c, int32_t start, int32_t length) const
 Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.
int32_t lastIndexOf (UChar32 c, int32_t start, int32_t length) const
 Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.
UChar charAt (int32_t offset) const
 Return the code unit at offset offset.
UChar operator[] (int32_t offset) const
 Return the code unit at offset offset.
UChar32 char32At (int32_t offset) const
 Return the code point that contains the code unit at offset offset.
int32_t getChar32Start (int32_t offset) const
 Adjust a random-access offset so that it points to the beginning of a Unicode character.
int32_t getChar32Limit (int32_t offset) const
 Adjust a random-access offset so that it points behind a Unicode character.
int32_t moveIndex32 (int32_t index, int32_t delta) const
 Move the code unit index along the string by delta code points.
void extract (int32_t start, int32_t length, UChar *dst, int32_t dstStart=0) const
 Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.
void extractBetween (int32_t start, int32_t limit, UChar *dst, int32_t dstStart=0) const
 Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.
int32_t extract (UChar *dst, int32_t dstCapacity, UErrorCode &errorCode) const
 Copy the contents of the string into dst.
void extract (int32_t start, int32_t length, UnicodeString &dst) const
 Copy the characters in the range [start, start + length) into the UnicodeString dst.
void extractBetween (int32_t start, int32_t limit, UnicodeString &dst) const
 Copy the characters in the range [start, limit) into the UnicodeString dst.
int32_t extract (int32_t start, int32_t startLength, char *target, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
int32_t extract (int32_t start, int32_t startLength, char *target, uint32_t targetLength, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.
int32_t extract (char *target, int32_t targetCapacity, UConverter *cnv, UErrorCode &errorCode) const
 Convert the UnicodeStringRef into a codepage string using an existing UConverter.
int32_t extract (int32_t start, int32_t startLength, std::string &target, const char *codepage=0) const
 Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.
int32_t extract (std::string &target, const char *codepage=0) const
 Copy all the characters in the string into an std::string object in a specified codepage.
int32_t extractUTF8 (std::string &target) const
 Copy all the characters in the string into an std::string object in UTF-8.
std::string asUTF8 (void) const
 Convert to a UTF8 string.
int32_t length (void) const
 Return the length of the UnicodeStringRef object.
int32_t countChar32 (int32_t start=0, int32_t length=0x7fffffff) const
 Count Unicode code points in the length UChar code units of the string.
bool isEmpty (void) const
 Determine if this string is empty.
UnicodeStringRefsetTo (const UnicodeStringRef &srcText)
 Set the text in the UnicodeString object to the characters in srcText.
UnicodeStringRefsetTo (const UnicodeString &srcText)
 Set the text in the UnicodeString object to the characters in srcText.
UnicodeStringRefsetTo (const UChar *srcChars, int32_t srcLength)
 Set the characters in the UnicodeString object to the characters in srcChars.
void toSingleByteStream (std::ostream &outStream) const
 Print a single byte version to outStream.

Static Public Member Functions

void release (std::string &target)
 Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.


Constructor & Destructor Documentation

uima::UnicodeStringRef::UnicodeStringRef void   )  [inline]
 

Default Constructor.

uima::UnicodeStringRef::UnicodeStringRef const icu::UnicodeString &  crUniString  )  [inline]
 

Constructor from icu::UnicodeString.

uima::UnicodeStringRef::UnicodeStringRef UChar const *  cpacString  )  [inline, explicit]
 

Constructor from zero terminated string.

uima::UnicodeStringRef::UnicodeStringRef UChar const *  cpacString,
int32_t  uiLength
[inline]
 

Constructor from string and length.

uima::UnicodeStringRef::UnicodeStringRef UChar const *  paucStringBegin,
UChar const *  paucStringEnd
[inline]
 

Constructor from a two pointers (begin/end).

Note: end points to the first char behind the string.

Deprecated:
Replace with UnicodeStringRef(paucStringBegin,paucStringEnd-paucStringBegin).


Member Function Documentation

int32_t uima::UnicodeStringRef::getSizeInBytes void   )  const [inline]
 

Accessor for the number of bytes occupied by this string.

UChar const * uima::UnicodeStringRef::getBuffer void   )  const [inline]
 

CONST Accessor for the string content (NOT ZERO DELIMITED!).

UnicodeStringRef & uima::UnicodeStringRef::operator= UnicodeStringRef const &  crclRHS  )  [inline]
 

Assignment operator.

int uima::UnicodeStringRef::operator== const UnicodeStringRef crclRHS  )  const [inline]
 

Equality operator.

int uima::UnicodeStringRef::operator!= const UnicodeStringRef crclRHS  )  const [inline]
 

Inequality operator.

bool uima::UnicodeStringRef::operator< UnicodeStringRef const &  text  )  const [inline]
 

less operator

bool uima::UnicodeStringRef::operator<= UnicodeStringRef const &  text  )  const [inline]
 

less equal operator

bool uima::UnicodeStringRef::operator> UnicodeStringRef const &  text  )  const [inline]
 

greater operator

bool uima::UnicodeStringRef::operator>= UnicodeStringRef const &  text  )  const [inline]
 

greater equal operator

int8_t uima::UnicodeStringRef::compare const UnicodeStringRef text  )  const [inline]
 

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:
text The UnicodeStringRef to compare to this one.
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare const icu::UnicodeString &  text  )  const [inline]
 

Compare the characters bitwise in this UnicodeStringRef to the characters in text.

Parameters:
text The UnicodeString to compare to this one.
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare int32_t  start,
int32_t  length,
const UnicodeStringRef srcText
const [inline]
 

Compare the characters bitwise in the range [start, start + length) with the characters in srcText.

Parameters:
start the offset at which the compare operation begins
length the number of characters of text to compare.
srcText the text to be compared
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Compare the characters bitwise in the range [start, start + length) with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
start the offset at which the compare operation begins
length the number of characters in this to compare.
srcText the text to be compared
srcStart the offset into srcText to start comparison
srcLength the number of characters in src to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare UChar const *  srcChars,
int32_t  srcLength
const [inline]
 

Compare the characters bitwise in this UnicodeStringRef with the first srcLength characters in srcChars.

Parameters:
srcChars The characters to compare to this UnicodeStringRef.
srcLength the number of characters in srcChars to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare int32_t  start,
int32_t  length,
UChar const *  srcChars
const [inline]
 

Compare the characters bitwise in the range [start, start + length) with the first length characters in srcChars.

Parameters:
start the offset at which the compare operation begins
length the number of characters to compare.
srcChars the characters to be compared
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compare int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Compare the characters bitwise in the range [start, start + length) with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
start the offset at which the compare operation begins
length the number of characters in this to compare
srcChars the characters to be compared
srcStart the offset into srcChars to start comparison
srcLength the number of characters in srcChars to compare
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareBetween int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit
const [inline]
 

Compare the characters bitwise in the range [start, limit) with the characters in srcText in the range [srcStart, srcLimit).

Parameters:
start the offset at which the compare operation begins
limit the offset immediately following the compare operation
srcText the text to be compared
srcStart the offset into srcText to start comparison
srcLimit the offset into srcText to limit comparison
Returns:
The result of bitwise character comparison: 0 if text contains the same characters as this, -1 if the characters in text are bitwise less than the characters in this, +1 if the characters in text are bitwise greater than the characters in this.

int8_t uima::UnicodeStringRef::compareCodePointOrder const UnicodeStringRef text  )  const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
text Another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder int32_t  start,
int32_t  length,
const UnicodeStringRef srcText
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder UChar const *  srcChars,
int32_t  srcLength
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
srcChars A pointer to another string to compare this one to.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder int32_t  start,
int32_t  length,
UChar const *  srcChars
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrder int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::compareCodePointOrderBetween int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit
const [inline]
 

Compare two Unicode strings in code point order.

This is different in UTF-16 from how compare(), operator==, startsWith() etc. work if supplementary characters are present:

In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.

Parameters:
start The start offset in this string at which the compare operation begins.
limit The offset after the last code unit from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLimit The offset after the last code unit from that string to compare.
Returns:
a negative/zero/positive integer corresponding to whether this string is less than/equal to/greater than the second one in code point order

int8_t uima::UnicodeStringRef::caseCompare const UnicodeStringRef text,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(text.foldCase(options)).

Parameters:
text Another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare int32_t  start,
int32_t  length,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare UChar const *  srcChars,
int32_t  srcLength,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
srcChars A pointer to another string to compare this one to.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare int32_t  start,
int32_t  length,
UChar const *  srcChars,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompare int32_t  start,
int32_t  length,
UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
length The number of code units from this string to compare.
srcChars A pointer to another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLength The number of code units from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

int8_t uima::UnicodeStringRef::caseCompareBetween int32_t  start,
int32_t  limit,
const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLimit,
uint32_t  options
const [inline]
 

Compare two strings case-insensitively using full case folding.

This is equivalent to this->foldCase(options).compareBetween(text.foldCase(options)).

Parameters:
start The start offset in this string at which the compare operation begins.
limit The offset after the last code unit from this string to compare.
srcText Another string to compare this one to.
srcStart The start offset in that string at which the compare operation begins.
srcLimit The offset after the last code unit from that string to compare.
options Either U_FOLD_CASE_DEFAULT or U_FOLD_CASE_EXCLUDE_SPECIAL_I
Returns:
A negative, zero, or positive integer indicating the comparison result.

bool uima::UnicodeStringRef::startsWith const UnicodeStringRef text  )  const [inline]
 

Determine if this starts with the characters in text.

Parameters:
text The text to match.
Returns:
TRUE if this starts with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::startsWith const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Determine if this starts with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
srcText The text to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcText to match
Returns:
TRUE if this starts with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::startsWith UChar const *  srcChars,
int32_t  srcLength
const [inline]
 

Determine if this starts with the characters in srcChars.

Parameters:
srcChars The characters to match.
srcLength the number of characters in srcChars
Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::startsWith UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Determine if this starts with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
srcChars The characters to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcChars to match
Returns:
TRUE if this starts with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith const UnicodeStringRef text  )  const [inline]
 

Determine if this ends with the characters in text.

Parameters:
text The text to match.
Returns:
TRUE if this ends with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::endsWith const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Determine if this ends with the characters in srcText in the range [srcStart, srcStart + srcLength).

Parameters:
srcText The text to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcText to match
Returns:
TRUE if this ends with the characters in text, FALSE otherwise

bool uima::UnicodeStringRef::endsWith UChar const *  srcChars,
int32_t  srcLength
const [inline]
 

Determine if this ends with the characters in srcChars.

Parameters:
srcChars The characters to match.
srcLength the number of characters in srcChars
Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

bool uima::UnicodeStringRef::endsWith UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength
const [inline]
 

Determine if this ends with the characters in srcChars in the range [srcStart, srcStart + srcLength).

Parameters:
srcChars The characters to match.
srcStart the offset into srcText to start matching
srcLength the number of characters in srcChars to match
Returns:
TRUE if this ends with the characters in srcChars, FALSE otherwise

int32_t uima::UnicodeStringRef::indexOf const UnicodeStringRef text  )  const [inline]
 

Locate in this the first occurrence of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf const UnicodeStringRef text,
int32_t  start
const [inline]
 

Locate in this the first occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf const UnicodeStringRef text,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the first occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the first occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcText The text to search for.
srcStart the offset into srcText at which to start matching
srcLength the number of characters in srcText to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar const *  srcChars,
int32_t  srcLength,
int32_t  start
const [inline]
 

Locate in this the first occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar const *  srcChars,
int32_t  srcLength,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of srcChars, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length
const
 

Locate in this the first occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcChars The text to search for.
srcStart the offset into srcChars at which to start matching
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar  c  )  const [inline]
 

Locate in this the first occurrence of the code unit c, using bitwise comparison.

Parameters:
c The code unit to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar32  c  )  const [inline]
 

Locate in this the first occurrence of the code point c, using bitwise comparison.

Parameters:
c The code point to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar  c,
int32_t  start
const [inline]
 

Locate in this the first occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:
c The code unit to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar32  c,
int32_t  start
const [inline]
 

Locate in this the first occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:
c The code point to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar  c,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the first occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code unit to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::indexOf UChar32  c,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the first occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code point to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf const UnicodeStringRef text  )  const [inline]
 

Locate in this the last occurrence of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf const UnicodeStringRef text,
int32_t  start
const [inline]
 

Locate in this the last occurrence of the characters in text starting at offset start, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf const UnicodeStringRef text,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the last occurrence in the range [start, start + length) of the characters in text, using bitwise comparison.

Parameters:
text The text to search for.
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf const UnicodeStringRef srcText,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the last occurrence in the range [start, start + length) of the characters in srcText in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcText The text to search for.
srcStart the offset into srcText at which to start matching
srcLength the number of characters in srcText to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar const *  srcChars,
int32_t  srcLength,
int32_t  start
const [inline]
 

Locate in this the last occurrence of the characters in srcChars starting at offset start, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar const *  srcChars,
int32_t  srcLength,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars, using bitwise comparison.

Parameters:
srcChars The text to search for.
srcLength the number of characters in srcChars
start The offset at which searching will start.
length The number of characters to search
Returns:
The offset into this of the start of srcChars, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar const *  srcChars,
int32_t  srcStart,
int32_t  srcLength,
int32_t  start,
int32_t  length
const
 

Locate in this the last occurrence in the range [start, start + length) of the characters in srcChars in the range [srcStart, srcStart + srcLength), using bitwise comparison.

Parameters:
srcChars The text to search for.
srcStart the offset into srcChars at which to start matching
srcLength the number of characters in srcChars to match
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of the start of text, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar  c  )  const [inline]
 

Locate in this the last occurrence of the code unit c, using bitwise comparison.

Parameters:
c The code unit to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar32  c  )  const [inline]
 

Locate in this the last occurrence of the code point c, using bitwise comparison.

Parameters:
c The code point to search for.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar  c,
int32_t  start
const [inline]
 

Locate in this the last occurrence of the code unit c starting at offset start, using bitwise comparison.

Parameters:
c The code unit to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar32  c,
int32_t  start
const [inline]
 

Locate in this the last occurrence of the code point c starting at offset start, using bitwise comparison.

Parameters:
c The code point to search for.
start The offset at which searching will start.
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar  c,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the last occurrence of the code unit c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code unit to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

int32_t uima::UnicodeStringRef::lastIndexOf UChar32  c,
int32_t  start,
int32_t  length
const [inline]
 

Locate in this the last occurrence of the code point c in the range [start, start + length), using bitwise comparison.

Parameters:
c The code point to search for.
start the offset into this at which to start matching
length the number of characters in this to search
Returns:
The offset into this of c, or -1 if not found.

UChar uima::UnicodeStringRef::charAt int32_t  offset  )  const [inline]
 

Return the code unit at offset offset.

Parameters:
offset a valid offset into the text
Returns:
the code unit at offset offset

UChar uima::UnicodeStringRef::operator[] int32_t  offset  )  const [inline]
 

Return the code unit at offset offset.

Parameters:
offset a valid offset into the text
Returns:
the code unit at offset offset

UChar32 uima::UnicodeStringRef::char32At int32_t  offset  )  const [inline]
 

Return the code point that contains the code unit at offset offset.

Parameters:
offset a valid offset into the text that indicates the text offset of any of the code units that will be assembled into a code point (21-bit value) and returned
Returns:
the code point of text at offset

int32_t uima::UnicodeStringRef::getChar32Start int32_t  offset  )  const [inline]
 

Adjust a random-access offset so that it points to the beginning of a Unicode character.

The offset that is passed in points to any code unit of a code point, while the returned offset will point to the first code unit of the same code point. In UTF-16, if the input offset points to a iv_uiLength surrogate of a surrogate pair, then the returned offset will point to the first surrogate.

Parameters:
offset a valid offset into one code point of the text
Returns:
offset of the first code unit of the same code point

int32_t uima::UnicodeStringRef::getChar32Limit int32_t  offset  )  const [inline]
 

Adjust a random-access offset so that it points behind a Unicode character.

The offset that is passed in points behind any code unit of a code point, while the returned offset will point behind the last code unit of the same code point. In UTF-16, if the input offset points behind the first surrogate (i.e., to the iv_uiLength surrogate) of a surrogate pair, then the returned offset will point behind the iv_uiLength surrogate (i.e., to the first surrogate).

Parameters:
offset a valid offset after any code unit of a code point of the text
Returns:
offset of the first code unit after the same code point

int32_t uima::UnicodeStringRef::moveIndex32 int32_t  index,
int32_t  delta
const
 

Move the code unit index along the string by delta code points.

Interpret the input index as a code unit-based offset into the string, move the index forward or backward by delta code points, and return the resulting index. The input index should point to the first code unit of a code point, if there is more than one.

Both input and output indexes are code unit-based as for all string indexes/offsets in ICU (and other libraries, like MBCS char*). If delta<0 then the index is moved backward (toward the start of the string). If delta>0 then the index is moved forward (toward the end of the string).

This behaves like CharacterIterator::move32(delta, kCurrent).

Examples: // s has code points 'a' U+10000 'b' U+10ffff U+2029 UnicodeStringRef s=UNICODE_STRING("a\\U00010000b\\U0010ffff\\u2029", 31).unescape();

// initial index: position of U+10000 int32_t index=1;

// the following examples will all result in index==4, position of U+10ffff

// skip 2 code points from some position in the string index=s.moveIndex32(index, 2); // skips U+10000 and 'b'

// go to the 3rd code point from the start of s (0-based) index=s.moveIndex32(0, 3); // skips 'a', U+10000, and 'b'

// go to the next-to-last code point of s

index=s.moveIndex32(s.length(), -2); // backward-skips U+2029 and U+10ffff

Parameters:
index input code unit index
delta (signed) code point count to move the index forward or backward in the string
Returns:
the resulting code unit index

void uima::UnicodeStringRef::extract int32_t  start,
int32_t  length,
UChar *  dst,
int32_t  dstStart = 0
const [inline]
 

Copy the characters in the range [start, start + length) into the array dst, beginning at dstStart.

If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:
start offset of first character which will be copied into the array
length the number of characters to extract
dst array in which to copy characters. The length of dst must be at least (dstStart + length).
dstStart the offset in dst where the first character will be extracted

void uima::UnicodeStringRef::extractBetween int32_t  start,
int32_t  limit,
UChar *  dst,
int32_t  dstStart = 0
const [inline]
 

Copy the characters in the range [start, limit) into the array dst, beginning at dstStart.

Parameters:
start offset of first character which will be copied into the array
limit offset immediately following the last character to be copied
dst array in which to copy characters. The length of dst must be at least (dstStart + (limit - start)).
dstStart the offset in dst where the first character will be extracted

int32_t uima::UnicodeStringRef::extract UChar *  dst,
int32_t  dstCapacity,
UErrorCode &  errorCode
const
 

Copy the contents of the string into dst.

This is a convenience function that checks if there is enough space in dst, extracts the entire string if possible, and NUL-terminates dst if possible.

If the string fits into dst but cannot be NUL-terminated (length()==dstCapacity) then the error code is set to U_STRING_NOT_TERMINATED_WARNING. If the string itself does not fit into dst (length()>dstCapacity) then the error code is set to U_BUFFER_OVERFLOW_ERROR.

If the string aliases to dst itself as an external buffer, then extract() will not copy the contents.

Parameters:
dst Destination string buffer.
dstCapacity Number of UChars available at dst.
errorCode ICU error code.
Returns:
length()

void uima::UnicodeStringRef::extract int32_t  start,
int32_t  length,
UnicodeString &  dst
const [inline]
 

Copy the characters in the range [start, start + length) into the UnicodeString dst.

Parameters:
start offset of first character which will be copied
length the number of characters to extract
dst UnicodeString into which to copy characters.
Returns:
A reference to dst

void uima::UnicodeStringRef::extractBetween int32_t  start,
int32_t  limit,
UnicodeString &  dst
const [inline]
 

Copy the characters in the range [start, limit) into the UnicodeString dst.

Parameters:
start offset of first character which will be copied
limit offset immediately following the last character to be copied
dst UnicodeString into which to copy characters.
Returns:
A reference to dst

int32_t uima::UnicodeStringRef::extract int32_t  start,
int32_t  startLength,
char *  target,
const char *  codepage = 0
const [inline]
 

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

The output string is NUL-terminated.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target buffer for extraction
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned. NOTE: It is assumed that the target is big enough to fit all of the characters.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract int32_t  start,
int32_t  startLength,
char *  target,
uint32_t  targetLength,
const char *  codepage = 0
const
 

Copy the characters in the range [start, start + length) into an array of characters in a specified codepage.

This function does not write any more than targetLength characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target buffer for extraction
targetLength the length of the target buffer
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target is NULL, then the number of bytes required for target is returned.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract char *  target,
int32_t  targetCapacity,
UConverter *  cnv,
UErrorCode &  errorCode
const
 

Convert the UnicodeStringRef into a codepage string using an existing UConverter.

The output string is NUL-terminated if possible.

This function avoids the overhead of opening and closing a converter if multiple strings are extracted.

Parameters:
target destination string buffer, can be NULL if targetCapacity==0
targetCapacity the number of chars available at target
cnv the converter object to be used (ucnv_resetFromUnicode() will be called), or NULL for the default converter
errorCode normal ICU error code
Returns:
the length of the output string, not counting the terminating NUL; if the length is greater than targetCapacity, then the string will not fit and a buffer of the indicated length would need to be passed in

int32_t uima::UnicodeStringRef::extract int32_t  start,
int32_t  startLength,
std::string &  target,
const char *  codepage = 0
const
 

Copy the characters in the range [start, start + length) into a std::string object in a specified codepage.

The output string is NUL-terminated.

Parameters:
start offset of first character which will be copied
startLength the number of characters to extract
target the target string for extraction
codepage the desired codepage for the characters. 0 has the special meaning of the default codepage. If codepage is an empty string (""), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extract std::string &  target,
const char *  codepage = 0
const [inline]
 

Copy all the characters in the string into an std::string object in a specified codepage.

Equivalent to extract(0, length(), target, codepage)

Parameters:
target the target string for extraction
codepage the desired codepage for the characters.
Returns:
the output string length, not including the terminating NUL

int32_t uima::UnicodeStringRef::extractUTF8 std::string &  target  )  const
 

Copy all the characters in the string into an std::string object in UTF-8.

Slightly more efficient than asUTF8() as avoids one copy.

Parameters:
target the target string for extraction
Returns:
the output string length, not including the terminating NUL

std::string uima::UnicodeStringRef::asUTF8 void   )  const [inline]
 

Convert to a UTF8 string.

Returns:
a std::string

void uima::UnicodeStringRef::release std::string &  target  )  [static]
 

Release contents of string container allocated by extract methods Useful when caller and callee use different heaps, e.g.

when debug code uses a release library. Is static so can be called on the UnicodeStringRef class directly.

int32_t uima::UnicodeStringRef::length void   )  const [inline]
 

Return the length of the UnicodeStringRef object.

The length is the number of characters in the text.

Returns:
the length of the UnicodeStringRef object

int32_t uima::UnicodeStringRef::countChar32 int32_t  start = 0,
int32_t  length = 0x7fffffff
const
 

Count Unicode code points in the length UChar code units of the string.

A code point may occupy either one or two UChar code units. Counting code points involves reading all code units.

This functions is basically the inverse of moveIndex32().

Parameters:
start the index of the first code unit to check
length the number of UChar code units to check
Returns:
the number of code points in the specified code units

bool uima::UnicodeStringRef::isEmpty void   )  const [inline]
 

Determine if this string is empty.

Returns:
TRUE if this string contains 0 characters, FALSE otherwise.

UnicodeStringRef & uima::UnicodeStringRef::setTo const UnicodeStringRef srcText  )  [inline]
 

Set the text in the UnicodeString object to the characters in srcText.

srcText is not modified.

Parameters:
srcText the source for the new characters
Returns:
a reference to this

UnicodeStringRef & uima::UnicodeStringRef::setTo const UnicodeString &  srcText  )  [inline]
 

Set the text in the UnicodeString object to the characters in srcText.

srcText is not modified.

Parameters:
srcText the source for the new characters
Returns:
a reference to this

UnicodeStringRef & uima::UnicodeStringRef::setTo const UChar *  srcChars,
int32_t  srcLength
[inline]
 

Set the characters in the UnicodeString object to the characters in srcChars.

srcChars is not modified.

Parameters:
srcChars the source for the new characters
srcLength the number of Unicode characters in srcChars.
Returns:
a reference to this

void uima::UnicodeStringRef::toSingleByteStream std::ostream &  outStream  )  const
 

Print a single byte version to outStream.

The encoding is UTF-8 if outStream is directed to disk, if outStream is cout our cerr the encoding is a Console-CCSID that will allow most character to be readable in a shell/command window.


The documentation for this class was generated from the following file:
Generated on Mon Oct 1 16:04:14 2012 for UIMACPP API by  doxygen 1.3.9.1