carb::extras::Utf8Iterator
Defined in carb/extras/Utf8Parser.h
-
class Utf8Iterator
A simple iterator class for walking a UTF-8 string.
This is built on top of the UTF8Parser static class and uses its functionality. Strings can only be walked forward. Random access to codepoints in the string is not possible. If needed, the pointer to the start of the next codepoint or the codepoint index can be retrieved.
Unnamed Group
-
using CodeByte = Utf8Parser::CodeByte
Reference the types used in Utf8Parser for more convenient use locally.
The base type for a point in a UTF-8 string.
Ideally these values should point to the start of an encoded codepoint in a string.
-
using CodePoint = Utf8Parser::CodePoint
The base type for a single Unicode codepoint value.
This represents a decoded UTF-8 codepoint.
-
using Flags = Utf8Parser::Flags
Base type for flags to various encoding and decoding functions.
Public Functions
-
inline Utf8Iterator()
-
inline Utf8Iterator(const CodeByte *string, size_t lengthInBytes = kNullTerminated, Flags flags = 0)
Constructor: initializes a new iterator for a given string.
- Parameters
string – [in] The string to walk. This should be a UTF-8 encoded string. This can be
nullptr
, but the iterator will not be valid if so.lengthInBytes – [in] The maximum number of bytes to walk in the string. This may be kNullTerminated if the string is null terminated. If the string is unterminated or only a portion of it needs to be iterated over, this may be the size of the buffer in bytes.
flags – [in] Flags to control the behavior of the UTF-8 parser. This may be zero or more of the Utf8Parser::fDecode* flags.
- Returns
No return value.
-
inline Utf8Iterator(const Utf8Iterator &it)
Copy constructor: copies another iterator into this one.
- Parameters
it – [in] The iterator to be copied. Note that if
it
is invalid, this iterator will also become invalid.- Returns
No return value.
-
inline operator bool() const
Checks if this iterator is still valid.
- Returns
true
if this iterator still has at least one more codepoint to walk.- Returns
false
if there is no more string data to walk and decode.
-
inline bool operator!() const
Check is this iterator is invalid.
- Returns
true
if there is no more string data to walk and decode.- Returns
false
if this iterator still has at least one more codepoint to walk.
-
inline CodePoint operator*() const
Retrieves the codepoint at this iterator’s current location.
- Returns
The codepoint at the current location in the string. Calling this multiple times does not cause the decoding work to be done multiple times. The decoded codepoint is cached once decoded.
- Returns
0
if there are no more codepoints to walk in the string.
-
inline const CodeByte *operator&() const
Retrieves the address of the start of the current codepoint.
- Returns
The address of the start of the current codepoint for this iterator. This can be used as a way of copying, editing, or reworking the string during iteration. It is the caller’s responsibility to ensure the string is still properly encoded after any change.
- Returns
nullptr
if there is no more string data to walk.
-
inline Utf8Iterator &operator++()
Pre increment operator: walk to the next codepoint in the string.
- Returns
A reference to this iterator. Note that if the end of the string is reached, the new state of this iterator will first point to the null terminator in the string (for null terminated strings), then after another increment will return the address
nullptr
from the ‘&’ operator. For length limited strings, reaching the end will immediately returnnullptr
from the ‘&’ operator.
-
inline Utf8Iterator operator++(int32_t)
Post increment operator: walk to the next codepoint in the string.
- Returns
A new iterator object representing the state of this object before the increment operation.
-
template<typename T>
inline Utf8Iterator &operator+=(T count) Increment operator: skip over zero or more codepoints in the string.
- Parameters
count – [in] The number of codepoints to skip over. This may be zero or larger. Negative values will be ignored and the iterator will not advance.
- Returns
A reference to this iterator.
-
template<typename T>
inline Utf8Iterator operator+(T count) const Addition operator: create a new iterator that skips zero or more codepoints.
- Parameters
count – [in] The number of codepoints to skip over. This may be zero or larger. Negative values will be ignored and the iterator will not advance.
- Returns
A new iterator that has skipped over the next
count
codepoints in the string starting from the location of this iterator.
-
inline bool operator==(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline bool operator!=(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline bool operator<(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline bool operator<=(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline bool operator>(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline bool operator>=(const Utf8Iterator &it) const
Comparison operators.
Remark
This object is treated as the left side of the comparison. Only the offset into the string contributes to this result. It is the caller’s responsibility to ensure both iterators refer to the same string otherwise the results are undefined.
- Parameters
it – [in] The iterator to compare this one to.
- Returns
true
if the string position represented byit
satisfies the requested comparison versus this object.- Returns
false
if the string position represented byit
does not satisfy the requested comparison versus this object.
-
inline Utf8Iterator &operator=(const Utf8Iterator &it)
Copy assignment operator: copies another iterator into this one.
- Parameters
it – [in] The iterator to copy.
- Returns
A reference to this object.
-
inline Utf8Iterator &operator=(const CodeByte *str)
String assignment operator: resets this iterator to the start of a new string.
- Parameters
str – [in] The new string to start walking. This must be a null terminated string. If this is
nullptr
, the iterator will become invalid. Any previous flags and length limits on this iterator will be cleared out.- Returns
A reference to this object.
-
inline size_t getIndex() const
Retrieves the current codepoint index of the iterator.
- Returns
The number of codepoints that have been walked so far by this iterator in the current string. This will always start at 0 and will only increase when a new codepoint is successfully decoded.
-
inline size_t getCodepointSize() const
Retrieves the size of the current codepoint in bytes.
- Returns
The size of the current codepoint (ie: the one returned with the ‘*’ operator) in bytes. This can be used along with the results of the ‘&’ operator to copy this encoded codepoint into another buffer or modify the string in place.
Public Static Attributes
-
static constexpr size_t kNullTerminated = Utf8Parser::kNullTerminated
The string buffer is effectively null terminated.
This allows the various decoding functions to bypass some range checking with the assumption that there is a null terminating character at some point in the buffer.
-
using CodeByte = Utf8Parser::CodeByte