Utility class for working with UTF-8 encoded strings.
More...
#include "donner/base/Utf8.h"
|
|
static bool | IsSurrogateCodepoint (char32_t ch) |
| | Returns true if the codepoint is a surrogate, per https://infra.spec.whatwg.org/#surrogate.
|
|
static bool | IsValidCodepoint (char32_t ch) |
| | Returns true if the codepoint is a valid UTF-8 codepoint.
|
| static int | SequenceLength (char leadingCh) |
| | Determines the length in bytes of a UTF-8 encoded character based on its leading byte.
|
| static std::tuple< char32_t, int > | NextCodepointLenient (std::string_view str) |
| | Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.
|
| static std::tuple< char32_t, int > | NextCodepoint (std::string_view str) |
| | Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.
|
| template<std::output_iterator< char > OutputIterator> |
| static OutputIterator | Append (char32_t ch, OutputIterator it) |
| | Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.
|
Utility class for working with UTF-8 encoded strings.
◆ Append()
template<std::output_iterator< char > OutputIterator>
| OutputIterator donner::Utf8::Append |
( |
char32_t | ch, |
|
|
OutputIterator | it ) |
|
inlinestatic |
Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.
- Template Parameters
-
| OutputIterator | An output iterator that accepts char elements. |
- Parameters
-
| ch | The Unicode codepoint to encode and append. |
| it | The output iterator to which the encoded bytes are appended. |
- Returns
- An iterator pointing to the element past the last inserted element.
◆ NextCodepoint()
| std::tuple< char32_t, int > donner::Utf8::NextCodepoint |
( |
std::string_view | str | ) |
|
|
inlinestatic |
Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.
If an invalid codepoint is encountered, the function returns the Unicode replacement character (\xFFFD) and consumes the invalid codepoint.
- Parameters
-
| str | The input string_view from which to read the codepoint. |
- Returns
- A tuple containing the decoded Unicode codepoint and the number of bytes consumed.
◆ NextCodepointLenient()
| std::tuple< char32_t, int > donner::Utf8::NextCodepointLenient |
( |
std::string_view | str | ) |
|
|
inlinestatic |
Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.
If the string is empty or contains insufficient bytes, returns a replacement codepoint.
- Parameters
-
| str | The input string_view from which to read the codepoint. |
- Returns
- A tuple containing the decoded Unicode codepoint and the number of bytes consumed.
◆ SequenceLength()
| int donner::Utf8::SequenceLength |
( |
char | leadingCh | ) |
|
|
inlinestatic |
Determines the length in bytes of a UTF-8 encoded character based on its leading byte.
- Parameters
-
| leadingCh | The leading byte of the UTF-8 character. |
- Returns
- The number of bytes in the UTF-8 character, or 0 if invalid.
The documentation for this class was generated from the following file: