Utility class for working with UTF-8 encoded strings.
More...
#include "donner/base/Utf8.h"
|
|
static bool | IsSurrogateCodepoint (char32_t ch) |
| | Returns true if the codepoint is a surrogate, per https://infra.spec.whatwg.org/#surrogate.
|
|
static bool | IsValidCodepoint (char32_t ch) |
| | Returns true if the codepoint is a valid UTF-8 codepoint.
|
| static int | SequenceLength (char leadingCh) |
| | Determines the length in bytes of a UTF-8 encoded character based on its leading byte.
|
| static std::tuple< char32_t, int > | NextCodepointLenient (std::string_view str) |
| | Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.
|
| static std::tuple< char32_t, int > | NextCodepoint (std::string_view str) |
| | Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.
|
| template<std::output_iterator< char > OutputIterator> |
| static OutputIterator | Append (char32_t ch, OutputIterator it) |
| | Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.
|
Utility class for working with UTF-8 encoded strings.
◆ Append()
template<std::output_iterator< char > OutputIterator>
| OutputIterator donner::Utf8::Append |
( |
char32_t | ch, |
|
|
OutputIterator | it ) |
|
inlinestatic |
Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.
- Template Parameters
-
| OutputIterator | An output iterator that accepts char elements. |
- Parameters
-
| ch | The Unicode codepoint to encode and append. |
| it | The output iterator to which the encoded bytes are appended. |
- Returns
- An iterator pointing to the element past the last inserted element.
◆ NextCodepoint()
| std::tuple< char32_t, int > donner::Utf8::NextCodepoint |
( |
std::string_view | str | ) |
|
|
inlinestatic |
Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.
If an invalid codepoint is encountered, the function returns the Unicode replacement character (U+FFFD �) and consumes the invalid codepoint.
- Parameters
-
| str | The input string_view from which to read the codepoint. |
- Returns
- A tuple containing the decoded Unicode codepoint and the number of bytes consumed.
◆ NextCodepointLenient()
| std::tuple< char32_t, int > donner::Utf8::NextCodepointLenient |
( |
std::string_view | str | ) |
|
|
inlinestatic |
Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.
If the string is empty or contains insufficient bytes, returns a replacement codepoint.
- Parameters
-
| str | The input string_view from which to read the codepoint. |
- Returns
- A tuple containing the decoded Unicode codepoint and the number of bytes consumed.
◆ SequenceLength()
| int donner::Utf8::SequenceLength |
( |
char | leadingCh | ) |
|
|
inlinestatic |
Determines the length in bytes of a UTF-8 encoded character based on its leading byte.
- Parameters
-
| leadingCh | The leading byte of the UTF-8 character. |
- Returns
- The number of bytes in the UTF-8 character, or 0 if invalid.
The documentation for this class was generated from the following file: