Donner
C++20 SVG rendering library
Loading...
Searching...
No Matches
donner::Utf8 Class Reference

Utility class for working with UTF-8 encoded strings. More...

#include "donner/base/Utf8.h"

Static Public Member Functions

static bool IsSurrogateCodepoint (char32_t ch)
 Returns true if the codepoint is a surrogate, per https://infra.spec.whatwg.org/#surrogate.
 
static bool IsValidCodepoint (char32_t ch)
 Returns true if the codepoint is a valid UTF-8 codepoint.
 
static int SequenceLength (char leadingCh)
 Determines the length in bytes of a UTF-8 encoded character based on its leading byte.
 
static std::tuple< char32_t, intNextCodepointLenient (std::string_view str)
 Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.
 
static std::tuple< char32_t, intNextCodepoint (std::string_view str)
 Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.
 
template<std::output_iterator< char > OutputIterator>
static OutputIterator Append (char32_t ch, OutputIterator it)
 Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.
 

Static Public Attributes

static constexpr char32_t kUnicodeReplacementCharacter = 0xFFFD
 U+FFFD REPLACEMENT CHARACTER (�)
 
static constexpr char32_t kUnicodeMaximumAllowedCodepoint = 0x10FFFF
 The greatest codepoint defined by Unicode, per https://www.w3.org/TR/css-syntax-3/#maximum-allowed-code-point.
 

Detailed Description

Utility class for working with UTF-8 encoded strings.

Member Function Documentation

◆ Append()

template<std::output_iterator< char > OutputIterator>
static OutputIterator donner::Utf8::Append ( char32_t ch,
OutputIterator it )
inlinestatic

Appends the UTF-8 encoding of the given Unicode codepoint to the output iterator.

Template Parameters
OutputIteratorAn output iterator that accepts char elements.
Parameters
chThe Unicode codepoint to encode and append.
itThe output iterator to which the encoded bytes are appended.
Returns
An iterator pointing to the element past the last inserted element.

◆ NextCodepoint()

static std::tuple< char32_t, int > donner::Utf8::NextCodepoint ( std::string_view str)
inlinestatic

Decodes the next UTF-8 codepoint from the input string, while strictly validating continuation bytes and sequence lengths.

If an invalid codepoint is encountered, the function returns the Unicode replacement character (\xFFFD) and consumes the invalid codepoint.

Parameters
strThe input string_view from which to read the codepoint.
Returns
A tuple containing the decoded Unicode codepoint and the number of bytes consumed.

◆ NextCodepointLenient()

static std::tuple< char32_t, int > donner::Utf8::NextCodepointLenient ( std::string_view str)
inlinestatic

Decodes the next UTF-8 codepoint from the input string, without validating if it is valid.

If the string is empty or contains insufficient bytes, returns a replacement codepoint.

Parameters
strThe input string_view from which to read the codepoint.
Returns
A tuple containing the decoded Unicode codepoint and the number of bytes consumed.

◆ SequenceLength()

static int donner::Utf8::SequenceLength ( char leadingCh)
inlinestatic

Determines the length in bytes of a UTF-8 encoded character based on its leading byte.

Parameters
leadingChThe leading byte of the UTF-8 character.
Returns
The number of bytes in the UTF-8 character, or 0 if invalid.

The documentation for this class was generated from the following file: