Donner 0.5.1
Embeddable browser-grade SVG2 engine
Loading...
Searching...
No Matches
donner::xml Namespace Reference

XML parsing and document model support, top-level objects are donner::xml::XMLParser and donner::xml::XMLDocument. More...

Classes

class  XMLDocument
 Represents an XML document, which holds a collection of XMLNode as the document tree. More...
class  XMLNode
 Represents an XML element belonging to an donner::xml::XMLDocument. More...
class  XMLParser
 Parses an XML document from a string. More...
struct  XMLQualifiedName
 Represents an XML attribute name with an optional namespace. More...
struct  XMLQualifiedNameRef
 Reference type for XMLQualifiedName, to pass the value to APIs without needing to allocate an RcString. More...
struct  XMLToken
 A single token emitted by the XML tokenizer. More...

Enumerations

enum class  XMLTokenType : std::uint8_t {
  TagOpen ,
  TagName ,
  TagClose ,
  TagSelfClose ,
  AttributeName ,
  AttributeValue ,
  Comment ,
  CData ,
  TextContent ,
  XmlDeclaration ,
  Doctype ,
  EntityRef ,
  ProcessingInstruction ,
  Whitespace ,
  ErrorRecovery
}
 Token types emitted by the XML tokenizer (Tokenize). More...

Functions

std::optional< RcStringEscapeAttributeValue (std::string_view value, char quoteChar='"')
 Escape a string for use as an XML attribute value, producing text that round-trips through donner::xml::XMLParser::Parse to recover the original bytes.
template<typename TokenSink>
void Tokenize (std::string_view source, TokenSink &&sink)
 Tokenize an XML source string, emitting XMLToken values to sink.
std::ostream & operator<< (std::ostream &os, XMLTokenType type)
 Ostream output operator for XMLTokenType.

Detailed Description

XML parsing and document model support, top-level objects are donner::xml::XMLParser and donner::xml::XMLDocument.

Enumeration Type Documentation

◆ XMLTokenType

enum class donner::xml::XMLTokenType : std::uint8_t
strong

Token types emitted by the XML tokenizer (Tokenize).

The token stream is gap-free: the concatenation of every token's source range recovers the original input byte-for-byte. No byte is covered by two tokens, and no byte is uncovered (except trailing whitespace after the last element, which is emitted as TextContent).

Enumerator
TagOpen 

< (element open) or </ (closing tag).

TagName 

Element name, e.g. rect, svg.

TagClose 

> (end of opening/closing tag).

TagSelfClose 

/> (self-closing element).

AttributeName 

Attribute name, e.g. fill, xmlns:xlink.

AttributeValue 

Quoted attribute value including delimiters, e.g. "red".

Comment 

<!-- ... --> (entire comment including delimiters).

CData 

<![CDATA[ ... ]]> (entire CDATA section).

TextContent 

Raw text between tags.

XmlDeclaration 

<?xml ... ?> (entire declaration).

Doctype 

<!DOCTYPE ...> (entire doctype).

EntityRef 

&amp;, &#x20;, etc. (within text content).

ProcessingInstruction 

<?name ...?> (entire PI).

Whitespace 

Whitespace inside a tag (between attributes, around =).

ErrorRecovery 

Emitted for regions the tokenizer cannot parse; error recovery skips to the next < or > and continues.

Function Documentation

◆ EscapeAttributeValue()

std::optional< RcString > donner::xml::EscapeAttributeValue ( std::string_view value,
char quoteChar = '"' )

Escape a string for use as an XML attribute value, producing text that round-trips through donner::xml::XMLParser::Parse to recover the original bytes.

The output is suitable for splicing between two delimiter characters of the requested quoteChar: the returned text does not include the surrounding quote chars, only the escaped value. Caller is responsible for emitting the delimiters.

Escape rules:

  • <&lt;, &&amp;, >&gt;
  • "&quot; when quoteChar is ", otherwise passthrough
  • '&apos; when quoteChar is ', otherwise passthrough
  • \t, \n, \r → numeric character references (&#9;, &#10;, &#13;), so the parser's attribute-value whitespace normalization does not collapse them into plain spaces on round-trip.
  • Valid multi-byte UTF-8 sequences pass through unchanged (we do not percent-encode non-ASCII bytes, XML attribute values carry UTF-8 natively).

Returns std::nullopt for input that cannot be represented in a well-formed XML attribute value at all:

  • The NUL byte (\0).
  • C0 control characters other than \t, \n, \r (i.e. U+0001U+0008, U+000B, U+000C, U+000EU+001F) — these are forbidden in XML 1.0.
  • Lone surrogates (U+D800U+DFFF) encoded in UTF-8.
  • The non-characters U+FFFE and U+FFFF.
  • Overlong UTF-8 sequences or truncated multi-byte starts.

This function is total on the input space it accepts — any input that makes it through the reject-list above produces a valid escaped string.

Parameters
valueThe raw attribute value bytes.
quoteCharThe quote delimiter the caller will surround the escaped value with. Must be '"' (double quote) or '\'' (single quote); any other value is treated as '"'.
Returns
The escaped value, or std::nullopt if value contains characters that cannot be represented in a well-formed XML attribute value.

◆ Tokenize()

template<typename TokenSink>
void donner::xml::Tokenize ( std::string_view source,
TokenSink && sink )

Tokenize an XML source string, emitting XMLToken values to sink.

The sink must be callable as sink(XMLToken) — typically a lambda, a functor, or a std::vector<XMLToken>::push_back wrapper.

Template Parameters
TokenSinkCallable with signature void(XMLToken).
Parameters
sourceThe XML source text.
sinkThe token consumer.