PostHole
Compose Login
You are browsing eu.zone1 in read-only mode. Log in to participate.
rss-bridge 2026-02-28T23:44:41+00:00

Show HN: Xmloxide – an agent made rust replacement for libxml2

Comments


xmloxide

A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.

libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.

Features

  • Memory-safe — arena-based tree with zero unsafe in the public API
  • Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
  • Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
  • Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
  • HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
  • XPath 1.0 — full expression parser and evaluator with all core functions
  • Validation — DTD, RelaxNG, and XML Schema (XSD) validation
  • Canonical XML — C14N 1.0 and Exclusive C14N serialization
  • XInclude — document inclusion processing
  • XML Catalogs — OASIS XML Catalogs for URI resolution
  • xmllint CLI — command-line tool for parsing, validating, and querying XML
  • Zero-copy where possible — string interning for fast comparisons
  • No global state — each Document is self-contained and Send + Sync
  • C/C++ FFI — full C API with header file (include/xmloxide.h) for embedding in C/C++ projects
  • Minimal dependencies — only encoding_rs (library has zero other deps; clap is CLI-only)

Quick Start

use xmloxide::Document;

let doc = Document::parse_str("<root><child>Hello</child></root>").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("root"));
assert_eq!(doc.text_content(root), "Hello");

Serialization

use xmloxide::Document;
use xmloxide::serial::serialize;

let doc = Document::parse_str("<root><child>Hello</child></root>").unwrap();
let xml = serialize(&doc);
assert_eq!(xml, "<root><child>Hello</child></root>");

XPath Queries

use xmloxide::Document;
use xmloxide::xpath::{evaluate, XPathValue};

let doc = Document::parse_str("<library><book><title>Rust</title></book></library>").unwrap();
let root = doc.root_element().unwrap();
let result = evaluate(&doc, root, "count(book)").unwrap();
assert_eq!(result.to_number(), 1.0);

SAX2 Streaming

use xmloxide::sax::{parse_sax, SaxHandler, DefaultHandler};
use xmloxide::parser::ParseOptions;

struct MyHandler;
impl SaxHandler for MyHandler {
fn start_element(&mut self, name: &str, _: Option<&str>, _: Option<&str>,
_: &[(String, String, Option<String>, Option<String>)]) {
println!("Element: {name}");

parse_sax("<root><child/></root>", &ParseOptions::default(), &mut MyHandler).unwrap();

HTML Parsing

use xmloxide::html::parse_html;

let doc = parse_html("<p>Hello <br> World").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));

Error Recovery

use xmloxide::parser::{parse_str_with_options, ParseOptions};

let opts = ParseOptions::default().recover(true);
let doc = parse_str_with_options("<root><unclosed>", &opts).unwrap();
for diag in &doc.diagnostics {
eprintln!("{}", diag);

CLI Tool

# Parse and pretty-print
xmllint --format document.xml

# Validate against a schema
xmllint --schema schema.xsd document.xml
xmllint --relaxng schema.rng document.xml
xmllint --dtdvalid schema.dtd document.xml

# XPath query
xmllint --xpath "//title" document.xml

# Canonical XML
xmllint --c14n document.xml

# Parse HTML
xmllint --html page.html

Module Overview


| Module | Description |
| --- | --- |
| tree | Arena-based DOM tree (Document, NodeId, NodeKind) |
| parser | XML 1.0 recursive descent parser with error recovery |
| parser::push | Push/incremental parser for chunked input |
| html | Error-tolerant HTML 4.01 parser |
| sax | SAX2 streaming event-driven parser |
| reader | XmlReader pull-based parsing API |
| serial | XML serializer and Canonical XML (C14N) |
| xpath | XPath 1.0 expression parser and evaluator |
| validation::dtd | DTD parsing and validation |
| validation::relaxng | RelaxNG schema validation |
| validation::xsd | XML Schema (XSD) validation |
| xinclude | XInclude 1.0 document inclusion |
| catalog | OASIS XML Catalogs for URI resolution |
| encoding | Character encoding detection and transcoding |
| ffi | C/C++ FFI bindings (include/xmloxide.h) |

### Performance

Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and **12% faster** on SVG. Serialization is **1.5-2.4x faster** thanks to the arena-based tree design. XPath is **1.1-2.7x faster** across all benchmarks.

**Parsing:**

****

| Document | Size | xmloxide | libxml2 | Result |
| --- | --- | --- | --- | --- |
| Atom feed | 4.9 KB | 26.7 µs (176 MiB/s) | 25.5 µs (184 MiB/s) | ~4% slower |
| SVG drawing | 6.3 KB | 58.5 µs (103 MiB/s) | 65.6 µs (92 MiB/s) | 12% faster |
| Maven POM | 11.5 KB | 76.9 µs (142 MiB/s) | 74.2 µs (148 MiB/s) | ~4% slower |
| XHTML page | 10.2 KB | 69.5 µs (139 MiB/s) | 61.5 µs (157 MiB/s) | ~13% slower |
| Large (374 KB) | 374 KB | 2.15 ms (169 MiB/s) | 2.08 ms (175 MiB/s) | ~3% slower |

**Serialization:**

************

| Document | Size | xmloxide | libxml2 | Result |
| --- | --- | --- | --- | --- |
| Atom feed | 4.9 KB | 11.3 µs | 17.5 µs | 1.5x faster |
| Maven POM | 11.5 KB | 20.1 µs | 47.5 µs | 2.4x faster |
| Large (374 KB) | 374 KB | 614 µs | 1397 µs | 2.3x faster |

**XPath:**

``****``****``****``****

| Expression | xmloxide | libxml2 | Result |
| --- | --- | --- | --- |
| Simple path (//entry/title) | 1.51 µs | 1.63 µs | 8% faster |
| Attribute predicate (//book[@id]) | 5.91 µs | 15.99 µs | 2.7x faster |
| count() function | 1.09 µs | 1.67 µs | 1.5x faster |
| string() function | 1.32 µs | 1.77 µs | 1.3x faster |

Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath `//` step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.

Run benchmarks (requires libxml2 system library)

cargo bench --features bench-libxml2 --bench comparison_bench


### Testing

- **785 unit tests** across all modules

- **112 FFI tests** covering the full C API surface (including SAX streaming)

- **libxml2 compatibility suite** — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing

- **W3C XML Conformance Test Suite** — 1727/1727 applicable tests passing (100%)

- **Integration tests** covering real-world XML documents, edge cases, and error recovery

cargo test --all-features


### C/C++ FFI

xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).

Build shared + static libraries (uses the included Makefile)

make

Or build individually:

make shared # .so / .dylib / .dll
make static # .a / .lib

Build and run the C example

make example


#include "xmloxide.h"

xmloxide_document *doc = xmloxide_parse_str("<root>Hello</root>");
uint32_t root = xmloxide_doc_root_element(doc);
char *name = xmloxide_node_name(doc, root); // "root"
char *text = xmloxide_node_text_content(doc, root); // "Hello"

xmloxide_free_string(name);
xmloxide_free_string(text);
xmloxide_free_doc(doc);


The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML parsing, DTD/RelaxNG/XSD validation, C14N, and XML Catalogs — is declared in `include/xmloxide.h`.

### Migrating from libxml2

[...]

---

*[Original source](https://github.com/jonwiggins/xmloxide)*
Reply