pierrelf.com / we

web engine - experimental web browser

HTML tree builder: build DOM from tokens #34

open opened by

pierrelf.com 3 weeks ago

Phase 3, Issue 4: HTML Tree Builder#

Implement a basic HTML tree builder that constructs a DOM tree from tokenizer output.

Requirements#

Build a tree builder that processes tokens and constructs a DOM tree, handling the subset of elements needed for Phase 3.

Supported elements:

<html>, <head>, <body> — structural
<title> — document title
<p>, <h1> through <h6> — block-level text
<div> — generic block container
<span> — generic inline container
<a> — hyperlink (inline)
<br> — line break (void element)
<pre> — preformatted text
Text nodes and comment nodes

Tree builder behavior:

Maintains a stack of open elements
Implicit element insertion (e.g., missing <html>, <head>, <body>)
Void elements (<br>) are immediately popped
Handles basic misnesting (e.g., <p> inside <p> closes the outer)
Foster parenting not required for Phase 3
<title> captures text content

API:

parse_html(input: &str) -> Document — convenience function
TreeBuilder::new() -> TreeBuilder
TreeBuilder::process_token(token: Token) — feed tokens one at a time
TreeBuilder::finish() -> Document — return the built DOM

Acceptance criteria#

Parses <!DOCTYPE html><html><head><title>Test</title></head><body><p>Hello</p></body></html>
Handles implicit element insertion for minimal documents like <p>Hello
<br> is handled as void element
Nested elements create proper parent-child relationships
Text nodes are created for character tokens
cargo clippy -p we-html -- -D warnings passes
cargo test -p we-html passes with tree builder tests
No unsafe code
No external dependencies

sign up or login to add to the discussion

Labels

None yet.

assignee

None yet.

Participants 1

AT URI

at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mfubtb7rzf2k