web engine - experimental web browser

HTML tree builder: build DOM from tokens #34

open opened by pierrelf.com

Phase 3, Issue 4: HTML Tree Builder#

Implement a basic HTML tree builder that constructs a DOM tree from tokenizer output.

Requirements#

Build a tree builder that processes tokens and constructs a DOM tree, handling the subset of elements needed for Phase 3.

Supported elements:

  • <html>, <head>, <body> — structural
  • <title> — document title
  • <p>, <h1> through <h6> — block-level text
  • <div> — generic block container
  • <span> — generic inline container
  • <a> — hyperlink (inline)
  • <br> — line break (void element)
  • <pre> — preformatted text
  • Text nodes and comment nodes

Tree builder behavior:

  • Maintains a stack of open elements
  • Implicit element insertion (e.g., missing <html>, <head>, <body>)
  • Void elements (<br>) are immediately popped
  • Handles basic misnesting (e.g., <p> inside <p> closes the outer)
  • Foster parenting not required for Phase 3
  • <title> captures text content

API:

  • parse_html(input: &str) -> Document — convenience function
  • TreeBuilder::new() -> TreeBuilder
  • TreeBuilder::process_token(token: Token) — feed tokens one at a time
  • TreeBuilder::finish() -> Document — return the built DOM

Acceptance criteria#

  • Parses <!DOCTYPE html><html><head><title>Test</title></head><body><p>Hello</p></body></html>
  • Handles implicit element insertion for minimal documents like <p>Hello
  • <br> is handled as void element
  • Nested elements create proper parent-child relationships
  • Text nodes are created for character tokens
  • cargo clippy -p we-html -- -D warnings passes
  • cargo test -p we-html passes with tree builder tests
  • No unsafe code
  • No external dependencies
sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mfubtb7rzf2k