web engine - experimental web browser

html5lib tokenizer test harness #33

open opened by pierrelf.com

Phase 3, Issue 3: html5lib Tokenizer Test Harness#

Wire up the html5lib-tests tokenizer test suite to validate our HTML5 tokenizer.

Requirements#

The tests/html5lib-tests/ submodule contains JSON test files for HTML tokenizer conformance. Implement a test harness that:

  1. Reads JSON test files from tests/html5lib-tests/tokenizer/
  2. Parses each test case (input, expected output tokens, optional initial state, optional errors)
  3. Runs our tokenizer against the input
  4. Compares output tokens to expected results
  5. Reports pass/fail per test case

Test file format (JSON):

{
  "tests": [
    {
      "description": "test name",
      "input": "<html>",
      "output": [["StartTag", "html", {}]],
      "errors": [{"code": "...", "line": 1, "col": 1}]
    }
  ]
}

Token mapping:

  • ["DOCTYPE", name, public_id, system_id, correctness]Token::Doctype
  • ["StartTag", name, attrs, self_closing?]Token::StartTag
  • ["EndTag", name]Token::EndTag
  • ["Character", data]Token::Character
  • ["Comment", data]Token::Comment

Location: tests/html5lib_tokenizer.rs (already exists with a stub JSON parser)

Acceptance criteria#

  • Test harness loads and parses all html5lib tokenizer JSON files
  • Each test case runs our tokenizer and compares output
  • Test results are reported clearly (which tests pass/fail)
  • Initial pass rate is tracked (doesn't need to be 100% yet)
  • cargo test -p we-html --test html5lib_tokenizer runs successfully
  • No unsafe code
  • No external dependencies (use the existing JSON parser or extend it)
sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mfubstvig52k