HTML5 tokenizer state machine #32

Phase 3, Issue 2: HTML5 Tokenizer#

Implement a spec-compliant HTML5 tokenizer in the html crate, replacing the current stub.

Implement the HTML5 tokenizer as a state machine per the WHATWG HTML spec (§13.2.5). The tokenizer reads input characters and emits Token values.

States to implement (minimum for Phase 3 subset):

Token types (already defined in lib.rs):

API:

Error handling:

Tokenizes basic HTML: <html><head><title>Test</title></head><body><p>Hello</p></body></html>
Handles self-closing tags: <br/>, <img/>
Handles attributes: <a href="url" class="link">
Handles comments: 
Handles DOCTYPE: <!DOCTYPE html>
Handles character references: &, <, >, A, A
Emits Character tokens for text content
cargo clippy -p we-html -- -D warnings passes
cargo test -p we-html passes with comprehensive unit tests
No unsafe code
No external dependencies