web engine - experimental web browser

CSS tokenizer (per spec) #38

open opened by pierrelf.com

Description#

Implement a CSS tokenizer in the css crate following the CSS Syntax Module Level 3 specification (§4 Tokenization).

Acceptance Criteria#

  • Implement the CSS tokenizer state machine that consumes a stream of code points and produces tokens
  • Token types to support:
    • <ident-token>: identifiers (e.g., color, div, --custom)
    • <function-token>: identifier followed by ( (e.g., rgb(, calc()
    • <at-keyword-token>: @ followed by identifier (e.g., @media, @import)
    • <hash-token>: # followed by name (e.g., #id, #fff)
    • <string-token>: quoted strings (single or double quotes)
    • <number-token>: integers and floats
    • <percentage-token>: number followed by %
    • <dimension-token>: number followed by unit (e.g., 10px, 2em, 1.5rem)
    • <whitespace-token>
    • <colon-token>, <semicolon-token>, <comma-token>
    • <{-token>, <}-token>, <(-token>, <)-token>, <[-token>, <]--token>
    • <delim-token>: single code points not matched by anything else (e.g., ., >, +, ~, *)
    • <CDC-token> (-->) and <CDO-token> (<!--)
    • <EOF-token>
  • Handle CSS escape sequences (\ followed by hex digits or non-newline)
  • Handle CSS comments (/* ... */)
  • Properly consume numbers (integer vs float, sign)
  • Write comprehensive unit tests covering all token types, edge cases, escape sequences, and comments

Implementation Notes#

  • No external dependencies
  • All code in crates/css/src/
  • This is the foundation for the CSS parser — get token boundaries right
sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 1
AT URI
at://did:plc:meotu43t6usg4qdwzenk4s2t/sh.tangled.repo.issue/3mgavt7gfpu2x