Description#
Implement a CSS tokenizer in the css crate following the CSS Syntax Module Level 3 specification (§4 Tokenization).
Acceptance Criteria#
- Implement the CSS tokenizer state machine that consumes a stream of code points and produces tokens
- Token types to support:
<ident-token>: identifiers (e.g.,color,div,--custom)<function-token>: identifier followed by((e.g.,rgb(,calc()<at-keyword-token>:@followed by identifier (e.g.,@media,@import)<hash-token>:#followed by name (e.g.,#id,#fff)<string-token>: quoted strings (single or double quotes)<number-token>: integers and floats<percentage-token>: number followed by%<dimension-token>: number followed by unit (e.g.,10px,2em,1.5rem)<whitespace-token><colon-token>,<semicolon-token>,<comma-token><{-token>,<}-token>,<(-token>,<)-token>,<[-token>,<]--token><delim-token>: single code points not matched by anything else (e.g.,.,>,+,~,*)<CDC-token>(-->) and<CDO-token>(<!--)<EOF-token>
- Handle CSS escape sequences (
\followed by hex digits or non-newline) - Handle CSS comments (
/* ... */) - Properly consume numbers (integer vs float, sign)
- Write comprehensive unit tests covering all token types, edge cases, escape sequences, and comments
Implementation Notes#
- No external dependencies
- All code in
crates/css/src/ - This is the foundation for the CSS parser — get token boundaries right