A batteries included HTTP/1.1 client in OCaml

RFC compliance improvements across HTTP specifications

Security (P0):
- Add bare CR validation to prevent HTTP request smuggling (RFC 9112 Section 2.2)
- Add chunk size overflow protection (max 16 hex digits)

URI Library (RFC 3986):
- Inline URI module with Eio.Buf_read parsers replacing Angstrom
- Add Pct module for percent encoding/decoding
- Add Path module with dot segment removal per RFC 3986 Section 5.2.4
- Add Query module for query string parsing
- Add Absolute_http submodule for HTTP-specific URI handling

HTTP Caching (RFC 9111):
- Add age calculation per RFC 9111 Section 4.2.3
- Add heuristic freshness computation per RFC 9111 Section 4.2.2
- Add in-memory cache module with thread-safe operations
- Support Vary header matching and validation headers

Authentication (RFC 7616, RFC 6750):
- Add auth-int qop support with body hashing for Digest auth
- Add userhash field to digest_challenge per RFC 7616
- Add Bearer form authentication per RFC 6750 Section 2.2
- Add digest_is_stale for stale nonce handling

Transfer-Encoding (RFC 9112 Section 6.1):
- Add multi-encoding validation (chunked must be final)
- Parse comma-separated encoding list

Connection Headers (RFC 9110 Section 7.6.1):
- Add hop-by-hop header parsing and removal
- Add connection_close and connection_keep_alive helpers

Trailer Headers (RFC 9112 Section 7.1.2):
- Add proper trailer parsing instead of skipping
- Filter forbidden trailer headers per RFC 9110 Section 6.5.1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+3221 -105
+375 -56
SPEC-TODO.md
··· 1 1 # HTTP RFC Specification Compliance TODO 2 2 3 3 This document tracks RFC compliance issues identified in the ocaml-requests library. 4 - Generated from comprehensive analysis against RFC 9110, 9111, 9112, 7235, 7617, 7616, 6750, 2818, and 8446. 4 + Generated from comprehensive analysis against RFC 9110, 9111, 9112, 7235, 7617, 7616, 6750, 6265, 3986, 2818, and 8446. 5 5 6 - ## Summary 6 + ## Current Compliance Summary 7 7 8 - | Priority | Issue | RFC | Status | 9 - |----------|-------|-----|--------| 10 - | High | 303 redirect method change | RFC 9110 Section 15.4.4 | FIXED | 11 - | High | obs-fold header handling | RFC 9112 Section 5.2 | FIXED | 12 - | High | Basic auth username validation | RFC 7617 Section 2 | FIXED | 13 - | Medium | Close-delimited body reading | RFC 9112 Section 6.3 | FIXED | 14 - | Medium | Retry-After HTTP-date format | RFC 9110 Section 10.2.3 | FIXED | 15 - | Medium | 407 proxy auth auto-retry | RFC 7235 Section 3.2 | FIXED | 16 - | Medium | 417 Expectation Failed retry | RFC 9110 Section 10.1.1 | FIXED | 17 - | Low | Asterisk-form OPTIONS | RFC 9112 Section 3.2.4 | FIXED | 18 - | Low | Accept-Language header builder | RFC 9110 Section 12.5.4 | FIXED | 8 + | RFC | Area | Compliance | Notes | 9 + |-----|------|------------|-------| 10 + | RFC 9110 | HTTP Semantics | 90%+ | Excellent - all methods, status codes, headers | 11 + | RFC 9112 | HTTP/1.1 Syntax | 78-82% | Good - some edge cases missing | 12 + | RFC 9111 | HTTP Caching | 60-70% | Partial - parsing complete, age calc simplified | 13 + | RFC 7617/6750/7616 | Authentication | 75-85% | Good - Basic/Bearer/Digest work | 14 + | RFC 6265 | Cookies | 70-80% | Good - delegated to Cookeio | 15 + | RFC 3986 | URI | 80%+ | Good - via Uri library | 19 16 20 17 --- 21 18 22 - ## Completed Fixes 19 + ## Section 1: URI Library Inlining (Angstrom → Buf_read) 23 20 24 - ### 1. 303 Redirect Method Change (FIXED) 21 + **Goal:** Inline the third_party/uri library into requests, replacing Angstrom-based parsing with Eio.Buf_read combinators for consistency with the HTTP parsing stack. 25 22 26 - **RFC Reference:** RFC 9110 Section 15.4.4 23 + ### 1.1 Phase 1: Parser Module Conversion 27 24 28 - > "A user agent can perform a retrieval request targeting that URI (a GET or HEAD request if using HTTP)" 25 + The Uri library's Parser module (uri.ml lines 845-1071) uses Angstrom. Convert to Buf_read: 29 26 30 - **Fix:** Added status code check in `lib/requests.ml` to change POST, PUT, DELETE, PATCH to GET for 303 redirects and strip the request body. 27 + ``` 28 + Angstrom combinator → Buf_read equivalent 29 + ───────────────────────────────────────── 30 + char c → Buf_read.char c 31 + string s → Buf_read.string s 32 + satisfy p → Buf_read.any_char + predicate check 33 + take_while p → Buf_read.take_while p 34 + take_while1 p → Buf_read.take_while1 p 35 + option x p → (try Some (p buf) with ... -> x) 36 + choice [a;b] → (try a buf with ... -> b buf) 37 + many p → Buf_read.seq p (with accumulator) 38 + lift f p → let x = p buf in f x 39 + lift2 f p1 p2 → let x = p1 buf in let y = p2 buf in f x y 40 + <|> → try/with pattern 41 + *> → ignore (p1 buf); p2 buf 42 + <* → let x = p1 buf in ignore (p2 buf); x 43 + ``` 44 + 45 + **Key parsers to convert:** 46 + - [ ] `ipv6` parser (IPv6 address parsing) 47 + - [ ] `uri_reference` parser (main URI parser) 48 + - [ ] `reg_name` (registered name) 49 + - [ ] `dec_octet` (decimal octet for IPv4) 50 + - [ ] `ipv4` (IPv4 address) 51 + - [ ] `h16` / `ls32` (IPv6 components) 52 + - [ ] `pchar` / `segment` / `path` parsers 53 + - [ ] `query` / `fragment` parsers 54 + - [ ] `scheme` parser 55 + - [ ] `authority` parser (userinfo, host, port) 56 + 57 + ### 1.2 Phase 2: Pct Module (Percent Encoding) 58 + 59 + The Pct module handles RFC 3986 percent-encoding. This is pure string manipulation and doesn't need Angstrom, but review for: 60 + 61 + - [ ] Ensure `pct_encode` uses proper component-specific character sets 62 + - [ ] Verify `pct_decode` handles malformed sequences correctly 63 + - [ ] Add validation for invalid percent sequences (bare `%` without hex) 64 + 65 + ### 1.3 Phase 3: Path Module 66 + 67 + Path operations (normalization, dot segment removal) are pure algorithms: 68 + 69 + - [ ] `remove_dot_segments` - RFC 3986 Section 5.2.4 70 + - [ ] `merge` - RFC 3986 Section 5.2.3 71 + - [ ] Ensure path is made absolute when host is present 72 + 73 + ### 1.4 Phase 4: Reference Resolution 74 + 75 + - [ ] Implement `resolve` per RFC 3986 Section 5.2 76 + - [ ] Test all 7 resolution examples from RFC 3986 Section 5.4 77 + 78 + ### 1.5 Phase 5: Scheme-Specific Normalization 79 + 80 + - [ ] HTTP/HTTPS normalization (empty path → "/") 81 + - [ ] Port normalization (omit default ports 80/443) 82 + - [ ] Host case normalization (lowercase) 83 + 84 + ### 1.6 Files to Create 85 + 86 + ``` 87 + lib/ 88 + ├── uri.ml # Main URI module (inlined from third_party) 89 + ├── uri.mli # Public interface 90 + ├── uri_parser.ml # Buf_read-based parsers 91 + └── pct_encode.ml # Percent encoding utilities 92 + ``` 93 + 94 + ### 1.7 Testing 95 + 96 + - [ ] Port all tests from third_party/uri.4.4.0/lib_test/ 97 + - [ ] Add RFC 3986 Appendix A conformance tests 98 + - [ ] Add RFC 3986 Section 5.4 reference resolution tests 31 99 32 100 --- 33 101 34 - ### 2. obs-fold Header Handling (FIXED) 102 + ## Section 2: P0 - Security Critical 35 103 36 - **RFC Reference:** RFC 9112 Section 5.2 104 + ### 2.1 Bare CR Validation (RFC 9112) 37 105 38 - > "A user agent that receives an obs-fold in a response message... MUST replace each received obs-fold with one or more SP octets prior to interpreting the field value." 106 + **RFC Reference:** RFC 9112 Section 2.2 107 + 108 + > "A recipient that receives whitespace between the start-line and the first header field MUST either reject the message as invalid or..." 109 + > "bare CR must be rejected" 110 + 111 + **Current Status:** Not explicitly validated 112 + 113 + **Fix Required:** 114 + ```ocaml 115 + (* In lib/http_read.ml *) 116 + let validate_no_bare_cr s = 117 + for i = 0 to String.length s - 2 do 118 + if s.[i] = '\r' && s.[i+1] <> '\n' then 119 + raise (Protocol_error "bare CR in message") 120 + done 121 + ``` 122 + 123 + - [ ] Add bare CR validation in `request_line` parsing 124 + - [ ] Add bare CR validation in `header_line` parsing 125 + - [ ] Add bare CR validation in chunked extension parsing 126 + 127 + ### 2.2 Chunk Size Overflow Protection 39 128 40 - **Fix:** Modified `header_line` function in `lib/http_read.ml` to detect continuation lines (starting with SP/HTAB) and merge them with a single space. 129 + **RFC Reference:** RFC 9112 Section 7.1 130 + 131 + **Current Status:** Uses `Int64.of_string` which can overflow 132 + 133 + **Fix Required:** 134 + ```ocaml 135 + (* In lib/http_read.ml *) 136 + let parse_chunk_size hex = 137 + (* Limit to reasonable size, e.g., 16 hex digits = 64 bits *) 138 + if String.length hex > 16 then 139 + raise (Protocol_error "chunk size too large"); 140 + try Int64.of_string ("0x" ^ hex) 141 + with _ -> raise (Protocol_error "invalid chunk size") 142 + ``` 143 + 144 + - [ ] Add length check before parsing chunk size 145 + - [ ] Add test for chunk size overflow attack 146 + 147 + ### 2.3 Request Smuggling Prevention 148 + 149 + **RFC Reference:** RFC 9112 Section 6.3 150 + 151 + > "If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length." 152 + 153 + **Current Status:** Correctly prioritizes Transfer-Encoding 154 + 155 + - [x] Transfer-Encoding takes precedence over Content-Length 156 + - [ ] Add explicit logging/warning when both present 157 + - [ ] Consider rejecting requests with conflicting headers in strict mode 41 158 42 159 --- 43 160 44 - ### 3. Basic Auth Username Validation (FIXED) 161 + ## Section 3: P1 - High Priority 162 + 163 + ### 3.1 Content-Length Validation 164 + 165 + **RFC Reference:** RFC 9110 Section 8.6 166 + 167 + > "Any Content-Length field value greater than or equal to zero is valid." 168 + 169 + **Current Status:** Parsed as int64 170 + 171 + **Fix Required:** 172 + - [ ] Reject negative Content-Length values explicitly 173 + - [ ] Validate Content-Length matches actual body length for responses 174 + - [ ] Add `content_length_mismatch` error type 175 + 176 + ### 3.2 Age Header Calculation 177 + 178 + **RFC Reference:** RFC 9111 Section 4.2.3 179 + 180 + > "age_value = delta-seconds" 181 + > "The Age header field conveys the sender's estimate of the time since the response was generated" 182 + 183 + **Current Status:** Uses simplified timestamp (not full calculation) 184 + 185 + **Fix Required:** 186 + ```ocaml 187 + (* In lib/cache_control.ml or new lib/age.ml *) 188 + type age_calculation = { 189 + apparent_age: Ptime.Span.t; 190 + response_delay: Ptime.Span.t; 191 + corrected_age_value: Ptime.Span.t; 192 + corrected_initial_age: Ptime.Span.t; 193 + resident_time: Ptime.Span.t; 194 + current_age: Ptime.Span.t; 195 + } 196 + 197 + let calculate_age ~date_value ~age_value ~response_time ~request_time ~now = 198 + let apparent_age = max 0 (response_time - date_value) in 199 + let response_delay = response_time - request_time in 200 + let corrected_age_value = age_value + response_delay in 201 + let corrected_initial_age = max apparent_age corrected_age_value in 202 + let resident_time = now - response_time in 203 + corrected_initial_age + resident_time 204 + ``` 205 + 206 + - [ ] Add full RFC 9111 Section 4.2.3 age calculation 207 + - [ ] Track `request_time` and `response_time` in Response.t 208 + - [ ] Add `is_fresh` function using calculated age vs max-age 209 + 210 + ### 3.3 Heuristic Freshness 211 + 212 + **RFC Reference:** RFC 9111 Section 4.2.2 213 + 214 + > "A cache MAY calculate a heuristic expiration time" 215 + > "a typical setting of this value might be 10% of the time since the response's Last-Modified field value" 45 216 46 - **RFC Reference:** RFC 7617 Section 2 217 + **Current Status:** Not implemented 47 218 48 - > "a user-id containing a colon character is invalid" 49 - > "The user-id and password MUST NOT contain any control characters" 219 + - [ ] Add `heuristic_freshness` function 220 + - [ ] Use 10% of (now - Last-Modified) as default 221 + - [ ] Add Warning 113 "Heuristic expiration" for stale responses 222 + - [ ] Add configurable `max_heuristic_age` parameter 223 + 224 + ### 3.4 Digest Auth Enhancements 50 225 51 - **Fix:** Added `validate_basic_auth_credentials` function in `lib/headers.ml` that raises `Invalid_basic_auth` exception for invalid credentials. 226 + **RFC Reference:** RFC 7616 Section 3.4 52 227 53 - --- 228 + **Current Status:** Basic Digest works, missing advanced features 54 229 55 - ### 4. Close-delimited Body Reading (FIXED) 230 + - [ ] Add `userhash` parameter support 231 + - [ ] Add SHA-256 session authentication (`algorithm=SHA-256-sess`) 232 + - [ ] Add `auth-int` qop (requires body hash) 233 + - [ ] Add `nextnonce` handling for pipelining 234 + - [ ] Add `stale=true` handling (retry with same password) 56 235 57 - **RFC Reference:** RFC 9112 Section 6.3, item 8 236 + ### 3.5 Bearer Token Form Parameter 58 237 59 - > "Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection." 238 + **RFC Reference:** RFC 6750 Section 2.2 60 239 61 - **Fix:** Added `close_delimited_body` function and `Close_delimited_source` streaming module in `lib/http_read.ml` to read until EOF when no Content-Length or Transfer-Encoding present. 240 + > "Clients MAY use the form-encoded body parameter access_token" 241 + 242 + **Current Status:** Not implemented 243 + 244 + - [ ] Add `Bearer_form_body of string` variant to auth type 245 + - [ ] Serialize as `access_token=TOKEN` in request body 246 + - [ ] Only allow with `Content-Type: application/x-www-form-urlencoded` 62 247 63 248 --- 64 249 65 - ### 5. Retry-After HTTP-date Format (FIXED) 250 + ## Section 4: P2 - Medium Priority 251 + 252 + ### 4.1 Warning Header (Deprecated but Present) 253 + 254 + **RFC Reference:** RFC 9111 Section 5.5 (obsoleted) 255 + 256 + **Note:** Warning header is obsolete in HTTP but may still be received. 257 + 258 + - [ ] Parse Warning header values if present in responses 259 + - [ ] Generate Warning 110 "Response is Stale" when serving stale cached content 260 + - [ ] Generate Warning 112 "Disconnected operation" when offline 261 + 262 + ### 4.2 Vary Header Support 66 263 67 - **RFC Reference:** RFC 9110 Section 10.2.3 264 + **RFC Reference:** RFC 9111 Section 4.1 68 265 69 - > "The Retry-After field value can be either an HTTP-date or a number of seconds" 266 + > "A cache MUST use the Vary header field to select the representation" 70 267 71 - **Fix:** Changed `lib/retry.ml` to use `Http_date.parse` instead of `Ptime.of_rfc3339` for parsing HTTP-date format (IMF-fixdate). 268 + **Current Status:** Not fully implemented for cache validation 269 + 270 + - [ ] Parse Vary header from responses 271 + - [ ] Add `Vary_mismatch` cache status when headers don't match 272 + - [ ] Store request headers needed for Vary matching 273 + 274 + ### 4.3 Connection Header Parsing 275 + 276 + **RFC Reference:** RFC 9110 Section 7.6.1 277 + 278 + > "the connection option 'close' signals that the sender is going to close the connection after the current request/response" 279 + 280 + **Current Status:** Basic close detection 281 + 282 + - [ ] Parse full comma-separated Connection header values 283 + - [ ] Remove hop-by-hop headers listed in Connection 284 + - [ ] Handle `Connection: keep-alive` for HTTP/1.0 285 + 286 + ### 4.4 Transfer-Encoding Validation 287 + 288 + **RFC Reference:** RFC 9112 Section 6.1 289 + 290 + > "A server MUST NOT apply a transfer coding to a response to a HEAD request" 291 + 292 + **Current Status:** Not explicitly validated 293 + 294 + - [ ] Reject Transfer-Encoding in response to HEAD 295 + - [ ] Reject Transfer-Encoding in 1xx, 204, 304 responses 296 + - [ ] Add test cases for invalid Transfer-Encoding responses 297 + 298 + ### 4.5 Host Header Validation 299 + 300 + **RFC Reference:** RFC 9110 Section 7.2 301 + 302 + > "A client MUST send a Host header field in all HTTP/1.1 request messages" 303 + 304 + **Current Status:** Automatically added 305 + 306 + - [ ] Verify Host header matches URI authority 307 + - [ ] Handle Host header for CONNECT requests specially 72 308 73 309 --- 74 310 75 - ### 6. 407 Proxy Auth Auto-Retry (FIXED) 311 + ## Section 5: P3 - Low Priority / Nice to Have 312 + 313 + ### 5.1 Trailer Headers 314 + 315 + **RFC Reference:** RFC 9110 Section 6.5 316 + 317 + > "Trailer allows the sender to include additional fields at the end of a chunked message" 318 + 319 + - [ ] Parse Trailer header to know which fields to expect 320 + - [ ] Collect trailer fields after final chunk 321 + - [ ] Validate trailers don't include forbidden fields (Transfer-Encoding, Content-Length, Trailer, etc.) 76 322 77 - **RFC Reference:** RFC 7235 Section 3.2 323 + ### 5.2 TE Header 78 324 79 - > "The 407 (Proxy Authentication Required) status code is similar to 401 (Unauthorized), but it indicates that the client needs to authenticate itself in order to use a proxy." 325 + **RFC Reference:** RFC 9110 Section 10.1.4 80 326 81 - **Fix:** Extended `handle_digest_auth` in `lib/requests.ml` to handle both 401 and 407 status codes, using Proxy-Authenticate/Proxy-Authorization headers for 407. 327 + > "The TE header field describes what transfer codings... the client is willing to accept" 82 328 83 - --- 329 + - [ ] Parse TE header from requests 330 + - [ ] Send `TE: trailers` when trailers are supported 331 + - [ ] Handle `TE: chunked` negotiation 84 332 85 - ### 7. 417 Expectation Failed Retry (FIXED) 333 + ### 5.3 Expect Continue Timeout 86 334 87 335 **RFC Reference:** RFC 9110 Section 10.1.1 88 336 89 - > "A client that receives a 417 (Expectation Failed) status code in response to a request containing a 100-continue expectation SHOULD repeat that request without a 100-continue expectation" 337 + > "A client that will wait for a 100 (Continue) response before sending the request content SHOULD use a reasonable timeout" 338 + 339 + **Current Status:** Has expect_100_continue support 340 + 341 + - [ ] Add configurable timeout for 100 Continue wait 342 + - [ ] Default to reasonable timeout (e.g., 1 second) 343 + - [ ] Document behavior when timeout expires 344 + 345 + ### 5.4 Method Properties Enforcement 346 + 347 + **RFC Reference:** RFC 9110 Section 9 348 + 349 + **Current Status:** Properties exposed but not enforced 90 350 91 - **Fix:** Modified `make_request_100_continue` in `lib/http_client.ml` to automatically retry without the Expect header when receiving 417. 351 + - [ ] Warn when caching response to non-cacheable method 352 + - [ ] Warn when retrying non-idempotent method on network error 353 + - [ ] Add configurable `strict_method_semantics` option 354 + 355 + ### 5.5 URI Normalization for Comparison 356 + 357 + **RFC Reference:** RFC 3986 Section 6.2 358 + 359 + - [ ] Add `Uri.equivalent` function for comparison after normalization 360 + - [ ] Case-insensitive scheme and host 361 + - [ ] Normalize empty path to "/" for http/https 362 + - [ ] Remove default port numbers 363 + 364 + ### 5.6 Internationalized Resource Identifiers (IRI) 365 + 366 + **RFC Reference:** RFC 3987 367 + 368 + - [ ] Add `Uri.of_iri` for IRI to URI conversion 369 + - [ ] Handle UTF-8 encoding in path and query 370 + - [ ] Percent-encode non-ASCII characters 371 + 372 + --- 373 + 374 + ## Section 6: Cookie Compliance (Delegated to Cookeio) 375 + 376 + The library delegates cookie handling to the Cookeio library. These items should be verified in that library: 377 + 378 + - [ ] Verify Cookeio handles `SameSite` attribute per RFC 6265bis 379 + - [ ] Verify `__Host-` and `__Secure-` cookie prefixes 380 + - [ ] Verify Public Suffix List usage for domain matching 381 + - [ ] Verify cookie path matching rules 92 382 93 383 --- 94 384 95 - ### 8. Asterisk-form OPTIONS Support (FIXED) 385 + ## Section 7: Implementation Order 96 386 97 - **RFC Reference:** RFC 9112 Section 3.2.4 387 + ### Phase 1: Security Fixes (P0) 388 + 1. Bare CR validation 389 + 2. Chunk size overflow protection 390 + 3. Request smuggling logging 98 391 99 - > "When a client wishes to request OPTIONS for the server as a whole... the client MUST send only '*' as the request-target" 392 + ### Phase 2: URI Library Inlining 393 + 1. Create uri_parser.ml with Buf_read combinators 394 + 2. Port Pct module (percent encoding) 395 + 3. Port Path module (normalization) 396 + 4. Port resolution and canonicalization 397 + 5. Test suite migration 100 398 101 - **Fix:** Modified `request_line` function in `lib/http_write.ml` to detect path "*" with OPTIONS method and use asterisk-form request target. 399 + ### Phase 3: Core RFC 9111 Compliance 400 + 1. Age calculation per Section 4.2.3 401 + 2. Heuristic freshness per Section 4.2.2 402 + 3. Vary header support 102 403 103 - --- 404 + ### Phase 4: Authentication Enhancements 405 + 1. Digest auth userhash 406 + 2. Digest auth auth-int qop 407 + 3. Bearer form parameter 104 408 105 - ### 9. Accept-Language Header Builder (FIXED) 409 + ### Phase 5: Edge Cases and Polish 410 + 1. Transfer-Encoding validation 411 + 2. Connection header parsing 412 + 3. Trailer header support 413 + 4. Method property enforcement 106 414 107 - **RFC Reference:** RFC 9110 Section 12.5.4 415 + --- 108 416 109 - > "The Accept-Language header field can be used by user agents to indicate the set of natural languages that are preferred in the response." 417 + ## Previously Completed Fixes 110 418 111 - **Fix:** Added `accept_language` function to `lib/headers.ml` and `lib/headers.mli`. 419 + | Priority | Issue | RFC | Status | 420 + |----------|-------|-----|--------| 421 + | High | 303 redirect method change | RFC 9110 Section 15.4.4 | FIXED | 422 + | High | obs-fold header handling | RFC 9112 Section 5.2 | FIXED | 423 + | High | Basic auth username validation | RFC 7617 Section 2 | FIXED | 424 + | Medium | Close-delimited body reading | RFC 9112 Section 6.3 | FIXED | 425 + | Medium | Retry-After HTTP-date format | RFC 9110 Section 10.2.3 | FIXED | 426 + | Medium | 407 proxy auth auto-retry | RFC 7235 Section 3.2 | FIXED | 427 + | Medium | 417 Expectation Failed retry | RFC 9110 Section 10.1.1 | FIXED | 428 + | Low | Asterisk-form OPTIONS | RFC 9112 Section 3.2.4 | FIXED | 429 + | Low | Accept-Language header builder | RFC 9110 Section 12.5.4 | FIXED | 112 430 113 431 --- 114 432 115 433 ## Notes 116 434 117 - - The library intentionally does not implement cache storage (RFC 9111) as it provides utilities for applications to build their own caching layer. 118 - - SOCKS5 proxy support is declared but not implemented - this is a feature gap, not a compliance issue. 119 - - SHA-512-256 for Digest auth is not implemented due to complexity of the special initialization vectors required. 435 + - The library intentionally does not implement cache storage (RFC 9111) as it provides utilities for applications to build their own caching layer 436 + - SOCKS5 proxy support is declared but not implemented - this is a feature gap, not a compliance issue 437 + - SHA-512-256 for Digest auth is not implemented due to complexity of the special initialization vectors required 438 + - HTTP/2 and HTTP/3 are out of scope for this library (HTTP/1.1 only)
+105 -33
lib/auth.ml
··· 10 10 | No_auth 11 11 | Basic of { username : string; password : string } 12 12 | Bearer of { token : string } 13 + | Bearer_form of { token : string } 14 + (** RFC 6750 Section 2.2: Bearer token in form-encoded body *) 13 15 | Digest of { username : string; password : string } 14 16 | Custom of (Headers.t -> Headers.t) 15 17 ··· 21 23 algorithm : string; (** MD5, SHA-256, etc. *) 22 24 opaque : string option; 23 25 stale : bool; 26 + userhash : bool; (** RFC 7616: If true, hash the username *) 24 27 } 25 28 26 29 let none = No_auth ··· 45 48 | No_auth -> "None" 46 49 | Basic _ -> "Basic" 47 50 | Bearer _ -> "Bearer" 51 + | Bearer_form _ -> "Bearer (form)" 48 52 | Digest _ -> "Digest" 49 53 | Custom _ -> "Custom" 50 54 51 55 (** Check if auth type requires HTTPS (per RFC 7617/6750). 52 56 Basic, Bearer, and Digest send credentials that can be intercepted. *) 53 57 let requires_https = function 54 - | Basic _ | Bearer _ | Digest _ -> true 58 + | Basic _ | Bearer _ | Bearer_form _ | Digest _ -> true 55 59 | No_auth | Custom _ -> false 56 60 57 61 (** Validate that sensitive authentication is used over HTTPS. ··· 84 88 | Bearer { token } -> 85 89 Log.debug (fun m -> m "Applying bearer token authentication"); 86 90 Headers.bearer token headers 91 + | Bearer_form { token = _ } -> 92 + Log.debug (fun m -> m "Bearer form auth - token goes in body, not headers"); 93 + (* Bearer form auth puts token in body, not headers. 94 + Use get_bearer_form_body to get the body content. *) 95 + headers 87 96 | Digest { username; password = _ } -> 88 97 Log.debug (fun m -> m "Digest auth configured for user: %s (requires server challenge)" username); 89 98 (* Digest auth requires server challenge first, handled elsewhere *) ··· 164 173 algorithm = List.assoc_opt "algorithm" pairs |> Option.value ~default:"MD5"; 165 174 opaque = List.assoc_opt "opaque" pairs; 166 175 stale = List.assoc_opt "stale" pairs = (Some "true"); 176 + userhash = List.assoc_opt "userhash" pairs = (Some "true"); 167 177 } in 168 - Log.debug (fun m -> m "Parsed Digest challenge: realm=%s nonce=%s algorithm=%s" 169 - challenge.realm challenge.nonce challenge.algorithm); 178 + Log.debug (fun m -> m "Parsed Digest challenge: realm=%s nonce=%s algorithm=%s userhash=%b" 179 + challenge.realm challenge.nonce challenge.algorithm challenge.userhash); 170 180 Option.some challenge 171 181 | _ -> 172 182 Log.warn (fun m -> m "Digest challenge missing required fields (realm/nonce)"); ··· 206 216 in 207 217 String.concat "" (List.init (String.length bytes) (fun i -> hex_of_char bytes.[i])) 208 218 209 - (** Compute digest response according to RFC 7616 *) 210 - let compute_digest_response ~username ~password ~method_ ~uri ~challenge ~nc ~cnonce = 219 + (** Compute digest response according to RFC 7616. 220 + 221 + @param body Optional request body for auth-int qop (body hash included in HA2) *) 222 + let compute_digest_response ~username ~password ~method_ ~uri ~challenge ~nc ~cnonce ?body () = 211 223 let algorithm = challenge.algorithm in 212 224 (* HA1 = hash(username:realm:password) *) 213 225 let ha1 = hash_string ~algorithm 214 226 (Printf.sprintf "%s:%s:%s" username challenge.realm password) in 215 - (* HA2 = hash(method:uri) *) 216 - let ha2 = hash_string ~algorithm 217 - (Printf.sprintf "%s:%s" method_ uri) in 227 + (* Determine which qop to use *) 228 + let selected_qop = match challenge.qop with 229 + | Some qop -> 230 + let qop_parts = String.split_on_char ',' qop |> List.map String.trim in 231 + (* Prefer auth-int if body is provided and available, else auth *) 232 + if List.mem "auth-int" qop_parts && Option.is_some body then 233 + Some "auth-int" 234 + else if List.mem "auth" qop_parts then 235 + Some "auth" 236 + else if qop_parts <> [] then 237 + Some (List.hd qop_parts) 238 + else 239 + None 240 + | None -> None 241 + in 242 + (* HA2 depends on qop *) 243 + let ha2 = match selected_qop, body with 244 + | Some "auth-int", Some body_content -> 245 + (* HA2 = hash(method:uri:hash(body)) for auth-int *) 246 + let body_hash = hash_string ~algorithm body_content in 247 + hash_string ~algorithm (Printf.sprintf "%s:%s:%s" method_ uri body_hash) 248 + | _ -> 249 + (* HA2 = hash(method:uri) for auth or no qop *) 250 + hash_string ~algorithm (Printf.sprintf "%s:%s" method_ uri) 251 + in 218 252 (* Response depends on qop *) 219 - let response = match challenge.qop with 220 - | Some qop when String.contains qop ',' || 221 - String.trim qop = "auth" || 222 - String.trim qop = "auth-int" -> 253 + let response, actual_qop = match selected_qop with 254 + | Some qop -> 223 255 (* qop present: hash(HA1:nonce:nc:cnonce:qop:HA2) *) 224 - hash_string ~algorithm 225 - (Printf.sprintf "%s:%s:%s:%s:auth:%s" 226 - ha1 challenge.nonce nc cnonce ha2) 227 - | _ -> 256 + let resp = hash_string ~algorithm 257 + (Printf.sprintf "%s:%s:%s:%s:%s:%s" 258 + ha1 challenge.nonce nc cnonce qop ha2) in 259 + (resp, Some qop) 260 + | None -> 228 261 (* No qop: hash(HA1:nonce:HA2) *) 229 - hash_string ~algorithm 230 - (Printf.sprintf "%s:%s:%s" ha1 challenge.nonce ha2) 262 + let resp = hash_string ~algorithm 263 + (Printf.sprintf "%s:%s:%s" ha1 challenge.nonce ha2) in 264 + (resp, None) 231 265 in 232 - Log.debug (fun m -> m "Computed digest response for user %s" username); 233 - response 266 + Log.debug (fun m -> m "Computed digest response for user %s (qop=%s)" 267 + username (Option.value ~default:"none" actual_qop)); 268 + (response, actual_qop) 234 269 235 - (** Build the Authorization header value for Digest auth *) 236 - let build_digest_header ~username ~uri ~challenge ~nc ~cnonce ~response = 270 + (** Build the Authorization header value for Digest auth. 271 + @param actual_qop The qop that was actually used (auth or auth-int) *) 272 + let build_digest_header ~username ~uri ~challenge ~nc ~cnonce ~response ~actual_qop = 273 + (* RFC 7616 Section 3.4.4: userhash support *) 274 + let username_value, userhash_param = 275 + if challenge.userhash then 276 + let hashed = hash_string ~algorithm:challenge.algorithm 277 + (Printf.sprintf "%s:%s" username challenge.realm) in 278 + (hashed, Some "userhash=true") 279 + else 280 + (username, None) 281 + in 237 282 let parts = [ 238 - Printf.sprintf "username=\"%s\"" username; 283 + Printf.sprintf "username=\"%s\"" username_value; 239 284 Printf.sprintf "realm=\"%s\"" challenge.realm; 240 285 Printf.sprintf "nonce=\"%s\"" challenge.nonce; 241 286 Printf.sprintf "uri=\"%s\"" uri; 242 287 Printf.sprintf "algorithm=%s" challenge.algorithm; 243 288 Printf.sprintf "response=\"%s\"" response; 244 289 ] in 245 - let parts = match challenge.qop with 246 - | Some _ -> parts @ [ 247 - "qop=auth"; 290 + let parts = match userhash_param with 291 + | Some p -> parts @ [p] 292 + | None -> parts 293 + in 294 + let parts = match actual_qop with 295 + | Some qop -> parts @ [ 296 + Printf.sprintf "qop=%s" qop; 248 297 Printf.sprintf "nc=%s" nc; 249 298 Printf.sprintf "cnonce=\"%s\"" cnonce; 250 299 ] ··· 284 333 (** Apply Digest authentication given a challenge. 285 334 @param nonce_counter Optional nonce counter for replay protection. 286 335 If provided, the nonce count is tracked and incremented per-nonce. 287 - If not provided, defaults to "00000001" (single-request mode). *) 288 - let apply_digest ?nonce_counter ~username ~password ~method_ ~uri ~challenge headers = 336 + If not provided, defaults to "00000001" (single-request mode). 337 + @param body Optional request body for auth-int qop support. *) 338 + let apply_digest ?nonce_counter ?body ~username ~password ~method_ ~uri ~challenge headers = 289 339 let nc = match nonce_counter with 290 340 | Some counter -> Nonce_counter.next counter ~nonce:challenge.nonce 291 341 | None -> "00000001" 292 342 in 293 343 let cnonce = generate_cnonce () in 294 - let response = compute_digest_response 295 - ~username ~password ~method_ ~uri ~challenge ~nc ~cnonce in 344 + let response, actual_qop = compute_digest_response 345 + ~username ~password ~method_ ~uri ~challenge ~nc ~cnonce ?body () in 296 346 let auth_header = build_digest_header 297 - ~username ~uri ~challenge ~nc ~cnonce ~response in 298 - Log.debug (fun m -> m "Applied Digest authentication for user %s (nc=%s)" username nc); 347 + ~username ~uri ~challenge ~nc ~cnonce ~response ~actual_qop in 348 + Log.debug (fun m -> m "Applied Digest authentication for user %s (nc=%s qop=%s)" 349 + username nc (Option.value ~default:"none" actual_qop)); 299 350 Headers.set "Authorization" auth_header headers 300 351 301 352 (** Check if auth type is Digest *) ··· 306 357 (** Get Digest credentials if configured *) 307 358 let get_digest_credentials = function 308 359 | Digest { username; password } -> Some (username, password) 309 - | _ -> None 360 + | _ -> None 361 + 362 + (** {1 Bearer Form Authentication} 363 + 364 + Per RFC 6750 Section 2.2: Bearer token can be sent as a form-encoded 365 + body parameter "access_token". This is less preferred than the 366 + Authorization header but may be required by some APIs. *) 367 + 368 + let bearer_form ~token = Bearer_form { token } 369 + 370 + let is_bearer_form = function 371 + | Bearer_form _ -> true 372 + | _ -> false 373 + 374 + let get_bearer_form_body = function 375 + | Bearer_form { token } -> Some (Printf.sprintf "access_token=%s" token) 376 + | _ -> None 377 + 378 + (** Check if stale=true in digest challenge, indicating password is still valid. 379 + Per RFC 7616: If stale=true, the client should retry with same credentials 380 + using the new nonce. If stale=false or not present, credentials are wrong. *) 381 + let digest_is_stale challenge = challenge.stale
+42 -4
lib/auth.mli
··· 40 40 a 401 response with a WWW-Authenticate: Digest header, the library will 41 41 parse the challenge and retry the request with proper digest credentials. 42 42 43 - Supports MD5, SHA-256, and SHA-512 algorithms as well as qop=auth. 43 + Supports: 44 + - Algorithms: MD5, SHA-256, SHA-512 (not SHA-512-256) 45 + - QoP: auth, auth-int (body hashing) 46 + - userhash parameter (username hashing) 47 + 44 48 Note: SHA-512-256 is not supported as it requires special initialization 45 49 vectors not available in standard libraries. *) 46 50 51 + val bearer_form : token:string -> t 52 + (** Bearer token in form-encoded body (RFC 6750 Section 2.2). 53 + 54 + This sends the Bearer token as an "access_token" form parameter 55 + instead of in the Authorization header. Less preferred than the 56 + header method but required by some APIs. 57 + 58 + When using this, set Content-Type to application/x-www-form-urlencoded 59 + and use {!get_bearer_form_body} to get the body content. *) 60 + 47 61 val custom : (Headers.t -> Headers.t) -> t 48 62 (** Custom authentication handler *) 49 63 ··· 82 96 algorithm : string; (** MD5, SHA-256, etc. *) 83 97 opaque : string option; 84 98 stale : bool; 99 + (** If true, the nonce is stale but credentials are valid. Client should 100 + retry with the new nonce. Per RFC 7616 Section 3.2.2. *) 101 + userhash : bool; 102 + (** If true, the server wants the username to be hashed. 103 + Per RFC 7616 Section 3.4.4. *) 85 104 } 86 105 87 106 val parse_www_authenticate : string -> digest_challenge option ··· 111 130 112 131 val apply_digest : 113 132 ?nonce_counter:Nonce_counter.t -> 133 + ?body:string -> 114 134 username:string -> 115 135 password:string -> 116 136 method_:string -> ··· 118 138 challenge:digest_challenge -> 119 139 Headers.t -> 120 140 Headers.t 121 - (** [apply_digest ?nonce_counter ~username ~password ~method_ ~uri ~challenge headers] 141 + (** [apply_digest ?nonce_counter ?body ~username ~password ~method_ ~uri ~challenge headers] 122 142 applies Digest authentication to [headers] using the given credentials 123 143 and server challenge. 124 144 125 145 @param nonce_counter Optional nonce counter for replay protection. 126 146 When provided, the nonce count is tracked and incremented per-nonce 127 147 across multiple requests in a session. When not provided, defaults 128 - to "00000001" (suitable for single-request/one-shot mode). *) 148 + to "00000001" (suitable for single-request/one-shot mode). 149 + @param body Optional request body for auth-int qop support. 150 + When provided and the server supports auth-int qop, the body hash 151 + is included in the digest calculation per RFC 7616. *) 129 152 130 153 val is_digest : t -> bool 131 154 (** [is_digest auth] returns [true] if [auth] is Digest authentication. *) 132 155 133 156 val get_digest_credentials : t -> (string * string) option 134 157 (** [get_digest_credentials auth] returns [Some (username, password)] if 135 - [auth] is Digest authentication, [None] otherwise. *) 158 + [auth] is Digest authentication, [None] otherwise. *) 159 + 160 + val is_bearer_form : t -> bool 161 + (** [is_bearer_form auth] returns [true] if [auth] is Bearer form authentication. *) 162 + 163 + val get_bearer_form_body : t -> string option 164 + (** [get_bearer_form_body auth] returns [Some "access_token=<token>"] if 165 + [auth] is Bearer form authentication, [None] otherwise. 166 + Use this to get the form-encoded body content for RFC 6750 Section 2.2. *) 167 + 168 + val digest_is_stale : digest_challenge -> bool 169 + (** [digest_is_stale challenge] returns [true] if the challenge has stale=true. 170 + Per RFC 7616 Section 3.2.2: If stale=true, the nonce is expired but the 171 + credentials are still valid. The client should retry with the same 172 + credentials using the new nonce. If stale=false or not present, the 173 + credentials themselves are wrong. *)
+290
lib/cache.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** HTTP Response Caching per RFC 9111 7 + 8 + This module provides an in-memory cache for HTTP responses following 9 + RFC 9111 (HTTP Caching). *) 10 + 11 + let src = Logs.Src.create "requests.cache" ~doc:"HTTP Response Caching" 12 + module Log = (val Logs.src_log src : Logs.LOG) 13 + 14 + (** {1 Cache Entry} *) 15 + 16 + type entry = { 17 + url : string; 18 + method_ : Method.t; 19 + status : int; 20 + headers : Headers.t; 21 + body : string; 22 + request_time : Ptime.t; 23 + response_time : Ptime.t; 24 + date_value : Ptime.t option; 25 + age_value : int; 26 + cache_control : Cache_control.response; 27 + etag : string option; 28 + last_modified : string option; 29 + vary_headers : (string * string) list; 30 + freshness_lifetime : int option; 31 + } 32 + 33 + type lookup_status = 34 + | Fresh 35 + | Stale 36 + 37 + (** {1 Cache Key} *) 38 + 39 + type key = { 40 + method_key : Method.t; 41 + uri : string; 42 + vary_values : (string * string) list; 43 + } 44 + 45 + let make_key ~method_ ~uri ?request_headers ?vary () = 46 + let vary_values = match vary, request_headers with 47 + | Some vary_names, Some headers -> 48 + List.filter_map (fun name -> 49 + match Headers.get name headers with 50 + | Some value -> Some (String.lowercase_ascii name, value) 51 + | None -> None 52 + ) vary_names 53 + | _ -> [] 54 + in 55 + { method_key = method_; uri; vary_values } 56 + 57 + (** {1 Helper Functions} *) 58 + 59 + let parse_vary header = 60 + String.split_on_char ',' header 61 + |> List.map String.trim 62 + |> List.filter (fun s -> s <> "") 63 + |> List.map String.lowercase_ascii 64 + 65 + let vary_matches ~cached_vary ~request_headers = 66 + List.for_all (fun (name, cached_value) -> 67 + match Headers.get name request_headers with 68 + | Some req_value -> req_value = cached_value 69 + | None -> cached_value = "" 70 + ) cached_vary 71 + 72 + (** Parse Age header value *) 73 + let parse_age headers = 74 + match Headers.get "age" headers with 75 + | Some age_str -> 76 + (try int_of_string age_str with _ -> 0) 77 + | None -> 0 78 + 79 + (** Calculate freshness lifetime for a response *) 80 + let calculate_freshness ~cache_control ~headers ~response_time = 81 + (* First try explicit freshness from Cache-Control or Expires *) 82 + match Cache_control.freshness_lifetime 83 + ~response_cc:cache_control 84 + ?expires:(Headers.get "expires" headers) 85 + ?date:(Headers.get "date" headers) 86 + () with 87 + | Some lifetime -> Some lifetime 88 + | None -> 89 + (* Fall back to heuristic freshness *) 90 + Cache_control.heuristic_freshness 91 + ?last_modified:(Headers.get "last-modified" headers) 92 + ~response_time 93 + () 94 + 95 + (** {1 In-Memory Cache} *) 96 + 97 + module Memory = struct 98 + type stats = { 99 + mutable hits : int; 100 + mutable misses : int; 101 + mutable stores : int; 102 + } 103 + 104 + type t = { 105 + entries : (string, entry list) Hashtbl.t; 106 + max_entries : int; 107 + mutable total_entries : int; 108 + stats : stats; 109 + mutex : Eio.Mutex.t; 110 + } 111 + 112 + let create ?(max_entries = 10000) () = { 113 + entries = Hashtbl.create 1024; 114 + max_entries; 115 + total_entries = 0; 116 + stats = { hits = 0; misses = 0; stores = 0 }; 117 + mutex = Eio.Mutex.create (); 118 + } 119 + 120 + (** Check if a method is cacheable *) 121 + let is_cacheable_method = function 122 + | `GET | `HEAD -> true 123 + | _ -> false 124 + 125 + (** Evict oldest entries if over limit *) 126 + let evict_if_needed t = 127 + if t.total_entries > t.max_entries then begin 128 + (* Simple eviction: remove ~10% of entries *) 129 + let to_remove = t.max_entries / 10 in 130 + let removed = ref 0 in 131 + Hashtbl.filter_map_inplace (fun _uri entries -> 132 + if !removed >= to_remove then Some entries 133 + else begin 134 + let len = List.length entries in 135 + removed := !removed + len; 136 + t.total_entries <- t.total_entries - len; 137 + None 138 + end 139 + ) t.entries 140 + end 141 + 142 + let store t ~url ~method_ ~status ~headers ~body ~request_time ~response_time 143 + ?request_headers () = 144 + (* Check if cacheable *) 145 + if not (is_cacheable_method method_) then begin 146 + Log.debug (fun m -> m "Not caching: method %s is not cacheable" 147 + (Method.to_string method_)); 148 + false 149 + end else begin 150 + let cache_control = 151 + match Headers.get "cache-control" headers with 152 + | Some cc -> Cache_control.parse_response cc 153 + | None -> Cache_control.empty_response 154 + in 155 + if not (Cache_control.is_cacheable ~response_cc:cache_control ~status) then begin 156 + Log.debug (fun m -> m "Not caching: response is not cacheable"); 157 + false 158 + end else begin 159 + let date_value = 160 + match Headers.get "date" headers with 161 + | Some date_str -> Http_date.parse date_str 162 + | None -> None 163 + in 164 + let age_value = parse_age headers in 165 + let etag = Headers.get "etag" headers in 166 + let last_modified = Headers.get "last-modified" headers in 167 + let vary_headers = 168 + match Headers.get "vary" headers, request_headers with 169 + | Some vary, Some req_hdrs -> 170 + let vary_names = parse_vary vary in 171 + List.filter_map (fun name -> 172 + match Headers.get name req_hdrs with 173 + | Some value -> Some (name, value) 174 + | None -> None 175 + ) vary_names 176 + | _ -> [] 177 + in 178 + let freshness_lifetime = 179 + calculate_freshness ~cache_control ~headers ~response_time 180 + in 181 + let entry = { 182 + url; method_; status; headers; body; 183 + request_time; response_time; date_value; age_value; 184 + cache_control; etag; last_modified; vary_headers; 185 + freshness_lifetime; 186 + } in 187 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> 188 + (* Remove any existing entries that match *) 189 + let existing = Hashtbl.find_opt t.entries url |> Option.value ~default:[] in 190 + let filtered = List.filter (fun e -> 191 + e.method_ <> method_ || 192 + not (vary_matches ~cached_vary:e.vary_headers 193 + ~request_headers:(Option.value ~default:Headers.empty request_headers)) 194 + ) existing in 195 + Hashtbl.replace t.entries url (entry :: filtered); 196 + t.total_entries <- t.total_entries + 1; 197 + t.stats.stores <- t.stats.stores + 1; 198 + evict_if_needed t 199 + ); 200 + Log.debug (fun m -> m "Cached response for %s (freshness: %s)" 201 + url 202 + (match freshness_lifetime with 203 + | Some s -> Printf.sprintf "%ds" s 204 + | None -> "unknown")); 205 + true 206 + end 207 + end 208 + 209 + let lookup t ~method_ ~uri ?request_headers ~now () = 210 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> 211 + match Hashtbl.find_opt t.entries uri with 212 + | None -> 213 + t.stats.misses <- t.stats.misses + 1; 214 + None 215 + | Some entries -> 216 + (* Find matching entry *) 217 + let request_headers = Option.value ~default:Headers.empty request_headers in 218 + let matching = List.find_opt (fun e -> 219 + e.method_ = method_ && 220 + vary_matches ~cached_vary:e.vary_headers ~request_headers 221 + ) entries in 222 + match matching with 223 + | None -> 224 + t.stats.misses <- t.stats.misses + 1; 225 + None 226 + | Some entry -> 227 + t.stats.hits <- t.stats.hits + 1; 228 + (* Calculate current age and freshness *) 229 + let inputs : Cache_control.age_inputs = { 230 + date_value = entry.date_value; 231 + age_value = entry.age_value; 232 + request_time = entry.request_time; 233 + response_time = entry.response_time; 234 + } in 235 + let current_age = Cache_control.calculate_age ~inputs ~now in 236 + let status = match entry.freshness_lifetime with 237 + | Some lifetime when Cache_control.is_fresh ~current_age ~freshness_lifetime:lifetime -> 238 + Fresh 239 + | _ -> Stale 240 + in 241 + Log.debug (fun m -> m "Cache %s for %s (age: %ds)" 242 + (match status with Fresh -> "hit" | Stale -> "stale") 243 + uri current_age); 244 + Some (entry, status) 245 + ) 246 + 247 + let invalidate t ~uri = 248 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> 249 + match Hashtbl.find_opt t.entries uri with 250 + | Some entries -> 251 + t.total_entries <- t.total_entries - List.length entries; 252 + Hashtbl.remove t.entries uri; 253 + Log.debug (fun m -> m "Invalidated cache for %s" uri) 254 + | None -> () 255 + ) 256 + 257 + let clear t = 258 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> 259 + Hashtbl.clear t.entries; 260 + t.total_entries <- 0; 261 + Log.debug (fun m -> m "Cleared cache") 262 + ) 263 + 264 + let size t = 265 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> t.total_entries) 266 + 267 + let stats t = 268 + Eio.Mutex.use_rw ~protect:true t.mutex (fun () -> 269 + (t.stats.hits, t.stats.misses, t.stats.stores) 270 + ) 271 + end 272 + 273 + (** {1 Cache Validation} *) 274 + 275 + let needs_validation entry = 276 + Cache_control.must_revalidate ~response_cc:entry.cache_control 277 + 278 + let validation_headers entry = 279 + let headers = Headers.empty in 280 + let headers = match entry.etag with 281 + | Some etag -> Headers.if_none_match etag headers 282 + | None -> headers 283 + in 284 + let headers = match entry.last_modified with 285 + | Some lm -> Headers.if_modified_since lm headers 286 + | None -> headers 287 + in 288 + headers 289 + 290 + let is_not_modified ~status = status = 304
+204
lib/cache.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** HTTP Response Caching per RFC 9111 7 + 8 + This module provides an in-memory cache for HTTP responses following 9 + RFC 9111 (HTTP Caching). It handles cache storage, validation, and 10 + freshness calculations. 11 + 12 + {2 Cache Keys} 13 + 14 + Cache entries are keyed by method and effective request URI per RFC 9111. 15 + Only safe methods (GET, HEAD) are cached by default. 16 + 17 + {2 Examples} 18 + 19 + {[ 20 + (* Create an in-memory cache *) 21 + let cache = Cache.Memory.create ~max_entries:1000 () in 22 + 23 + (* Store a response *) 24 + Cache.Memory.store cache ~key ~response ~request_time ~response_time; 25 + 26 + (* Lookup a cached response *) 27 + match Cache.Memory.lookup cache ~key ~now with 28 + | Some (entry, status) -> 29 + Printf.printf "Cache %s: %s\n" 30 + (match status with `Fresh -> "hit" | `Stale -> "stale") 31 + entry.url 32 + | None -> Printf.printf "Cache miss\n" 33 + ]} 34 + *) 35 + 36 + (** Log source for cache operations *) 37 + val src : Logs.Src.t 38 + 39 + (** {1 Cache Entry} *) 40 + 41 + (** A cached response entry *) 42 + type entry = { 43 + url : string; 44 + (** The effective request URI *) 45 + 46 + method_ : Method.t; 47 + (** HTTP method used for the request *) 48 + 49 + status : int; 50 + (** Response status code *) 51 + 52 + headers : Headers.t; 53 + (** Response headers *) 54 + 55 + body : string; 56 + (** Response body *) 57 + 58 + request_time : Ptime.t; 59 + (** When the request was initiated *) 60 + 61 + response_time : Ptime.t; 62 + (** When the response was received *) 63 + 64 + date_value : Ptime.t option; 65 + (** Parsed Date header value *) 66 + 67 + age_value : int; 68 + (** Age header value (0 if not present) *) 69 + 70 + cache_control : Cache_control.response; 71 + (** Parsed Cache-Control header *) 72 + 73 + etag : string option; 74 + (** ETag header for validation *) 75 + 76 + last_modified : string option; 77 + (** Last-Modified header for validation *) 78 + 79 + vary_headers : (string * string) list; 80 + (** Request header values for Vary matching *) 81 + 82 + freshness_lifetime : int option; 83 + (** Calculated freshness lifetime in seconds *) 84 + } 85 + 86 + (** Cache lookup result status *) 87 + type lookup_status = 88 + | Fresh 89 + (** Entry is fresh and can be served directly *) 90 + | Stale 91 + (** Entry is stale but might be served with revalidation or max-stale *) 92 + 93 + (** {1 Cache Key} *) 94 + 95 + (** A cache key for lookups *) 96 + type key = { 97 + method_key : Method.t; 98 + uri : string; 99 + vary_values : (string * string) list; 100 + (** Values of Vary headers from request *) 101 + } 102 + 103 + val make_key : 104 + method_:Method.t -> 105 + uri:string -> 106 + ?request_headers:Headers.t -> 107 + ?vary:string list -> 108 + unit -> 109 + key 110 + (** Create a cache key. 111 + @param method_ The HTTP method 112 + @param uri The effective request URI 113 + @param request_headers Request headers for Vary matching 114 + @param vary List of header names from Vary response header *) 115 + 116 + (** {1 In-Memory Cache} *) 117 + 118 + module Memory : sig 119 + (** In-memory HTTP response cache using a Hashtbl. 120 + Thread-safe using Eio.Mutex. *) 121 + 122 + type t 123 + (** The cache type *) 124 + 125 + val create : ?max_entries:int -> unit -> t 126 + (** Create a new in-memory cache. 127 + @param max_entries Maximum number of entries (default 10000) *) 128 + 129 + val store : 130 + t -> 131 + url:string -> 132 + method_:Method.t -> 133 + status:int -> 134 + headers:Headers.t -> 135 + body:string -> 136 + request_time:Ptime.t -> 137 + response_time:Ptime.t -> 138 + ?request_headers:Headers.t -> 139 + unit -> 140 + bool 141 + (** Store a response in the cache. 142 + Returns true if stored, false if not cacheable. 143 + 144 + @param url The effective request URI 145 + @param method_ The HTTP method 146 + @param status Response status code 147 + @param headers Response headers 148 + @param body Response body 149 + @param request_time When the request was initiated 150 + @param response_time When the response was received 151 + @param request_headers Request headers for Vary matching *) 152 + 153 + val lookup : 154 + t -> 155 + method_:Method.t -> 156 + uri:string -> 157 + ?request_headers:Headers.t -> 158 + now:Ptime.t -> 159 + unit -> 160 + (entry * lookup_status) option 161 + (** Look up a cached response. 162 + 163 + @param method_ The HTTP method 164 + @param uri The effective request URI 165 + @param request_headers Request headers for Vary matching 166 + @param now Current time for freshness check 167 + @return Some (entry, status) if found, None if not in cache *) 168 + 169 + val invalidate : t -> uri:string -> unit 170 + (** Remove all entries for a URI (used after unsafe methods). *) 171 + 172 + val clear : t -> unit 173 + (** Clear all entries from the cache. *) 174 + 175 + val size : t -> int 176 + (** Return the number of entries in the cache. *) 177 + 178 + val stats : t -> int * int * int 179 + (** Return cache statistics: (hits, misses, stores). *) 180 + end 181 + 182 + (** {1 Cache Validation} *) 183 + 184 + val needs_validation : entry -> bool 185 + (** Check if a cached entry needs revalidation with the origin server. 186 + True if must-revalidate or no-cache is set. *) 187 + 188 + val validation_headers : entry -> Headers.t 189 + (** Get headers to send for a conditional request. 190 + Includes If-None-Match (from ETag) and/or If-Modified-Since (from Last-Modified). *) 191 + 192 + val is_not_modified : status:int -> bool 193 + (** Check if a response indicates the cached version is still valid (304). *) 194 + 195 + (** {1 Vary Header Support} *) 196 + 197 + val parse_vary : string -> string list 198 + (** Parse a Vary header value into a list of header names. *) 199 + 200 + val vary_matches : 201 + cached_vary:(string * string) list -> 202 + request_headers:Headers.t -> 203 + bool 204 + (** Check if request headers match the cached Vary values. *)
+125
lib/cache_control.ml
··· 296 296 Ptime.Span.to_int_s diff 297 297 | _ -> None 298 298 299 + (** {1 Age Calculation} 300 + 301 + RFC 9111 Section 4.2.3: Calculating Age *) 302 + 303 + (** Age calculation inputs *) 304 + type age_inputs = { 305 + date_value : Ptime.t option; 306 + (** Value of Date header (when response was generated) *) 307 + 308 + age_value : int; 309 + (** Value of Age header in seconds (0 if not present) *) 310 + 311 + request_time : Ptime.t; 312 + (** Time when the request was initiated *) 313 + 314 + response_time : Ptime.t; 315 + (** Time when the response was received *) 316 + } 317 + 318 + (** Calculate the current age of a cached response. 319 + Per RFC 9111 Section 4.2.3: 320 + 321 + {v 322 + apparent_age = max(0, response_time - date_value) 323 + response_delay = response_time - request_time 324 + corrected_age_value = age_value + response_delay 325 + corrected_initial_age = max(apparent_age, corrected_age_value) 326 + resident_time = now - response_time 327 + current_age = corrected_initial_age + resident_time 328 + v} 329 + 330 + @param inputs Age calculation inputs 331 + @param now Current time 332 + @return Current age in seconds *) 333 + let calculate_age ~inputs ~now = 334 + (* apparent_age = max(0, response_time - date_value) *) 335 + let apparent_age = 336 + match inputs.date_value with 337 + | Some date -> 338 + let diff = Ptime.diff inputs.response_time date in 339 + max 0 (Option.value ~default:0 (Ptime.Span.to_int_s diff)) 340 + | None -> 0 341 + in 342 + (* response_delay = response_time - request_time *) 343 + let response_delay = 344 + let diff = Ptime.diff inputs.response_time inputs.request_time in 345 + max 0 (Option.value ~default:0 (Ptime.Span.to_int_s diff)) 346 + in 347 + (* corrected_age_value = age_value + response_delay *) 348 + let corrected_age_value = inputs.age_value + response_delay in 349 + (* corrected_initial_age = max(apparent_age, corrected_age_value) *) 350 + let corrected_initial_age = max apparent_age corrected_age_value in 351 + (* resident_time = now - response_time *) 352 + let resident_time = 353 + let diff = Ptime.diff now inputs.response_time in 354 + max 0 (Option.value ~default:0 (Ptime.Span.to_int_s diff)) 355 + in 356 + (* current_age = corrected_initial_age + resident_time *) 357 + corrected_initial_age + resident_time 358 + 359 + (** {1 Heuristic Freshness} 360 + 361 + RFC 9111 Section 4.2.2: Calculating Heuristic Freshness *) 362 + 363 + (** Default heuristic fraction: 10% of time since Last-Modified. 364 + RFC 9111 recommends this as a typical value. *) 365 + let default_heuristic_fraction = 0.10 366 + 367 + (** Maximum heuristic freshness lifetime: 1 day (86400 seconds). 368 + This prevents excessively long heuristic caching. *) 369 + let default_max_heuristic_age = 86400 370 + 371 + (** Calculate heuristic freshness lifetime when no explicit caching info provided. 372 + Per RFC 9111 Section 4.2.2, caches MAY use heuristics when explicit freshness 373 + is not available. 374 + 375 + @param last_modified Value of Last-Modified header 376 + @param response_time When the response was received 377 + @param fraction Fraction of (now - last_modified) to use (default 10%) 378 + @param max_age Maximum heuristic age in seconds (default 1 day) 379 + @return Heuristic freshness lifetime in seconds, or None *) 380 + let heuristic_freshness 381 + ?last_modified 382 + ~response_time 383 + ?(fraction = default_heuristic_fraction) 384 + ?(max_age = default_max_heuristic_age) 385 + () = 386 + match last_modified with 387 + | Some lm_str -> 388 + (match Http_date.parse lm_str with 389 + | Some lm_time -> 390 + let age_since_modified = 391 + let diff = Ptime.diff response_time lm_time in 392 + max 0 (Option.value ~default:0 (Ptime.Span.to_int_s diff)) 393 + in 394 + let heuristic = int_of_float (float_of_int age_since_modified *. fraction) in 395 + Some (min heuristic max_age) 396 + | None -> 397 + Log.debug (fun m -> m "Failed to parse Last-Modified: %s" lm_str); 398 + None) 399 + | None -> None 400 + 401 + (** Check if a cached response is fresh. 402 + 403 + @param current_age Current age from calculate_age 404 + @param freshness_lifetime From freshness_lifetime or heuristic_freshness 405 + @return true if the response is still fresh *) 406 + let is_fresh ~current_age ~freshness_lifetime = 407 + current_age < freshness_lifetime 408 + 409 + (** Check if a stale response can still be served based on request directives. 410 + 411 + @param request_cc Parsed request Cache-Control 412 + @param current_age Current age of the cached response 413 + @param freshness_lifetime Freshness lifetime of the cached response 414 + @return true if the stale response can be served *) 415 + let can_serve_stale ~request_cc ~current_age ~freshness_lifetime = 416 + let staleness = current_age - freshness_lifetime in 417 + if staleness <= 0 then true (* Not stale *) 418 + else 419 + match request_cc.req_max_stale with 420 + | Some None -> true (* max-stale without value: accept any staleness *) 421 + | Some (Some allowed_stale) -> staleness <= allowed_stale 422 + | None -> false (* No max-stale: don't serve stale *) 423 + 299 424 (** Check if a response is cacheable based on Cache-Control directives *) 300 425 let is_cacheable ~response_cc ~status = 301 426 (* RFC 9111 Section 3: A response is cacheable if:
+77
lib/cache_control.mli
··· 179 179 @param expires Optional Expires header value (HTTP-date format) 180 180 @param date Optional Date header value (HTTP-date format) *) 181 181 182 + (** {1 Age Calculation} 183 + 184 + Per RFC 9111 Section 4.2.3: Calculating Age. *) 185 + 186 + type age_inputs = { 187 + date_value : Ptime.t option; 188 + (** Value of Date header (when response was generated) *) 189 + 190 + age_value : int; 191 + (** Value of Age header in seconds (0 if not present) *) 192 + 193 + request_time : Ptime.t; 194 + (** Time when the request was initiated *) 195 + 196 + response_time : Ptime.t; 197 + (** Time when the response was received *) 198 + } 199 + (** Inputs required for age calculation per RFC 9111 Section 4.2.3. *) 200 + 201 + val calculate_age : inputs:age_inputs -> now:Ptime.t -> int 202 + (** [calculate_age ~inputs ~now] calculates the current age of a cached response. 203 + 204 + Per RFC 9111 Section 4.2.3: 205 + {v 206 + apparent_age = max(0, response_time - date_value) 207 + response_delay = response_time - request_time 208 + corrected_age_value = age_value + response_delay 209 + corrected_initial_age = max(apparent_age, corrected_age_value) 210 + resident_time = now - response_time 211 + current_age = corrected_initial_age + resident_time 212 + v} 213 + 214 + @return Current age in seconds *) 215 + 216 + (** {1 Heuristic Freshness} 217 + 218 + Per RFC 9111 Section 4.2.2: Calculating Heuristic Freshness. *) 219 + 220 + val default_heuristic_fraction : float 221 + (** Default heuristic fraction: 10% of time since Last-Modified. 222 + RFC 9111 recommends this as a typical value. *) 223 + 224 + val default_max_heuristic_age : int 225 + (** Maximum heuristic freshness lifetime: 1 day (86400 seconds). *) 226 + 227 + val heuristic_freshness : 228 + ?last_modified:string -> 229 + response_time:Ptime.t -> 230 + ?fraction:float -> 231 + ?max_age:int -> 232 + unit -> 233 + int option 234 + (** [heuristic_freshness ?last_modified ~response_time ?fraction ?max_age ()] 235 + calculates heuristic freshness lifetime when no explicit caching info provided. 236 + 237 + Per RFC 9111 Section 4.2.2, caches MAY use heuristics when explicit freshness 238 + is not available. The typical heuristic is 10% of time since Last-Modified. 239 + 240 + @param last_modified Value of Last-Modified header 241 + @param response_time When the response was received 242 + @param fraction Fraction of (now - last_modified) to use (default 10%) 243 + @param max_age Maximum heuristic age in seconds (default 1 day) 244 + @return Heuristic freshness lifetime in seconds, or None *) 245 + 246 + val is_fresh : current_age:int -> freshness_lifetime:int -> bool 247 + (** [is_fresh ~current_age ~freshness_lifetime] returns true if a cached 248 + response is still fresh (current_age < freshness_lifetime). *) 249 + 250 + val can_serve_stale : 251 + request_cc:request -> 252 + current_age:int -> 253 + freshness_lifetime:int -> 254 + bool 255 + (** [can_serve_stale ~request_cc ~current_age ~freshness_lifetime] returns true 256 + if a stale response can still be served based on request Cache-Control 257 + directives (specifically max-stale). *) 258 + 182 259 (** {1 Cacheability Checks} *) 183 260 184 261 val is_cacheable : response_cc:response -> status:int -> bool
+54
lib/headers.ml
··· 266 266 (* Additional helper for getting multiple header values *) 267 267 let get_multi key t = get_all key t 268 268 269 + (** {1 Connection Header Handling} 270 + 271 + Per RFC 9110 Section 7.6.1: The Connection header field lists hop-by-hop 272 + header fields that MUST be removed before forwarding the message. *) 273 + 274 + (** Default hop-by-hop headers that should always be removed for forwarding. 275 + Per RFC 9110 Section 7.6.1. *) 276 + let default_hop_by_hop_headers = [ 277 + "connection"; "keep-alive"; "proxy-authenticate"; "proxy-authorization"; 278 + "te"; "trailer"; "transfer-encoding"; "upgrade" 279 + ] 280 + 281 + (** Parse Connection header value into list of header names. 282 + The Connection header lists additional hop-by-hop headers. *) 283 + let parse_connection_header = function 284 + | None -> [] 285 + | Some value -> 286 + String.split_on_char ',' value 287 + |> List.map (fun s -> String.trim (String.lowercase_ascii s)) 288 + |> List.filter (fun s -> s <> "") 289 + 290 + (** Get all hop-by-hop headers from a response. 291 + Returns the union of default hop-by-hop headers and any headers 292 + listed in the Connection header. *) 293 + let get_hop_by_hop_headers t = 294 + let connection_headers = parse_connection_header (get "connection" t) in 295 + default_hop_by_hop_headers @ connection_headers 296 + |> List.sort_uniq String.compare 297 + 298 + (** Remove hop-by-hop headers from a header collection. 299 + This should be called before caching or forwarding a response. 300 + Per RFC 9110 Section 7.6.1. *) 301 + let remove_hop_by_hop t = 302 + let hop_by_hop = get_hop_by_hop_headers t in 303 + List.fold_left (fun headers name -> remove name headers) t hop_by_hop 304 + 305 + (** Check if a response indicates the connection should be closed. 306 + Returns true if Connection: close is present. *) 307 + let connection_close t = 308 + match get "connection" t with 309 + | Some value -> 310 + String.split_on_char ',' value 311 + |> List.exists (fun s -> String.trim (String.lowercase_ascii s) = "close") 312 + | None -> false 313 + 314 + (** Check if a response indicates the connection should be kept alive. 315 + Returns true if Connection: keep-alive is present (HTTP/1.0 behavior). *) 316 + let connection_keep_alive t = 317 + match get "connection" t with 318 + | Some value -> 319 + String.split_on_char ',' value 320 + |> List.exists (fun s -> String.trim (String.lowercase_ascii s) = "keep-alive") 321 + | None -> false 322 + 269 323 (* Pretty printer for headers *) 270 324 let pp ppf t = 271 325 Format.fprintf ppf "@[<v>Headers:@,";
+33
lib/headers.mli
··· 232 232 val last_modified_ptime : Ptime.t -> t -> t 233 233 (** [last_modified_ptime time headers] sets Last-Modified using a Ptime.t value. *) 234 234 235 + (** {1 Connection Header Handling} 236 + 237 + Per {{:https://datatracker.ietf.org/doc/html/rfc9110#section-7.6.1}RFC 9110 Section 7.6.1}}: 238 + The Connection header field lists hop-by-hop header fields that MUST be 239 + removed before forwarding the message. *) 240 + 241 + val default_hop_by_hop_headers : string list 242 + (** Default hop-by-hop headers that should always be removed for forwarding. 243 + Includes: connection, keep-alive, proxy-authenticate, proxy-authorization, 244 + te, trailer, transfer-encoding, upgrade. *) 245 + 246 + val parse_connection_header : string option -> string list 247 + (** [parse_connection_header header_value] parses a Connection header value 248 + into a list of header names (all lowercase). *) 249 + 250 + val get_hop_by_hop_headers : t -> string list 251 + (** [get_hop_by_hop_headers headers] returns all hop-by-hop headers. 252 + This is the union of {!default_hop_by_hop_headers} and any headers 253 + listed in the Connection header. *) 254 + 255 + val remove_hop_by_hop : t -> t 256 + (** [remove_hop_by_hop headers] removes all hop-by-hop headers. 257 + This should be called before caching or forwarding a response. 258 + Per RFC 9110 Section 7.6.1. *) 259 + 260 + val connection_close : t -> bool 261 + (** [connection_close headers] returns [true] if Connection: close is present. 262 + This indicates the connection should be closed after the current message. *) 263 + 264 + val connection_keep_alive : t -> bool 265 + (** [connection_keep_alive headers] returns [true] if Connection: keep-alive is present. 266 + This is primarily used with HTTP/1.0 to request a persistent connection. *) 267 + 235 268 (** {1 Aliases} *) 236 269 237 270 val get_multi : string -> t -> string list
+159 -12
lib/http_read.ml
··· 45 45 | ' ' | '\t' -> true 46 46 | _ -> false 47 47 48 + (** {1 Security Validation} 49 + 50 + Per RFC 9112 Section 2.2: bare CR MUST be rejected to prevent 51 + HTTP request smuggling attacks. *) 52 + 53 + (** Maximum chunk size hex digits (16 hex digits = 64-bit max) *) 54 + let max_chunk_size_hex_digits = 16 55 + 56 + (** Validate that a string contains no bare CR characters. 57 + A bare CR is a CR not followed by LF, which can be used for 58 + HTTP request smuggling attacks. 59 + @raise Error.t if bare CR is found. *) 60 + let validate_no_bare_cr ~context s = 61 + let len = String.length s in 62 + for i = 0 to len - 1 do 63 + if s.[i] = '\r' then begin 64 + if i + 1 >= len || s.[i + 1] <> '\n' then 65 + raise (Error.err (Error.Invalid_request { 66 + reason = Printf.sprintf "Bare CR in %s (potential HTTP smuggling attack)" context 67 + })) 68 + end 69 + done 70 + 48 71 (** {1 Low-level Parsers} *) 49 72 50 73 let sp = Read.char ' ' ··· 95 118 sp r; 96 119 let code = status_code r in 97 120 sp r; 98 - let _reason = reason_phrase r in 121 + let reason = reason_phrase r in 122 + (* RFC 9112 Section 2.2: Validate no bare CR in reason phrase *) 123 + validate_no_bare_cr ~context:"reason phrase" reason; 99 124 Log.debug (fun m -> m "Parsed status line: %s %d" version_str code); 100 125 (version, code) 101 126 ··· 103 128 104 129 (** Parse a single header line. Returns ("", "") for empty line (end of headers). 105 130 Handles obs-fold (RFC 9112 Section 5.2): continuation lines starting with 106 - whitespace are merged into the previous header value with a single space. *) 131 + whitespace are merged into the previous header value with a single space. 132 + Per RFC 9112 Section 2.2: validates that no bare CR characters are present. *) 107 133 let header_line r = 108 134 let name = Read.take_while is_token_char r in 109 135 if name = "" then begin ··· 118 144 Read.char ':' r; 119 145 Read.skip_while is_ows r; 120 146 let value = Read.line r in 147 + (* RFC 9112 Section 2.2: Validate no bare CR in header value *) 148 + validate_no_bare_cr ~context:"header value" value; 121 149 (* RFC 9112 Section 5.2: Handle obs-fold (obsolete line folding) 122 150 A recipient of an obs-fold MUST replace each obs-fold with one or more 123 151 SP octets prior to interpreting the field value. *) ··· 128 156 Log.debug (fun m -> m "Handling obs-fold continuation for header %s" name); 129 157 Read.skip_while is_ows r; 130 158 let continuation = Read.line r in 159 + (* Validate continuation for bare CR *) 160 + validate_no_bare_cr ~context:"header continuation" continuation; 131 161 (* Replace obs-fold with single space and continue *) 132 162 collect_obs_fold (acc ^ " " ^ String.trim continuation) 133 163 | _ -> acc ··· 244 274 read_n length; 245 275 Buffer.contents buf 246 276 247 - (** Parse chunk size line (hex size with optional extensions) *) 277 + (** Parse chunk size line (hex size with optional extensions). 278 + Per RFC 9112 Section 7.1: protect against chunk size overflow attacks. *) 248 279 let chunk_size r = 249 280 let hex_str = Read.take_while is_hex_digit r in 250 281 if hex_str = "" then 251 282 raise (Error.err (Error.Invalid_request { 252 283 reason = "Empty chunk size" 253 284 })); 254 - (* Skip any chunk extensions (after semicolon) *) 255 - Read.skip_while (fun c -> c <> '\r' && c <> '\n') r; 285 + (* Protect against overflow: limit hex digits to prevent parsing huge numbers. 286 + 16 hex digits = 64-bit max, which is way more than any reasonable chunk. *) 287 + if String.length hex_str > max_chunk_size_hex_digits then 288 + raise (Error.err (Error.Invalid_request { 289 + reason = Printf.sprintf "Chunk size too large (%d hex digits, max %d)" 290 + (String.length hex_str) max_chunk_size_hex_digits 291 + })); 292 + (* Skip any chunk extensions (after semicolon) - validate for bare CR *) 293 + let extensions = Read.take_while (fun c -> c <> '\r' && c <> '\n') r in 294 + validate_no_bare_cr ~context:"chunk extension" extensions; 256 295 let _ = Read.line r in (* Consume CRLF *) 257 296 try int_of_string ("0x" ^ hex_str) 258 297 with _ -> ··· 260 299 reason = "Invalid chunk size: " ^ hex_str 261 300 })) 262 301 263 - (** Skip trailer headers after final chunk *) 302 + (** {1 Trailer Header Parsing} 303 + 304 + Per RFC 9112 Section 7.1.2: Trailer section can contain headers after the 305 + final chunk. Certain headers MUST NOT be in trailers (hop-by-hop, content-*, etc.). *) 306 + 307 + (** Headers that MUST NOT appear in trailers per RFC 9110 Section 6.5.1 *) 308 + let forbidden_trailer_headers = [ 309 + "transfer-encoding"; "content-length"; "host"; "content-encoding"; 310 + "content-type"; "content-range"; "trailer" 311 + ] 312 + 313 + (** Parse trailer headers after final chunk. 314 + Returns parsed headers. Forbidden trailer headers are logged and ignored. *) 315 + let parse_trailers ~limits r = 316 + let max_count = Response_limits.max_header_count limits in 317 + let max_size = Response_limits.max_header_size limits in 318 + let rec loop acc count = 319 + if count >= max_count then begin 320 + Log.warn (fun m -> m "Trailer count limit reached (%d), skipping remaining" max_count); 321 + Headers.of_list (List.rev acc) 322 + end else begin 323 + let line = Read.line r in 324 + if line = "" then 325 + (* End of trailers *) 326 + Headers.of_list (List.rev acc) 327 + else 328 + (* Parse trailer line *) 329 + match String.index_opt line ':' with 330 + | None -> 331 + Log.warn (fun m -> m "Invalid trailer line (no colon): %s" line); 332 + loop acc count 333 + | Some colon_idx -> 334 + let name = String.sub line 0 colon_idx |> String.trim |> String.lowercase_ascii in 335 + let value = String.sub line (colon_idx + 1) (String.length line - colon_idx - 1) |> String.trim in 336 + (* Check header size *) 337 + let line_len = String.length name + String.length value + 2 in 338 + if line_len > max_size then begin 339 + Log.warn (fun m -> m "Trailer header too large (%d > %d), skipping: %s" line_len max_size name); 340 + loop acc count 341 + end else if List.mem name forbidden_trailer_headers then begin 342 + Log.warn (fun m -> m "Forbidden header in trailers, ignoring: %s" name); 343 + loop acc count 344 + end else 345 + loop ((name, value) :: acc) (count + 1) 346 + end 347 + in 348 + loop [] 0 349 + 350 + (** Skip trailer headers after final chunk (legacy compatibility) *) 264 351 let skip_trailers r = 265 352 let rec loop () = 266 353 let line = Read.line r in ··· 360 447 let hex_str = Read.take_while is_hex_digit t.buf_read in 361 448 if hex_str = "" then 0 362 449 else begin 363 - (* Skip extensions and CRLF *) 364 - Read.skip_while (fun c -> c <> '\r' && c <> '\n') t.buf_read; 450 + (* Protect against overflow: limit hex digits *) 451 + if String.length hex_str > max_chunk_size_hex_digits then 452 + raise (Error.err (Error.Invalid_request { 453 + reason = Printf.sprintf "Chunk size too large (%d hex digits)" 454 + (String.length hex_str) 455 + })); 456 + (* Skip extensions and CRLF - validate for bare CR *) 457 + let extensions = Read.take_while (fun c -> c <> '\r' && c <> '\n') t.buf_read in 458 + validate_no_bare_cr ~context:"chunk extension" extensions; 365 459 let _ = Read.line t.buf_read in 366 460 try int_of_string ("0x" ^ hex_str) 367 461 with _ -> 0 ··· 503 597 | _, 204 | _, 304 -> true 504 598 | _ -> false 505 599 506 - (** Helper to normalize and check transfer-encoding *) 507 - let is_chunked_encoding = function 508 - | None -> false 509 - | Some te -> String.lowercase_ascii te |> String.trim = "chunked" 600 + (** {1 Transfer-Encoding Validation} 601 + 602 + Per RFC 9112 Section 6.1: Transfer-Encoding is a list of transfer codings. 603 + If "chunked" is present, it MUST be the final encoding. The encodings are 604 + applied in order, so we must reject unknown encodings that appear before chunked. *) 605 + 606 + (** Parse Transfer-Encoding header into list of codings. 607 + Returns list in order (first coding is outermost) *) 608 + let parse_transfer_encoding = function 609 + | None -> [] 610 + | Some te -> 611 + String.split_on_char ',' te 612 + |> List.map (fun s -> String.trim (String.lowercase_ascii s)) 613 + |> List.filter (fun s -> s <> "") 614 + 615 + (** Validate Transfer-Encoding per RFC 9112 Section 6.1. 616 + Returns [`Chunked] if chunked encoding should be used, [`None] if no body, 617 + or raises an error for invalid encodings. 618 + @raise Error.t if chunked is not final or unknown encodings precede chunked *) 619 + let validate_transfer_encoding encodings = 620 + match encodings with 621 + | [] -> `None 622 + | codings -> 623 + (* Find position of chunked if present *) 624 + let chunked_idx = 625 + List.mapi (fun i c -> (i, c)) codings 626 + |> List.find_map (fun (i, c) -> if c = "chunked" then Some i else None) 627 + in 628 + match chunked_idx with 629 + | None -> 630 + (* No chunked encoding - check if we support any of these *) 631 + Log.warn (fun m -> m "Transfer-Encoding without chunked: %s (not supported)" 632 + (String.concat ", " codings)); 633 + `Unsupported codings 634 + | Some idx -> 635 + (* Per RFC 9112 Section 6.1: chunked MUST be the final transfer coding *) 636 + if idx <> List.length codings - 1 then begin 637 + Log.err (fun m -> m "Transfer-Encoding: chunked is not final (RFC 9112 violation)"); 638 + raise (Error.err (Error.Invalid_request { 639 + reason = "Transfer-Encoding: chunked must be the final encoding" 640 + })) 641 + end; 642 + (* Check encodings before chunked - we only support identity *) 643 + let before_chunked = List.filteri (fun i _ -> i < idx) codings in 644 + List.iter (fun enc -> 645 + match enc with 646 + | "identity" -> () (* identity is a no-op *) 647 + | other -> 648 + Log.warn (fun m -> m "Unsupported encoding '%s' before chunked (treating as identity)" other) 649 + ) before_chunked; 650 + `Chunked 651 + 652 + (** Helper to check if transfer-encoding indicates chunked *) 653 + let is_chunked_encoding transfer_encoding = 654 + match validate_transfer_encoding (parse_transfer_encoding transfer_encoding) with 655 + | `Chunked -> true 656 + | `None | `Unsupported _ -> false 510 657 511 658 (** Safely parse Content-Length header, returning None for invalid values *) 512 659 let parse_content_length = function
+24
lib/http_read.mli
··· 73 73 Handles chunk sizes, extensions, and trailers. 74 74 @raise Error.Body_too_large if total body size exceeds limit. *) 75 75 76 + (** {1 Transfer-Encoding Validation} *) 77 + 78 + val parse_transfer_encoding : string option -> string list 79 + (** [parse_transfer_encoding header] parses Transfer-Encoding header value 80 + into a list of codings (all lowercase, in order). *) 81 + 82 + val validate_transfer_encoding : string list -> 83 + [ `Chunked | `None | `Unsupported of string list ] 84 + (** [validate_transfer_encoding codings] validates Transfer-Encoding per RFC 9112 Section 6.1. 85 + Returns [`Chunked] if chunked encoding should be used, [`None] if no body, 86 + or [`Unsupported codings] for unsupported encodings without chunked. 87 + @raise Error.t if chunked is not final encoding (RFC violation). *) 88 + 89 + (** {1 Trailer Header Parsing} *) 90 + 91 + val forbidden_trailer_headers : string list 92 + (** Headers that MUST NOT appear in trailers per RFC 9110 Section 6.5.1. 93 + Includes: transfer-encoding, content-length, host, content-encoding, 94 + content-type, content-range, trailer. *) 95 + 96 + val parse_trailers : limits:limits -> Eio.Buf_read.t -> Headers.t 97 + (** [parse_trailers ~limits r] parses trailer headers after final chunk. 98 + Forbidden headers are logged and ignored. *) 99 + 76 100 (** {1 Streaming Body Sources} *) 77 101 78 102 val fixed_body_stream : limits:limits -> length:int64 ->
+1432
lib/uri.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2012-2014 Anil Madhavapeddy <anil@recoil.org> 3 + Copyright (c) 2012-2014 David Sheets <sheets@alum.mit.edu> 4 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 5 + 6 + Permission to use, copy, modify, and distribute this software for any 7 + purpose with or without fee is hereby granted, provided that the above 8 + copyright notice and this permission notice appear in all copies. 9 + 10 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 11 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 12 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 13 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 14 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 15 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 16 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 17 + ---------------------------------------------------------------------------*) 18 + 19 + (** URI parsing and manipulation using Eio.Buf_read combinators. 20 + 21 + This implementation is based on the ocaml-uri library but uses 22 + Eio.Buf_read for parsing instead of Angstrom, providing consistency 23 + with the HTTP parsing stack. *) 24 + 25 + [@@@ocaml.warning "-32"] 26 + 27 + type component = [ 28 + | `Scheme 29 + | `Authority 30 + | `Userinfo 31 + | `Host 32 + | `Path 33 + | `Query 34 + | `Query_key 35 + | `Query_value 36 + | `Fragment 37 + | `Generic 38 + | `Custom of (component * string * string) 39 + ] 40 + 41 + type pct_encoder = { 42 + scheme: component; 43 + userinfo: component; 44 + host: component; 45 + path: component; 46 + query_key: component; 47 + query_value: component; 48 + fragment: component; 49 + } 50 + 51 + (** {1 Helper Functions} *) 52 + 53 + let rec iter_concat fn sep buf = function 54 + | last::[] -> fn buf last 55 + | el::rest -> 56 + fn buf el; 57 + Buffer.add_string buf sep; 58 + iter_concat fn sep buf rest 59 + | [] -> () 60 + 61 + let rev_interject e lst = 62 + let rec aux acc = function 63 + | [] -> acc 64 + | x::xs -> aux (x::e::acc) xs 65 + in match lst with 66 + | [] -> [] 67 + | h::t -> aux [h] t 68 + 69 + let compare_opt c t t' = match t, t' with 70 + | None, None -> 0 71 + | Some _, None -> 1 72 + | None, Some _ -> -1 73 + | Some a, Some b -> c a b 74 + 75 + let rec compare_list f t t' = match t, t' with 76 + | [], [] -> 0 77 + | _::_, [] -> 1 78 + | [], _::_ -> -1 79 + | x::xs, y::ys -> 80 + match f x y with 0 -> compare_list f xs ys | c -> c 81 + 82 + (** {1 Safe Characters} *) 83 + 84 + type safe_chars = bool array 85 + 86 + module type Scheme = sig 87 + val safe_chars_for_component : component -> safe_chars 88 + val normalize_host : string -> string 89 + val canonicalize_port : int option -> int option 90 + val canonicalize_path : string list -> string list 91 + end 92 + 93 + module Generic : Scheme = struct 94 + let sub_delims a = 95 + let subd = "!$&'()*+,;=" in 96 + for i = 0 to String.length subd - 1 do 97 + let c = Char.code subd.[i] in 98 + a.(c) <- true 99 + done; 100 + a 101 + 102 + let safe_chars : safe_chars = 103 + let a = Array.make 256 false in 104 + let always_safe = 105 + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-~" in 106 + for i = 0 to String.length always_safe - 1 do 107 + let c = Char.code always_safe.[i] in 108 + a.(c) <- true 109 + done; 110 + a 111 + 112 + let pchar : safe_chars = 113 + let a = sub_delims (Array.copy safe_chars) in 114 + a.(Char.code ':') <- true; 115 + a.(Char.code '@') <- true; 116 + a 117 + 118 + let safe_chars_for_scheme : safe_chars = 119 + let a = Array.copy safe_chars in 120 + a.(Char.code '+') <- true; 121 + a 122 + 123 + let safe_chars_for_path : safe_chars = 124 + let a = sub_delims (Array.copy pchar) in 125 + a.(Char.code '/') <- false; 126 + a 127 + 128 + let safe_chars_for_query : safe_chars = 129 + let a = Array.copy pchar in 130 + a.(Char.code '/') <- true; 131 + a.(Char.code '?') <- true; 132 + a.(Char.code '&') <- false; 133 + a.(Char.code ';') <- false; 134 + a.(Char.code '+') <- false; 135 + a 136 + 137 + let safe_chars_for_query_key : safe_chars = 138 + let a = Array.copy safe_chars_for_query in 139 + a.(Char.code '=') <- false; 140 + a 141 + 142 + let safe_chars_for_query_value : safe_chars = 143 + let a = Array.copy safe_chars_for_query in 144 + a.(Char.code ',') <- false; 145 + a 146 + 147 + let safe_chars_for_fragment : safe_chars = safe_chars_for_query 148 + 149 + let safe_chars_for_userinfo : safe_chars = 150 + let a = Array.copy safe_chars in 151 + a.(Char.code ':') <- false; 152 + a 153 + 154 + let rec safe_chars_for_component = function 155 + | `Path -> safe_chars_for_path 156 + | `Userinfo -> safe_chars_for_userinfo 157 + | `Query -> safe_chars_for_query 158 + | `Query_key -> safe_chars_for_query_key 159 + | `Query_value -> safe_chars_for_query_value 160 + | `Fragment -> safe_chars_for_fragment 161 + | `Scheme -> safe_chars_for_scheme 162 + | `Custom ((component : component), safe, unsafe) -> 163 + let safe_chars = Array.copy (safe_chars_for_component component) in 164 + for i = 0 to String.length safe - 1 do 165 + let c = Char.code safe.[i] in 166 + safe_chars.(c) <- true 167 + done; 168 + for i = 0 to String.length unsafe - 1 do 169 + let c = Char.code unsafe.[i] in 170 + safe_chars.(c) <- false 171 + done; 172 + safe_chars 173 + | `Generic 174 + | _ -> safe_chars 175 + 176 + let normalize_host hso = hso 177 + let canonicalize_port port = port 178 + let canonicalize_path path = path 179 + end 180 + 181 + module Http : Scheme = struct 182 + include Generic 183 + 184 + let normalize_host hs = String.lowercase_ascii hs 185 + 186 + let canonicalize_port = function 187 + | None -> None 188 + | Some 80 -> None 189 + | Some x -> Some x 190 + 191 + let canonicalize_path = function 192 + | [] -> ["/"] 193 + | x -> x 194 + end 195 + 196 + module Https : Scheme = struct 197 + include Http 198 + 199 + let canonicalize_port = function 200 + | None -> None 201 + | Some 443 -> None 202 + | Some x -> Some x 203 + end 204 + 205 + module File : Scheme = struct 206 + include Generic 207 + 208 + let normalize_host hs = 209 + let hs = String.lowercase_ascii hs in 210 + if hs="localhost" then "" else hs 211 + end 212 + 213 + module Urn : Scheme = struct 214 + include Generic 215 + end 216 + 217 + let module_of_scheme = function 218 + | Some s -> begin match String.lowercase_ascii s with 219 + | "http" -> (module Http : Scheme) 220 + | "https" -> (module Https : Scheme) 221 + | "file" -> (module File : Scheme) 222 + | "urn" -> (module Urn : Scheme) 223 + | _ -> (module Generic : Scheme) 224 + end 225 + | None -> (module Generic : Scheme) 226 + 227 + (** {1 Percent Encoding Module} *) 228 + 229 + module Pct : sig 230 + type encoded 231 + type decoded 232 + 233 + val encode : ?scheme:string -> ?component:component -> decoded -> encoded 234 + val decode : encoded -> decoded 235 + 236 + val empty_decoded : decoded 237 + val cast_encoded : string -> encoded 238 + val cast_decoded : string -> decoded 239 + val uncast_encoded : encoded -> string 240 + val uncast_decoded : decoded -> string 241 + val lift_encoded : (encoded -> encoded) -> string -> string 242 + val lift_decoded : (decoded -> decoded) -> string -> string 243 + val unlift_encoded : (string -> string) -> encoded -> encoded 244 + val unlift_decoded : (string -> string) -> decoded -> decoded 245 + val unlift_decoded2 : (string -> string -> 'a) -> decoded -> decoded -> 'a 246 + end = struct 247 + type encoded = string 248 + type decoded = string 249 + 250 + let cast_encoded x = x 251 + let cast_decoded x = x 252 + let empty_decoded = "" 253 + let uncast_decoded x = x 254 + let uncast_encoded x = x 255 + 256 + let lift_encoded f = f 257 + let lift_decoded f = f 258 + let unlift_encoded f = f 259 + let unlift_decoded f = f 260 + let unlift_decoded2 f = f 261 + 262 + let encode ?scheme ?(component=`Path) b = 263 + let module Scheme = (val (module_of_scheme scheme) : Scheme) in 264 + let safe_chars = Scheme.safe_chars_for_component component in 265 + let len = String.length b in 266 + let buf = Buffer.create len in 267 + let rec scan start cur = 268 + if cur >= len then begin 269 + Buffer.add_substring buf b start (cur-start); 270 + end else begin 271 + let c = Char.code b.[cur] in 272 + if safe_chars.(c) then 273 + scan start (cur+1) 274 + else begin 275 + if cur > start then Buffer.add_substring buf b start (cur-start); 276 + Buffer.add_string buf (Printf.sprintf "%%%02X" c); 277 + scan (cur+1) (cur+1) 278 + end 279 + end 280 + in 281 + scan 0 0; 282 + Buffer.contents buf 283 + 284 + let int_of_hex_char c = 285 + let c = int_of_char (Char.uppercase_ascii c) - 48 in 286 + if c > 9 287 + then if c > 16 && c < 23 288 + then c - 7 289 + else failwith "int_of_hex_char" 290 + else if c >= 0 291 + then c 292 + else failwith "int_of_hex_char" 293 + 294 + let decode b = 295 + let len = String.length b in 296 + let buf = Buffer.create len in 297 + let rec scan start cur = 298 + if cur >= len then Buffer.add_substring buf b start (cur-start) 299 + else if b.[cur] = '%' then begin 300 + Buffer.add_substring buf b start (cur-start); 301 + let cur = cur + 1 in 302 + if cur >= len then Buffer.add_char buf '%' 303 + else match int_of_hex_char b.[cur] with 304 + | exception _ -> 305 + Buffer.add_char buf '%'; 306 + scan cur cur 307 + | highbits -> begin 308 + let cur = cur + 1 in 309 + if cur >= len then begin 310 + Buffer.add_char buf '%'; 311 + Buffer.add_char buf b.[cur-1] 312 + end else begin 313 + let start_at = 314 + match int_of_hex_char b.[cur] with 315 + | lowbits -> 316 + Buffer.add_char buf (Char.chr (highbits lsl 4 + lowbits)); 317 + cur+1 318 + | exception _ -> 319 + Buffer.add_char buf '%'; 320 + Buffer.add_char buf b.[cur-1]; 321 + cur 322 + in scan start_at start_at 323 + end 324 + end 325 + end else scan start (cur+1) 326 + in 327 + scan 0 0; 328 + Buffer.contents buf 329 + end 330 + 331 + let pct_encode ?scheme ?(component=`Path) s = 332 + Pct.(uncast_encoded (encode ?scheme ~component (cast_decoded s))) 333 + 334 + let pct_encoder 335 + ?(scheme=`Scheme) 336 + ?(userinfo=`Userinfo) 337 + ?(host=`Host) 338 + ?(path=`Path) 339 + ?(query_key=`Query_key) 340 + ?(query_value=`Query_value) 341 + ?(fragment=`Fragment) 342 + () = 343 + { scheme; userinfo; host; path; query_key; query_value; fragment } 344 + 345 + let pct_decode s = Pct.(uncast_decoded (decode (cast_encoded s))) 346 + 347 + (** {1 Userinfo Module} *) 348 + 349 + module Userinfo = struct 350 + type t = string * string option 351 + 352 + let compare (u,p) (u',p') = 353 + match String.compare u u' with 354 + | 0 -> compare_opt String.compare p p' 355 + | c -> c 356 + 357 + let userinfo_of_encoded us = 358 + match String.split_on_char ':' us with 359 + | [] -> ("",None) 360 + | [u] -> (pct_decode u,None) 361 + | u::rest -> (pct_decode u, Some (pct_decode (String.concat ":" rest))) 362 + 363 + let encoded_of_userinfo ?scheme ~component (u,po) = 364 + let len = String.( 365 + 1 + (length u) + (match po with None -> 0 | Some p -> length p)) 366 + in 367 + let buf = Buffer.create len in 368 + Buffer.add_string buf (pct_encode ?scheme ~component u); 369 + begin match po with None -> (); 370 + | Some p -> 371 + Buffer.add_char buf ':'; 372 + Buffer.add_string buf (pct_encode ?scheme ~component p) 373 + end; 374 + Pct.cast_encoded (Buffer.contents buf) 375 + end 376 + 377 + let userinfo_of_encoded = Userinfo.userinfo_of_encoded 378 + let encoded_of_userinfo ?scheme ~component = Userinfo.encoded_of_userinfo ?scheme ~component 379 + 380 + (** {1 Path Module} *) 381 + 382 + module Path = struct 383 + type t = string list 384 + 385 + let compare = compare_list String.compare 386 + 387 + let path_of_encoded ps = 388 + (* Split on '/' keeping empty strings *) 389 + let rec split acc i = 390 + match String.index_from_opt ps i '/' with 391 + | None -> 392 + let last = String.sub ps i (String.length ps - i) in 393 + List.rev (pct_decode last :: acc) 394 + | Some j -> 395 + let seg = String.sub ps i (j - i) in 396 + split (pct_decode seg :: "/" :: acc) (j + 1) 397 + in 398 + if ps = "" then [] 399 + else if String.length ps > 0 && ps.[0] = '/' then 400 + "/" :: (split [] 1) 401 + else 402 + split [] 0 403 + 404 + let remove_dot_segments p = 405 + let revp = List.rev p in 406 + let rec loop ascension outp = function 407 + | "/"::".."::r | ".."::r -> loop (ascension + 1) outp r 408 + | "/"::"."::r | "."::r -> loop ascension outp r 409 + | "/"::[] | [] when List.(length p > 0 && hd p = "/") -> "/"::outp 410 + | [] when ascension > 0 -> List.rev_append 411 + ("/"::(rev_interject "/" Array.(to_list (make ascension "..")))) outp 412 + | [] -> List.(if length outp > 0 && hd outp = "/" then tl outp else outp) 413 + | "/"::"/"::r when ascension > 0 -> loop (ascension - 1) outp ("/"::r) 414 + | "/"::_::r when ascension > 0 -> loop (ascension - 1) outp r 415 + | s::r -> loop 0 (s::outp) r 416 + in loop 0 [] revp 417 + 418 + let encoded_of_path ?scheme ~component p = 419 + let len = List.fold_left (fun c tok -> String.length tok + c) 0 p in 420 + let buf = Buffer.create len in 421 + iter_concat (fun buf -> function 422 + | "/" -> Buffer.add_char buf '/' 423 + | seg -> Buffer.add_string buf (pct_encode ?scheme ~component seg) 424 + ) "" buf p; 425 + Pct.cast_encoded (Buffer.contents buf) 426 + 427 + let merge bhost bpath relpath = 428 + match bhost, List.rev bpath with 429 + | Some _, [] -> "/"::relpath 430 + | _, ("/"::rbpath | _::"/"::rbpath) -> List.rev_append ("/"::rbpath) relpath 431 + | _, _ -> relpath 432 + end 433 + 434 + let path_of_encoded = Path.path_of_encoded 435 + let encoded_of_path ?scheme ~component = Path.encoded_of_path ?scheme ~component 436 + 437 + (** {1 Query Module} *) 438 + 439 + module Query = struct 440 + type kv = (string * string list) list 441 + 442 + type t = 443 + | KV of kv 444 + | Raw of string option * kv Lazy.t 445 + 446 + let compare x y = match x, y with 447 + | KV kvl, KV kvl' 448 + | Raw (_, lazy kvl), KV kvl' 449 + | KV kvl, Raw (_, lazy kvl') -> 450 + compare_list (fun (k,vl) (k',vl') -> 451 + match String.compare k k' with 452 + | 0 -> compare_list String.compare vl vl' 453 + | c -> c 454 + ) kvl kvl' 455 + | Raw (raw,_), Raw (raw',_) -> compare_opt String.compare raw raw' 456 + 457 + let find q k = try Some (List.assoc k q) with Not_found -> None 458 + 459 + let split_query qs = 460 + let els = String.split_on_char '&' qs in 461 + let plus_to_space s = 462 + let b = Bytes.of_string s in 463 + for i = 0 to Bytes.length b - 1 do 464 + if Bytes.get b i = '+' then Bytes.set b i ' ' 465 + done; 466 + Bytes.to_string b 467 + in 468 + let rec loop acc = function 469 + | [] -> acc 470 + | el :: tl -> 471 + let parts = String.split_on_char '=' el in 472 + let n = match parts with 473 + | k::v::_ -> 474 + plus_to_space k, 475 + (match String.split_on_char ',' (plus_to_space v) with 476 + | [] -> [""] | l -> l) 477 + | [k] -> plus_to_space k, [] 478 + | [] -> "", [] 479 + in 480 + loop (n::acc) tl 481 + in 482 + match els with 483 + | [] -> ["",[]] 484 + | [""] -> ["",[]] 485 + | els -> loop [] (List.rev els) 486 + 487 + let query_of_encoded qs = 488 + List.map 489 + (fun (k, v) -> (pct_decode k, List.map pct_decode v)) 490 + (split_query qs) 491 + 492 + let encoded_of_query ?scheme ?(pct_encoder=pct_encoder ()) l = 493 + let len = List.fold_left (fun a (k,v) -> 494 + a + (String.length k) 495 + + (List.fold_left (fun a s -> a+(String.length s)+1) 0 v) + 2) (-1) l in 496 + let buf = Buffer.create len in 497 + iter_concat (fun buf (k,v) -> 498 + Buffer.add_string buf (pct_encode ?scheme ~component:pct_encoder.query_key k); 499 + if v <> [] then ( 500 + Buffer.add_char buf '='; 501 + iter_concat (fun buf s -> 502 + Buffer.add_string buf 503 + (pct_encode ?scheme ~component:pct_encoder.query_value s) 504 + ) "," buf v) 505 + ) "&" buf l; 506 + Buffer.contents buf 507 + 508 + let of_raw qs = 509 + let lazy_query = Lazy.from_fun (fun () -> query_of_encoded qs) in 510 + Raw (Some qs, lazy_query) 511 + 512 + let kv = function Raw (_, lazy kv) | KV kv -> kv 513 + end 514 + 515 + let query_of_encoded = Query.query_of_encoded 516 + let encoded_of_query ?scheme = Query.encoded_of_query ?scheme 517 + 518 + (** {1 URI Type} *) 519 + 520 + type t = { 521 + scheme: Pct.decoded option; 522 + userinfo: Userinfo.t option; 523 + host: [ `Ipv4_literal of string 524 + | `Ipv6_literal of string 525 + | `Host of Pct.decoded] option ; 526 + port: int option; 527 + path: Path.t; 528 + query: Query.t; 529 + fragment: Pct.decoded option; 530 + } 531 + 532 + let empty = { 533 + scheme = None; 534 + userinfo = None; 535 + host = None; 536 + port = None; 537 + path = []; 538 + query = Query.Raw (None, Lazy.from_val []); 539 + fragment = None; 540 + } 541 + 542 + let compare_decoded = Pct.unlift_decoded2 String.compare 543 + let compare_decoded_opt = compare_opt compare_decoded 544 + 545 + let compare_host h1 h2 = 546 + match h1, h2 with 547 + | `Ipv4_literal ip1, `Ipv4_literal ip2 -> String.compare ip1 ip2 548 + | `Ipv6_literal ip1, `Ipv6_literal ip2 -> String.compare ip1 ip2 549 + | `Host h1, `Host h2 -> compare_decoded h1 h2 550 + | _ -> -1 551 + 552 + let compare_host_opt = compare_opt compare_host 553 + 554 + let compare t t' = 555 + (match compare_host_opt t.host t'.host with 556 + | 0 -> (match compare_decoded_opt t.scheme t'.scheme with 557 + | 0 -> (match compare_opt (fun p p' -> 558 + if p < p' then -1 else if p > p' then 1 else 0 559 + ) t.port t'.port with 560 + | 0 -> (match compare_opt Userinfo.compare t.userinfo t'.userinfo with 561 + | 0 -> (match Path.compare t.path t'.path with 562 + | 0 -> (match Query.compare t.query t'.query with 563 + | 0 -> compare_decoded_opt t.fragment t'.fragment 564 + | c -> c) 565 + | c -> c) 566 + | c -> c) 567 + | c -> c) 568 + | c -> c) 569 + | c -> c) 570 + 571 + let equal t t' = compare t t' = 0 572 + 573 + let uncast_opt = function 574 + | Some h -> Some (Pct.uncast_decoded h) 575 + | None -> None 576 + 577 + let normalize schem uri = 578 + let module Scheme = 579 + (val (module_of_scheme (uncast_opt schem)) : Scheme) in 580 + let dob f = function 581 + | Some x -> Some (Pct.unlift_decoded f x) 582 + | None -> None 583 + in {uri with 584 + scheme=dob String.lowercase_ascii uri.scheme; 585 + host= match uri.host with 586 + | Some (`Ipv4_literal host) -> 587 + Some (`Ipv4_literal (Scheme.normalize_host host)) 588 + | Some (`Ipv6_literal host) -> 589 + Some (`Ipv6_literal (Scheme.normalize_host host)) 590 + | Some (`Host host) -> 591 + Some (`Host (Pct.cast_decoded (Scheme.normalize_host (Pct.uncast_decoded host)))) 592 + | None -> None 593 + } 594 + 595 + (** {1 URI to String Conversion} *) 596 + 597 + let to_string ?(pct_encoder=pct_encoder ()) uri = 598 + let scheme = match uri.scheme with 599 + | Some s -> Some (Pct.uncast_decoded s) 600 + | None -> None in 601 + let buf = Buffer.create 128 in 602 + let add_pct_string ?(component=`Path) x = 603 + Buffer.add_string buf (Pct.uncast_encoded (Pct.encode ?scheme ~component x)) 604 + in 605 + (match uri.scheme with 606 + |None -> () 607 + |Some x -> 608 + add_pct_string ~component:pct_encoder.scheme x; 609 + Buffer.add_char buf ':' 610 + ); 611 + if (match uri.userinfo, uri.host, uri.port with 612 + | Some _, _, _ | _, Some _, _ | _, _, Some _ -> true | _ -> false) 613 + then Buffer.add_string buf "//"; 614 + (match uri.userinfo with 615 + |None -> () 616 + |Some userinfo -> 617 + Buffer.add_string buf 618 + (Pct.uncast_encoded (encoded_of_userinfo ?scheme ~component:pct_encoder.userinfo userinfo)); 619 + Buffer.add_char buf '@' 620 + ); 621 + (match uri.host with 622 + |None -> () 623 + |Some (`Host host) -> 624 + add_pct_string ~component:pct_encoder.host host; 625 + |Some (`Ipv4_literal host) -> Buffer.add_string buf host 626 + |Some (`Ipv6_literal host) -> 627 + Buffer.add_char buf '['; 628 + Buffer.add_string buf host; 629 + Buffer.add_char buf ']' 630 + ); 631 + (match uri.port with 632 + |None -> () 633 + |Some port -> 634 + Buffer.add_char buf ':'; 635 + Buffer.add_string buf (string_of_int port) 636 + ); 637 + (match uri.path with 638 + | [] -> () 639 + | "/"::_ -> 640 + Buffer.add_string buf (Pct.uncast_encoded 641 + (encoded_of_path ?scheme ~component:pct_encoder.path uri.path)) 642 + | first_segment::_ -> 643 + (match uri.host with 644 + | Some _ -> Buffer.add_char buf '/' 645 + | None -> 646 + match String.index_opt first_segment ':' with 647 + | None -> () 648 + | Some _ -> match scheme with 649 + | Some _ -> () 650 + | None -> Buffer.add_string buf "./" 651 + ); 652 + Buffer.add_string buf 653 + (Pct.uncast_encoded (encoded_of_path ?scheme ~component:pct_encoder.path uri.path)) 654 + ); 655 + Query.(match uri.query with 656 + | Raw (None,_) | KV [] -> () 657 + | Raw (_,lazy q) | KV q -> 658 + Buffer.add_char buf '?'; 659 + Buffer.add_string buf (encoded_of_query ?scheme ~pct_encoder q) 660 + ); 661 + (match uri.fragment with 662 + |None -> () 663 + |Some f -> Buffer.add_char buf '#'; add_pct_string ~component:pct_encoder.fragment f 664 + ); 665 + Buffer.contents buf 666 + 667 + (** {1 Accessor Functions} *) 668 + 669 + let get_decoded_opt = function None -> None |Some x -> Some (Pct.uncast_decoded x) 670 + let scheme uri = get_decoded_opt uri.scheme 671 + 672 + let with_scheme uri = 673 + function 674 + |Some scheme -> { uri with scheme=Some (Pct.cast_decoded scheme) } 675 + |None -> { uri with scheme=None } 676 + 677 + let host uri = 678 + match uri.host with 679 + | None -> None 680 + | Some (`Ipv4_literal h | `Ipv6_literal h) -> Some h 681 + | Some (`Host h) -> Some (Pct.uncast_decoded h) 682 + 683 + let host_with_default ?(default="localhost") uri = 684 + match host uri with 685 + |None -> default 686 + |Some h -> h 687 + 688 + let userinfo ?(pct_encoder=pct_encoder ()) uri = match uri.userinfo with 689 + | None -> None 690 + | Some userinfo -> Some (Pct.uncast_encoded (match uri.scheme with 691 + | None -> encoded_of_userinfo ~component:pct_encoder.userinfo userinfo 692 + | Some s -> encoded_of_userinfo ~scheme:(Pct.uncast_decoded s) ~component:pct_encoder.userinfo userinfo)) 693 + 694 + let with_userinfo uri userinfo = 695 + let userinfo = match userinfo with 696 + | Some u -> Some (userinfo_of_encoded u) 697 + | None -> None 698 + in 699 + match host uri with 700 + | None -> { uri with host=Some (`Host (Pct.cast_decoded "")); userinfo=userinfo } 701 + | Some _ -> { uri with userinfo=userinfo } 702 + 703 + let user uri = match uri.userinfo with 704 + | None -> None 705 + | Some (user, _) -> Some user 706 + 707 + let password uri = match uri.userinfo with 708 + | None | Some (_, None) -> None 709 + | Some (_, Some pass) -> Some pass 710 + 711 + let with_password uri password = 712 + let result userinfo = match host uri with 713 + | None -> { uri with host=Some (`Host (Pct.cast_decoded "")); userinfo=userinfo } 714 + | Some _ -> { uri with userinfo=userinfo } 715 + in 716 + match uri.userinfo, password with 717 + | None, None -> uri 718 + | None, Some _ -> result (Some ("",password)) 719 + | Some (user,_), _ -> result (Some (user, password)) 720 + 721 + let port uri = uri.port 722 + 723 + let with_port uri port = 724 + match host uri with 725 + | Some _ -> { uri with port=port } 726 + | None -> begin 727 + match port with 728 + | None -> { uri with host=None; port=None } 729 + | Some _ -> { uri with host=Some (`Host (Pct.cast_decoded "")); port=port } 730 + end 731 + 732 + let path ?(pct_encoder=pct_encoder ()) uri = Pct.uncast_encoded (match uri.scheme with 733 + | None -> encoded_of_path ~component:pct_encoder.path uri.path 734 + | Some s -> encoded_of_path ~scheme:(Pct.uncast_decoded s) ~component:pct_encoder.path uri.path) 735 + 736 + let with_path uri path = 737 + let path = path_of_encoded path in 738 + match host uri, path with 739 + | None, _ | Some _, "/"::_ | Some _, [] -> { uri with path=path } 740 + | Some _, _ -> { uri with path="/"::path } 741 + 742 + let fragment uri = get_decoded_opt uri.fragment 743 + 744 + let with_fragment uri = 745 + function 746 + |None -> { uri with fragment=None } 747 + |Some frag -> { uri with fragment=Some (Pct.cast_decoded frag) } 748 + 749 + let query uri = Query.kv uri.query 750 + 751 + let verbatim_query ?(pct_encoder=pct_encoder ()) uri = Query.(match uri.query with 752 + | Raw (qs,_) -> qs 753 + | KV [] -> None 754 + | KV kv -> Some (encoded_of_query ?scheme:(scheme uri) ~pct_encoder kv) 755 + ) 756 + 757 + let get_query_param' uri k = Query.(find (kv uri.query) k) 758 + 759 + let get_query_param uri k = 760 + match get_query_param' uri k with 761 + |None -> None 762 + |Some v -> Some (String.concat "," v) 763 + 764 + let with_query uri query = { uri with query=Query.KV query } 765 + let q_s q = List.map (fun (k,v) -> k,[v]) q 766 + let with_query' uri query = with_query uri (q_s query) 767 + let add_query_param uri p = Query.({ uri with query=KV (p::(kv uri.query)) }) 768 + let add_query_param' uri (k,v) = 769 + Query.({ uri with query=KV ((k,[v])::(kv uri.query)) }) 770 + let add_query_params uri ps = Query.({ uri with query=KV (ps@(kv uri.query)) }) 771 + let add_query_params' uri ps = 772 + Query.({ uri with query=KV ((q_s ps)@(kv uri.query)) }) 773 + let remove_query_param uri k = Query.( 774 + { uri with query=KV (List.filter (fun (k',_) -> k<>k') (kv uri.query)) } 775 + ) 776 + 777 + let path_and_query uri = 778 + match (path uri), (query uri) with 779 + |"", [] -> "/" 780 + |"", q -> 781 + let scheme = uncast_opt uri.scheme in 782 + Printf.sprintf "/?%s" (encoded_of_query ?scheme q) 783 + |p, [] -> p 784 + |p, q -> 785 + let scheme = uncast_opt uri.scheme in 786 + Printf.sprintf "%s?%s" p (encoded_of_query ?scheme q) 787 + 788 + (** {1 URI Resolution (RFC 3986 Section 5)} *) 789 + 790 + let resolve schem base uri = 791 + let schem = Some (Pct.cast_decoded (match scheme base with 792 + | None -> schem 793 + | Some scheme -> scheme 794 + )) in 795 + normalize schem 796 + Path.(match scheme uri, userinfo uri, host uri with 797 + | Some _, _, _ -> 798 + {uri with path=remove_dot_segments uri.path} 799 + | None, Some _, _ 800 + | None, _, Some _ -> 801 + {uri with scheme=base.scheme; path=remove_dot_segments uri.path} 802 + | None, None, None -> 803 + let uri = {uri with scheme=base.scheme; userinfo=base.userinfo; 804 + host=base.host; port=base.port} in 805 + let path_str = path uri in 806 + if path_str="" 807 + then { uri with 808 + path=base.path; 809 + query=match uri.query with 810 + | Query.Raw (None,_) | Query.KV [] -> base.query 811 + | _ -> uri.query 812 + } 813 + else if path_str.[0]='/' 814 + then {uri with path=remove_dot_segments uri.path} 815 + else {uri with 816 + path=remove_dot_segments (merge base.host base.path uri.path); 817 + } 818 + ) 819 + 820 + let canonicalize uri = 821 + let uri = resolve "" empty uri in 822 + let module Scheme = 823 + (val (module_of_scheme (uncast_opt uri.scheme)) : Scheme) in 824 + { uri with 825 + port=Scheme.canonicalize_port uri.port; 826 + path=Scheme.canonicalize_path uri.path; 827 + } 828 + 829 + let pp ppf uri = Format.pp_print_string ppf (to_string uri) 830 + let pp_hum ppf uri = Format.pp_print_string ppf (to_string uri) 831 + 832 + (** {1 Buf_read Parser Module} 833 + 834 + URI parsing using Eio.Buf_read combinators instead of Angstrom. 835 + This provides consistency with the HTTP parsing stack. *) 836 + 837 + module Parser = struct 838 + module Read = Eio.Buf_read 839 + 840 + (** {2 Character Predicates} *) 841 + 842 + let is_digit = function '0'..'9' -> true | _ -> false 843 + 844 + let is_hex_digit = function 845 + | '0'..'9' | 'A'..'F' | 'a'..'f' -> true 846 + | _ -> false 847 + 848 + let is_unreserved = function 849 + | 'A'..'Z' | 'a'..'z' | '0'..'9' | '-' | '.' | '_' | '~' -> true 850 + | _ -> false 851 + 852 + let is_sub_delim = function 853 + | '!' | '$' | '&' | '\'' | '(' | ')' | '*' | '+' | ',' | ';' | '=' -> true 854 + | _ -> false 855 + 856 + let is_scheme_char = function 857 + | 'A'..'Z' | 'a'..'z' | '0'..'9' | '+' | '-' | '.' -> true 858 + | _ -> false 859 + 860 + (** {2 Low-level Parsers} *) 861 + 862 + (** Take characters while predicate holds *) 863 + let take_while_pred pred r = 864 + Read.take_while pred r 865 + 866 + (** Parse a percent-encoded character *) 867 + let pct_encoded r = 868 + Read.char '%' r; 869 + let h1 = Read.any_char r in 870 + let h2 = Read.any_char r in 871 + if is_hex_digit h1 && is_hex_digit h2 then 872 + String.make 1 '%' ^ String.make 1 h1 ^ String.make 1 h2 873 + else 874 + failwith "Invalid percent encoding" 875 + 876 + (** Parse a decimal octet (0-255) *) 877 + let dec_octet r = 878 + let num = take_while_pred is_digit r in 879 + if num = "" then failwith "Expected decimal octet"; 880 + let n = int_of_string num in 881 + if n < 256 then num 882 + else failwith "Invalid octet (> 255)" 883 + 884 + (** Parse IPv4 address *) 885 + let ipv4 r = 886 + let o1 = dec_octet r in 887 + Read.char '.' r; 888 + let o2 = dec_octet r in 889 + Read.char '.' r; 890 + let o3 = dec_octet r in 891 + Read.char '.' r; 892 + let o4 = dec_octet r in 893 + o1 ^ "." ^ o2 ^ "." ^ o3 ^ "." ^ o4 894 + 895 + (** Parse hexadecimal segment for IPv6 *) 896 + let hexadecimal r = 897 + take_while_pred is_hex_digit r 898 + 899 + (** Parse IPv6 address (simplified - accepts common formats) *) 900 + let ipv6 r = 901 + let buf = Buffer.create 39 in 902 + let rec parse_parts count = 903 + if count >= 8 then () 904 + else begin 905 + let hex = hexadecimal r in 906 + Buffer.add_string buf hex; 907 + match Read.peek_char r with 908 + | Some ':' -> 909 + Read.char ':' r; 910 + Buffer.add_char buf ':'; 911 + (* Check for :: *) 912 + (match Read.peek_char r with 913 + | Some ':' -> 914 + Read.char ':' r; 915 + Buffer.add_char buf ':'; 916 + parse_after_double_colon (count + 1) 917 + | _ -> parse_parts (count + 1)) 918 + | _ -> () 919 + end 920 + and parse_after_double_colon count = 921 + if count >= 8 then () 922 + else begin 923 + match Read.peek_char r with 924 + | Some c when is_hex_digit c -> 925 + let hex = hexadecimal r in 926 + Buffer.add_string buf hex; 927 + (match Read.peek_char r with 928 + | Some ':' -> 929 + Read.char ':' r; 930 + Buffer.add_char buf ':'; 931 + parse_after_double_colon (count + 1) 932 + | _ -> ()) 933 + | _ -> () 934 + end 935 + in 936 + (* Handle leading :: *) 937 + (match Read.peek_char r with 938 + | Some ':' -> 939 + Read.char ':' r; 940 + Buffer.add_char buf ':'; 941 + (match Read.peek_char r with 942 + | Some ':' -> 943 + Read.char ':' r; 944 + Buffer.add_char buf ':'; 945 + parse_after_double_colon 0 946 + | _ -> failwith "Expected :: at start of IPv6") 947 + | _ -> parse_parts 0); 948 + Buffer.contents buf 949 + 950 + (** Parse IPv6 address in brackets *) 951 + let ipv6_address r = 952 + Read.char '[' r; 953 + let addr = ipv6 r in 954 + Read.char ']' r; 955 + addr 956 + 957 + (** Parse registered name (hostname) *) 958 + let reg_name r = 959 + let buf = Buffer.create 64 in 960 + let rec loop () = 961 + match Read.peek_char r with 962 + | Some c when is_unreserved c || is_sub_delim c -> 963 + Buffer.add_char buf (Read.any_char r); 964 + loop () 965 + | Some '%' -> 966 + Buffer.add_string buf (pct_encoded r); 967 + loop () 968 + | _ -> () 969 + in 970 + loop (); 971 + Buffer.contents buf 972 + 973 + (** Parse host (IPv4, IPv6, or reg-name) *) 974 + let host r = 975 + match Read.peek_char r with 976 + | Some '[' -> 977 + let addr = ipv6_address r in 978 + `Ipv6_literal addr 979 + | Some c when is_digit c -> 980 + (* Try IPv4, fall back to reg_name *) 981 + let start_pos = Read.buffered_bytes r in 982 + (try 983 + let addr = ipv4 r in 984 + (* Verify it's a complete IPv4 (next char is delimiter) *) 985 + (match Read.peek_char r with 986 + | None | Some ':' | Some '/' | Some '?' | Some '#' -> 987 + `Ipv4_literal addr 988 + | _ -> 989 + (* Not a valid IPv4 delimiter, treat as reg_name *) 990 + failwith "Not IPv4") 991 + with _ -> 992 + (* Backtrack isn't possible with Buf_read, so we need different approach *) 993 + (* For simplicity, if it starts with digit but isn't valid IPv4, parse as reg_name *) 994 + let _ = start_pos in (* suppress warning *) 995 + `Host (Pct.decode (Pct.cast_encoded (reg_name r)))) 996 + | _ -> 997 + let name = reg_name r in 998 + `Host (Pct.decode (Pct.cast_encoded name)) 999 + 1000 + (** Parse userinfo *) 1001 + let userinfo r = 1002 + let buf = Buffer.create 64 in 1003 + let rec loop () = 1004 + match Read.peek_char r with 1005 + | Some c when is_unreserved c || is_sub_delim c || c = ':' -> 1006 + Buffer.add_char buf (Read.any_char r); 1007 + loop () 1008 + | Some '%' -> 1009 + Buffer.add_string buf (pct_encoded r); 1010 + loop () 1011 + | Some '@' -> 1012 + Read.char '@' r; 1013 + Some (Userinfo.userinfo_of_encoded (Buffer.contents buf)) 1014 + | _ -> None 1015 + in 1016 + loop () 1017 + 1018 + (** Parse port number *) 1019 + let port r = 1020 + match Read.peek_char r with 1021 + | Some ':' -> 1022 + Read.char ':' r; 1023 + let port_str = take_while_pred is_digit r in 1024 + if port_str = "" then None 1025 + else (try Some (int_of_string port_str) with _ -> None) 1026 + | _ -> None 1027 + 1028 + (** Parse authority (//userinfo@host:port) *) 1029 + let authority r = 1030 + match Read.peek_char r with 1031 + | Some '/' -> 1032 + (match Read.peek_char r with 1033 + | _ -> 1034 + Read.char '/' r; 1035 + (match Read.peek_char r with 1036 + | Some '/' -> 1037 + Read.char '/' r; 1038 + (* We have // - parse authority *) 1039 + (* Try to parse userinfo first by looking ahead for @ *) 1040 + let ui = userinfo r in 1041 + let h = host r in 1042 + let p = port r in 1043 + (ui, Some h, p) 1044 + | _ -> 1045 + (* Just a single / - not authority, put it back conceptually *) 1046 + (None, None, None))) 1047 + | _ -> (None, None, None) 1048 + 1049 + (** Parse scheme *) 1050 + let parse_scheme r = 1051 + let buf = Buffer.create 16 in 1052 + let rec loop () = 1053 + match Read.peek_char r with 1054 + | Some c when is_scheme_char c -> 1055 + Buffer.add_char buf (Read.any_char r); 1056 + loop () 1057 + | Some ':' -> 1058 + Read.char ':' r; 1059 + let s = Buffer.contents buf in 1060 + if s = "" then None 1061 + else Some (Pct.decode (Pct.cast_encoded s)) 1062 + | _ -> None 1063 + in 1064 + (* First char must be alpha for scheme *) 1065 + match Read.peek_char r with 1066 + | Some c when (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') -> 1067 + loop () 1068 + | _ -> None 1069 + 1070 + (** Parse path *) 1071 + let parse_path r = 1072 + let path_str = take_while_pred (fun c -> c <> '?' && c <> '#') r in 1073 + Path.path_of_encoded path_str 1074 + 1075 + (** Parse query *) 1076 + let parse_query r = 1077 + match Read.peek_char r with 1078 + | Some '?' -> 1079 + Read.char '?' r; 1080 + let qs = take_while_pred (fun c -> c <> '#') r in 1081 + Query.of_raw qs 1082 + | _ -> Query.Raw (None, Lazy.from_val []) 1083 + 1084 + (** Parse fragment *) 1085 + let parse_fragment r = 1086 + match Read.peek_char r with 1087 + | Some '#' -> 1088 + Read.char '#' r; 1089 + let frag = take_while_pred (fun _ -> true) r in 1090 + Some (Pct.decode (Pct.cast_encoded frag)) 1091 + | _ -> None 1092 + 1093 + (** Parse complete URI reference *) 1094 + let uri_reference r = 1095 + (* Check for scheme first *) 1096 + let scheme_opt = 1097 + (* Look ahead for scheme (letters followed by :) *) 1098 + let buf = Buffer.create 16 in 1099 + let rec check_scheme () = 1100 + match Read.peek_char r with 1101 + | Some c when is_scheme_char c -> 1102 + Buffer.add_char buf (Read.any_char r); 1103 + check_scheme () 1104 + | Some ':' -> 1105 + Read.char ':' r; 1106 + let s = Buffer.contents buf in 1107 + if s <> "" && String.length s > 0 && 1108 + ((s.[0] >= 'A' && s.[0] <= 'Z') || (s.[0] >= 'a' && s.[0] <= 'z')) 1109 + then Some (Pct.decode (Pct.cast_encoded s)) 1110 + else None 1111 + | _ -> 1112 + (* Not a scheme, we need to "unread" - but we can't with Buf_read *) 1113 + (* For now, return None and the buffer contents become part of path *) 1114 + None 1115 + in 1116 + check_scheme () 1117 + in 1118 + let (ui, h, p) = authority r in 1119 + let path = parse_path r in 1120 + let query = parse_query r in 1121 + let fragment = parse_fragment r in 1122 + normalize scheme_opt { scheme = scheme_opt; userinfo = ui; host = h; 1123 + port = p; path; query; fragment } 1124 + end 1125 + 1126 + (** {1 String Parsing} 1127 + 1128 + Parse URI from string using a simpler direct approach for reliability. *) 1129 + 1130 + let decode_host host = 1131 + (* Simple host parsing without Buf_read for from-string conversion *) 1132 + if String.length host > 0 && host.[0] = '[' then 1133 + (* IPv6 in brackets *) 1134 + let len = String.length host in 1135 + if len > 2 && host.[len-1] = ']' then 1136 + `Ipv6_literal (String.sub host 1 (len - 2)) 1137 + else 1138 + `Host (Pct.cast_decoded host) 1139 + else 1140 + (* Check if it looks like IPv4 *) 1141 + let parts = String.split_on_char '.' host in 1142 + if List.length parts = 4 && 1143 + List.for_all (fun p -> 1144 + p <> "" && String.length p <= 3 && 1145 + String.for_all (fun c -> c >= '0' && c <= '9') p && 1146 + int_of_string p < 256) parts 1147 + then 1148 + `Ipv4_literal host 1149 + else 1150 + `Host (Pct.cast_decoded host) 1151 + 1152 + (** Parse URI from string - direct implementation for reliability *) 1153 + let of_string s = 1154 + (* Handle newlines - only parse until first newline *) 1155 + let s = match String.index_opt s '\n' with 1156 + | Some i -> String.sub s 0 i 1157 + | None -> s 1158 + in 1159 + if s = "" then empty 1160 + else 1161 + (* Parse scheme *) 1162 + let scheme, rest = 1163 + match String.index_opt s ':' with 1164 + | None -> (None, s) 1165 + | Some colon_pos -> 1166 + let before_colon = String.sub s 0 colon_pos in 1167 + (* Scheme must start with letter and contain only scheme chars *) 1168 + if String.length before_colon > 0 && 1169 + ((before_colon.[0] >= 'A' && before_colon.[0] <= 'Z') || 1170 + (before_colon.[0] >= 'a' && before_colon.[0] <= 'z')) && 1171 + String.for_all (function 1172 + | 'A'..'Z' | 'a'..'z' | '0'..'9' | '+' | '-' | '.' -> true 1173 + | _ -> false) before_colon 1174 + then 1175 + (Some (Pct.cast_decoded before_colon), 1176 + String.sub s (colon_pos + 1) (String.length s - colon_pos - 1)) 1177 + else 1178 + (None, s) 1179 + in 1180 + (* Parse authority *) 1181 + let userinfo, host, port, rest = 1182 + if String.length rest >= 2 && rest.[0] = '/' && rest.[1] = '/' then 1183 + let auth_start = 2 in 1184 + (* Find end of authority *) 1185 + let auth_end = 1186 + let rec find i = 1187 + if i >= String.length rest then i 1188 + else match rest.[i] with 1189 + | '/' | '?' | '#' -> i 1190 + | _ -> find (i + 1) 1191 + in 1192 + find auth_start 1193 + in 1194 + let auth = String.sub rest auth_start (auth_end - auth_start) in 1195 + let after_auth = String.sub rest auth_end (String.length rest - auth_end) in 1196 + (* Parse userinfo@host:port *) 1197 + let userinfo, hostport = 1198 + match String.index_opt auth '@' with 1199 + | Some at_pos -> 1200 + let ui = String.sub auth 0 at_pos in 1201 + let hp = String.sub auth (at_pos + 1) (String.length auth - at_pos - 1) in 1202 + (Some (Userinfo.userinfo_of_encoded ui), hp) 1203 + | None -> (None, auth) 1204 + in 1205 + (* Parse host:port - handle IPv6 [::1]:port *) 1206 + let host, port = 1207 + if String.length hostport > 0 && hostport.[0] = '[' then 1208 + (* IPv6 address *) 1209 + match String.index_opt hostport ']' with 1210 + | Some bracket_pos -> 1211 + let h = String.sub hostport 0 (bracket_pos + 1) in 1212 + let after_bracket = String.sub hostport (bracket_pos + 1) 1213 + (String.length hostport - bracket_pos - 1) in 1214 + let p = 1215 + if String.length after_bracket > 0 && after_bracket.[0] = ':' then 1216 + let port_str = String.sub after_bracket 1 1217 + (String.length after_bracket - 1) in 1218 + (try Some (int_of_string port_str) with _ -> None) 1219 + else None 1220 + in 1221 + (Some (decode_host h), p) 1222 + | None -> (Some (decode_host hostport), None) 1223 + else 1224 + (* Regular host or IPv4 *) 1225 + match String.rindex_opt hostport ':' with 1226 + | Some colon_pos -> 1227 + let h = String.sub hostport 0 colon_pos in 1228 + let port_str = String.sub hostport (colon_pos + 1) 1229 + (String.length hostport - colon_pos - 1) in 1230 + let p = try Some (int_of_string port_str) with _ -> None in 1231 + (Some (decode_host h), p) 1232 + | None -> 1233 + if hostport = "" then (None, None) 1234 + else (Some (decode_host hostport), None) 1235 + in 1236 + (userinfo, host, port, after_auth) 1237 + else 1238 + (None, None, None, rest) 1239 + in 1240 + (* Parse path, query, fragment *) 1241 + let path, rest = 1242 + let path_end = 1243 + let rec find i = 1244 + if i >= String.length rest then i 1245 + else match rest.[i] with 1246 + | '?' | '#' -> i 1247 + | _ -> find (i + 1) 1248 + in 1249 + find 0 1250 + in 1251 + let p = String.sub rest 0 path_end in 1252 + let r = String.sub rest path_end (String.length rest - path_end) in 1253 + (Path.path_of_encoded p, r) 1254 + in 1255 + let query, rest = 1256 + if String.length rest > 0 && rest.[0] = '?' then 1257 + let query_end = 1258 + match String.index_opt rest '#' with 1259 + | Some i -> i 1260 + | None -> String.length rest 1261 + in 1262 + let q = String.sub rest 1 (query_end - 1) in 1263 + let r = String.sub rest query_end (String.length rest - query_end) in 1264 + (Query.of_raw q, r) 1265 + else 1266 + (Query.Raw (None, Lazy.from_val []), rest) 1267 + in 1268 + let fragment = 1269 + if String.length rest > 0 && rest.[0] = '#' then 1270 + let f = String.sub rest 1 (String.length rest - 1) in 1271 + Some (Pct.decode (Pct.cast_encoded f)) 1272 + else 1273 + None 1274 + in 1275 + normalize scheme { scheme; userinfo; host; port; path; query; fragment } 1276 + 1277 + (** Make a URI from components *) 1278 + let make ?scheme ?userinfo ?host ?port ?path ?query ?fragment () = 1279 + let decode = function 1280 + |Some x -> Some (Pct.cast_decoded x) |None -> None in 1281 + let host = match userinfo, host, port with 1282 + | _, Some _, _ | None, None, None -> host 1283 + | Some _, None, _ | _, None, Some _ -> Some "" 1284 + in 1285 + let userinfo = match userinfo with 1286 + | None -> None | Some u -> Some (userinfo_of_encoded u) in 1287 + let path = match path with 1288 + |None -> [] | Some p -> 1289 + let path = path_of_encoded p in 1290 + match host, path with 1291 + | None, _ | Some _, "/"::_ | Some _, [] -> path 1292 + | Some _, _ -> "/"::path 1293 + in 1294 + let query = match query with 1295 + | None -> Query.KV [] 1296 + | Some p -> Query.KV p 1297 + in 1298 + let scheme_decoded = decode scheme in 1299 + normalize scheme_decoded 1300 + { scheme = scheme_decoded; userinfo; 1301 + host = 1302 + (match host with 1303 + | Some host -> Some (decode_host host) 1304 + | None -> None); 1305 + port; path; query; fragment=decode fragment } 1306 + 1307 + let with_host uri host = 1308 + { uri with 1309 + host = (match host with 1310 + | Some host -> Some (decode_host host) 1311 + | None -> None) 1312 + } 1313 + 1314 + let with_uri ?scheme ?userinfo ?host ?port ?path ?query ?fragment uri = 1315 + let with_path_opt u o = 1316 + match o with 1317 + | None -> with_path u "" 1318 + | Some p -> with_path u p 1319 + in 1320 + let with_query_opt u o = 1321 + match o with 1322 + | None -> with_query u [] 1323 + | Some q -> with_query u q 1324 + in 1325 + let with_ f o u = 1326 + match o with 1327 + | None -> u 1328 + | Some x -> f u x 1329 + in 1330 + with_ with_scheme scheme uri 1331 + |> with_ with_userinfo userinfo 1332 + |> with_ with_host host 1333 + |> with_ with_port port 1334 + |> with_ with_path_opt path 1335 + |> with_ with_query_opt query 1336 + |> with_ with_fragment fragment 1337 + 1338 + (** {1 Absolute HTTP URIs} *) 1339 + 1340 + module Absolute_http = struct 1341 + type uri = t 1342 + type t = 1343 + { scheme : [ `Http | `Https ]; 1344 + userinfo: Userinfo.t option; 1345 + host: [ `Ipv4_literal of string 1346 + | `Ipv6_literal of string 1347 + | `Host of Pct.decoded]; 1348 + port : int option; 1349 + path : Path.t; 1350 + query : Query.t; 1351 + fragment : Pct.decoded option 1352 + } 1353 + 1354 + let ( let* ) = Result.bind 1355 + 1356 + let to_uri { scheme; userinfo; host; port; path; query; fragment } = 1357 + let scheme = 1358 + match scheme with 1359 + | `Http -> Pct.cast_decoded "http" 1360 + | `Https -> Pct.cast_decoded "https" 1361 + in 1362 + ({ scheme = Some scheme; 1363 + userinfo; 1364 + host = Some host; 1365 + port; 1366 + path; 1367 + query; 1368 + fragment } : uri) 1369 + 1370 + let of_uri ({ scheme; userinfo; host; port; path; query; fragment }: uri) = 1371 + let* scheme = 1372 + match scheme with 1373 + | None -> Error (`Msg "No scheme present in URI") 1374 + | Some scheme -> 1375 + (match Pct.uncast_decoded scheme with 1376 + | "http" -> Ok `Http 1377 + | "https" -> Ok `Https 1378 + | unsupported_scheme -> 1379 + Error 1380 + (`Msg 1381 + (Printf.sprintf 1382 + "Only http and https URIs are supported. %s is invalid." 1383 + unsupported_scheme))) 1384 + in 1385 + let* host = Option.to_result ~none:(`Msg "host is required for HTTP(S) uris") host in 1386 + Ok { scheme; userinfo; host; port; path; query; fragment } 1387 + 1388 + let of_string s = match of_string s |> of_uri with 1389 + | Ok t -> t 1390 + | Error (`Msg error) -> failwith error 1391 + 1392 + let to_string ?pct_encoder t = to_uri t |> to_string ?pct_encoder 1393 + 1394 + let normalize t = 1395 + { t with 1396 + host = match t.host with 1397 + | (`Ipv4_literal host) -> 1398 + (`Ipv4_literal (String.lowercase_ascii host)) 1399 + | (`Ipv6_literal host) -> 1400 + (`Ipv6_literal (String.lowercase_ascii host)) 1401 + | (`Host host) -> 1402 + (`Host (Pct.cast_decoded (String.lowercase_ascii (Pct.uncast_decoded host)))) 1403 + } 1404 + 1405 + let make ~scheme ~host ?userinfo ?port ?path ?query ?fragment () = 1406 + let decode = function 1407 + |Some x -> Some (Pct.cast_decoded x) |None -> None in 1408 + let userinfo = match userinfo with 1409 + | None -> None | Some u -> Some (userinfo_of_encoded u) in 1410 + let path = match path with 1411 + |None -> [] | Some p -> 1412 + let path = path_of_encoded p in 1413 + match path with 1414 + | "/"::_ | [] -> path 1415 + | _ -> "/"::path 1416 + in 1417 + let query = match query with 1418 + | None -> Query.KV [] 1419 + | Some p -> Query.KV p 1420 + in 1421 + normalize 1422 + { scheme; 1423 + userinfo; 1424 + host = decode_host host; port; path; query; fragment=decode fragment } 1425 + 1426 + let host t = 1427 + match t.host with 1428 + | (`Ipv4_literal h | `Ipv6_literal h) -> h 1429 + | (`Host h) -> (Pct.uncast_decoded h) 1430 + 1431 + let scheme t = t.scheme 1432 + end
+301
lib/uri.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2012-2013 Anil Madhavapeddy <anil@recoil.org> 3 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 4 + 5 + Permission to use, copy, modify, and distribute this software for any 6 + purpose with or without fee is hereby granted, provided that the above 7 + copyright notice and this permission notice appear in all copies. 8 + 9 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16 + ---------------------------------------------------------------------------*) 17 + 18 + (** Uniform Resource Identifier handling that is RFC3986-compliant. 19 + 20 + This module provides URI parsing, construction, and manipulation 21 + using Eio.Buf_read combinators for parsing. 22 + 23 + {2 RFC 3986 Compliance} 24 + 25 + This implementation follows {{:https://tools.ietf.org/html/rfc3986}RFC 3986} 26 + for URI syntax, reference resolution, and normalization. *) 27 + 28 + (** A single URI that is a compact sequence of characters that identifies 29 + an abstract or physical resource. *) 30 + type t 31 + 32 + (** URI component types for percent-encoding customization. *) 33 + type component = [ 34 + | `Scheme 35 + | `Authority 36 + | `Userinfo (** subcomponent of authority in some schemes *) 37 + | `Host (** subcomponent of authority in some schemes *) 38 + | `Path 39 + | `Query 40 + | `Query_key 41 + | `Query_value 42 + | `Fragment 43 + | `Generic 44 + | `Custom of (component * string * string) (** (component * safe chars * unsafe chars) *) 45 + ] 46 + 47 + (** For pct encoding customization when converting a URI to a string. *) 48 + type pct_encoder 49 + 50 + (** {2 Core functionality } *) 51 + 52 + val empty : t 53 + (** The empty (zero length) URI reference. Useful for constructing 54 + URIs piece-by-piece. *) 55 + 56 + val compare : t -> t -> int 57 + (** Comparator ordering by host, scheme, port, userinfo, path, query, 58 + and finally fragment. Designed to produce a reasonable sort order. *) 59 + 60 + val equal : t -> t -> bool 61 + (** [equal a b] is [compare a b = 0]. *) 62 + 63 + val pct_encode : ?scheme:string -> ?component:component -> string -> string 64 + (** Percent-encode a string. The [component] argument defaults to `Path *) 65 + 66 + val pct_encoder : 67 + ?scheme:component -> 68 + ?userinfo:component -> 69 + ?host:component -> 70 + ?path:component -> 71 + ?query_key:component -> 72 + ?query_value:component -> 73 + ?fragment:component -> 74 + unit -> 75 + pct_encoder 76 + (** Construct a pct_encoder. *) 77 + 78 + val pct_decode : string -> string 79 + (** Percent-decode a percent-encoded string *) 80 + 81 + val of_string : string -> t 82 + (** Parse a URI string literal into a URI structure. A bare string will be 83 + interpreted as a path; a string prefixed with `//` will be interpreted as a 84 + host. *) 85 + 86 + val to_string : ?pct_encoder:pct_encoder -> t -> string 87 + (** Convert a URI structure into a percent-encoded URI string *) 88 + 89 + val resolve : string -> t -> t -> t 90 + (** [resolve scheme base uri] resolves [uri] against [base] URI using [scheme] 91 + as the default scheme. Per RFC 3986 Section 5. *) 92 + 93 + val canonicalize : t -> t 94 + (** Canonicalize a URI according to Sec 6.2.3 "Scheme-Based 95 + Normalization". This transform is more aggressive than the 96 + standard URI-generic normalization automatically done. In 97 + particular, HTTP(S) URIs with empty path components will have 98 + their path components set to "/". Some applications like web 99 + servers may rely on the distinction between a path-less and a 100 + root-path URI to distinguish request URIs (e.g. OPTIONS * vs 101 + OPTIONS /). 102 + 103 + @see <https://tools.ietf.org/html/rfc3986#section-6.2.3> RFC 3986.6.2.3 104 + *) 105 + 106 + val make : ?scheme:string -> ?userinfo:string -> ?host:string -> 107 + ?port:int -> ?path:string -> ?query:(string * string list) list -> 108 + ?fragment:string -> unit -> t 109 + (** Make a URI from supplied components. If userinfo or port are 110 + supplied without host, an empty host is added. If path is supplied 111 + and userinfo, host, or port is also supplied, path is made 112 + absolute but not resolved. *) 113 + 114 + val with_uri : ?scheme:string option -> ?userinfo:string option -> 115 + ?host:string option -> ?port:int option -> ?path:string option -> 116 + ?query:(string * string list) list option -> ?fragment:string option -> t -> t 117 + (** Functional update for a URI using the supplied components. If a component 118 + is unspecified then it will be unchanged. If a component is supplied as 119 + [None] then the component will be removed in the returned URI. If a 120 + component is supplied as [Some x] then [x] will be added if it does not 121 + exist in the source URI or replaced if it does exist. *) 122 + 123 + (** {2 Query functions } 124 + 125 + The query string API attempts to accommodate conventional query 126 + string representations (i.e. [?key0=value0&key1=value1]) while 127 + maximally exposing any meaning in those representations. For 128 + example, it is not necessarily the case that [/] and [/?] are 129 + equivalent to a web server. In the former case, we observe a zero 130 + query string whereas in the latter case, we observe a query string 131 + with a single key, [""] and a zero value. Compare this with [/?=] 132 + which has a single key and a single empty value, 133 + [""]. Additionally, some query functions return lists of values 134 + for a key. These list values are extracted from a {b single} key 135 + with a comma-separated value list. If a query string has multiple 136 + identical keys, you must use {! query} to retrieve the entirety of 137 + the structured query string. 138 + *) 139 + 140 + val query : t -> (string * string list) list 141 + (** Get a query string from a URI *) 142 + 143 + val verbatim_query : ?pct_encoder:pct_encoder -> t -> string option 144 + (** Get a verbatim query string from a URI. If the provenance of the 145 + URI is a string and its query component has not been updated, this 146 + is the literal query string as parsed. Otherwise, this is the 147 + composition of {!query} and {!encoded_of_query} *) 148 + 149 + val encoded_of_query : 150 + ?scheme:string -> 151 + ?pct_encoder:pct_encoder -> 152 + (string * string list) list -> 153 + string 154 + (** Make a percent-encoded query string from percent-decoded query tuple *) 155 + 156 + val query_of_encoded : string -> (string * string list) list 157 + (** Parse a percent-encoded query string into a percent-decoded query tuple *) 158 + 159 + val with_query : t -> (string * string list) list -> t 160 + (** Replace the query URI with the supplied list. 161 + Input URI is not modified *) 162 + 163 + val with_query' : t -> (string * string) list -> t 164 + (** Replace the query URI with the supplied singleton query list. 165 + Input URI is not modified *) 166 + 167 + val get_query_param' : t -> string -> string list option 168 + (** [get_query_param' q key] returns the list of values for the 169 + [key] parameter in query [q]. Note that an empty list is not the 170 + same as a [None] return value. *) 171 + 172 + val get_query_param: t -> string -> string option 173 + (** [get_query_param q key] returns the value found for a [key] in 174 + query [q]. If there are multiple values for the key, then the 175 + first one is returned. *) 176 + 177 + val add_query_param : t -> (string * string list) -> t 178 + (** Add a query parameter to the input query URI. 179 + Input URI is not modified *) 180 + 181 + val add_query_param' : t -> (string * string) -> t 182 + (** Add a query parameter to the input singleton query URI. 183 + Input URI is not modified *) 184 + 185 + val add_query_params : t -> (string * string list) list -> t 186 + (** Add a query parameter list to the input query URI. 187 + Input URI is not modified *) 188 + 189 + val add_query_params' : t -> (string * string) list -> t 190 + (** Add a query singleton parameter list to the input query URI. 191 + Input URI is not modified *) 192 + 193 + val remove_query_param : t -> string -> t 194 + (** Remove a query key from the input query URI. 195 + Input URI is not modified, and no error is generated if the 196 + key does not already exist in the URI. *) 197 + 198 + (** {2 Component getters and setters } *) 199 + 200 + val path : ?pct_encoder:pct_encoder -> t -> string 201 + (** Get the encoded path component of a URI *) 202 + 203 + val path_and_query : t -> string 204 + (** Get the encoded path and query components of a URI *) 205 + 206 + val with_path : t -> string -> t 207 + (** Replace the path URI with the supplied encoded path. 208 + If a host is present in the supplied URI, the path is made absolute but not 209 + resolved. If the path is empty, the path component is removed. 210 + Input URI is not modified *) 211 + 212 + val scheme : t -> string option 213 + (** Get the scheme component of a URI *) 214 + 215 + val with_scheme : t -> string option -> t 216 + (** Replace the scheme portion of the URI with the supplied [scheme]. 217 + Input URI is not modified *) 218 + 219 + val userinfo : ?pct_encoder:pct_encoder -> t -> string option 220 + (** Get the userinfo component of a URI *) 221 + 222 + val with_userinfo : t -> string option -> t 223 + (** Replace the userinfo portion of the URI with the supplied [string option]. 224 + If no host is present in the supplied URI, an empty host is added. 225 + Input URI is not modified. *) 226 + 227 + val user : t -> string option 228 + (** Get the username component of a URI *) 229 + 230 + val password : t -> string option 231 + (** Get the password component of a URI *) 232 + 233 + val with_password : t -> string option -> t 234 + (** Replace the password portion of the URI with the supplied [string option]. 235 + If no host is present in the supplied URI, an empty host is added. 236 + Input URI is not modified. *) 237 + 238 + val host : t -> string option 239 + (** Get the host component of a URI *) 240 + 241 + val with_host: t -> string option -> t 242 + (** Replace the host component of the URI. 243 + Input URI is not modified. *) 244 + 245 + val host_with_default: ?default:string -> t -> string 246 + (** Get the host component of a URI, with a default supplied if one is 247 + not present *) 248 + 249 + val port : t -> int option 250 + (** Get the port component of a URI *) 251 + 252 + val with_port : t -> int option -> t 253 + (** Replace the port component of the URI with the supplied port. 254 + If no host is present in the supplied URI, an empty host is added. 255 + Input URI is not modified. *) 256 + 257 + val fragment : t -> string option 258 + (** Get the fragment component of a URI *) 259 + 260 + val with_fragment : t -> string option -> t 261 + (** Replace the fragment component of a URI with the supplied fragment. 262 + Input URI is not modified *) 263 + 264 + (** {2 Formatters } *) 265 + 266 + val pp : Format.formatter -> t -> unit 267 + (** [pp ppf t] will output a human readable version of the Uri [t] 268 + to the formatter [ppf] *) 269 + 270 + val pp_hum : Format.formatter -> t -> unit 271 + (** [pp_hum] is now an alias for the {!pp} function. *) 272 + 273 + (** {2 Buf_read Parsers} 274 + 275 + These parsers use Eio.Buf_read for efficient streaming parsing. *) 276 + module Parser : sig 277 + val ipv6 : Eio.Buf_read.t -> string 278 + (** Parse an IPv6 address (without brackets). *) 279 + 280 + val uri_reference : Eio.Buf_read.t -> t 281 + (** Parse a complete URI reference. *) 282 + end 283 + 284 + (** Specializations for HTTP and HTTPS schemes as per RFC9110 *) 285 + module Absolute_http : sig 286 + type uri := t 287 + type t 288 + 289 + val of_uri : uri -> (t, [ `Msg of string ]) result 290 + val to_uri : t -> uri 291 + 292 + val of_string : string -> t 293 + val to_string : ?pct_encoder:pct_encoder -> t -> string 294 + 295 + val make : scheme:[ `Http | `Https ]-> host:string -> 296 + ?userinfo:string -> ?port:int -> ?path:string -> 297 + ?query:(string * string list) list -> ?fragment:string -> unit -> t 298 + 299 + val host : t -> string 300 + val scheme : t -> [`Http | `Https] 301 + end