···11[package]
22name = "repo-stream"
33-version = "0.2.2"
33+version = "0.3.0"
44edition = "2024"
55license = "MIT OR Apache-2.0"
66-description = "A robust CAR file -> MST walker for atproto"
66+description = "Fast and robust atproto CAR file processing"
77repository = "https://tangled.org/@microcosm.blue/repo-stream"
8899[dependencies]
+11
changelog.md
···11+# v0.3.0
22+33+_2026-01-15_
44+55+- drop sqlite, pick up fjall v3 for some speeeeeeed (and code simplification and easier build requirements and)
66+- no more `Processable` trait, process functions are just `Vec<u8> -> Vec<u8>` now (bring your own ser/de). there's a potential small cost here where processors need to now actually go through serialization even for in-memory car walking, but i think zero-copy approaches (eg. rkyv) are low-cost enough
77+- custom deserialize for MST nodes that does as much depth calculation and rkey validation as - possible in-line. (not clear if it actually made anything faster)
88+- check MST depth at every node properly (previously it could do some walking before being able to check and included some assumptions)
99+- check MST for empty leaf nodes (which not allowed)
1010+- shave 0.6 nanoseconds (really) from MST depth calculation (don't ask)
1111+- drop and swap some dependencies: `bincode`, `futures`, `futures-core`, `ipld-core` -> `cid`, `multibase`, `rusqlite` -> `fjall`. and add `hashbrown` bc it benchmarked a bit faster. (we hash on user-controlled CIDs -- is the lower DOS-resistance a risk to worry about?)
+14-9
readme.md
···5858```
59596060more recent todo
6161+- [ ] add a zero-copy rkyv process function example
6162- [ ] repo car slices
6263- [ ] lazy-value stream (rkey -> CID diffing for tap-like `#sync` handling)
6364- [x] get an *emtpy* car for the test suite
6465- [x] implement a max size on disk limit
65666767+some ideas
6868+- [ ] since the disk k/v get/set interface is now so similar to HashMap (blocking, no transactions,), it's probably possible to make a single `Driver` and move the thread stuff from the disk one to generic helper functions. (might create async footguns though)
6969+- [ ] fork iroh-car into a sync version so we can drop tokio as a hard requirement, and offer async via wrapper helper things
7070+- [ ] feature-flag the sha2 crate for hmac-sha256? if someone wanted fewer deps?? then maybe make `hashbrown` also optional vs builtin hashmap?
66716772-----
6873···132137- [x] car file test fixtures & validation tests
133138- [x] make sure we can get the did and signature out for verification
134139 -> yeah the commit is returned from init
135135-- [ ] spec compliance todos
140140+- [x] spec compliance todos
136141 - [x] assert that keys are ordered and fail if not
137142 - [x] verify node mst depth from key (possibly pending [interop test fixes](https://github.com/bluesky-social/atproto-interop-tests/issues/5))
138138-- [ ] performance todos
143143+- [x] performance todos
139144 - [x] consume the serialized nodes into a mutable efficient format
140140- - [ ] maybe customize the deserialize impl to do that directly?
145145+ - [x] maybe customize the deserialize impl to do that directly?
141146 - [x] benchmark and profile
142142-- [ ] robustness todos
143143- - [ ] swap the blocks hashmap for a BlockStore trait that can be dumped to redb
144144- - [ ] maybe keep the redb function behind a feature flag?
145145- - [ ] can we assert a max size for node blocks?
147147+- [x] robustness todos
148148+ - [x] swap the blocks hashmap for a BlockStore trait that can be dumped to redb
149149+ - [x] maybe keep the redb function behind a feature flag?
150150+ - [ ] can we assert a max size of entries for node blocks?
146151 - [x] figure out why asserting the upper nibble of the fourth byte of a node fails fingerprinting
147152 -> because it's the upper 3 bytes, not upper 4 byte nibble, oops.
148148- - [ ] max mst depth (there is actually a hard limit but a malicious repo could do anything)
149149- - [ ] i don't *think* we need a max recursion depth for processing cbor contents since we leave records to the user to decode
153153+ - [x] max mst depth (to expensive to attack actually)
154154+ - [x] i don't *think* we need a max recursion depth for processing cbor contents since we leave records to the user to decode
150155151156newer ideas
152157