Fast and robust atproto CAR file processing in rust

update timings etc

+29 -3
+29 -3
readme.md
··· 71 71 72 72 current car processing times (records processed into their length usize, phil's dev machine): 73 73 74 - - 128MiB CAR file: `350ms` 74 + - 450MiB CAR file (huge): `1.3s` 75 + - 128MiB (huge): `350ms` 75 76 - 5.0MiB: `6.8ms` 76 77 - 279KiB: `170us` 77 78 - 3.4KiB: `5.2us` ··· 85 86 static GLOBAL: MiMalloc = MiMalloc; 86 87 ``` 87 88 88 - - 128MiB CAR file: `310ms` (-13%) 89 + - 450MiB CAR file: `1.1s` (-15%) 90 + - 128MiB: `310ms` (-13%) 89 91 - 5.0MiB: `6.1ms` (-10%) 90 92 - 279KiB: `160us` (-5%) 91 93 - 3.4KiB: `5.7us` (-9%) 92 94 - empty: `660ns` (-7%) 93 95 96 + processing CARs requires buffering blocks, so it can consume a lot of memory. repo-stream's in-memory driver has minimal memory overhead, but there are two ways to make it work with less mem (you can do either or both!) 94 97 95 - running the huge-car benchmark 98 + 1. spill blocks to disk 99 + 2. inline block processing 100 + 101 + #### spill blocks to disk 102 + 103 + this is a little slower but can greatly reduce the memory used. there's nothing special you need to do for this. 104 + 105 + 106 + #### inline block processing 107 + 108 + if you don't need to store the complete records, you can have repo-stream try to optimistically apply a processing function to the raw blocks as they are streamed in. 109 + 110 + 111 + #### constrained mem perf comparison 112 + 113 + sketchy benchmark but hey. mimalloc is enabled, and the processing spills to disk. inline processing reduces entire records to 8 bytes (usize of the raw record block size): 114 + 115 + - 450MiB CAR file: `5.0s` (4.5x slowdown for disk) 116 + - 128MiB: `1.27s` (4.1x slowdown) 117 + 118 + fortunately, most CARs in the ATmosphere are very small, so for eg. backfill purposes, the vast majority of inputs will not face this slowdown. 119 + 120 + 121 + #### running the huge-car benchmark 96 122 97 123 - to avoid committing it to the repo, you have to pass it in through the env for now. 98 124