AST caching
As I mentioned before it's guaranteed that produced AST has references only to other arena-allocated objects, which means that it's possible
- to write the content of arena straight on disk
- read it later and "re-interpret" back to AST
The code could look roughly like this:
fn main() { dump(); // This code below is meant to be executed later let data = read(); let node = load(&data); } // Parses code, dumps it to a temp file, returns nothing fn dump() { let code = b"10 + 20"; let mut mem = [0; 1000]; let base_ptr = &mem as *const usize; let blob = Blob::from(&mut mem); let ast: &Node = Parser::new(code, &blob).parse().ast.unwrap(); let mut f = File::create("/tmp/ast").unwrap(); let root_node_offset = unsafe { let node_ptr = ast as *const Node; let offset = node_ptr.byte_offset_from(base_ptr); assert!(offset > 0); offset }; f.write_all(&usize_to_bytes(root_node_offset as usize)).unwrap(); f.write_all(blob.data()).unwrap(); } fn read() -> Vec<u8> { let mut f = File::open("tmp/ast").unwrap(); let mut data = vec![]; f.read_to_end(&mut data).unwrap(); data } fn load(data: &[u8]) -> &Node { let root_node_offset = bytes_to_usize(*data.first_chunk().unwrap()); let data = &data[8..]; unsafe { data.as_ptr().add(root_node_offset).cast::<Node>().as_ref().unwrap() } } fn usize_to_bytes(n: usize) -> [u8; 8] { n.to_ne_bytes() } fn bytes_to_usize(bytes: [u8; 8]) -> usize { unsafe { core::mem::transmute(bytes) } }
This could be a huge advantage for static analysis tools, caching becomes just
- read
mtime
of the source file - check if cached AST is newer, read + return if so
- otherwise, parse it and write on disk
AST of multiple source files could probably be packed together in a single binary file (e.g. based on hierarchy of files, a bin file per directory)