Allocator implementation
This is the heart of the whole story.
We want to have some data structure that:
- encapsulates a raw pointer + size behind it
- keeps track of where we have stopped writing
- has interior mutability for the write pointer
- allows to allocate (via const ref) a chunk of
N
bytes and moves the write pointer
Also we need to be careful with alignment, we do support byte arrays that can reserve 2 ^ N
bytes, but most of our data structures have alignment 8 (because most of them encpsulate pointers of some kind).
Here to simplify things we can introduce a simple (but reasonable) requirement: the region of memory that we take in control must be 8-byte-aligned (i.e. it should be a *mut usize
), and there should be no way for us to access it as blob of bytes.
#![allow(unused)] fn main() { struct Blob<'b> { ptr: *mut usize, len: Cell<usize>, capacity: usize, _marker: core::marker::PhantomData<&'b ()>, } impl<'b> From<&'b mut [usize]> for Blob<'b> { fn from(slice: &'b mut [usize]) -> Self { Self { ptr: slice.as_mut_ptr(), len: Cell::new(0), capacity: slice.len(), _marker: core::marker::PhantomData, } } } }
The blob can only be constructed from a slice of usize
(e.g. from a stack allocated array [usize; N]
or from heap-allocated vec.as_mut_slice()
). Then allocations becomes much simpler:
#![allow(unused)] fn main() { use core::mem::{size_of, align_of}; fn write_ptr(&self) -> *mut usize { unsafe { self.ptr.add(self.len.get()) } } fn alloc_uninitialized<T>(&self) -> NonNull<T> { assert_eq!(size_of::<T>() % size_of::<usize>(), 0); assert_eq!(align_of::<T>(), size_of::<usize>()); let ptr = self.write_ptr(); let count = size_of::<T>() / size_of::<usize>(); self.assert_has_space_for_extra_words(count); self.len.set(self.len.get() + count); unsafe { NonNull::new_unchecked(ptr as *mut _) } } fn assert_has_space_for_extra_words(&self, required_words: usize) { let left = self.capacity - self.len.get(); assert!( required_words <= left, "OOM error: can't allocate {} words, only {} has left", required_words, left ); } }
So instead of operating on *const u8
we use *const usize
and thus we can always be sure that our pointer is properly aligned.
Bytes allocation (for ByteArray
) requires some rounding though:
#![allow(unused)] fn main() { pub fn push_bytes(&self, bytes: &[u8]) -> &'b [u8] { let len_in_words = bytes.len().next_multiple_of(size_of::<usize>()) / size_of::<usize>(); self.assert_has_space_for_extra_words(len_in_words); let ptr = self.write_ptr(); unsafe { core::ptr::copy_nonoverlapping::<u8>( bytes.as_ptr(), self.write_ptr().cast(), bytes.len(), ) }; self.len.set(self.len.get() + len_in_words); unsafe { core::slice::from_raw_parts(ptr.cast(), bytes.len()) } } }
So if we need to write 10 bytes we write 16. The pointer is 8-byte-aligned before and after writing.
Once we have these primitives we need to make decision on how our data structures must be initialized:
- we can enforce the underlying memory of the blob to be always zero-initialized (and
panic!
if it's not true), then those data structures that support zero initialization can be allocated as is, viablob.alloc_uninitialized::<T>()
- we can add an explicit constructor to each structure that allocates on a blob and initializes acquired region with the default state
I went with the second option, this way the array can be re-used between parser runs with no extra computations:
let mut mem = [0; 1000];
for file in files {
let blob = Blob::from(&mut mem);
parse(file.content, &blob);
}
However it requires some extra code in each struct, and Blob::alloc_uninitialized
must be explicitly marked as a private function so that no code accidentally calls it:
#![allow(unused)] fn main() { impl ArenaAllocatedStruct { // Having a dedicated method allows other structs // to "embed" it and still have ability to // initialize it with default fields. // // This could also be `impl Default` of course. fn new_in_place() -> Self { Self { // ... } } fn new_in(blob: &Blob<'b>) -> &'b Self { let this: &'b mut Self = blob.alloc_uninitialized(); *this = Self::new_in_place(); this } } }