I tried multiple ways and found one that is very (VERY) controversial, but I think it deserves it’s own article.
Let’s say you have a library in C. Just for simplicity, Rust is not special here.
How can you use it in C++? A very simple solution is to wrap your header file with
#ifdef __cplusplus
extern "C"
{
#endif
// C bindings
#ifdef __cplusplus
}
#endif
then change your includes to be slightly more compatible with C++ (stdio.h
-> cstdio
etc) and call it a day.
How can you use it in Ruby/Node.js? Again, let’s start with “traditional” solution. For Ruby take your C library, for Node.js take C++ library and “wrap” it using a “native” extension.
It is possible to “attach” any arbitrary data to Ruby/Node.js objects
TypedData_Wrap_Struct
functionNapi::External
In both cases it’s possible to get pointer to attached data at any moment, both require you to specify a function that GC calls to free attached data.
Pros: still quite simple. Cons: really error-prone, libraries designed this way quite frequently have memory leaks and segfaults.
Of course it’d be unfair to not mention that it’s always possible to type-cast structures from low-level language to high-level:
A small note on copying: it is possible to “move” some data in certain cases and languages:
rb_str_new_*
functions, but literally all of them take const char *ptr, size_t len
, however, it’s possible to create a “dummy” heap-allocated Ruby String
and set ptr
and len
on it. It is possible only (and only) because internals of all Ruby objects (including String
) are fully exposed on C layer.Napi::String
class for that but it takes either a C++ const std::string&
or const char *
. From what I know internals are not available, so copying is necessary.std::string
that does not have a “take-and-store-what’s-given” constructor that takes a char *
(I would call it a “move” constructor, but this name has a different meaning in C++ :D). There’s a const char *
constructor that performs copying. Of course it’s possible to create a compiler-dependent type-casting to a class with the same set of fields, move pointer + len
+ capacity there and convert it back to std::string
. It’s ugly, but definitely doable.Let’s start. I’d like to demonstrate a different solution on a tiny example. Let’s write a micro-library that has several Rust structs, something like a function that, let’s say, takes a String
and returns a Vec
of all non-ASCII chars. It’s called foo
.
pub fn foo(s: &str) -> Vec<char> {
s.chars().filter(|c| !c.is_ascii()).collect()
}
#[test]
fn test_foo() {
let chars = foo("abc😋中国def");
assert_eq!(chars, vec!['😋', '中', '国']);
}
There are two structs that belong to Rust world exclusively: Vec
and char
.
Can we expose these two types in C? Vec
is defined in Rust standard library and it has #[repr(Rust)]
, char
is a 4-byte structure with no public layout.
We could define our own repr(C)
structs together with impl From<RustType> for CType
and call .into()
, but that’s not really the goal here.
Let’s think for a moment about C++, Ruby and Node:
std::vector
for Vec
and std::string
for char
.Array
for Vec
and String
for char
.What if our library could depend on some contract that requires bindings to provide primitives? The contract will be the same for all bindings, but implementation will be different.
Rust does not know what is C++ std::vector
or Ruby String
, but we know it, our bindings know it and by providing a set of foreign utility functions (implemented on the bindings side) we could work with it just like with native std::Vec<T>
.
Here “Primitives” will be:
“Functions to work with primitives” will be a set of extern "C"
functions that take and return these “foreign” objects. Rust can call them without any knowledge of these objects.
By swapping implementations at link time we can get the same algorithm that works with a different set of structures from different languages. “Link-time polymorphism” seems to be a good name for this concept.
I think it makes sense to start with C, this is what I would like to get eventually:
#ifndef STRUCTS_H
#define STRUCTS_H
#include <stddef.h>
#include <stdbool.h>
typedef struct Char
{
char bytes[4];
} Char;
typedef struct CharList
{
Char *ptr;
size_t len;
} CharList;
#endif // STRUCTS_H
Now the question is: how can we return this CharList
from our Rust code? We could use bindgen
and something-something. No, no and no.
Instead, let’s make Rust think that CharList
on its side is some struct of some (AOT-known) size without any meaningful fields. To do that we need to dump sizes of our structs and make them available in C:
#include <stdio.h>
#include "structs.h"
int main()
{
printf("CHAR_SIZE=%lu\n", sizeof(Char));
printf("CHAR_LIST_SIZE=%lu\n", sizeof(CharList));
return 0;
}
We compile it, we run it, and we save its output to a text file called sizes
, here’s what I have locally
CHAR_SIZE=4
CHAR_LIST_SIZE=16
Now our Rust library has to be changed to work in 2 different modes:
native
- when Rust structs are used as fields.external
- when only size of structs is known, but fields and their positions are not.We a new feature to our Cargo.toml
:
[features]
default = []
# enables "external" mode, when structs have only size but no fields
external = []
and we create a build script:
#[cfg(feature = "external")]
fn main() {
// read path of the `sizes` file from the ENV var
let sizes_filepath = env!("SIZES_FILEPATH");
// read `sizes` file
let sizes = std::fs::read_to_string(sizes_filepath)
.expect("SIZES_FILEPATH has to point to a file");
// parse it line by line and re-write to Rust
let sizes_rs = sizes
.lines()
.map(|line| {
let parts = line.split("=").collect::<Vec<_>>();
let name = parts[0];
let value = parts[1];
format!("pub(crate) const {}: usize = {};", name, value)
})
.collect::<Vec<_>>()
.join("\n");
// write it back to sizes.rs
std::fs::write("src/sizes.rs", sizes_rs).unwrap();
}
#[cfg(not(feature = "external"))]
fn main() {
// dummy main for "native" mode when no work is needed
}
Note: also there should be a rerun-if-changed
line, but for the sake of implicitly I ignore dependencies here.
After running SIZES_FILEPATH=path/to/sizes cargo build --features=external
we get src/sizes.rs
.
pub(crate) const CHAR_SIZE: usize = 4;
pub(crate) const CHAR_LIST_SIZE: usize = 16;
Time to use it! First we wrap existing definition of Char
and CharList
with
#[cfg(not(feature = "external"))]
mod native {
pub type Char = char;
pub type CharList = Vec<char>;
}
#[cfg(not(feature = "external"))]
use native::{Char, CharList};
(note that we use not(feature = "external")
above, which means “native” mode, they are opposite, and so one feature is enough) and then we add conditionally included external
module with all structs that we use:
#[cfg(feature = "external")]
mod sizes;
#[cfg(feature = "external")]
mod external {
use crate::sizes::*;
#[repr(C)]
pub struct Char {
bytes: [u8; CHAR_SIZE],
}
#[repr(C)]
pub struct CharList {
bytes: [u8; CHAR_LIST_SIZE],
}
}
#[cfg(feature = "external")]
use external::{Char, CharList};
Now if we try to return CharList
from our foo
function we get compilation errors:
Char
from char
collect()
CharList
from Iterator<char>
How can we implement it? Let’s take a look again at used APIs.
const *u8
in C API) - we have itc.is_ascii()
- that’s also OKchar
to foreign Char
- this requires a constructorCharList
from chars - this requires methods like CharList::new()
and CharList::push()
CharList::len()
, CharList::at()
, impl PartialEq for Char
(to compare individual chars) and impl std::fmt::Debug for Char
(to print left/right-hand side if assertions fails)So, here’s a list of “external” (i.e. in the mod external {}
block) methods we want to have:
impl From<char> for Char {
fn from(_: char) -> Self { todo!() }
}
impl PartialEq for Char {
fn eq(&self, other: &Self) -> bool { todo!() }
}
impl std::fmt::Debug for Char {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { todo!() }
}
impl CharList {
pub fn new() -> Self { todo!() }
pub fn push(&mut self, item: Char) { todo!() }
pub fn len(&self) -> usize { todo!() }
pub fn at(&self, idx: usize) -> &Char { todo!() }
}
For now all of them can return todo!()
, but we’ll implement them later. Time to rewrite our Rust implementation to use these new language-independent functionality:
pub fn foo(s: &str) -> CharList {
let mut char_list = CharList::new();
for char in s.chars() {
if !char.is_ascii() {
char_list.push(Char::from(char));
}
}
char_list
}
#[test]
fn test_foo() {
let s = "abc😋中国";
let chars = foo(s);
assert_eq!(chars.len(), 3);
assert_eq!(chars.at(0), &Char::from('😋'));
assert_eq!(chars.at(1), &Char::from('中'));
assert_eq!(chars.at(2), &Char::from('国'));
}
That was easy, right? Way less elegant but still easy.
At this point you might have a guess on what’s going to happen next. We are going to define a bunch of external C-ABI functions and blindly call them in all todo!()
places. Later these functions will be implemented by the bindings library on C side:
extern "C" {
fn char__new(c1: u8, c2: u8, c3: u8, c4: u8) -> Char;
fn char__at(this: *const Char, idx: u8) -> u8;
}
impl Char {
fn byte_at(&self, idx: u8) -> u8 {
debug_assert!(idx <= 3);
unsafe { char__at(self, idx) }
}
}
impl From<char> for Char {
fn from(c: char) -> Self {
let mut buf = [0; 4];
c.encode_utf8(&mut buf);
unsafe { char__new(buf[0], buf[1], buf[2], buf[3]) }
}
}
impl From<&Char> for char {
fn from(c: &Char) -> Self {
let c1 = c.byte_at(0);
let c2 = c.byte_at(1);
let c3 = c.byte_at(2);
let c4 = c.byte_at(3);
let bytes = [c1, c2, c3, c4];
let mut zero_idx = 4;
for idx in 0..4 {
if bytes[idx] == 0 {
zero_idx = idx;
break;
}
}
let bytes = &bytes[0..zero_idx];
let s = std::str::from_utf8(bytes).unwrap();
let chars = s.chars().collect::<Vec<_>>();
debug_assert!(chars.len() == 1);
chars[0]
}
}
impl PartialEq for Char {
fn eq(&self, other: &Self) -> bool {
(0..4).all(|idx| self.byte_at(idx) == other.byte_at(idx))
}
}
impl std::fmt::Debug for Char {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", char::from(self))
}
}
extern "C" {
fn char_list__new() -> CharList;
fn char_list__push(this: *mut CharList, item: Char);
fn char_list__len(this: *const CharList) -> usize;
fn char_list__at(this: *const CharList, idx: usize) -> *const Char;
}
impl CharList {
pub fn new() -> Self {
unsafe { char_list__new() }
}
pub fn push(&mut self, item: Char) {
unsafe { char_list__push(self, item) }
}
pub fn len(&self) -> usize {
unsafe { char_list__len(self) }
}
pub fn at(&self, idx: usize) -> &Char {
unsafe { char_list__at(self, idx).as_ref().unwrap() }
}
}
OK, now we have a contract. Rust relies on it, but C so far does not provide anything. If we try to run tests with --features=external
we get a bunch of linkage errors, which is 100% expected. Time to implement in on C side.
This is a “shared” version that we’ll use for C++/Ruby/Node.js bindings
#ifndef BINDINGS_H
#define BINDINGS_H
#include DEFINITIONS_FILE
#ifdef __cplusplus
extern "C"
{
#endif
Char_BLOB char__new(uint8_t c1, uint8_t c2, uint8_t c3, uint8_t c4);
const uint8_t *char__bytes(const Char_BLOB *self);
CharList_BLOB char_list__new();
void char_list__push(CharList_BLOB *self, Char_BLOB item);
size_t char_list__len(const CharList_BLOB *self);
Char_BLOB char_list__at(const CharList_BLOB *self, size_t idx);
#ifdef __cplusplus
}
#endif
#endif // BINDINGS_H
It includes DEFINITIONS_FILE
because we want it to be generic (and we’ll pass it as a dynamic define via -D
flag). Also you might notice that methods take/return <type>_BLOB
type, that’s because we want to pass C-compatible types. C types are C-compatible, and so we make another file bindings-support.h
with
#ifndef BINDINGS_SUPPORT_H
#define BINDINGS_SUPPORT_H
#include "structs.h"
typedef Char Char_BLOB;
typedef CharList CharList_BLOB;
#endif // BINDINGS_SUPPORT_H
… to create aliases. For C++ we’ll have to convert our classes to something C understands (I’ll cover it later).
Implementation time!
Char_BLOB char__new(uint8_t c1, uint8_t c2, uint8_t c3, uint8_t c4)
{
return (Char_BLOB){.bytes = {c1, c2, c3, c4}};
}
uint8_t char__at(const Char_BLOB *self, uint8_t idx)
{
return self->bytes[idx];
}
CharList_BLOB char_list__new()
{
return (CharList_BLOB){.ptr = NULL, .len = 0};
}
void char_list__push(CharList_BLOB *self, Char_BLOB item)
{
Char *prev = self->ptr;
self->ptr = malloc(sizeof(Char) * (self->len + 1));
if (self->len > 0)
{
memcpy(self->ptr, prev, self->len * sizeof(Char));
free(prev);
}
self->ptr[self->len] = item;
self->len = self->len + 1;
}
size_t char_list__len(const CharList_BLOB *self)
{
return self->len;
}
Char_BLOB char_list__at(const CharList_BLOB *self, size_t idx)
{
return self->ptr[idx];
}
We compile it
$ clang c-bindings/bindings.c -g -c -o c-bindings/all.o
$ ar rc c-bindings/libbindings.a c-bindings/all.o
And change our build.rs
script to link with it (the purpose of these environment variables is to make external primitives implementation “pluggable”):
let external_lib_path = env!("EXTERNAL_LIB_PATH");
println!("cargo:rustc-link-search={}", external_lib_path);
let external_lib_name = env!("EXTERNAL_LIB_NAME");
println!("cargo:rustc-link-lib=static={}", external_lib_name);
And now we can run tests:
$ cd rust-lib
$ EXTERNAL_LIB_PATH="../c-bindings" \
EXTERNAL_LIB_NAME="bindings" \
SIZES_FILEPATH="../c-bindings/sizes" \
cargo test --features=external
and we get 1 passing test!
I want this:
#ifndef STRUCTS_HPP
#define STRUCTS_HPP
#include <string>
#include <vector>
class Char
{
public:
char bytes[4];
Char() : Char(0, 0, 0, 0) {}
explicit Char(char c1, char c2, char c3, char c4) : bytes{c1, c2, c3, c4} {}
size_t size() const
{
size_t size = 2;
if (bytes[2])
size++;
if (bytes[3])
size++;
return size;
}
std::string as_string() const
{
std::string s;
s.reserve(4);
for (size_t i = 0; i < size(); i++)
{
s.push_back(bytes[i]);
}
return s;
}
};
using CharList = std::vector<Char>;
#endif // STRUCTS_HPP
C functions are the same, but these classes are incompatible with C FFI, so we need to define our blob structs:
#ifndef BINDINGS_SUPPORT_HPP
#define BINDINGS_SUPPORT_HPP
#include "structs.hpp"
#define DECLARE_BLOB(T) \
extern "C" \
{ \
struct T##_BLOB \
{ \
uint8_t bytes[sizeof(T)]; \
}; \
} \
union T##_UNION \
{ \
T value; \
T##_BLOB blob; \
\
~T##_UNION() {} \
T##_UNION() \
{ \
new (&value) T(); \
} \
}; \
T##_BLOB PACK_##T(T value) \
{ \
T##_UNION u; \
u.value = std::move(value); \
return u.blob; \
}; \
T UNPACK_##T(T##_BLOB blob) \
{ \
T##_UNION u; \
u.blob = blob; \
return std::move(u.value); \
}
DECLARE_BLOB(Char);
DECLARE_BLOB(CharList);
#endif // BINDINGS_SUPPORT_HPP
Here DECLARE_BLOB
macro for given Type
defines:
Type_BLOB
struct that is C-compatibleType_UNION
union that is a union of Type
and Type_BLOB
PACK_Type
function that converts an instance of Type
to Type_BLOB
so we can return if from extern "C"
functionUNPACK_Type
function that converts Type_BLOB
to Type
so we can “unpack” blob that is passed to extern "C"
functionThis conversion with union
is an equivalent of std::mem::transmute
from Rust (C++20 has std::bit_cast
for that, but union
shows better what happens under the hood). Also we std::move
the value to and from union
on conversion, this is important (otherwise, copyable types are copied).
However, it has a requirement for T
to be both movable and constructible with no arguments. We could do std::memset(this, 0, sizeof(T))
, but some move constructors/assignment operators swap fields of this
and given other
, and so sometimes it’s invalid to call a destructor on an object full of zeroes (of course it totally depends on the structure of the object).
With these blobs implementing binding functions is trivial:
extern "C"
{
Char_BLOB char__new(uint8_t c1, uint8_t c2, uint8_t c3, uint8_t c4) noexcept
{
return PACK_Char(Char(c1, c2, c3, c4));
}
uint8_t char__at(const Char_BLOB *self, uint8_t idx) noexcept
{
const Char *s = (const Char *)self;
if (idx >= 4)
{
return 0;
}
return s->bytes[idx];
}
CharList_BLOB char_list__new() noexcept
{
return PACK_CharList(CharList());
}
void char_list__push(CharList_BLOB *self, Char_BLOB item) noexcept
{
((CharList *)self)->push_back(UNPACK_Char(item));
}
size_t char_list__len(const CharList_BLOB *self) noexcept
{
return ((CharList *)self)->size();
}
Char_BLOB char_list__at(const CharList_BLOB *self, size_t idx) noexcept
{
return PACK_Char(((CharList *)self)->at(idx));
}
}
We compile and build a static library:
$ clang++ -std=c++17 cpp-bindings/bindings.cpp -g -c -fPIE -o cpp-bindings/all.o
$ ar rc cpp-bindings/libbindings.a cpp-bindings/all.o
One extra step that is specific to C++ libraries - we need to link with C++ runtime, so this extra code goes to Cargo.toml
and build.rs
respectively:
[features]
# enables linking with C++ runtime
link-with-cxx-runtime = []
if cfg!(feature = "link-with-cxx-runtime") {
if cfg!(target_os = "linux") {
println!("cargo:rustc-link-lib=dylib=stdc++");
} else {
println!("cargo:rustc-link-lib=dylib=c++");
}
}
And finally we can run our tests:
EXTERNAL_LIB_PATH="../cpp-bindings" \
EXTERNAL_LIB_NAME="bindings" \
SIZES_FILEPATH="../cpp-bindings/sizes" \
cargo test --features=external,link-with-cxx-runtime
We want our Char
to be a Ruby String
and CharList
to be an Array
, in Ruby C API they are both represented as VALUE
(that is technically a pointer to a tagged union (unless it’s a small number/true
/false
/nil
, then it’s basically the value itself)).
#ifndef STRUCTS_H
#define STRUCTS_H
#include <ruby.h>
typedef VALUE Char;
typedef VALUE CharList;
#endif // STRUCTS_H
and so sizes are these:
CHAR_SIZE=8
CHAR_LIST_SIZE=8
VALUE
is simply an alias to unsigned long
on x86_64, so blobs are aliases:
#ifndef BINDINGS_SUPPORT_H
#define BINDINGS_SUPPORT_H
#include "structs.h"
typedef Char Char_BLOB;
typedef CharList CharList_BLOB;
#endif // BINDINGS_SUPPORT_H
Bindings implementation (uses Ruby C API):
#include "bindings-support.h"
Char_BLOB char__new(uint8_t c1, uint8_t c2, uint8_t c3, uint8_t c4)
{
char c[4] = {(char)c1, (char)c2, (char)c3, (char)c4};
long len = 2;
if (c3) len++;
if (c4) len++;
return rb_utf8_str_new(c, len);
}
uint8_t char__at(const Char_BLOB *self, uint8_t idx)
{
VALUE this = *self;
return StringValuePtr(this)[idx];
}
CharList_BLOB char_list__new()
{
return rb_ary_new();
}
void char_list__push(CharList_BLOB *self, Char_BLOB item)
{
rb_ary_push(*self, item);
}
size_t char_list__len(const CharList_BLOB *self)
{
return rb_array_len(*self);
}
Char_BLOB char_list__at(const CharList_BLOB *self, size_t idx)
{
return rb_ary_entry(*self, idx);
}
Looks similar to C++ bindings, right? OK, can we run Rust tests with Ruby primitives? Unfortunately, no. These methods expect Ruby VM to be up and running and embedding Ruby is not something many Ruby developers do.
Instead, we need to re-compile it to a dynamically-loaded library that (once loaded) registers a foo
method that takes a string, passes its const char *
pointer to our foo
function defined in Rust and returns CharList
back to Ruby space.
#include <ruby.h>
#include "structs.h"
CharList c_foo(const char *s);
VALUE rb_foo(VALUE self, VALUE s)
{
(void)self;
Check_Type(s, T_STRING);
CharList chars = c_foo(StringValueCStr(s));
return chars;
}
void Init_foo()
{
rb_define_global_function("foo", rb_foo, 1);
}
c_foo
is defined on Rust side with C linkage:
#[no_mangle]
pub extern "C" fn c_foo(s: *const i8) -> CharList {
let s = unsafe { std::ffi::CStr::from_ptr(s).to_str().unwrap() };
foo(s)
}
And now if we compile it to .bundle
(this is for Mac, on Linux and Windows it’s a .so
extension)
$ clang -Ipath/to/ruby/includes ruby-bindings/init.c -c -o ruby-bindings/init.o
$ clang -Ipath/to/ruby/includes ruby-bindings/bindings.c -c -o ruby-bindings/bindings.o
$ ar rc ruby-bindings/libbindings.a ruby-bindings/bindings.o
$ cd rust-lib
$ EXTERNAL_LIB_PATH="../ruby-bindings" \
EXTERNAL_LIB_NAME=bindings \
SIZES_FILEPATH="../ruby-bindings/sizes" \
cargo build --features=external
$ cd ..
$ clang \
-dynamic \
-bundle \
-o ruby-bindings/foo.bundle \
ruby-bindings/init.o rust-foo/librust-foo-rust.a \
-Wl,-undefined,dynamic_lookup
.. we get ruby-bindings/foo.bundle
that can be imported from Ruby:
$ ruby -r./ruby-bindings/foo -e 'p foo("abc😋中国")'
["😋", "中", "国"]
In fact there are many more options that are passed to clang
above, check out the repository if you want to try it yourself.
All versions above have a memory leak that can be easily identified by compiling with
ASAN_OPTIONS=detect_leaks=1 CXXFLAGS="-fsanitize=address" clang++
or (to track it when running Rust tests)
RUSTFLAGS="-Z sanitizer=address" ... cargo test
(On Mac make sure to use clang
from Homebrew, version that ships with OS does not support ASAN)
Running tests with this options shows that we have a leak somewhere in
Direct leak of 96 byte(s) in 1 object(s) allocated from:
...
#9 0x105905990 in char_list__push bindings.cpp:34
#10 0x105904c04 in rust_foo::external::CharList::push::hf5bdccdb764b0c62 lib.rs:91
#11 0x1059043bb in rust_foo::foo::hf9387b4435c6f8b6 lib.rs:108
#12 0x10590442d in c_foo lib.rs:117
#13 0x1059006e8 in cpp_foo(char const*) test.cpp:14
#14 0x105900a10 in main test.cpp:24
And the reason is that our CharList
is heap-allocated, but it has no destructor on the Rust side. To fix it we need to add 2 more functions to our bindings:
extern "C" {
fn char__drop(this: *mut Char);
fn char_list__drop(this: *mut CharList);
}
impl Drop for Char {
fn drop(&mut self) {
unsafe { char__drop(self) }
}
}
impl Drop for CharList {
fn drop(&mut self) {
unsafe { char_list__drop(self) }
}
}
Of course, we need to add it to bindings.h
:
#ifdef __cplusplus
extern "C"
{
#endif
void char__drop(Char_BLOB *self);
void char_list__drop(CharList_BLOB *self);
#ifdef __cplusplus
}
#endif
Here’s a C implementation (depending on your implementation char__drop
could to something):
void char__drop(Char_BLOB *self)
{
// noop, Char has no allocations
}
void char_list__drop(CharList_BLOB *self)
{
if (self->len > 0)
{
free(self->ptr);
}
}
C++ implementation:
void char__drop(Char_BLOB *self)
{
// noop, Char has no allocations
}
void char_list__drop(CharList_BLOB *self)
{
((CharList *)self)->~vector();
}
Ruby:
void char__drop(Char_BLOB *self)
{
// noop, Ruby has GC
}
void char_list__drop(CharList_BLOB *self)
{
// noop, Ruby has GC
}
How about performance? When we compile with Rust primitives we use LTO and so things from Rust standard library can be optimized together with our code. Luckily, there’s a way to do that for external primitives too.
I would like to demonstrate it on the low level, first let’s compile everything to LLVM IR:
$ clang-13 -S -emit-llvm c-bindings/bindings.c -o c-bindings/bindings.ll
$ grep -F "char__" c-bindings/bindings.ll
define dso_local i32 @char__new(i8 zeroext %0, i8 zeroext %1, i8 zeroext %2, i8 zeroext %3) #0 {
define dso_local zeroext i8 @char__at(%struct.Char* %0, i8 zeroext %1) #0 {
define dso_local void @char__drop(%struct.Char* %0) #0 {
^ that was the file with bindings implementation, C part. It defines all char__
functions.
$ cd rust-foo
$ rustc --crate-name rust_foo src/lib.rs --crate-type staticlib --emit=llvm-ir --cfg 'feature="default"' --cfg 'feature="external"'
$ cd ..
$ grep -F "char__" rust-foo/rust_foo.ll
%8 = call i32 @char__new(i8 zeroext %_8, i8 zeroext %_10, i8 zeroext %_12, i8 zeroext %_14)
declare i32 @char__new(i8 zeroext, i8 zeroext, i8 zeroext, i8 zeroext) unnamed_addr #1
^ that was the Rust part. It declares (as external) and calls some of our char__
functions.
Let’s link them together:
$ llvm-link-13 c-bindings/bindings.ll rust-foo/rust_foo.ll -S -o merged.ll
$ grep -F "char__" merged.ll
define dso_local i32 @char__new(i8 zeroext %0, i8 zeroext %1, i8 zeroext %2, i8 zeroext %3) #0 {
define dso_local zeroext i8 @char__at(%struct.Char* %0, i8 zeroext %1) #0 {
define dso_local void @char__drop(%struct.Char* %0) #0 {
%8 = call i32 @char__new(i8 zeroext %_8, i8 zeroext %_10, i8 zeroext %_12, i8 zeroext %_14)
These 2 modules have been merged, time to optimize them together:
$ opt-13 -O3 merged.ll -S -o optimized.ll
$ grep -F "char__" optimized.ll
define dso_local i32 @char__new(i8 zeroext %0, i8 zeroext %1, i8 zeroext %2, i8 zeroext %3) local_unnamed_addr #0 {
define dso_local zeroext i8 @char__at(%struct.Char* nocapture readonly %0, i8 zeroext %1) local_unnamed_addr #1 {
define dso_local void @char__drop(%struct.Char* nocapture %0) local_unnamed_addr #0 {
Now we have only definitions, actual calls have been successfully inlined. Now we can compile it to an object file to see a final result:
$ llc-13 -O3 -filetype=obj optimized.ll -o optimized.o
$ clang-13 test.c optimized.o -O3 -o test-runner
$ objdump -D test-runner | grep "call" | grep "char__"
# no output
$ objdump -D test-runner | grep "call" | grep "char_list"
406bba: e8 01 01 00 00 callq 406cc0 <char_list__drop>
406d21: e8 fa fe ff ff callq 406c20 <char_list__new>
As you can see all char__new
or char__at
calls have been inlined, char_list__drop
and char_list__new
haven’t, because of how LLVM decides on what should or should not be inlined. Anyway, it works.
Of course, there’s an easier way to get the same result:
$ export RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld"
$ export CFLAGS="-flto"
^ this should be enough to get the same result. By adding -Clinker-plugin-lto
we ask Rust to compile all object files to LLVM IR:
$ RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang-13 -Clink-arg=-fuse-ld=lld" cargo build --features=external
$ mkdir objects
$ cd objects
$ ar x ../target/release/librust_foo.a
$ ls -l
# a ton of object files, that's what static library is about
$ file *.o
... snip ...
popcountti2.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
powixf2.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
rust_foo-0f1418f7365bf15b.2nf5cb5qlud0f6qs.rcgu.o: LLVM IR bitcode
rust_foo-0f1418f7365bf15b.rust_foo.f999abab-cgu.0.rcgu.o: LLVM IR bitcode
rust_foo-0f1418f7365bf15b.rust_foo.f999abab-cgu.1.rcgu.o: LLVM IR bitcode
rust_foo-0f1418f7365bf15b.rust_foo.f999abab-cgu.2.rcgu.o: LLVM IR bitcode
rust_foo-0f1418f7365bf15b.rust_foo.f999abab-cgu.3.rcgu.o: LLVM IR bitcode
rustc_demangle-7f98f837d3579544.rustc_demangle.5563b4d3-cgu.0.rcgu.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
... snip ...
Rust standard library is compiled directly to object files, but our code is LLVM IR bitcode
. This way we can get zero-cost bindings defined externally.
Of course, it’s not gonna work with Ruby or Node.js, both of them have a giant libruby.so
(or libnode.so
) that defines all functions and constants that your extension relies on. The extension itself is compiled with -Wl,-undefined,dynamic_lookup
and symbol lookup is performed at runtime. I feel like technically it’s possible, the entire Ruby/Node.js runtime could be compiled to a static libruby.a
/libnode.a
that defines all VM objects as external (because that’s a singleton that we need to hook into), but all functions can provide their implementation (of course, in LLVM IR format), and so they can be inlined into our bindings implementation. I haven’t experimented with it yet, and honestly I’m not going to :) If you know anything about existing discussions around it, please, ping me on Twitter.
Demo repository is available here
To sum up:
Pros:
Cons:
std::string
from C++ does not fully match the contract (i.e. you can’t borrow a char *
from std::string
and expect it to live as long as the string lives). In some cases it can become really hard to track and fix.So, I’m ready to announce that I have finished working on a new Ruby parser. It’s called lib-ruby-parser
.
Key features:
CSend
node that represents “conditional send” like foo&.bar
. Here’s a list of all defined nodes. Both Ripper and RubyVM::AST
have no documentation of their AST format. whitequark/parser
has a great documentation, but its AST is not “static”.whitequark/parser
does a great job, but the nature of dynamic language does not allow it to provide such guarantees. I’ll show a few examples later.whitequark/parser
, its lexer (or tokenizer if it sounds better for you) is based on MRI’s parse.y
. What does it mean? It means that I was not able to find any difference in tokenizing on 3 million lines of code that I have got by pulling sources of top 300 gems (by total downloads). I’ll mention how I track it soon.Current performance (in release mode, with jemalloc
) is ~200000 LOC/s. I think it can even be used for syntax highlighting (and in the browser, too, haha).
I don’t want to dig too far, but some notes could be interesting.
Rust is a general purpose language that is based on LLVM (<3) and can be compiled directly into machine code. It does not need any VMs and it can be compiled to a ton of targets (or platforms). The code does not use pointers, and there are no unsafe
calls that you might hear about.
Rust does support ADT (algebraic data type) and it has generics, so you can build data structures like
enum Tree<T> {
Empty,
Some(T, Box<Tree<T>>, Box<Tree<T>>),
}
Beautiful, right?
I designed Node
struct in a very similar way:
enum Node {
Variant1(Variant1),
Variant2(Variant2),
Variant3(Variant3),
// ...
}
struct Variant1 {
// fields
}
struct Variant2 {
// fields
}
// ...
If you are familiar with C/C++ it might look similar to tagged union, and was I know that’s exactly how they are implemented. Close equivalent in C:
struct Node
{
enum {
VARIANT1,
VARIANT2,
} variant_no;
union {
struct {
// variant1 data ...
} variant1_data;
struct {
// variant2 data ...
} variant2_data;
} variant_value;
};
As I said, my lexer is based on MRI’s parse.y
. It’s just a set of procedures that I turned into a few structs and interfaces.
But as some of you know, MRI’s parser is compiled using bison
(or yacc
if you care about licenses). Bison is an LALR(1) parser generator, short summary:
.y
file using bison’s DSL, with some interpolations in your programming language.{ext}
file where ext
is what your language uses. doneUnfortunately, bison supports only C/C++/Java/D.
First I looked at what’s available in the world of Rust. The most popular LALR parser generator is called lalrpop
and I was very about it at the very beginning, I think has a very, very beautiful API:
pub Term: i32 = {
<n:Num> => n,
"(" <t:Term> ")" => t,
};
Num: i32 = <s:r"[0-9]+"> => i32::from_str(s).unwrap();
Plus, it’s written in Rust, so to compile a grammar that is based on it you don’t anything except Rust (that you need anyway to compile something written in Rust, right?).
Unfortunately, I have got a few reasons to abandon this idea:
No mid-rules (that are used A LOT in MRI) like
foo: bar { /* do something */ } baz { /* reduce */ }
I guess it’s possible to emulate them by introducing rules like
bar_with_mid_rule: bar { /* do something */ }
foo: bar_with_mid_rule baz { /* reduce */ }
but then I have no idea how such grammar can be maintained.
ruby -ye '42'
locallylalrpop
has a different format comparing to bison, and so maintaining it (like backporting new changed from MRI) seems to be a nightmare.But then I realized that Bison has a feature called “custom skeleton”. It’s like a custom Bison template that you can use to convert .y
to your format, and it “takes” all the data (like token/transition tables) as an argument when called.
So I wrote my own skeleton for Rust and wrapped it into a Rust library. It uses m4 format that is a macro language. Here’s the main file and an example of how it can be used.
And then it took me about a week to backport the entire parser. The stack of it is a wrapper around Vec<Value>
where Value
is an enum type:
enum Value {
Stolen,
Uninitialized,
None,
Token(Token),
TokenList(TokenList),
Node(Node),
NodeList(NodeList),
Bool(bool),
MaybeStrTerm(MaybeStrTerm),
Num(Num),
/* For custom superclass rule */
Superclass(Superclass),
// ... variants for other custom rules, there are ~25 of them
}
Initially result of each “reduce” action (that is stored in yyval
variable) is set to Value::Uninitialized
, reading $<Node>5
in your action is compiled into something like
match yystack.steal_value_at(5) {
Value::Node(node) => node,
other => panic!("not a node: {:?}", other)
}
Doing $$ = ...
is compiled into yyval = ...
.
Why does reading “steal” the value from the stack? Because you can do $$ = $<Node>1
and push an element of the vector to the same vector. At the same time you can do something like $$ = combine($<Node>1, $<Node>1)
where you want both arguments to be mutable. You can’t do it in Rust.
This is why when you read any stack value you actually steal it by replacing what’s in the stack with Value::Stolen
:
impl Stack {
// ...
fn steal_value_at(&mut self, i: usize) -> Value {
let len = self.stack.len();
std::mem::replace(&mut self.stack[len - 1 - i], Value::Stolen)
}
}
Value::Stolen
is just a placeholder value that indicates (when you see it) that your code previously has accessed the same stack entry. It’s necessary to have it (or in general some kind of a default value that is set by std::mem::take/replace
) to “fix” ownership model.
So then it was done and I started profiling it. At the very beginning it was incredibly slow, but I knew it, I had way too many .clone()
calls in my code (in Rust that’s a deep clone that is quite expensive in some cases). I added jemalloc
and started profiling (pprof-rs
<3), I removed most clones and got ~160-170 thousand LOC/s.
Great, but I wanted more. ~20% of time in my benchmark was spent on this std::mem::replace
call that swaps non-overlapping bytes. Initially I thought that I can’t improve it (that’s the fastest way to take the value AND to put a placeholder instead of it). At some point when I was writing C++ bindings I noticed that sizeof(Node)
is 120 bytes (Node
here is a C++ Node
class) and it literally opened my eyes.
I’ll write it in C, take a look at this structure:
struct Node
{
enum { /* variants */ } variant;
union {
struct Args args;
struct Def def;
struct IfTernary if_ternary,
// .. all other variant values
} variant_value;
};
struct IfTernary
{
Node *cond;
Node *if_true;
Node *if_false;
// .. a few more Range field, snip
};
(Let’s pretend that it can be compiled without forward declarations). This C Node
is very, very similar to its Rust analogue. What’s the size of this Node
?
The size of struct
is a sum of all of its nodes (let’s simplify it and forget about memory alignment), the size of union
is a maximum values its variant sizes.
Some specific node structures have multiple fields inside that are pointers (8 bytes on x86-64), and so the size of the generic Node
is huge too.
Let’s “swap” pointers and unions:
struct Node
{
enum { /* variants */ } variant;
union {
struct Args *args;
struct Def *def;
struct IfTernary (if_ternary,
// .. all other variant values
} variant_value;
};
struct IfTernary
{
Node cond;
Node if_true;
Node if_false;
// .. a few more Range field, snip
};
See? Now the size of the union
is always sizeof(void*)
and so Node
is much smaller.
Why does it matter? Because Vec<T>
in Rust is a wrapper for *T
array. It’s a contiguous and “flat” region of memory where all elements are “inlined”:
|item1|item2|item3|item4|...
and every item takes sizeof(T)
memory. This is why doing std::mem::replace(t1, t2)
can swap at most sizeof(T)
bytes and this is why I want T
to be as small as possible.
After turning my Rust model into
enum Node {
Args(Box<Args>),
IfTernary(Box<IfTernary>),
/// ...
}
struct IfTernary {
cond: Node,
if_true: Node,
if_false: Node,
// ...
}
// other variants
I have got the same performance as Ripper.
I keep thinking about turning lib-ruby-parser
into zero copy parser and I’m believe it’s very possible.
Currently all “values” that copy from source code (like numeric/string literals, token values) are copied into tokens and AST nodes:
"42"
tokens:
Token { name: "tINTEGER", value: "42", loc: 0...2 }
nodes:
Int { value: "42", expression_l: 0..2 }
lib-ruby-parser
constructs these "42"
strings twice by copying a part of the input. Of course, for this particular case it’s possible to store only ranges like start...end
, but there are exceptions where values of tokens and AST nodes are not equal to input sub-strings (like escape sequences, "\n"
or "\u1234"
).
Even this way it’s possible to introduce the following type system:
enum Value<'a> {
SubstringOfInput(&'a [u8]),
ManuallyConstructed(Vec<u8>)
}
The first variant is a reference, the latter is owned. Total value (of both token and AST node) could be just a vector of such enums, and if you parse a string "foo\n"
you’ll get
tokens:
Token {
name: "tSTRING_CONTENT",
value: vec![
Value::SubstringOfInput(&input[1..3]),
Value::ManuallyConstructed(vec![b'\n'])
],
loc: 1..5
}
However, then input
must live as long as tokens and AST, and it sounds a bit problematic.
One option that I see is adding Rc<Input>
to such values and store a range in SubstringOfInput
enum variant. That’s basically a shared_ptr
from C++ world that wraps a raw pointer (like T*
) + (pointer to) a number of existing “clones” of this pointer. Every time you copy it the shared number is incremented, destructor decreases it and once it’s zero it also deletes T
. It’s quite cheap in terms of performance (something like *n += 1
in constructor, *n -= 1; drop(ptr) if n == 0;
in destructor)
GitHub repository - https://github.com/lib-ruby-parser/c-bindings
It took me a while to fix all segfaults, memory leaks and invalid memory access patterns (I’m not a C developer). Valgrind
and asan
are incredibly useful tools, I can’t even imagine how much time it would take to write bindings without these guys.
The API is very similar, there’s an additional layer between C and Rust that converts output of lib-ruby-parser
into C structures.
It uses a combination of enum
and union
to represent a Node
.
https://github.com/lib-ruby-parser/cpp-bindings
I personally like C++ much more than C. Smart pointers, containers, generics, but still worse for me than Rust. Upcoming standards are going to introduce even more features like modules and concepts, but, no.
The same story again, valgrind
/asan
, an extra layer that converts Rust
objects to C++
classes.
Also, my valgrind
on Mac could not detect calling free
on C++
object (that’s invalid, should be delete
from C++), and so I had to setup a docker container locally to find and fix it.
It uses modern std::variant<Nodes...>
to represent a Node
.
As a proof of concept I also created bindings for Node.js - https://github.com/lib-ruby-parser/node-bindings.
I was actually impressed (in a good way) how elegant is the API of node-addon-api
. I worked with V8 C++ API about a 10 months ago in Electron and back in the day it was quite verbose and painful. Also, I remember an interesting feature of TryCatch
class:
void DoSomething() {
TryCatch trycatch;
// do something that potentially does `throw err`
if (trycatch.HasCaught()) {
// process trycatch.Exception() value
}
}
I personally think that this class abuses constructors/destructors in C++:
TryCatch
registers this
in some global context (like GetCurrentHandleScope().SetThrowHandler(this)
TryCatch
handlerTryCatch
drops itI admit that it’s smart, but it sounds way too implicit to me. I’d like this interface to perform a register in a more explicit way (by calling register/unregsiter
for example).
And I like that node-addon-api
handles it even better by sticking to C++ exceptions.
Node.js bindings are based on C++ bindings from the previous section and they use a custom JavaScript class for each node type. Yes, there’s also an extra layer that converts C++ objects to JavaScript.
Rust can be compiled to WebAssembly, here’s a demo - https://lib-ruby-parser.github.io/wasm-bindings.
It worked out of the box with one minor change. I had to mark onigurama
dependency as optional and disable it for WASM build. Oh, I love how Rust can turn dependency into a feature that is optional and enabled/disabled by default as you configure it.
This is one of the biggest open-source projects that I have ever made. It can be used from Rust/C/C++/Node.js/browser. It’s fast (but remember, it can get even faster), it’s precise and it’s very strongly typed.
Also, if you need a custom setup (like custom C++ classes or maybe a completely different API) there’s a meta-repository with all nodes information. You can use it to build your own bindings (or maybe new bindings for other languages), it has a nodes
function that returns a Vec<Node>
(Node
here is just a “configuration” of a single Node
variant from Rust - source)
Bindings still have a room for improvements (for example, there’s no way to pass custom ParserOptions
to the parser), and current version of lib-ruby-parser
is 0.7.0
.
However, I think it’s ready. It’s stable and I’m not going to introduce any major changes anymore. I’m going to cut 3.0.0
once Ruby 3 is released. Don’t hesitate to drop me a message if it works well for you.
This article is about instruction sequences and evaluating them using pure Ruby.
The repository is available here.
Is it a Ruby implementation?
No. It’s just a runner of instructions. It is similar to MRI’s virtual machine, but it lacks many features and it’s 100 times slower.
Can I use it in my applications?
Of course, no. Well, if you want.
Does it work at all?
Yes, and it even passes most language specs from RubySpec test suite.
Well, I think I should start with explaining basics. There is a plenty of articles about it, so I’ll be short:
parse.y
)compile.c
)insn.def
, vm_eval.c
, vm_insnhelper.c
)Long time ago there was no YARV and Ruby used to evaluate AST.
1+2
is just a small syntax tree with a send
node in the root and two children: int(1)
and int(2)
. To evaluate it you need to “traverse” it by walking down recursively. Primitive nodes are getting substituted with values (integers 1 and 2 respectively), send
node calls +
on pre-reduced children and returns 3
.
On the other side, YARV is a stack-based Virtual Machine (VM) that evaluates an assembly-like language that consist of more than 100 predefined instructions that store their input/output values on the VM’s stack.
These instructions have arguments, some of them are primitive values, some are inlined Ruby objects, some are flags.
You can view these instructions by running Ruby with --dump=insns
flag:
$ ruby --dump=insns -e 'puts 2 + 3'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putobject 2
0003 putobject 3
0005 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:puts, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0011 leave
As you can see, there are 5 instructions:
putself
- pushes self
at the top of the stackputobject
- pushes given object (numbers 2 an 3)opt_plus
- optimized instruction for +
methodopt_send_without_block
- optimized instruction for a generic method call without blockleave
- an equivalent of return
Let’s start with the example above.
MRI has an API to compile an arbitrary code into instructions:
> pp RubyVM::InstructionSequence.compile("2+3").to_a
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>0,
:local_size=>0,
:stack_max=>2,
:node_id=>4,
:code_location=>[1, 0, 1, 3]},
"<compiled>",
"<compiled>",
"<compiled>",
1,
:top,
[],
{},
[],
[1,
:RUBY_EVENT_LINE,
[:putobject, 2],
[:putobject, 3],
[:opt_plus, {:mid=>:+, :flag=>16, :orig_argc=>1}, false],
[:leave]]]
Our code gets compiled into an array where:
MAJOR
/MINOR
/PATCH
parts of the Ruby version that was used to compile itfile/line
arguments to {class,module}_eval
(Object.module_eval("[__FILE__, __LINE__]", '(file)', 42)
):top
here means the code was parsed and compiled as a top-level block of code)Each instruction is either:
Only arrays are “real” instructions, numbers and symbols are special debug entries that are used internally by the VM. In our case a number followed by the :RUBY_EVENT_LINE
is a mark that MRI uses to know what is the number of the line that is currently being interpreted (for example, all backtrace entries include these numbers)
How can we evaluate instructions above? Well, we definitely need a stack:
$stack = []
Then, we need to iterate over instructions (the last item of the iseq) and evaluate them one by one. We could write a giant case-when
, but I promise that it won’t fit on 10 screens. Let’s use some meta-programming and dynamic dispatching:
def execute_insn(insn)
name, *payload = insn
send(:"execute_#{name}", payload)
end
This way we need to write a method per instruction type, let’s start with putobject
:
def execute_putobject(object)
$stack.push(object)
end
def execute_opt_plus(options, flag)
rhs = $stack.pop
lhs = $stack.pop
$stack.push(lhs + rhs)
end
def execute_leave
# nothing here so far
end
This code is enough to get 5
in the stack once it’s executed. Here’s the runner:
iseq = RubyVM::InstructionSequence.compile("2+3").to_a
insns = iseq[13].grep(Array) # ignore numbers and symbols for now
insns.each { |insn| execute_insn(insn) }
pp $stack
You should see [5]
.
All instructions above simply pull some values from the stack, do some computations and push the result back.
Let’s think about self
for a minute:
> pp RubyVM::InstructionSequence.compile("self").to_a[13]
[1, :RUBY_EVENT_LINE, [:putself], [:leave]]
self
is stored somewhere internally, and even more, it’s dynamic:
puts self
class X
puts self
end
The first puts self
prints main
, while the second one prints X
.
Here comes the concept of frames.
Frame is an object inside the virtual machine that represents a closure. Or a binding
. It’s an isolated “world” with its own set of locals, its own self
, its own file
/line
information, it’s own rescue
and ensure
handlers etc.
Frame also has a type:
top
frame that wraps all the code in your file. All variables set in the global scope of your file belong to the top frame. Each parsed and evaluated file creates its own top frame.method
frame. All methods create it. One method frame per one method call.block
frame. All blocks and lambdas create it. One block frame per one block call.class
frame, however it does not mean that instantiating a class creates it. The whole class X; ... end
does it. Later when you do X.new
Ruby does not create any class
frames. This frame represents a class definition.rescue
frame that represents a code inside rescue => e; <...>; end
blockensure
frame (well, I’m sure you get it)module
(for a module body), sclass
(for a singleton class) and a very unique eval
frame.And they are stored internally as a stack.
When you invoke caller
method you see this stack (or some information based on it). Each entry in the error backtrace is based on the state of this stack when the error is thrown.
OK, let’s write some code to extend our VM:
class FrameClass
COMMON_FRAME_ATTRIBUTES = %i[
_self
nesting
locals
file
line
name
].freeze
def self.new(*arguments, &block)
Struct.new(
*COMMON_FRAME_ATTRIBUTES,
*arguments,
keyword_init: true
) do
class_eval(&block)
def self.new(iseq:, **attributes)
instance = allocate
instance.file = iseq.file
instance.line = iseq.line
instance.name = iseq.name
instance.locals = {}
instance.send(:initialize, **attributes)
instance
end
end
end
end
I like builders and I don’t like inheritance (the fact that frames share some common attributes does not mean that they should be inherited from an AbstractFrame
class; also I don’t like abstract classes, they are dead by definition)
Here’s the first version of the TopFrame
:
TopFrame = FrameClass.new do
def initialize(**)
self._self = TOPLEVEL_BINDING.eval('self')
self.nesting = [Object]
end
def pretty_name
"TOP #{file}"
end
end
Let’s walk through the code:
_self
- what self
returns inside the framenesting
- what Module.nesting
returns (used for the relative constant lookup)locals
- a set of local variablesfile
and line
- currently running __FILE__:__LINE__
name
- a human-readable name of the frame, we will use it mostly for debuggingFrameClass
is a builder that is capable of building a custom Frame
class (similar to Struct
class)FrameClass.new
takes:So, the TopFrame
class is a Struct
-like class that:
FrameClass.new
)_self
and nesting
pretty_name
instance methodWe will create as many classes as we need to cover all kinds of frames (I will return to it later, I promise)
I don’t like working with plain arrays, and as I mentioned above there’s a ton of useful information in the instruction sequence that we get from RubyVM::InstructionSequence.compile("code").to_a
.
Let’s create a wrapper that knows the meaning of array items:
class ISeq
attr_reader :insns
def initialize(ruby_iseq)
@ruby_iseq = ruby_iseq
@insns = @ruby_iseq[13].dup
end
def file
@ruby_iseq[6]
end
def line
@ruby_iseq[8]
end
def kind
@ruby_iseq[9]
end
def name
@ruby_iseq[5]
end
def lvar_names
@ruby_iseq[10]
end
def args_info
@ruby_iseq[11]
end
end
Instance methods are self-descriptive, but just in case:
file
/line
return file/line where the iseq has been createdkind
returns a Symbol that we will later use to distinguish frames (:top
for a TopFrame
)insns
returns a list of instructionsname
is an internal name of the frame that is used in stacktraces (for class X; end
it returns <class:X>
)lvar_names
is an array of all local variable names that are used in the frameargs_info
is a special Hash with a meta-information about arguments (empty for all frames except methods)Frames are organized as a stack internally, every time when we enter a frame Ruby pushes it on a stack. When the frame ends (i.e. when its list of instruction ends or there’s a special [:leave]
instruction) it pops it.
class FrameStack
attr_reader :stack
include Enumerable
def initialize
@stack = []
end
def each
return to_enum(:each) unless block_given?
@stack.each { |frame| yield frame }
end
def push(frame)
@stack << frame
frame
end
def push_top(**args)
push TopFrame.new(**args)
end
def pop
@stack.pop
end
def top
@stack.last
end
def size
@stack.size
end
def empty?
@stack.empty?
end
end
Each entry in the stack is a frame that we entered at some point, so we can quickly build a caller
:
class BacktraceEntry < String
def initialize(frame)
super("#{frame.file}:#{frame.line}:in `#{frame.name}'")
end
end
stack = FrameStack.new
code = '2 + 3'
iseq = ISeq.new(RubyVM::InstructionSequence.compile(code, 'test.rb', '/path/to/test.rb', 42).to_a)
stack.push_top(iseq: iseq)
caller = stack.map { |frame| BacktraceEntry.new(frame) }.join("\n")
puts caller
# => "/path/to/test.rb:42: in `<compiled>'"
Let’s write it in a “script style” (it is simplified for a good reason, real code is much more complicated):
iseq = ISeq.new(RubyVM::InstructionSequence.compile_file('path/to/file.rb').to_a)
frame_stack = FrameStack.new
stack = []
frame_stack.push_top(iseq: iseq)
until frame_stack.empty?
current_frame = frame_stack.top
if current_frame.nil?
break
end
if current_frame.insns.empty?
frame_stack.pop_frame
next
end
current_insn = current_frame.insns.shift
execute_insn(current_insn)
end
Generally speaking, the code above is the core of the VM. Once it’s executed both frame_stack
and stack
must be empty. I added a bunch of consistency checks in my implementation, but for the sake of simplicity I’m going to omit them here.
I’ll try to be short here, there are about 100 instructions in Ruby, and some of them look similar.
putself
, putobject
, putnil
, putstring
, putiseq
All of these guys push a simple object at the top of the stack. putnil
pushes a known global nil
object, others have an argument that is used in stack.push(argument)
opt_plus
Ruby has a mode (that is turned on by default) that optimizes some frequently used method calls, like +
or .size
. It is possible to turn them off by manipulating RubyVM::InstructionSequence.compile_option
(if you set :specialized_instruction
to false
you’ll get a normal method call instead of the specialized instruction).
All of them do one specific thing, here’s an example of the opt_size
:
def execute_opt_size(_)
push(pop.size)
end
Of course we do it this way because we cannot optimize it. MRI does a different thing:
String#size
(or Array#size
if it’s an array) is not redefinedrb_str_length
(or RARRAY_LEN
if it’s an array)We could do the same sequence of steps, but we can’t invoke a C method, and so calling a check + .size
afterwards is even slower. It’s better for us to fall to the slow branch from the beginning.
You can print all available specialized instructions by running
RubyVM::INSTRUCTION_NAMES.grep(/\Aopt_/)
On Ruby 2.6.4 there are 34 of them.
opt_send_without_block
(or send
if specialized instructions are disabled)This is an instruction that is used to invoke methods. puts 123
looks like this:
[:opt_send_without_block, {:mid=>:puts, :flag=>20, :orig_argc=>1}, false]
It has 2 arguments:
mid
- a method ID (method name)flag
- a bitmask with a metadata about the invocationorig_argc
- a number of arguments passed to a method callCALL_DATA
in C. I have no idea what it doesHere’s the rough implementation:
def execute_opt_send_without_block(options, _)
mid = options[:mid]
args = options[:orig_argc].times.map { pop }.reverse
recv = pop
result = recv.send(mid, *args)
push(result)
end
So here we
options
hash (mid
)args
)recv
)recv.send(mid, *args)
(in our case it’s self.send(:puts, *[123])
I intentionally started with method calls because Ruby defines methods via method calls. Yes.
Ruby has a special singleton object called Frozen Core
. When you define a method via def m; end
Ruby invokes frozen_core.send("core#define_method", method_iseq)
:
$ ruby --dump=insns -e 'def m; end'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> (catch: FALSE)
0000 putspecialobject 1 ( 1)[Li]
0002 putobject :m
0004 putiseq m
0006 opt_send_without_block <callinfo!mid:core#define_method, argc:2, ARGS_SIMPLE>, <callcache>
0009 leave
The object itself is defined here.
Of course, we don’t have access to the Frozen Core. But we have an instruction that pushes it at the top of the stack. We can create our own FrozenCore = Object.new.freeze
and check if recv
is equal to this frozen core.
As you may notice there are also putobject :m
and putiseq
instructions. And argc
of the method call is 2. Hmmm.
core#define_method
takes two arguments:
putobject
instructionputiseq
instruction. Yes, instruction is an argument for another instruction.Here’s the code:
if recv.equal?(FrozenCore) && mid == :'core#define_method'
method_name, body_iseq = *args
result = __define_method(method_name: method_name, body_iseq: body_iseq)
end
Here’s how __define_method
looks:
def __define_method(method_name:, body_iseq:)
parent_nesting = current_frame.nesting
define_on = MethodDefinitionScope.new(current_frame)
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
method_name
end
When we enter a method in Ruby it inherits Module.nesting
of a frame that defines it. This is why we also copy current_frame.nesting
to a method frame.
define_on = MethodDefinitionScope.new(current_frame)
is also quite simple:
class MethodDefinitionScope
def self.new(frame)
case frame._self
when Class, Module
frame._self
when TOPLEVEL_BINDING.eval('self')
Object
else
frame._self.singleton_class
end
end
end
Object
class.instance_eval
for example) - the method is defined on the singleton class of the object.These code constructions are equivalent:
def m; end
# and
Object.define_method(:m) {}
class X; def m; end; end
# and
X.define_method(:m) {}
o = Object.new
o.instance_eval { def m; end }
# and
o.singleton_class.define_method(:m) {}
Then comes this part:
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
We define a method that takes any arguments (it breaks Method#parameters
, but let’s ignore it) and an optional block and executes the ISeq of the method body in a context of self
.
I admit that it’s a very hacky trick, but it allows us to dynamically assign self
.
Plus, we pass all other things that can (and in most cases will) be used in a method body:
method_args
- what was given to a particular invocation of our methodblock
- a block given to a method callparent_nesting
- Module.nesting
in the outer scope. We have to store in the beginning of the method definition because it may change before the method gets called.execute(iseq, **options)
is a tiny wrapper that pushes the frame into the frame_stack
depending on the kind
of the given iseq:
def execute(iseq, **payload)
iseq = ISeq.new(iseq)
push_frame(iseq, **payload)
evaluate_last_frame
pop_frame
end
def push_frame(iseq, **payload)
case iseq.kind
when :top
@frame_stack.push_top(
iseq: iseq
)
when :method
@frame_stack.push_method(
iseq: iseq,
parent_nesting: payload[:parent_nesting],
_self: payload[:_self],
arg_values: payload[:method_args],
block: payload[:block]
)
else
raise NotImplementedError, "Unknown iseq kind #{iseq.kind.inspect}"
end
end
there are 2 most-commonly used instructions to get/set locals:
getlocal
setlocal
Both take two arguments:
Here’s an example
> pp RubyVM::InstructionSequence.compile('a = 10; b = 20; a; b').to_a[13]
[[:putobject, 10],
[:setlocal_WC_0, 4],
[:putobject, 20],
[:setlocal_WC_0, 3],
[:getlocal_WC_0, 3],
[:leave]]
We push 10
to the stack, then we pop it and assign to a variable with ID = 4 in the current frame (setlocal_WC_0
here is a specialized instruction that is setlocal 0, 4
when the optimization is turned off).
Here’s the code to maintain locals:
require 'set'
# A simple struct that represents a single local variable;
# has a name, an ID and a value (or no value)
Local = Struct.new(:name, :id, :value, keyword_init: true) do
def get
value
end
def set(value)
self.value = value
value
end
end
# A wrapper around "Set" that holds all locals for some frame;
# Absolutely each frame has its own instance of "Locals"
class Locals
UNDEFINED = Object.new
def UNDEFINED.inspect; 'UNDEFINED'; end
def initialize(initial_names)
@set = Set.new
initial_names.reverse_each.with_index(3) do |arg_name, idx|
# implicit args (like a virtual attribute that holds mlhs value) have numeric names
arg_name += 1 if arg_name.is_a?(Integer)
declare(name: arg_name, id: idx).set(Locals::UNDEFINED)
end
end
def declared?(name: nil, id: nil)
!find_if_declared(name: name, id: id).nil?
end
def declare(name: nil, id: nil)
local = Local.new(name: name, id: id, value: nil)
@set << local
local
end
def find_if_declared(name: nil, id: nil)
if name
@set.detect { |var| var.name == name }
elsif id
@set.detect { |var| var.id == id }
else
raise NotImplementedError, "At least one of name:/id: is required"
end
end
def find(name: nil, id: nil)
result = find_if_declared(name: name, id: id)
if result.nil?
raise InternalError, "No local name=#{name.inspect}/id=#{id.inspect}"
end
result
end
def pretty
@set
.map { |local| ["#{local.name}(#{local.id})", local.value] }
.sort_by { |(name, value)| name }
.to_h
end
end
So locals
inside a frame is just a set. It is possible to declare a local, to check if it’s declared and to get it.
Here’s the implementation of getlocal
:
def execute_getlocal(local_var_id, n)
frame = n.times.inject(current_frame) { |f| f.parent_frame }
local = frame.locals.find(id: local_var_id)
value = local.get
if value.equal?(Locals::UNDEFINED)
value = nil
end
push(value)
end
We jump out n
times to get the Nth frame, we find a local, we return nil
if it’s undefined
and we push the value back to the stack (so the result can be used by a subsequent instruction)
Here’s the implementation of setlocal
:
def execute_setlocal(local_var_id, n)
value = pop
frame = n.times.inject(current_frame) { |f| f.parent_frame }
local =
if (existing_local = frame.locals.find_if_declared(id: local_var_id))
existing_local
elsif frame.equal?(current_frame)
frame.locals.declare(id: local_var_id)
else
raise InternalError, 'locals are malformed'
end
local.set(value)
end
This one is a bit more complicated:
pop
the value from the stack (it was pushed by a previous instruction putobject 10
)Every method has a list of arguments. Yes, sometimes it’s empty, but even in such case we do an arity check. In general arguments initialization is a part of every method call.
This part of Ruby is really complicated, because we have 12 argument types:
def m(x)
def m(x = 42)
def m(*x)
def m(*, x = 1)
mlhs
argument (can be used as a post argument too) - def m( (x, *y, z) )
def m(x:)
def m(x: 42)
def m(**x)
def m(&x)
proc { |;x| }
(I did not implement it because I never used it)nil
keyword argument (since 2.7) - def m(**nil)
def m(...)
First, let’s take a look at the iseq to see what we have:
> pp RubyVM::InstructionSequence.compile('def m(a, b = 42, *c, d); end').to_a[13][4][1]
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>4,
:local_size=>4,
:stack_max=>1,
:node_id=>7,
:code_location=>[1, 0, 1, 28]},
"m",
"<compiled>",
"<compiled>",
1,
:method,
[:a, :b, :c, :d],
{:opt=>[:label_0, :label_4],
:lead_num=>1,
:post_num=>1,
:post_start=>3,
:rest_start=>2},
[],
[:label_0,
1,
[:putobject, 42],
[:setlocal_WC_0, 5],
:label_4,
[:putnil],
:RUBY_EVENT_RETURN,
[:leave]]]
There are two entries that we are interested in:
Let’s prepare and group it first, it’s hard to work with such format:
class CategorizedArguments
attr_reader :req, :opt, :rest, :post, :kw, :kwrest, :block
def initialize(arg_names, args_info)
@req = []
@opt = []
@rest = nil
@post = []
parse!(arg_names.dup, args_info.dup)
end
def parse!(arg_names, args_info)
(args_info[:lead_num] || 0).times do
req << take_arg(arg_names)
end
opt_info = args_info[:opt].dup || []
opt_info.shift
opt_info.each do |label|
opt << [take_arg(arg_names), label]
end
if args_info[:rest_start]
@rest = take_arg(arg_names)
end
(args_info[:post_num] || 0).times do
post << take_arg(arg_names)
end
end
def take_arg(arg_names)
arg_name_or_idx = arg_names.shift
if arg_name_or_idx.is_a?(Integer)
arg_name_or_idx += 1
end
arg_name_or_idx
end
end
I intentionally skip keyword arguments here, but they are not that much different from other types of arguments. The only noticeable difference is that optional keyword arguments have “inlined” default values if they are simple enough (like plain strings or numbers, but not expressions like 2+2
). If you are interested you can go to the repository and check this file.
Then, we should parse arguments and assign them into local variables when we push a method frame (so they are available once we start executing instructions of a method body):
class MethodArguments
attr_reader :args, :values, :locals
def initialize(iseq:, values:, locals:)
@values = values.dup
@locals = locals
@iseq = iseq
@args = CategorizedArguments.new(
iseq.lvar_names,
iseq.args_info
)
end
def extract(arity_check: false)
if arity_check && values.length < args.req.count + args.post.count
raise ArgumentError, 'wrong number of arguments (too few)'
end
# Required positional args
args.req.each do |name|
if arity_check && values.empty?
raise ArgumentError, 'wrong number of arguments (too few)'
end
value = values.shift
locals.find(name: name).set(value)
end
# Optional positional args
args.opt.each do |(name, label)|
break if values.length <= args.post.count
value = values.shift
locals.find(name: name).set(value)
VM.jump(label)
end
# Rest positional argument
if (name = args.rest)
value = values.first([values.length - args.post.length, 0].max)
@values = values.last(args.post.length)
locals.find(name: name).set(value)
end
# Required post positional arguments
args.post.each do |name|
if arity_check && values.empty?
raise ArgumentError, 'Broken arguments, cannot extract required argument'
end
value = values.shift
locals.find(name: name).set(value)
end
# Make sure there are no arguments left
if arity_check && values.any?
raise ArgumentError, 'wrong number of arguments (too many)'
end
end
end
Here values
is what we get in a method call in *method_args
, locals
is equal to MethodFrame#locals
that is set to Locals.new
by default.
Let’s write MethodFrame
class!
MethodFrame = FrameClass.new do
attr_reader :arg_values
attr_reader :block
def initialize(parent_nesting:, _self:, arg_values:, block:)
self._self = _self
self.nesting = parent_nesting
@block = block
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: iseq.args_info[:block_start] ? block : nil
).extract(arity_check: true)
end
def pretty_name
"#{_self.class}##{name}"
end
end
Method frame is just a regular frame that extracts arguments during its initialization.
A regular constant assignment (like A = 1
) is based on a scope (Module.nesting
for relative lookup) and two instructions:
setconstant
getconstant
Both have a single argument - a constant name. But how does Ruby distinguish relative and absolute constant lookup? I mean, what’s the difference between A
and ::A
?
Ruby uses a special instruction to set a “constant scope” that:
opt_getinlinecache
before get/setconstant
pushnil
(that works as a flag)Let’s take a look at the non-optimized mode (because we can’t optimize it anyway):
> pp RubyVM::InstructionSequence.compile('A; ::B; Kernel::D').to_a[13]
[[:putnil],
[:getconstant, :A],
[:pop],
[:putobject, Object],
[:getconstant, :B],
[:pop],
[:putnil],
[:getconstant, :Kernel],
[:getconstant, :D],
[:leave]]
A
constant performs a relative lookup, so putnil
is used.
::B
constant performs a global lookup on the Object
that is a known object, and so it’s inlined in the putobject
instruction.
Kernel::D
first searches for Kernel
constant locally, then it uses it as a “scope” for a constant D
.
Quite easy, right? Not so fast. Ruby uses Module.nesting
to perform a bottom-top search. This is why it’s so important to maintain nesting
value in frames. Thus, the lookup is performed on current_scope.nesting
in reverse order:
def execute_getconstant(name)
scope = pop
search_in = scope.nil? ? current_scope.nesting.reverse : [scope]
search_in.each do |mod|
if mod.const_defined?(name)
const = mod.const_get(name)
push(const)
return
end
end
raise NameError, "uninitialized constant #{name}"
end
If the scope is given (via push
in a previous instruction) we use it. Otherwise we have a relative lookup and so we must use current_scope.nesting.reverse
.
setconstant
is a bit simpler, because it always defines a constant on a scope set by a previous instruction:
> pp RubyVM::InstructionSequence.compile('A = 10; ::B = 20; Kernel::D = 30').to_a[13]
[[:putobject, 10],
[:putspecialobject, 3],
[:setconstant, :A],
[:putobject, 20],
[:putobject, Object],
[:setconstant, :B],
[:putobject, 30],
[:dup],
[:putnil],
[:getconstant, :Kernel],
[:setconstant, :D],
[:leave]]
putspecialobject
is an instruction that is (when called with 3) pushes a “current” scope.
def putspecialobject(kind)
case kind
when 3
push(current_frame.nesting.last) # push "current" scope
else
raise NotImplementedError, "Unknown special object #{kind}"
end
end
def setconstant(name)
scope = pop
value = pop
scope.const_set(name, value)
end
Instance variables are always picked from the self
of the current frame (they literally look like a simplified version of local variables that are always stored in self
of the current scope):
> pp RubyVM::InstructionSequence.compile('@a = 42; @a').to_a[13]
[[:putobject, 42],
[:setinstancevariable, :@a, 0],
[:getinstancevariable, :@a, 0],
[:leave]]
I guess you know how the code should look like:
def execute_getinstancevariable(name, _)
value = current_frame._self.instance_variable_get(name)
push(value)
end
def execute_setinstancevariable(name, _)
value = pop
current_frame._self.instance_variable_set(name, value)
end
Class variables are similar, but it is possible to get it in the instance method, so it uses self
if our current frame is a ClassFrame
or self.class
otherwise:
> pp RubyVM::InstructionSequence.compile('@@a = 42; @@a').to_a[13]
[[:putobject, 42],
[:setclassvariable, :@@a],
[:getclassvariable, :@@a],
[:leave]]
def execute_setclassvariable(name)
value = pop
klass = current_frame._self
klass = klass.class unless klass.is_a?(Class)
klass.class_variable_set(name, value)
end
def execute_getclassvariable(name)
klass = current_frame._self
klass = klass.class unless klass.is_a?(Class)
value = klass.class_variable_get(name)
push(value)
end
But how can we construct arrays, hashes?
> pp RubyVM::InstructionSequence.compile('[ [:foo,a,:bar], [4,5], 42 ]').to_a[13]
[[:putobject, :foo],
[:putself],
[:send, {:mid=>:a, :flag=>28, :orig_argc=>0}, false, nil],
[:putobject, :bar],
[:newarray, 3],
[:duparray, [4, 5]],
[:putobject, 42],
[:newarray, 3],
[:leave]]
As you can see the strategy of building an array depends on its dynamicity:
[:foo, a, :bar]
MRI uses newarray
(because a
has to be computed in runtime)[4, 5]
it uses duparray
(because it’s faster)The whole array is also dynamic (because one of its elements is also dynamic). Let’s define them:
def execute_duparray(array)
push(array.dup)
end
def execute_newarray(size)
array = size.times.map { pop }.reverse
push(array)
end
Do hashes support inlining?
> pp RubyVM::InstructionSequence.compile('{ primitive: { foo: :bar }, dynamic: { c: d } }').to_a[13]
[[:putobject, :primitive],
[:duphash, {:foo=>:bar}],
[:putobject, :dynamic],
[:putobject, :c],
[:putself],
[:send, {:mid=>:d, :flag=>28, :orig_argc=>0}, false, nil],
[:newhash, 2],
[:newhash, 4],
[:leave]]
Yes! duphash
contains an inlined hash that should be pushed to the stack as is. newhash
has a numeric argument that represents the number of keys and values on the hash (i.e. keys * 2
or values * 2
, there’s no difference). And once again, if at least one element of the hash is dynamic, the whole has is also dynamic and so it uses newhash
:
def execute_duphash(hash)
push(hash.dup)
end
def execute_newhash(size)
hash = size.times.map { pop }.reverse.each_slice(2).to_h
push(hash)
end
Why do we need .dup
in duphash
and duparray
? The reason is simple: this instruction can be executed multiple times (if it’s a part of a method or block, for example), and so the same value will be pushed to the stack multiple times. One of the next instructions can modify it but literals have to stay static no matter what. Without using .dup
the code like
2.times do
p [1, 2, 3].pop
end
would print 3
and 2
.
Splat is one of the most beautiful features of Ruby. Splat is foo, bar = *baz
(and also [*foo, *bar]
):
> pp RubyVM::InstructionSequence.compile('a, b = *c, 42').to_a[13]
[[:putself],
[:send, {:mid=>:c, :flag=>28, :orig_argc=>0}, false, nil],
[:splatarray, true],
[:putobject, 42],
[:newarray, 1],
[:concatarray],
[:dup],
[:expandarray, 2, 0],
[:setlocal_WC_0, 4],
[:setlocal_WC_0, 3],
[:leave]]
splatarray
pops the object from the stack, converts it to Array
by calling to_a
(if it’s not an array; otherwise there’s no type casting), and pushes the result back to the stack.
concatarray
constructs an array from two top elements and pushes it back. So it changes the stack [a, b]
to [ [a,b] ]
. If items are arrays it expands and merges them.
expandarray
expands it by doing pop
and pushing items back to the stack. It takes the number of elements that need to be returned, so if an array is bigger it drops some items, if it’s too small - it pushes as many nil
s as needed.
def execute_splatarray(_)
array = pop
array = array.to_a unless array.is_a?(Array)
push(array)
end
def execute_concatarray(_)
last = pop
first = pop
push([*first, *last])
end
def execute_expandarray(size, _flag)
array = pop
if array.size < size
array.push(nil) until array.size == size
elsif array.size > size
array.pop until array.size == size
else
# they are equal
end
array.reverse_each { |item| push(item) }
end
In fact expandarray
is much, much more complicated, you can go to the repository and check it if you want.
Keyword splats (like { **x, **y }
) are really similar to array splats, I’m not going to cover them here.
To handle conditions Ruby uses local goto
(just like in C). Target of the goto
-like instruction is a label:
> pp RubyVM::InstructionSequence.compile("a = b = c = 42; if a; b; else; c; end").to_a[13]
[[:putobject, 42],
[:dup],
[:setlocal_WC_0, 3],
[:dup],
[:setlocal_WC_0, 4],
[:setlocal_WC_0, 5],
[:getlocal_WC_0, 5],
[:branchunless, :label_20],
[:jump, :label_16],
:label_16,
[:getlocal_WC_0, 4],
[:jump, :label_22],
:label_20,
[:getlocal_WC_0, 3],
:label_22,
[:leave]]
Do you see these :label_<NN>
symbols? They are used as markers. branchunless
takes a single argument: a label that it to jump to if the value on the top of the stack is false
or nil
. If it’s true
-like it does nothing.
def execute_branchunless(label)
cond = pop
unless cond
jump(label)
end
end
def jump(label)
insns = current_frame.iseq.insns
insns.drop_while { |insn| insn != label }
insns.shift # to drop the label too
end
Here we do pop
, check it and call jump
if it’s false
. jump
skips instructions until it sees a given label.
MRI also has branchif
and branchnil
:
branchif
does if cond
as a main checkbranchnil
does if cond.nil?
Ruby has a few compile-time optimizations that optimize code like
"a""b"
"#{'a'}#{'b'}"
into a string "ab"
. However more complicated cases with dynamic interpolation involve a few new instructions:
> pp RubyVM::InstructionSequence.compile('"#{a}#{:sym}"').to_a[13]
[[:putobject, ""],
[:putself],
[:send, {:mid=>:a, :flag=>28, :orig_argc=>0}, false, nil],
[:dup],
[:checktype, 5],
[:branchif, :label_18],
[:dup],
[:send, {:mid=>:to_s, :flag=>20, :orig_argc=>0}, false, nil],
[:tostring],
:label_18,
[:putobject, :sym],
[:dup],
[:checktype, 5],
[:branchif, :label_31],
[:dup],
[:send, {:mid=>:to_s, :flag=>20, :orig_argc=>0}, false, nil],
[:tostring],
:label_31,
[:concatstrings, 3],
[:leave]]
Parts above are split into sections.
putobject ""
)a
:a
via send
checktype
instruction that checks for an argument 5
that what’s popped is a string. it pushes back a boolean valueto_s
if the object is not a string:sym
that gets interpolated in the same wayconcatstrings 3
that does pop
3 times, concatenates 3 strings and pushes the result back to the stackFirst let’s take a look at the checktype
instruction:
CHECK_TYPE = ->(klass, obj) {
klass === obj
}.curry
RB_OBJ_TYPES = {
0x00 => ->(obj) { raise NotImplementedError }, # RUBY_T_NONE
0x01 => CHECK_TYPE[Object], # RUBY_T_OBJECT
0x02 => ->(obj) { raise NotImplementedError }, # RUBY_T_CLASS
0x03 => ->(obj) { raise NotImplementedError }, # RUBY_T_MODULE
0x04 => ->(obj) { raise NotImplementedError }, # RUBY_T_FLOAT
0x05 => CHECK_TYPE[String], # RUBY_T_STRING
0x06 => ->(obj) { raise NotImplementedError }, # RUBY_T_REGEXP
0x07 => ->(obj) { raise NotImplementedError }, # RUBY_T_ARRAY
0x08 => ->(obj) { raise NotImplementedError }, # RUBY_T_HASH
0x09 => ->(obj) { raise NotImplementedError }, # RUBY_T_STRUCT
0x0a => ->(obj) { raise NotImplementedError }, # RUBY_T_BIGNUM
0x0b => ->(obj) { raise NotImplementedError }, # RUBY_T_FILE
0x0c => ->(obj) { raise NotImplementedError }, # RUBY_T_DATA
0x0d => ->(obj) { raise NotImplementedError }, # RUBY_T_MATCH
0x0e => ->(obj) { raise NotImplementedError }, # RUBY_T_COMPLEX
0x0f => ->(obj) { raise NotImplementedError }, # RUBY_T_RATIONAL
0x11 => ->(obj) { raise NotImplementedError }, # RUBY_T_NIL
0x12 => ->(obj) { raise NotImplementedError }, # RUBY_T_TRUE
0x13 => ->(obj) { raise NotImplementedError }, # RUBY_T_FALSE
0x14 => ->(obj) { raise NotImplementedError }, # RUBY_T_SYMBOL
0x15 => ->(obj) { raise NotImplementedError }, # RUBY_T_FIXNUM
0x16 => ->(obj) { raise NotImplementedError }, # RUBY_T_UNDEF
0x1a => ->(obj) { raise NotImplementedError }, # RUBY_T_IMEMO
0x1b => ->(obj) { raise NotImplementedError }, # RUBY_T_NODE
0x1c => ->(obj) { raise NotImplementedError }, # RUBY_T_ICLASS
0x1d => ->(obj) { raise NotImplementedError }, # RUBY_T_ZOMBIE
0x1e => ->(obj) { raise NotImplementedError }, # RUBY_T_MOVED
0x1f => ->(obj) { raise NotImplementedError }, # RUBY_T_MASK
}.freeze
def execute_checktype(type)
item_to_check = pop
check = RB_OBJ_TYPES.fetch(type) { raise InternalError, "checktype - unknown type #{type}" }
result = check.call(item_to_check)
push(result)
end
I blindly took it from MRI and yes, this instruction supports many types. I implemented only two of them, but the rest looks simple (except imemo
and friends). Honestly I have no idea why, but about 95% of specs from the RubySpec (only language group, I did not check the whole test suite) are passing with these missing parts. I have no idea how to trigger MRI to use them. Maybe it uses them internally?
concatstrings
looks just like newarray
:
def execute_concatstrings(count)
strings = count.times.map { pop }.reverse
push(strings.join)
end
Blocks are passed to method calls as a third argument:
> pp RubyVM::InstructionSequence.compile('m { |a| a + 42 }').to_a[13]
[[:putself],
[:send,
{:mid=>:m, :flag=>4, :orig_argc=>0},
false,
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>1,
:local_size=>1,
:stack_max=>2,
:node_id=>7,
:code_location=>[1, 2, 1, 16]},
"block in <compiled>",
"<compiled>",
"<compiled>",
1,
:block,
[:a],
{:lead_num=>1, :ambiguous_param0=>true},
[[:redo, nil, :label_1, :label_9, :label_1, 0],
[:next, nil, :label_1, :label_9, :label_9, 0]],
[1,
:RUBY_EVENT_B_CALL,
[:nop],
:label_1,
:RUBY_EVENT_LINE,
[:getlocal_WC_0, 3],
[:putobject, 42],
[:send, {:mid=>:+, :flag=>16, :orig_argc=>1}, false, nil],
:label_9,
[:nop],
:RUBY_EVENT_B_RETURN,
[:leave]]]]]
Block definitely needs a frame that looks pretty much like a MethodFrame
:
BlockFrame = FrameClass.new(:arg_values) do
def initialize(arg_values:)
self._self = parent_frame._self
self.nesting = parent_frame.nesting
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: nil
).extract(arity_check: false)
end
def pretty_name
name
end
end
(For simplicity let’s ignore that blocks can also take blocks; also let’s ignore lambdas, we will return to them later)
The code above looks almost like a method frame. The only difference is the arity_check
value that we pass to the MethodArguments
class.
But when should we create this frame? And how can we get a proc from it?
VM_CALL_ARGS_BLOCKARG = (0x01 << 1)
def execute_send(options, flag, block_iseq)
_self = self
mid = options[:mid]
args = []
block =
if block_iseq
proc do |*args|
execute(block_iseq, self: _self, arg_values: args)
end
elsif flag & VM_CALL_ARGS_BLOCKARG
pop
else
nil
end
args = options[:orig_argc].times.map { pop }.reverse
recv = pop
result = recv.send(mid, *args, &block)
push(result)
end
It looks like a more generalized version of opt_send_without_block
, because opt_send_without_block
is a specialized implementation of send
.
This instruction also pops a receiver and arguments, but what’s important, it also computes the block.
block_iseq
is given we create a proc that (once called) executes a block instruction (i.e. a block body) with given arguments. This block uses self
of the place where it was created. (i.e. self == proc { self }.call
always returns true)block_iseq
the block can be given via a &block
argument. MRI marks method call as VM_CALL_ARGS_BLOCKARG
(this flag is just a bitmask)Implicit block like b = proc {}; m(&b)
does not need any additional implementation. Method proc
here takes a block (handled by the first if
branch), it gets stored in a local variable and we pass it to the method as a block argument (elseif
branch).
It’s complicated and I don’t have a complete solution that covers all cases (I guess because MRI does not expose enough APIs to do it. Or I’m just not smart enough).
Arrow lambda (->(){}
) is just a method call FrozenCore#lambda
, and so we can easily determine that it’s a lambda and not a proc. But what about lambda {}
? It can be overwritten.
An incomplete (and somewhat unreliable) solution is to check that our receiver does not override lambda
method inherited from a Kernel
module:
creating_a_lambda = false
if mid == :lambda
if recv.equal?(FrozenCore)
# ->{} syntax
creating_a_lambda = true
end
if recv.class.instance_method(:lambda).owner == Kernel
if Kernel.instance_method(:lambda) == RubyRb::REAL_KERNEL_LAMBDA
# an original "lambda" method from a Kernel module
creating_a_lambda = true
end
end
end
Then we can set it on our block frame as an attribute.
# in the branch that creates a proc from the `block_iseq`
proc do |*args|
execute(block_iseq, self: _self, arg_values: args, is_lambda: creating_a_lambda)
end
BlockFrame = FrameClass.new(:arg_values, :is_lambda) do
def initialize(arg_values:, is_lambda:)
self.is_lambda = is_lambda
self._self = parent_frame._self
self.nesting = parent_frame.nesting
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: nil
).extract(arity_check: is_lambda)
end
end
Arity check is enabled only if our proc is a lambda.
If you remember when we define a method we tell it to save given block in a method frame:
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
And the frame itself saves it in the attr_reader
.
So both explicit and implicit blocks are available in a method body via current_frame.block
. It’s possible to invoke it by calling block.call(arguments)
(if it’s available as an explicit block argument) or to call yield(arguments)
(in such case it does not even have to be declared in a method signature).
pp RubyVM::InstructionSequence.compile('def m; yield; end').to_a[13][4][1][13]
[[:invokeblock, {:mid=>nil, :flag=>16, :orig_argc=>0}],
[:leave]]
Honestly even before I started working on this article I expected MRI to do something like this. yield
is equivalent to <current block>.call(args)
:
def execute_invokeblock(options)
args = options[:orig_argc].times.map { pop }.reverse
frame = current_frame
frame = frame.parent_frame until frame.can_yield?
result = frame.block.call(*args)
push(result)
end
Do you see frame = frame.parent_frame until frame.can_yield?
? The reason for this line is that you may have a code like
def m
[1,2,3].each { |item| yield item }
end
^ yield
here belongs to the method m
, not to the BlockFrame
of the .each
method. There can be more nested blocks, so we have to go up until we see something that supports yield
. Well, we know that only one frame can do yield
: it’s a MethodFrame
.
Our frame class factory need to be extended to generate this method by default and return false from it. MethodFrame
has to override it and return true
. Polymorphism!
Calling super
is very similar to calling yield
: it can be replaced with method(__method__).super_method.call(args)
.
__method__
can be retrieved from current_frame.name
, args
are processed using options[:orig_argc]
:
def execute_invokesuper(options, _, _)
recv = current_frame._self
mid = current_frame.name
super_method = recv.method(mid).super_method
args = options[:orig_argc].times.map { pop }.reverse
result = super_method.call(*args)
push(result)
end
This implementation is incorrect, it can’t handle a sequence of super
calls (class A < B < C
, each has a method that calls super
). I guess it’s possible to implement it by recording the class where the method was defined (i.e. by storing current_frame._self
before calling define_method
and passing it to the MethodFrame
constructor as a defined_in
attribute). This way we could do something like this:
def execute_invokesuper(options, _, _)
recv = current_frame._self
mid = current_frame.name
dispatchers = recv.class.ancestors
current_dispatcher_idx = dispatchers.index(current_frame.defined_in)
next_dispatcher = dispatchers[current_dispatcher_idx + 1]
super_method = next_dispatcher.instance_method(mid).bind(recv)
args = options[:orig_argc].times.map { pop }.reverse
result = super_method.call(*args)
push(result)
end
I did not implement it because MSpec does not rely on it and I usually try to avoid sequences of super
calls.
Similar to locals and instance variables, there are getglobal
/setglobal
instructions. They also take a variable name as an argument.
Unfortunately, Ruby has no API to dynamically get/set global variables. But we have eval
!
def execute_getglobal((name))
push eval(name.to_s)
end
def execute_setglobal((name))
# there's no way to set a gvar by name/value
# but eval can reference locals
value = pop
eval("#{name} = value")
end
defined?
keywordAs you may know this keyword can handle pretty much anything:
> pp RubyVM::InstructionSequence.compile('defined?(42)').to_a[13]
[:putobject, "expression"],
[:leave]]
> pp RubyVM::InstructionSequence.compile('a = 42; defined?(a)').to_a[13]
[[:putobject, 42],
[:setlocal_WC_0, 3],
[:putobject, "local-variable"],
[:leave]]
In some simple cases it does not do any computations. It’s obvious that 42
is an expression and a
is a local variable (and there’s no way to remove it by any code between assignment and defined?
check)
More advanced checks use a defined
instruction:
> pp RubyVM::InstructionSequence.compile('@a = 42; defined?(@a)').to_a[13]
[[:putobject, 42],
[:setinstancevariable, :@a, 0],
[:putnil],
[:defined, 2, :@a, true],
[:leave]]
The first argument is a special enum
flag that specifies what are we trying to check here:
module DefinedType
DEFINED_NOT_DEFINED = 0
DEFINED_NIL = 1
DEFINED_IVAR = 2
DEFINED_LVAR = 3
DEFINED_GVAR = 4
DEFINED_CVAR = 5
DEFINED_CONST = 6
DEFINED_METHOD = 7
DEFINED_YIELD = 8
DEFINED_ZSUPER = 9
DEFINED_SELF = 10
DEFINED_TRUE = 11
DEFINED_FALSE = 12
DEFINED_ASGN = 13
DEFINED_EXPR = 14
DEFINED_IVAR2 = 15
DEFINED_REF = 16
DEFINED_FUNC = 17
end
I’ll show you the branch that handles instance variables:
def execute_defined(defined_type, obj, needstr)
# used only in DEFINED_FUNC/DEFINED_METHOD branches
# but we still have to do `pop` here (even if it's unused)
context = pop
verdict =
case defined_type
when DefinedType::DEFINED_IVAR
ivar_name = obj
if current_frame._self.instance_variable_defined?(ivar_name)
'instance-variable'
end
# ... other branches
end
push(verdict)
end
All other branches are similar, they do some check and push a constant string or nil
back to the stack.
For static ranges (like (1..2)
) Ruby uses inlining and a well-known putobject
instruction. But what if it’s dynamic? Like (a..b)
> pp RubyVM::InstructionSequence.compile('a = 3; b = 4; p (a..b); p (a...b)').to_a[13]
[[:putobject, 3],
[:setlocal_WC_0, 4],
[:putobject, 4],
[:setlocal_WC_0, 3],
[:putself],
[:getlocal_WC_0, 4],
[:getlocal_WC_0, 3],
[:newrange, 0],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:pop],
[:putself],
[:getlocal_WC_0, 4],
[:getlocal_WC_0, 3],
[:newrange, 1],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:leave]]
There’s a special newrange
instruction that takes a flag as an argument to specify inclusion of the right side (i.e. to distinguish ..
vs ...
)
def execute_newrange(flag)
high = pop
low = pop
push(Range.new(low, high, flag == 1))
end
This is probably the most complicated part. What if you have a method that has a loop inside a loop that does return
? You want to stop executing both loops and simply exit the method, right?
def m
2.times do |i|
3.times do |j|
return [i,j]
end
end
end
m
Of course you can just find the closest frame that supports return
(i.e. a MethodFrame
), but you also need to stop execution of two running methods and blocks. In our case it’s even more complicated because we don’t control them (they are written in C).
The only way I was able to find is to throw an exception. An exception destroys all frames (including YARV’s C frames) until it finds someone who can catch and handle it. If there’s no such frame the programs exits with an error.
Let’s create a special exception class called VM::LongJumpError
. Each frame class has to know what it can handle (for example, you can do break
in a block, but not in a method; return
is normally supported only by methods and lambdas, etc):
class LongJumpError < InternalError
attr_reader :value
def initialize(value)
@value = value
end
def do_jump!
raise InternalError, 'Not implemented'
end
def message
"#{self.class}(#{@value.inspect})"
end
end
class ReturnError < LongJumpError
def do_jump!
frame = current_frame
if frame.can_return?
# swallow and consume
frame.returning = self.value
else
pop_frame(reason: "longjmp (return) #{self}")
raise self
end
end
end
Each longjmp
exception wraps the value that it “returns” with (or “breaks” with, for break
we need a separate class, but I’m going to skip it here. break
/next
and other friends are really similar to return
).
But we need to catch them, right? Without a rescue
handler we will have something conceptually similar to segfault:
def execute(iseq, **payload)
iseq = ISeq.new(iseq)
push_frame(iseq, **payload)
# here comes the difference:
# we wrap executing instructions into a rescue handler
begin
evaluate_last_frame
rescue LongJumpError => e
e.do_jump!
end
pop_frame
end
The only missing thing is the implementation of can_return?
method in our frames. All frames except MethodFrame
(and BlockFrame
it it’s marked as lambda
) must return false
, MethodFrame
must return true
.
MRI uses a special instruction called throw
that has a single argument that is a throw_type
(an enum
, for return
it’s 1, break
is 3, next
is 4, there are also retry
/redo
and a few more). The value that must be attached to the thrown exception comes from the stack (so this instruction does a single pop
)
VM_THROW_STATE_MASK = 0xff
RUBY_TAG_NONE = 0x0
RUBY_TAG_RETURN = 0x1
RUBY_TAG_BREAK = 0x2
RUBY_TAG_NEXT = 0x3
RUBY_TAG_RETRY = 0x4
RUBY_TAG_REDO = 0x5
RUBY_TAG_RAISE = 0x6
RUBY_TAG_THROW = 0x7
RUBY_TAG_FATAL = 0x8
RUBY_TAG_MASK = 0xf
def execute_throw(throw_type)
throw_type = throw_type & VM_THROW_STATE_MASK
throw_obj = pop
case throw_type
when RUBY_TAG_RETURN
raise VM::ReturnError, throw_obj
when RUBY_TAG_BREAK
raise VM::BreakError, throw_obj
when RUBY_TAG_NEXT
raise VM::NextError, throw_obj
# ...
end
end
longjmp
in MRIBut does it work in the same way in MRI? C does not have exceptions. And at the same time there is a bunch of places where MRI does something like
if (len > ARY_MAX_SIZE) {
rb_raise(rb_eArgError, "array size too big");
}
// handle validated data
This rb_raise
somehow exits a C function. Well, here’s the trick: non-local goto
.
There are C calls that perform a goto
to any place (that was previously marked of course, similar to jump to labels for a local goto
):
setjmp()
saves the stack context/environment inenv
for later use bylongjmp
… also known as “context switch”. And it’s relatively expensive.
Even if you don’t raise
an exception and only do begin; ...; rescue; end
in your code you still have to save the context (to jump to it once you raise
an error). MRI does not know at compile time which methods can throw an error (and do you throw them at all), so each rescue
produces a setjmp
call (and each raise
triggers a longjmp
and passes closest rescue
-> saved env
as an argument)
rescue
/ensure
So now we know that raise/rescue works via long jumps under the hood. Let’s implement our own exceptions.
By sticking to MRI exceptions we can unwrap both internal and our stacks at the same time. I’m not going to override raise
, it should do what it originally does, but we still need to support our own rescue
blocks. Let’s see what MRI gives us:
> pp RubyVM::InstructionSequence.compile('begin; p "x"; rescue A; p "y"; end').to_a
[ # ...snip
[[:rescue,
[ # ...snip
"rescue in <compiled>",
"<compiled>",
"<compiled>",
1,
:rescue,
[:"\#$!"],
{},
[],
[1,
[:getlocal_WC_0, 3],
[:putnil],
[:getconstant, :A],
[:checkmatch, 3],
[:branchif, :label_11],
[:jump, :label_19],
:label_11,
:RUBY_EVENT_LINE,
[:putself],
[:putstring, "y"],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:leave],
:label_19,
[:getlocal_WC_0, 3],
[:throw, 0]]],
:label_0,
:label_7,
:label_8,
0],
[:retry, nil, :label_7, :label_8, :label_0, 0]],
[:label_0,
1,
:RUBY_EVENT_LINE,
[:putself],
[:putstring, "x"],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
:label_7,
[:nop],
:label_8,
[:leave]]]
An instruction sequence that has some rescue
blocks inside includes all information about them (in the element #12, right above an instructions list). Each rescue handler is a frame with its own list of variables and instructions. Its kind
is :rescue
and is has at least one local variable: $!
. It starts with a dollar sign, but it’s a local variable. According to its semantics it has to be a local variable, but unfortunately it can’t look like a local variable (because it’d would potentially conflict with method calls). I mean, that’s how I explain it to myself, I don’t know for sure what was the initial reason to design it this way.
It also has a few labels at the bottom - :label_7, :label_8, :label_0
:
A top-level instruction also contains these labels, and the meaning of them is:
RescueFrame
with a rescue iseq$!
variable in this frame to the error that we have just caughtjump
to the “exit” label after doing pop_frame
Let’s code it!
class ISeq
attr_reader :rescue_handlers
attr_reader :ensure_handlers
def initialize(ruby_iseq)
@ruby_iseq = ruby_iseq
reset!
setup_rescue_handlers!
setup_ensure_handlers!
end
# ... other existing methods on ISeq class
def setup_rescue_handlers!
@rescue_handler = @ruby_iseq[12]
.select { |handler| handler[0] == :rescue }
.map { |(_, iseq, begin_label, end_label, exit_label)| Handler.new(iseq, begin_label, end_label, exit_label) }
end
def setup_ensure_handlers!
@ensure_handler = @ruby_iseq[12]
.select { |handler| handler[0] == :ensure }
.map { |(_, iseq, begin_label, end_label, exit_label)| Handler.new(iseq, begin_label, end_label, exit_label) }
end
class Handler
attr_reader :iseq
attr_reader :begin_label, :end_label, :exit_label
def initialize(iseq, begin_label, end_label, exit_label)
@iseq = iseq
@begin_label, @end_label, @exit_label = begin_label, end_label, exit_label
end
end
end
So now each iseq
object has two getters:
rescue_handlers
ensure_handlers
Frames must know which handlers are active (but not instruction sequences, because methods can recursively call themselves and so the same iseq will be reused; it’s a per-frame property):
class FrameClass
def self.new
Struct.new {
# Both must be set to `Set.new` in the constructor
attr_accessor :enabled_rescue_handlers
attr_accessor :enabled_ensure_handlers
}
end
end
So this way all frames have it too. Every time when we see a label in our execution loop we need to check if it matches any begin_label
or end_label
of our current_frame.iseq.rescue_handlers
(or ensure_handlers
):
def on_label(label)
{
current_iseq.rescue_handlers => current_frame.enabled_rescue_handlers,
current_iseq.ensure_handlers => current_frame.enabled_ensure_handlers,
}.each do |all_handlers, enabled_handlers|
all_handlers
.select { |handler| handler.begin_label == label }
.each { |handler| enabled_handlers << handler }
all_handlers
.select { |handler| handler.end_label == label }
.each { |handler| enabled_handlers.delete(handler) }
end
end
side note: when we do a local jump
we should also walk through skipped instructions and enable/disable our handlers; this is important
OK, now the only missing part is the reworked execution loop:
# a generic runner that is used inside the loop
def execute_insn(insn)
case insn
when [:leave]
current_frame.returning = stack.pop
when Array
name, *payload = insn
with_error_handling do
send(:"execute_#{name}", payload)
end
# -- new branch for labels --
when Regexp.new("label_\\d+")
on_label(insn)
else
# ignore
end
end
# a wrapper that catches and handles error
def with_error_handling
yield
rescue VM::InternalError => e
# internal errors like LongJumpError should be invisible for users
raise
rescue Exception => e
handle_error(e)
end
def handle_error(error)
if (rescue_handler = current_frame.enabled_rescue_handler.first)
result = execute(rescue_handler.iseq, caught: error, exit_to: rescue_handler.exit_label)
stack.push(result)
else
raise error
end
end
# This guy also needs customization to support `jump(exit_label)`
def pop_frame
frame = @frame_stack.pop
if frame.is_a?(RescueFrame)
jump(frame.exit_to)
end
frame.returning
end
# And here's the rescue frame implementation
RescueFrame = FrameClass.new do
attr_reader :parent_frame, :caught, :exit_to
def initialize(parent_frame:, caught:, exit_to:)
self._self = parent_frame._self
self.nesting = parent_frame.nesting
@parent_frame = parent_frame
@caught = caught
@exit_to = exit_to
# $! always has an ID = 3
locals.declare(id: 3, name: :"\#$!")
locals.find(id: 3).set(caught)
end
end
But why do we handle only the first handler? Can there be multiple handlers? The answer is no, because:
rescue
handlers (by using case error
branching in a rescue body)rescue
itself is frame, and so nested rescue
is a rescue handler of the rescue handlerthrow
/catch
methodsAs a side (and I personally think a very interesting) note, while I was working on this project I have realized that specs for Kernel#throw
are not working for me at all. They were literally completely broken (even after I finished working on a very basic implementation of exceptions):
catch(:x) do
begin
throw :x
rescue Exception => e
puts e
end
end
This code does not print anything. However, if you do just throw :x
you get an exception:
> throw :x
Traceback (most recent call last):
2: from (irb):14
1: from (irb):14:in `throw'
UncaughtThrowError (uncaught throw :x)
Huh, what’s going on? Let’s take a look at the implementation of Kernel#throw
:
void
rb_throw_obj(VALUE tag, VALUE value)
{
rb_execution_context_t *ec = GET_EC();
struct rb_vm_tag *tt = ec->tag;
while (tt) {
if (tt->tag == tag) {
tt->retval = value;
break;
}
tt = tt->prev;
}
if (!tt) {
VALUE desc[3];
desc[0] = tag;
desc[1] = value;
desc[2] = rb_str_new_cstr("uncaught throw %p");
rb_exc_raise(rb_class_new_instance(numberof(desc), desc, rb_eUncaughtThrow));
}
ec->errinfo = (VALUE)THROW_DATA_NEW(tag, NULL, TAG_THROW);
EC_JUMP_TAG(ec, TAG_THROW);
}
This code raises an instance of rb_eUncaughtThrow
only in one case: if there’s no frame above that may “catch” it (like in the case when we did just throw :x
).
However if there’s a frame somewhere above that has the same tag MRI performs a manual longjmp
. This is why we can’t catch this exception. There’s simply no exception if there’s a catch(:x)
above (but there would be an exception if we would do catch(:y) { throw :x }
).
Is it faster? Let’s see
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(:time => 3)
x.report('raise') do
begin
raise 'x'
rescue => e
end
end
x.report('throw') do
catch(:x) do
throw :x
end
end
x.compare!
end
$ ruby benchmark.rb
Warming up --------------------------------------
raise 101.832k i/100ms
throw 256.893k i/100ms
Calculating -------------------------------------
raise 1.310M (± 4.2%) i/s - 3.971M in 3.037942s
throw 4.853M (± 3.0%) i/s - 14.643M in 3.020227s
Comparison:
throw: 4852821.4 i/s
raise: 1309915.6 i/s - 3.70x slower
As expected, it’s faster. But what’s more important, let’s see what if there are more frames between raise/rescue
and throw/catch
. First, let’s create a small method that wraps given code into N frames:
def in_n_frames(depth = 0, n, blk)
if depth == n
blk.call
else
in_n_frames(depth + 1, n, blk)
end
end
begin
in_n_frames(1000, proc { raise 'err' })
rescue => e
p e.backtrace.length
end
It prints 1003
, because there’s also a TopFrame
and a BlockFrame
(and a small bug that does +1), but that’s absolutely fine for us.
$ ruby benchmark.rb
Warming up --------------------------------------
raise 1.061k i/100ms
throw 1.115k i/100ms
Calculating -------------------------------------
raise 10.628k (± 1.3%) i/s - 32.891k in 3.095347s
throw 11.183k (± 1.5%) i/s - 34.565k in 3.091514s
Comparison:
throw: 11183.1 i/s
raise: 10627.8 i/s - 1.05x slower
There’s almost no difference! The reason is simple: the only thing that is different is creation of the exception object. throw
does not do it.
There are also a few interesting instructions that MRI uses to evaluate your complex code:
adjuststack(n)
- does n.times { pop }
nop
- literally does nothingdupn(n)
- does pop
N times and then pushes them twice (or basically duplicates N last items)setn(n)
- does stack[-n-1] = stack.top
topn(n)
- does push(stack[-n-1])
swap
- swaps top two stack elementsdup
- like dupn(1)
reverse(n)
- reverses N stack elements (i.e. does n.times.map { pop }.each { |value| push(value) }
)First of all, I’d like to say thank you to everyone who made YARV. I was not able to find a single place where MRI behaves inefficiently (and I spent many hours looking into instructions).
Once again, the code is available here, feel free to create issues/message me in Twitter. And please, don’t use it in the real world.
]]>Disclaimer #2 I’m not going to cover popular things like flip-flops (thanks God they are deprecated in 2.6.0).
I was thinking for a while which item should go first, but finally I had to give up. I think all items are funny.
I don’t even know if there’s anyone in the world using it. o
flag is a very, very magical thing that “freezes” a regexp after parsing:
pry> 1.upto(5) { |i| puts /#{i}/o.source }
1
1
1
1
1
pry> 3.times.map { |i| /#{i}/o.object_id }
=> [70135960411140, 70135960411140, 70135960411140]
That’s a special syntax to define an inline regexp as a constant. It is a constant because its value is constant (object_id
returns the same value). I think the main purpose of such flag is to reduce objects allocation, and I believe it was not initially designed for such cases. If you are too lazy to extract a static regexp to a constant, simply add an o
flag.
Well, I have to confess, sometimes I hate Ruby for various reasons, this feature is one of them.
# encoding: utf-8
s = "\xff"
puts s.encoding
puts s.valid_encoding?
puts s.bytes
$ ruby test.rb
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
UTF-8
false
255
In this case string is not a “real” string. This bytes sequence is simply invalid for UTF-8 (in UTF-8 any codepoint > 127 works as a flag that indicates that the char is multi-byte and the next char (or chars) defines a real value), but Ruby allows it. It’s not even a String, it’s just a container of bytes. And for some reason Ruby allows you to pack an arbitrary sequence of bytes into a string, and if you want to ask “Is it valid?” you have a (I think) conceptually wrong String#valid_encoding?
method. Maybe the right way to solve it would be to:
SyntaxError
)String#valid_encoding?
methodp <<"A#{b}C"
#{
<<"A#{b}C"
A#{b}C
}
str
A#{b}C
I’m quite sure that there are no syntax highlighters that can properly handle this code. At the moment of writing GitHub is unable to do that. Try evaluating this code in IRB.
As you most probably know in Ruby setters can’t have return values. They always return their arguments:
def m=(a)
return 42
end
self.m = 'return me'
# => "return me"
Yes, you can make a return by calling a setter method using Kernel#send
:
pry> send(:m=, 'return me')
=> 42
So, the general rule is like “if you call a setter method without :send you can’t make a return”. Wrong!
pry> def []=(a); 42; end
pry> self.[]= 'return me'
=> 42
I can’t imagine any reason to use such syntax, most probably it should be deprecated.
Imagine the following piece of code:
def [](idx, &block)
puts idx + block.call
end
self[1] { 2 }
Looks valid, right? We pass a positional argument 1
and a block that returns 2
to the method called []
. The method should print 3
. Let’s run it with Ruby 2.5:
$ ruby -v test.rb
2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
3
1 + 2 == 3
, everything is fine. Let’s try 2.4:
$ ruby -v test.rb
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-darwin17]
test.rb:5: syntax error, unexpected { arg, expecting end-of-input
self[1] { 2 }
^
Yes, this syntax was introduced in Ruby 2.5. Did you hear any announcements about it?
Spoiler: there are no tests for this syntax in ruby/ruby repository. Guess why?
Let’s take a simple code:
pry> [1, 2, 3].join
=> "123"
Do you know anything about “pure” functions (or pure methods in this case)? Ideally methods should only use self
and provided arguments. Relying on any global state is bad because you are not the only one who can mutate it.
pry> $, = 'Ruby'
pry> [1, 2, 3].join
=> "1Ruby2Ruby3"
String#join
uses a global variable as a default value for the separator. Literally def join(sep = $,)
.
By the way, maybe I should put the following code to my current project before leaving it. How much time is needed to find it?
if rand > 0.5
$, = [102, 105, 110, 100, 32, 109, 101].pack('c*')
end
After doing require 'english'
you get two aliases for this global variable: $OUTPUT_FIELD_SEPARATOR
and $OFS
, that’s the real name of this global variable.
Spoiler: you may think that it’s impossible because the parser rejects such code. But in fact, Ruby allows it and there are even some specs for this - RubySpec. I don’t know much about Ruby internals, but at least one class uses instance variables without @
, it’s called Range
. (0..3)
has 3 instance variables:
excl
= falsebegin
= 0end
= 3“Any proves?” - let’s marshal it:
pry> Marshal.dump(0..3)
=> "\x04\bo:\nRange\b:\texclF:\nbegini\x00:\bendi\b"
This string contains a version (4.8), an indicator of the object (o
), a symbol :Range
, a hash of instance variables { excl: false, begin: 0, end: 3 }
.
Let’s change a class name a bit (but keep the same length to not break anything):
pry> class Hello; end
pry> Marshal.load("\x04\bo:\nHello\b:\texclF:\nbegini\x00:\bendi\b")
^^^^^
=> #<Hello:0x00007fece707a9c8>
Now we can change, for example, excl
to @one
(again, to keep the length the same):
pry> Marshal.load("\x04\bo:\nHello\b:\t@oneF:\nbegini\x00:\bendi\b")
^^^^
=> #<Hello:0x00007fece70533a0 @one=false>
pry> _.instance_variables
=> [:@one]
The conclusion is simple: excl
is an instance variable, but Kernel#instance_variables
hides it.
Kernel#instance_variable_get / set
and Ruby Lexer are the places that validate instance variable names. Low-level C calls don’t do it and in general when you write a C extension you may easily get an instance variable without @
char.
You can read my article about marshalling to get a full overview of its internals.
As you may know there are two types of coercing in Ruby: explicit and implicit.
Explicit is when you call methods like to_a
, to_h
, to_s
. When the object is not an Array/Hash/String, but can become it.
Implicit is when Ruby calls methods like to_ary
, to_hash
, to_str
for you. When the object acts as an Array/Hash/String and converting it to the corresponding class must happen automatically.
There are a lot of methods in the core library that are documented as “taking a String as an argument” but in fact they accept any objects that can be implicitly converted to a String.
pry> o = Object.new
pry> def o.to_str; "hello"; end
pry> "string" + o
=> "stringhello"
pry> "hello".casecmp(o)
=> 0
pry> "testhello".chomp(o)
=> "test"
pry> "hellostr".delete_prefix(o)
=> "str"
pry> "hello world".start_with?(o)
=> true
There are more methods like this, and not only for String.
Also, there’s a way to implicitly convert an abstract Object to
*
)**
)&
)Sometimes it can be ridiculous:
class User
def to_ary; [1, 2, 3]; end
def to_hash; { a: 1 }; end
def to_proc; proc { 42 }; end
end
def m(*rest, **kwargs, &block)
rest.length + kwargs.length + block.call
end
user = User.new
m(*user, **user, &user)
# => 44
I’m not sure that this feature is required. But remember, that’s only my opinion.
Explicit coercing is explicit and forces you to call to_a/to_h/to_s
manually. Probably it would be better to restrict */**/&
operators to accept only Array/Hash/Proc
objects (and to be as strict as possible).
to_a
Previous section says that Ruby never invokes methods for explicit coercing on its own. There’s one exception: to_a
method.
o = Object.new
def o.to_a; [1, 2, 3]; end
def o.to_ary; [4, 5, 6]; end
a, b, c = o
p [a, b, c]
# [4, 5, 6]
# so, it calls to_ary, that's an implicit coercing
a, b, c = *o
p [a, b, c]
# [1, 2, 3]
# it calls to_a !! an explicit coercing gets called by Ruby
For some reason the concept of implicit/explicit coercing does not work for this case.
The section about nested HEREdocs shows a HEREdoc identifier that has an interpolation inside. Also, it’s possible to use "\n"
:
p <<"HERE
"
content
HERE
it prints "content\n"
. For some reason newline is not allowed in the middle of the HEREdoc identifier (and don’t get me wrong, I think that newlines should be rejected, no matter in the middle or in the end).
1if true
Yes, that’s a valid syntax. Ruby has very special rules for white-spaces and newlines. 1i
is a special syntax for complex numbers, but 1if true
is 1 if true
. There’s also a 1r
syntax for rational numbers, and yes, 1rescue nil
is 1 rescue nil
.
pry> 1if true
1
pry> 1rescue nil
1
But what about 1ri
?
pry> 1rif true
SyntaxError: (syntax error, unexpected tIDENTIFIER, expecting end-of-input)
1rif true
^~~
Sweet. Bonus:
pry> def m; 1return; end
SyntaxError : (syntax error, unexpected keyword_return, expecting keyword_end)
def m; 1return; end
^~~~~~
pry> def m; 1retry; end
SyntaxError: (syntax error, unexpected keyword_retry, expecting keyword_end)
def m; 1retry; end
^~~~~
pry> def m; (1redo; end
SyntaxError: syntax error, unexpected keyword_redo, expecting keyword_end)
def m; 1redo; end
^~~~
Looks like there are special rules for keyword modifiers.
defined?
I think this is the most controversial keyword in Ruby. It takes literally everything as an argument.
pry> defined?(self)
=> "self"
pry> defined?(nil)
=> "nil"
pry> defined?(true)
=> "true"
pry> defined?(false)
=> "false"
pry> defined?(a = 1)
=> "assignment"
pry> a
=> nil
pry> a = 1; defined?(a)
=> "local-variable"
pry> defined?(begin; 1; 2; 3; end)
=> "expression"
pry> defined?(self.m)
=> nil
pry> def m; end; defined?(self.m)
=> "method"
pry> module M; def m; end; end
pry> include M
pry> def m; defined?(super); end
pry> m
=> "super"
It also can return yield
, constant
, class variable
, instance-variable
and global-variable
. By the way, where’s the dash in the class variable
?
That’s a strong violation of a single responsibility principle. This keyword can handle EVERYTHING!
Moreover, it handles all kinds of exceptions inside:
pry> a = Object.new
pry> defined?(
a.b.c.d +
MissingConstant +
yield +
super +
nil * 2 +
eval("!@#$%^") +
require('missing_file')
)
=> nil
That’s too much for a single keyword.
return
in the class/module bodyYou can’t call return
from a module/class body:
pry> class A; return 1; end
SyntaxError (Invalid return in class/module body)
class A; return 1; end
^~~~~~
pry> module A; return 1; end
SyntaxError (Invalid return in class/module body)
module A; return 1; end
^~~~~~
It throws a SyntaxError
, i.e. even if the code is unreachable, you still can’t write it, it’s simply invalid.
But you can use return
in a singleton class body:
pry> class << self; return; end
LocalJumpError: unexpected return
Now that’s a LocalJumpError
, so this code can be interpreted if nobody touches it:
pry> class << self; return; end if false
=> nil
Again, this is something that probably could be removed from Ruby, I don’t know anyone using it.
Meta-character is a special sequence of characters that gets interpreted as a single character. Most probably you know one of them - \uDDDD
. But there are more:
pry> "\u1234"
=> "ሴ"
pry> "\377"
=> "\xFF"
pry> "\xFF"
=> "\xFF"
pry> "\C-\a"
=> "\a"
pry> "\ca"
=> "\u0001"
pry> "\M-a"
=> "\xE1"
pry> "\C-\M-f"
=> "\x86"
pry> "\M-\cf"
=> "\x86"
pry> "\c\M-f"
=> "\x86"
That’s absolutely insane! Moreover, Ruby starting from 2.6 ignores spaces (and tabs) around codepoints in the \u{}
syntax:
pry> "\u{ 123 456 }"
=> "ģі"
MRI has a special rule for Proc
class: it expands a single array argument:
pry> proc { |a, b| [a, b] }.call([1, 2])
=> [1, 2]
… but only if it the proc takes more than one argument. And if the arity is 1 it works as you’d expect:
pry> proc { |a| [a] }.call([1, 2])
=> [[1, 2]]
And here’s an edge case: it’s possible to put a trailing comma after arguments list:
pry> proc { |a,| [a] }.call([1, 2])
=> [1]
… and MRI still expands an array. So how many arguments does this proc have?
pry> proc{|a,|}.arity
=> 1
What’s going on?
The answer is simple: there’s an invisible rest argument after trailing comma. The real interface of this proc is:
proc { |a, *| }
MRI generates it for you and then hides it.
If you are interested in implementation details take a look at parse.y - there’s a special field excessed_comma
that works as a flag.
Also, you can clearly see in the Ripper’s output:
pry> require 'ripper'
pry> Ripper.sexp('proc{|a|}')[1][0][2][1][1]
=> [:params, [[:@ident, "a", [1, 6]]], nil, nil, nil, nil, nil, nil]
pry> Ripper.sexp('proc{|a,|}')[1][0][2][1][1]
=> [:params, [[:@ident, "a", [1, 6]]], nil, 0, nil, nil, nil, nil]
Do you see the difference?
optarg
default valuesIn Ruby optional arguments are very, very powerful. You can pass pretty much anything as a default value of the argument in the method signature:
pry> def m(a = (puts 'no a')); a; end
pry> m(1)
=> 1
pry> m
no a
=> nil
I don’t like it in general. I would say it’s too powerful, you can abuse this feature and do some really crazy stuff. For example like this:
def m(a = (return 1; nil))
return 2
end
What does this method return when you call it without any arguments? Yep, it returns 1
.
The reason why it works this way is that MRI inlines optional arguments initialization to the method body, so for VM this code actually looks like:
def m(a = nil)
a ||= (return 1; nil)
return 2
end
You can even go further and redefine a method in its arguments:
def factorial(
n,
redefinition = <<-RUBY,
define_method(__method__) do |
_ = (return 1 if n == 1; nil),
_ = eval(redefinition),
_ = (return n * (n -= 1; send(__method__)); nil)
|
end
RUBY
_ = eval(redefinition),
_ = (return send(__method__); nil)
)
# <<EMPTY BODY>>
end
p factorial(5)
Yes, this method has no body but is still capable of calculating factorial.
I think only people that work with parsing tools are aware of this feature. That’s a special kind of argument that “shadows” outer variable. The syntax is |;shadowarg|
:
pry> n = 1; proc { n }.call
=> 1
pry> n = 1; proc { |;n| n }.call
=> nil
Basically, it’s nice to have an ability to use own isolated set of local variables in your block and be sure that you don’t change an outer scope. But again, does anyone use it? And also it reminds me a var
keyword from the JavaScript.
Take a look at the following code:
begin
raise 'error message'
rescue => RuntimeError
puts 'caught an error'
end
Looks correct from the first glance, right? And it even prints caught an error
, but in fact it has an invalid code construction. It is valid from the parser perspective, but I think you don’t want to write such code, it redefines a constant RuntimeError
.
Ruby has a very tricky mechanism of converting getters to setters. It can convert
local variable get
to local variable set
(most popular usage of rescue
handler)instance variable get
to instance variable set
const get
to const set
getter method
to setter method
So if you have object = OpenStruct.new
and you catch an error using rescue => object.field
you’ll get object.field = <thrown error>
called under the hood.
That’s definitely very, very flexible but does anyone need it? I’d better reject all cases above except local variables.
I have seen the first snippet in the real codebase and it was quite difficult to understand why the spec that asserts something like expect { code construction }.to raise_error(RuntimeError)
does not work.
I used to think that positional and keyword arguments act like two completely separate groups of arguments. If the last argument is a Hash and you pass it to the method call it
But I was wrong. One argument value can populate both positional and keyword argument at the same item:
def m(a = 1, b: 1)
[a, b]
end
p m(b: 2, 'b' => 3)
# => [{"b"=>3}, 2]
I feel like it’s a bug:
a
must be { b: 2, 'b' => 3 }
b
must be just 1
(default value)This story is not about bad parts of Ruby or anything like that. Don’t feel bad because of this - I’m really sorry. I was trying to cover some rarely used features and explain as much as I can.
]]>Marshal
that does all the job for serialization and deserialization.
To serialize an object, use Marshal.dump
, to deserialize - Marshal.load
or Marshal.restore
.
marshalled = Marshal.dump([1, 2, 'string', Object.new])
# => "\x04\b[\ti\x06i\aI\"\vstring\x06:\x06ETo:\vObject\x00"
Marshal.load(marshalled)
# => [1, 2, "string", #<Object:0x00000002643000>]
This article explains the format of marshalling and shows how to write a pure Ruby marshalling library compatible with the standard Ruby implementation.
Let’s try to make a pure Ruby gem that is compatible with standard Marshal
class.
$ bundle gem pure_ruby_marshal
$ tree pure_ruby_marshal
pure_ruby_marshal
├── bin
│ ├── console
│ └── setup
├── CODE_OF_CONDUCT.md
├── Gemfile
├── lib
│ ├── pure_ruby_marshal
│ │ └── version.rb
│ └── pure_ruby_marshal.rb
├── LICENSE.txt
├── pure_ruby_marshal.gemspec
├── Rakefile
└── README.md
Our main module PureRubyMarshal
has the following interface:
module PureRubyMarshal
extend self
def dump(object)
# ...
end
def load(data)
# ...
end
end
Also I’d like to extract all logic related into reading/writing encoded data
to separate classes - ReadBuffer
for reading, WriteBuffer
for writing.
module PureRubyMarshal
extend self
autoload :ReadBuffer, 'pure_ruby_marshal/read_buffer'
autoload :WriteBuffer, 'pure_ruby_marshal/write_buffer'
def dump(object)
WriteBuffer.new(object).write
end
def load(data)
ReadBuffer.new(data).read
end
end
As we decided before, ReadBuffer
is our class responsible for reading an object from marshalled data.
Here is how it should look:
class PureRubyMarshal::ReadBuffer
attr_reader :data
def initialize(data)
@data = data
end
def read
# ...
end
end
First let’s take a look at Marshal.dump
.
marshalled = Marshal.dump([1, 2, 'string', Object.new])
data = marshalled.chars
# => [a long array of characters]
The first two characters represent a version of library that was used for marshalling:
data[0].ord
# => 4
data[1].ord
# => 8
Which represents a current version of Marshal library - 4.8
(link to the source).
This values are constants for any marshalled data and they are stored in Marshal::MAJOR_VERSION
and Marshal::MINOR_VERSION
:
:001 > Marshal::MAJOR_VERSION
=> 4
:002 > Marshal::MINOR_VERSION
=> 8
Our PureRubyMarshal
should have same constants:
module PureRubyMarshal
MAJOR_VERSION = 4
MINOR_VERSION = 8
end
The rest of the array is actually a marshalled object.
So, for now marshalled data looks like ['4.8', some unknown characters]
To read the version of Marshal
library we can modify ReadBuffer#initialize
:
attr_reader :minor_version, :major_version
def initialize(data)
@data = data.chars
@major_version = read_byte
@minor_version = read_byte
end
def read_char
data.shift
end
def read_byte
read_char.ord
end
# In irb:
marshalled = Marshal.dump(123)
read_buffer = PureRubyMarshal::ReadBuffer.new(marshalled)
read_buffer.major_version
=> 4
read_buffer.minor_version
=> 8
When Marshal
converts your object to a string, it uses very simple rules:
Fixnum
, Hash
or Array
always use the same format of serialization,
prepended by a special character (each character for each unique type)Iobject{hash:of,instance:variables}
)NilClass
, TrueClass
, FalseClass
ReadBuffer.new(Marshal.dump(nil)).data
=> ["0"]
Character 0
means that encoded object is nil
. To handle it, we can write something like:
class PureRubyMarshal::ReadBuffer
# ...
def read
char = read_char
case char
when '0' then nil
else
raise NotImplementedError, "Unknown object type #{char}"
end
end
end
:001 > ReadBuffer.new(Marshal.dump(nil)).read
=> nil
:001 > ReadBuffer.new(Marshal.dump(true)).data
=> ["T"]
:002 > ReadBuffer.new(Marshal.dump(false)).data
=> ["F"]
true
converts to T
, false
converts to F
, nothing complex for now.
Let’s extend our case
statement:
when 'T' then true
when 'F' then false
Of course, we can’t develop without tests,
# fixtures.rb
FIXTURES = {
'nil' => nil,
'true' => true,
'false' => false
}
# pure_ruby_marshal_spec.rb
require 'spec_helper'
describe PureRubyMarshal do
describe '.load' do
FIXTURES.each do |fixture_name, fixture_value|
it "loads marshalled #{fixture_name}" do
result = PureRubyMarshal.load(Marshal.dump(fixture_value))
expect(result).to eq(fixture_value)
end
end
end
end
# rspec --format documentation
PureRubyMarshal
.load
loads marshalled nil
loads marshalled true
loads marshalled false
3 examples, 0 failures
All encoded integers are prepended with an "i"
symbol. Added one more when
to our case
:
when 'i' then read_integer
The next byte defines a strategy of decoding the rest of the data. Then comes a bytes sequence representing a number.
Let’s say that byte = (first_char_code_after_i ^ 128) - 128
(it ranges from -128 .. 128
).
There are 5 different cases depending on the byte value:
byte is 0
byte in [4..128]
byte in [1..4]
byte in [-128..-4]
byte in [-5..-1]
Let’s write a utility lambda to simplify examples:
get_sequence = lambda do |n|
buffer = ReadBuffer.new(Marshal.dump(n))
buffer.read_char
buffer.data.map(&:ord)
end
It:
Marshal
ReadBuffer
"i"
)The first case is the simplest one:
get_sequence[0]
=> [0]
Zero is actually encoded as zero.
The second case:
get_sequence[7]
=> [12]
get_sequence[8]
=> [13]
get_sequence[120]
=> [125]
In this case result = byte - 5
and it covers small positive numbers.
The third case:
get_sequence[99999]
=> [3, 159, 134, 1]
3 means three numbers representing a single number, they should be merged with binary OR
and byte shifting:
(159 << (8*0)) | (134 << (8*1)) | (1 << (8*2))
=> 99999
The fourth and the fifth examples are just like 2nd and 3rd, but for negative numbers.
Let’s implement our read_integer
method!
def read_integer
# c is our first byte
c = (read_byte ^ 128) - 128
case c
when 0
# 0 means 0
0
when (4..127)
# case for small positive numbers
c - 5
when (1..3)
# c next bytes is our big positive number
c.
times.
map { |i| [i, read_byte] }.
inject(0) { |result, (i, byte)| result | (byte << (8*i)) }
when (-128..-6)
# case for small negative numbers
c + 5
when (-5..-1)
# (-c) next bytes is our number
(-c).
times.
map { |i| [i, read_byte] }.
inject(-1) do |result, (i, byte)|
a = ~(0xff << (8*i))
b = byte << (8*i)
(result & a) | b
end
end
end
To add tests for integers, we can just add more key-value pairs to our
FIXTURES
constant.
Complex data types like
Array
String
Hash
Regexp
and others include numbers into their encoded structure to represent their length.
ReadBuffer.new(Marshal.dump(:a_symbol)).data
=> [":", "\r", "a", "_", "s", "y", "m", "b", "o", "l"]
Symbols are encoded using:
":"
character"\r".ord - 5 = 8
) as Integer (we can fetch it using read_integer
method)# one more "when" statement
when ':' then read_symbol
# and implementation
def read_symbol
read_integer.times.map { read_char }.join.to_sym
end
ReadBuffer.new(Marshal.dump("a string")).data
=> ["\"", "\r", "a", " ", "s", "t", "r", "i", "n", "g"]
Where:
"\""
is actually a "
symbol which shows the beginning of the encoded stringn = 8
is a length of the stringn
following symbols are the string itself.To support loading marshalled string, we can use the following code:
# ...
when '"' then read_string
# ...
def read_string
read_integer.times.map { read_char }.join
end
ReadBuffer.new(Marshal.dump([1,2,3])).data
=> ["[", "\b", "i", "\x06", "i", "\a", "i", "\b"]
"["
means that converted object is an Array
n = 3
is a length of our array"i", "1"
is an Integer 1
"i", "2"
is an Integer 2
"i", "3"
is an Integer 3
Array items are encoded as separate objects, every item is prepended by its own service character:
# ...
when '[' then read_array
# ...
def read_array
read_integer.times.map { read }
end
Hash
is encoded as an array of key-value pairs:
ReadBuffer.new(Marshal.dump({ 15 => 5 })).data
=> ["{", "\x06", "i", "\x14", "i", "\n"]
"{"
is the beginning of hash1
is a number of key => value
pairs"i" 15
is an Integer 15
"i" 5
is an Integer 5
# ...
when '{' then read_hash
# ...
def read_hash
pairs = read_integer.times.map { [read, read] }
Hash[pairs]
end
Float
is encoded as its string representation
ReadBuffer.new(Marshal.dump(1.5)).data
=> ["f", "\b", "1", ".", "5"]
Floats are encoded using #to_s
method:
"f"
is the beginning of Float
3
is the length"1", ".", "5"
is its string representationTo get it back, we can use String#to_f
method:
# ...
when 'f' then read_float
# ...
def read_float
read_string.to_f
end
ReadBuffer.new(Marshal.dump(Array)).data
=> ["c", "\n", "A", "r", "r", "a", "y"]
Classes are represented by their names:
"c"
means a Class
5
is the length of its nameAfter reading the name we can do Object.const_get(const_name)
to retrieve the actual class.
The only remark here is that when the class does not exist anymore,
Marshal
re-raises ArgumentError, "undefined class/module #{const_name}"
instead of a NameError
.
Moreover, if the constant returned by Object.const_get
is not a class,
Marshal
raises ArgumentError, "#{const_name} does not refer to a Class"
.
So, the constant must exist and it must be a Class
# ...
when 'c' then read_class
# ...
def marshal_const_get(const_name)
Object.const_get(const_name)
rescue NameError
raise ArgumentError, "undefined class/module #{const_name}"
end
def read_class
const_name = read_string
klass = marshal_const_get(const_name)
unless klass.instance_of?(Class)
raise ArgumentError, "#{const_name} does not refer to a Class"
end
klass
end
I have extracted marshal_const_get
to a separate method to use it later for
reading a Module
from the marshalled data.
Modules are similar to Classes, but the “magical” character is "m"
instead of "c"
(and, of course, messages of exceptions are about modules).
# ...
when 'm' then read_module
# ...
def read_module
const_name = read_string
klass = marshal_const_get(const_name)
unless klass.instance_of?(Module)
raise ArgumentError, "#{const_name} does not refer to a Module"
end
klass
end
Struct
Struct
classes are encoded by their class names + their data:
Point = Struct.new(:x, :y)
a = Point.new(3, 7)
ReadBuffer.new(Marshal.dump(a)).data
=> ["S", ":", "\n", "P", "o", "i", "n", "t", "\a", ":", "\x06", "x", "i", "\b", ":", "\x06", "y", "i", "\f"]
This output can be split to:
"S"
means a beginning of encoded Struct
":", 5, "P", "o", "i", "n", "t"
is a Symbol :Point
2, ":", 1, "x", "i", 2, ":", 1, "y", "i", 3
is a visually unmarked Hash (i.e. no {
header) with 2 pairs:
":", 1, "x"
is a Symbol :x
"i", 2
is an Integer 2
":", 1, "y"
is a Symbol :y
"i", 3
is an Integer 3
So, we have a Struct
defined with its class name and the Hash containing object’s data
# ...
when 'S' then read_struct
# ...
def read_struct
klass = marshal_const_get(read) # Point
attributes = read_hash # { x: 3, y: 7 }
values = attributes.values_at(*klass.members) # [3, 7]
klass.new(*values)
end
Why is class name encoded as a Symbol
, not a String
? See section ‘Symbol link’
ReadBuffer.new(Marshal.dump(/a_regexp/)).data
=> ["/", "\r", "a", "_", "r", "e", "g", "e", "x", "p", "\x00"]
Alright, there are:
"/"
- a beginning of a regexp8
- length of its string representation"a_regexp"
- string representationkcode
that was passed to a constructor# ...
when '/' then read_regexp
# ...
def read_regexp
string = read_string
kcode = read_byte
Regexp.new(string, kcode)
end
class Point2
attr_reader :x, :y
def initialize(x, y)
@x, @y = x, y
end
def ==(other)
other.is_a?(Point2) &&
other.x == self.x &&
other.y == self.y
end
end
point = Point2.new(5, 10)
ReadBuffer.new(Marshal.dump(point)).data
=> ["o", ":", "\v", "P", "o", "i", "n", "t", "2", "\a", ":", "\a", "@", "x", "i", "\n", ":", "\a", "@", "y", "i", "\x0F"]
"o"
means the beginning of any non-standard object":", 6, "P", "o", "i", "n", "t", "2"
is a Symbol :Point2
2, ":", 2, "@", "x", "i", 5, ":", 2, "@", "y", "i", 10
is a Hash with 2 key-value pairs:
":", 2, "@", "x"
is a Symbol @x
"i", 5
is an Integer 4
":", 2, "@", "y"
is a Symbol @y
"i", 10
is an Integer 5
# ...
when 'o' then read_object
# ...
def read_object
klass = marshal_const_get(read) # Point2
ivars_data = read_hash # { :@x => 5, :@y = 10 }
object = klass.allocate # #<Point >
ivars_data.each do |ivar_name, value|
object.instance_variable_set(ivar_name, value)
end
object
end
User classes are classes inherited from default classes:
class MyArray < Array
end
my_array = MyArray[1,2,3]
ReadBuffer.new(Marshal.dump(my_array)).data
=> ["C", ":", "\f", "M", "y", "A", "r", "r", "a", "y", "[", "\b", "i", "\x06", "i", "\a", "i", "\b"]
Objects of these classes are encoded in the following way:
"C"
- a beginning of a user class":", 7, "M", "y", "A", "r", "r", "a", "y"
- a symbol :MyArray
- the name of the user class[1, 2, 3]
for this example)# ...
when 'C' then read_userclass
# ...
def read_userclass
klass = marshal_const_get(read)
data = read
klass.new(data)
end
Sometimes your object is very, very complex. Like an object extended with some modules:
MyModule = Module.new
obj = []
obj.extend(MyModule)
In this case Marshal
prepends you object structure with the list of extended modules:
ReadBuffer.new(Marshal.dump(obj)).data
=> ["e", ":", "\r", "M", "y", "M", "o", "d", "u", "l", "e", "[", "\x00"]
Where:
"e"
means that marshalled data has the following scheme:
Symbol
that represents a name of a Ruby module (:MyModule
)# ...
when 'e' then read_extended_object
# ...
def read_extended_object
mod = marshal_const_get(read) # MyModule
object = read # []
object.extend(mod)
end
I guess this idea was originally created for compressing marshalled output.
Consider the following situation: you have a collection of objects of the same class
(the simplest example is a result of calling YourActiveRecordModel.all
).
When you pass this collection to Marshal.dump
, it converts objects one by one, writing them to an output stream.
But an abstract object is represented by its class name and a hash of instance variables.
Symbol links save you from writing the same class.name
again and again to the output.
Instead, it remembers all symbols that have been written, and when the symbol appears twice in the sequence,
it writes its sequence number.
a1 = [:symbol1, :symbol2]
a2 = [:symbol1, :symbol1]
dumped1 = Marshal.dump(a1)
=> "\x04\b[\a:\fsymbol1:\fsymbol2"
dumped1.length
=> 22
dumped2 = Marshal.dump(a2)
=> "\x04\b[\a:\fsymbol1;\x00"
dumped2.length
=> 15
For this example it saves us 31% of the initial size. Let’s see how it works:
ReadBuffer.new(Marshal.dump(a2)).data
=> ["[", "\a", ":", "\f", "s", "y", "m", "b", "o", "l", "1", ";", "\x00"]
So we have an array of two items:
Symbol
:symbol1
";"
representing beginning of symbol link and an encoded Integer
that represents a symbol with this indexThe algorithm for parsing Symbol
should be changed a little bit
to save all symbols to an internal array.
def initialize
# ...
@symbols_cache = []
# ...
end
# ...
def read_symbol
symbol = read_integer.times.map { read_char }.join.to_sym # no changes here
@symbols_cache << symbol # save a symbol
symbol
end
# ...
when ';' then read_symbol_link
# ...
def read_symbol_link
@symbols_cache[read_integer]
end
I’ll return to symbol links and my thoughts about how it can be used to compress marshalled output in “Optimizations” section.
Same story here, when you have an object that appears multiple times in your data, Marshal
will serialize it only once:
obj1 = Object.new
obj2 = Object.new
a1 = [obj1, obj2]
a2 = [obj1, obj1]
dumped1 = Marshal.dump(a1)
dumped1.length
=> 18
dumped2 = Marshal.dump(a2)
dumped2.length
=> 16
A symbol that indicates a beginning of an object link is "@"
. However the criteria for
dumping an object link instead of an object itself is objects equality (equal?
).
Here’s the problem: if you dump an array [{}, {}]
, Marshal
will dump
both objects without any object links, because these objects are not equal.
Also Marshal
does not cache:
true
/false
/nil
Integer
String
when it’s a part of float/regexpHash
when it’s a hash of instance variables/struct
membersWhich is mostly correct.
true
/false
/nil
/integers/floats always point to the same object in the memory[s, Regexp.new(s)]
it will not create an object link. Why? Regexp.new
always creates a copy of the string inside, so there’s no way to save any memory using object links, regexp.source
is always a unique object in the memory.[Object, ObjectLink]
.The code for object links is quite big to paste it here, you can find it here
To be honest, there are a few more cases, but they are too complex for implementing and pasting it here. I’m not going to cover here:
marshal_dump/load
)Bignum
Float
like infinityUserDef
/Hashdef
/ModuleOld
If you have read the previous part, I suppose it should be clear for you how to write it yourself :)
Let’s try on the real-world examples. Stuff that usually gets serialized is your data.
And I can remember only one example where Marshal
is used - model caching.
Imagine I have a model User
with the following columns:
id
email
created_at
/updated_at
Marshal.dump(User.first).length
=> 891
That’s a lot… The easiest solution is to dump only attributes
hash:
class User < ActiveRecord::Base
def marshal_dump(*)
attributes
end
def marshal_load(attributes)
initialize(attributes)
@new_record = id.blank?
end
end
Marshal.load Marshal.dump(User.first)
=> #<User id: 131 ...>
Marshal.dump(User.first).length
=> 205
That’s better, let’s see what else we can improve.
By default ActiveRecord::Base#attributes
returns a hash where keys are String
. That sounds like
it can be optimized by calling attributes.symbolize_keys
in marshal_dump
to use symbol links
when we dump a collection. But that’s not true!
User.all.map(&:attributes).map(&:keys).flatten.map(&:object_id).uniq.length
=> 4 (for 4 columns)
This is an amazing example of optimization!
User::AttrNames.constants
=> [:ATTR_9646, :ATTR_56d61696c6, :ATTR_36275616475646f51647, :ATTR_57074616475646f51647]
User::AttrNames.constants.map do |const_name|
User::AttrNames.const_get(const_name)
end
=> ["id", "email", "created_at", "updated_at"]
Keys in the attributes
hash are from these constants, so they are always the same
objects, so Marshal
cashes them using object links.
By the way, why does ActiveRecord save attribute names as String
? The answer is simple,
Symbol
has no encoding.
Well, currently our record is represented by:
String
“id” / object linkInteger
id - can’t do anything hereString
“email” / object linkString
email - nothing to optimizeString
“created_at” / object linkActiveSupport::TimeWithZone
created_at - probably can be optimizedString
“updated_at” / object linkActiveSupport::TimeWithZone
updated_at - probably can be optimizedLet’s see what happens when we serialize AS::TimeWithZone
:
Marshal.dump(Time.zone.now).length
=> 120
So, from 205 characters of our record 120 is taken for a time with zone. AS::TimeWithZone
has
its custom implementation of marshal_load
that converts it to [utc, zone, time]
.
If you always use your application time zone,
then you can store it as epoch time:
class ActiveSupport::TimeWithZone
def marshal_dump(*)
to_i
end
def marshal_load(epoch)
utc = Time.at(epoch)
zone = Rails.application.config.time_zone
local = utc.in_time_zone(zone)
initialize(utc.utc, ::Time.find_zone(zone), local)
end
end
Marshal.dump(User.first).length
=> 139
Marshal.dump(User.first(3)).length
=> 261
Personally I’m not sure that the next optimization is a good idea, but you can try:
class User < ActiveRecord::Base
def self.marshallable_attributes
@marshallable_attributes ||= %w(
id
email
created_at
updated_at
)
end
def marshal_dump(*)
self.class.marshallable_attributes.map do |attribute|
attributes[attribute]
end
end
def marshal_load(attributes)
attributes = self.class.marshallable_attributes.zip(attributes)
attributes = Hash[attributes]
initialize(attributes)
@new_record = id.blank?
end
end
The idea is to store a hard-coded sequence of attribute names and using it serialize only values of attributes. It allows us to not serialize object links of attribute names.
Why can’t we just get attributes.keys.sort
? Because you can add one more
column that may come to the middle of that array. As a result, your
attributes may become shuffled. But probably you can avoid it by changing a cache key
right after adding a column.
You can extend this solution to something similar to ActiveModel::Serializer
where you have a separate serializer class for every model class.
Let’s see what this optimization gives us:
Marshal.dump(User.first).length
=> 84
Marshal.dump(User.first(3).to_a).length
=> 190
That’s 10 times less then initial size, good job!
Of course, if you have columns with type text
there’s nothing to
optimize, this is my example and my experience.
I have been working for about 3 weeks on implementing Marshal
module for opal.
It’s almost compatible with MRI implementation. I believe Marshal
is the thing
that can bring real isomorphism to opal applications. If you have an object that can be dumped
on the server and its class was compiled on the client, you can pass it directly
from the server without any serialization on the client/server.
HandlerSocket query language is very simple (I’d even say it’s primitive), but it’s much faster than MySQL’s one. Though, of course, there are some limitations. Interested?
You already have it if you are using Percona Server or MariaDB. If not, install it from the source.
To activate the plugin, run:
INSTALL PLUGIN handlersocket SONAME 'handlersocket.so';
My configuration is the following:
# [mysqld] section
# the port number to bind to for read requests
loose_handlersocket_port = 9998
# the port number to bind to for write requests
loose_handlersocket_port_wr = 9999
# the number of worker threads for read requests
loose_handlersocket_threads = 16
# the number of worker threads for write requests
loose_handlersocket_threads_wr = 1
open_files_limit = 65535
You can find a detailed documentation of all available configuration options here
Restart your MySQL server and run:
show processlist\G
You should see a lot of rows like:
Id: 1
User: system user
Host: connecting host
db: NULL
Command: Connect
Time: NULL
State: handlersocket: mode=rd, 0 conns, 0 active
Info: NULL
Rows_sent: 0
Rows_examined: 0
which means that HS daemon is up and running.
You can test it locally using telnet
:
$ telnet 0.0.0.0 9999
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
Type P -> 0 -> your_database -> your_table -> PRIMARY -> id,some_column
(where ->
is Tab). And press Enter. It should return 0 -> 1
.
This protocol looks ugly, but it may save you a lot of network usage. It’s very compact, and parsing does not require any CPU usage.
If you don’t have too much queries per second, probably you don’t need HS. It does not optimize queries, but you may save some time on request parsing + some network. You may find it interesting if you have a lot of simple queries, like simple SELECT
’s by primary key.
Here goes my Ruby for HandlerSocket protocol. You can find it here.
It has two implementations inside:
Ruby implementation is very slow, it’s there mainly to explain the protocol. C-based is quite fast.
To require a specific implementation, run
# For slow pure Ruby implementation
require 'handlersocket/pure'
# For fast C implementation
require 'handlersocket/ext'
To create a connection, run
hs = Handlersocket.new('0.0.0.0', 9999)
Both pure Ruby and C implementations have the same API, so the require
place is the only difference.
To open an index, run
hs.open_index('0', 'hs_test', 'users', 'PRIMARY', ['id', 'email'])
To read the data from that index, run
hs.find('0', '=', ['12'], ['100]'])
# Which is equal to
# SELECT id, name FROM hs_test.users WHERE id = 12 LIMIT 100
Other commands like auth
/insert
/update
/delete
are not there yet. But it’s not that difficult to add them, check out this file, implementation of other methods also takes ~2 lines of code.
The most interesting part. To run benchmarks locally, clone the gem repository on GitHub and run rake benchmark
. It compares mysql2
gem to Ruby-based and C-based implementations. Here are my results:
Calculating -------------------------------------
pure HS 2.000 i/100ms
ext HS 3.149k i/100ms
mysql2 669.000 i/100ms
-------------------------------------------------
pure HS 49.979 (± 2.0%) i/s - 250.000
ext HS 110.369M (±15.7%) i/s - 467.793M
mysql2 5.218M (±16.3%) i/s - 23.869M
Comparison:
ext HS: 110369437.2 i/s
mysql2: 5218120.6 i/s - 21.15x slower
pure HS: 50.0 i/s - 2208314.91x slower
I have run these benchmarks on 4 cores server with 4 GB ram on Percona server 5.6. On both small and huge (30 millions records) datasets, with enabled and disabled query cache, with a small and a big value of innodb_buffer_pool_size
.
There’s a 20-30x
performance difference between mysql2
and a C-based version of HandlerSocket because:
mysql2
is just a way more complex than my gemAnd I was really disappointed by performance of Ruby-based implementation. It’s 2 millions times slower than C-based. Why? There’s a magical number 50.0 i/s
, but I cannot find what does it mean. If you have an answer, please, ping me on Twitter.
The gem mostly works, but there are some points that should be refined. Currently when network goes down there’s no way to reconnect because HS protocol is stateful. There’s no history tracking in HS objects, so if you open an index and then reconnect, you lose your opened index. I’m not sure if it should be implemented on a low level of abstraction in HS gem, probably it’s better to make a separated high-level gem for AR that does this job.
Once again, HandlerSocket saves your time on query parsing, building a query plan, it’s more compact, but is very limited. If you don’t have too many requests, don’t even think about using it.
In Ruby the best candidate for doing this is Binding
class. If you have a binding, your can easily do some debug using well-known pry
gem. But the binding itself cannot be dumped (at least not, using default Ruby tools).
How to get a local binding? Just use binding
. How to get a binding from an object? Just add a method to you class:
class MyClass
def local_binding
binding
end
end
MyClass.new.local_binding
# => #<Binding>
Binding encapsulates the execution context at the place in your code where the interpreter is currently running and retains this context for future use. So, to dump and load back a binding we need to:
binding.eval('self')
)binding.eval('local_variables')
)In fact, that’s all you need to restore your binding.
How can we dump an arbitrary structure? Ruby has a class in the standard library called Marshal
. The two core methods of this class are dump
and load
:
Point = Struct.new(:x, :y)
a = Point.new(34, 65)
marshaled = Marshal.dump(a)
# => "\x04\bS:\nPoint\a:\x06xi':\x06yiF"
Marshal.load(marshaled)
# => #<struct Point x=34, y=65>
Unfortunately, not everything can be marshaled. According to the documentation, the following objects cannot be dumped:
proc {}
or object.method(:method_name)
)IO.new(1)
or StringIO.new
)Class.new
or Module.new
)That sound really sad, but in most cases we can ignore these limitations. When was the last time you needed to debug an IO object that was doing something strange? In real life we rarely use any of these classes during debugging process. So, instead of dumping and loading back an IO
object we can just return a new one.
Well, we can patch every single class in Ruby and add marshal_load
and marshal_dump
hooks to them, but that’s just horrible. It would be much, much better to write a set of classes that are each responsible for converting a specific group of objects.
With that in mind I have implemented:
PrimitiveDumper
- for dumping primitive objects, like numbers, booleans.ArrayDumper
- for arrays.HashDumper
- for hashes.ObjectDumper
- for custom objects.ClassDumper
- for classes.ProcDumper
- for proc
/method objectsMagicDumper
- for “magical objects” (see ‘dumping magical objects’ section)ExistingObjectDumper
- for existing objects (see ‘dumping recurring objects’ section)Every dumper takes an object that we need to dump and returns its marshalable representation. Later you can use the same dumper to restore representation back and get the original object.
You can find the implementation of these dumpers here and the specs for them here.
Here is, probably, the most complicated example:
def undumpable_recursive_object
@undumpable_recursive ||= begin
p = Point.allocate
p.x = p
p.y = StringIO.new
p
end
end
After converting this object using a system of dumpers result looks like this:
{
_klass: Point,
_ivars: {
:@x => {
_existing_object_id: 1234566 # or similar
},
:@y => {
_klass: StringIO,
_undumpable: true
}
},
_old_object_id: 1234566 # same as above
}
This hash can be easily marshaled and restored back. But yes, we lose our StringIO
instance - when the object is loaded back, that variable will be blank.
After writing the first version of the library, I have tested it with a blank Rails application. The testing code was:
class UsersController < ApplicationController
def index
@users = User.all.to_a # 5 records
local_proc = proc { 2 + 2 }
render json: @users
StoredBinding.create(data: binding.dump)
end
end
The length of the dump was ~30 screens and it took ~20 seconds to generate it. Most of the data was coming from objects related to Rails itself. Things like Rails configuration, backtrace cleaners, arrays of middlewares, and so on. Do we need them? No. These objects are the same for every request, so we can ignore them.
But at the same time, we need to save and restore all references from ‘serializable’ objects to ‘magic’ objects, we can’t just omit them. This logic is implemented in BindingDumper::MagicObjects
module and here’s how you can use it:
class A
@config = :config
end
BindingDumper::MagicObjects.register(A)
p BindingDumper::MagicObjects.pool
=> {10633360=>"A", 600668=>"A.instance_variable_get(:@config)"}
So, it builds a mapping between object_id
and the way how to get this object. Using this functionality we can easily get whether existing object is ‘magical’, and if yes - dump its string representation (to eval
it on loading phase). Let’s say, we need to dump Rails.application.config
, one of the ‘magical’ objects. We need to get its object_id
, find it in the pool and remember the string that returns rails configuration after evaluation, i.e.:
"Rails.
instance_variable_get(:@app_class).
instance_variable_get(:@instance).
instance_variable_get(:@config)"
After this optimization we have to spend ~20 ms to build an object pool and ~200 ms to dump a binding.
We can optimize it even more. A lot of things like request
, response
are shared as instance variables across ~10 objects. We can dump our request
object only once, remember its object_id
and use a reference while dumping other objects that use it.
Let’s say, we are in the initial memory (MEM1
). We dump a binding, open another console with separated memory (MEM2
) and restore a binding. In the example above (about recursive structure) there was a key :_existing_object_id
that returns an object_id
from MEM1
.
In MEM2
we restore a binding and create a mapping
{
object_id_from_MEM1 =>
restored_object_in_MEM2
}
Using this mapping (in the gem it’s called memories) we can restore the reference to duplicated objects.
At this point you can be really confused, but relax, we are almost done.
So, we have a binding. To dump it we need to:
hash1
with the context of binding and local variableshash2
Marshal.dump(hash2)
To load it back:
Marshal.load
the dump to get hash2
hash2
to hash1
using the same convertershash1
Steps 1-4 and 1-3 are already implemented. The last step – making the context pretty – means that we need to inject local_binding method into the context and make it look like the “real” binding (inject local variables to the binding).
# we just have it,
# it's a `self` from the place where `binding.dump` was called
context
# and we have also local variables
local_variables
# here we need to get a binding that:
subject.local_binding.eval('self') == context
# => true
subject.local_binding.eval('local_variables') == local_variables
# => true
The pseudo-code for loading and patching the context looks like:
marshaled = StoredBinding.last.data
converted = Marshal.load(marshaled)
restored = Dumpers.load(converted)
context = undumped[:context]
locals = undumped[:locals]
class << context
def local_binding
result = binding
locals.each do |lvar_name, lvar|
result.local_variable_set(lvar_name, lvar)
end
result
end
end
The actual implementation can be found here. After calling Binding.load(dumped).pry
you can start debugging it!
Currently the gem supports Ruby versions from 1.9.3 to 2.2.3. I had a few issues with porting the code from 2.0.0 to 1.9.3, like the lack of keyword arguments and Module#prepend
. The funniest one was that in versions before 2.1.0 there is no binding.local_variable_set
- there is only binding.eval
that takes a string, not a block.
How can we pass a complex object to eval
? The solution is not so difficult, because we have the object right here and right now, and the binding uses the same memory as the main thread. This means that we can pass the object_id
of our object to eval
string and get it there using ObjectSpace._id2ref
:
undumped[:lvars].each do |lvar_name, lvar|
result.eval("#{lvar_name} = ObjectSpace._id2ref(#{lvar.object_id})")
end
I have tested the gem locally with a few projects. Everything was fine, but:
Rails::BacktraceCleaner
and some others from NewRelic
gem). You have to require corresponding files manually before loading the binding in the console.To try it out, clone the GitHub repository, install dependencies, prepare the database using bin/dummy_rake db:create db:migrate
, and start the server via bin/dummy_rails s
. Then visit http://localhost:3000/users to dump the binding of UsersController#index
. After that you can open a console using bin/dummy_rails c
and run StoredBinding.last.debug
. You’re now in your controller, in the same state that it was in a moment ago when you hit that /users page!.
The gem is fully tested with its specs running on Travis CI. There’s also a script that can be used to run the whole test suite locally on every supported version of Ruby. But that’s definitely not enough for a gem to become completely production-ready.
That’s why I ask everyone who read this article: if you think that the idea of this gem should stay alive, that this method of debugging can be useful, and you would like to use it yourself, please, try it out locally and share your finding with me (via Twitter or GitHub).
The task that is solved here is not real, but it’s still a good example of (probably?) real work with Opal. I could choose some complex enough JavaScript library and write a simple wrapper using Opal, but there’s no fun. Instead, let’s write a wrapper for existing rich client-side application (it may show you how to wrap your existing application logic). Well, wrapper for something like a client-side scheduler may sound boring, so I have chosen a JavaScript-based browser game called BrowserQuest
written by Mozilla, and I’ll show you how to write a bot for it using Opal.
There are so many posts about Opal, so I’m just going to say “it’s a Ruby to JavaScript” compiler, that’s enough.
First of all, we need something that runs the game and injects a bot into the page. I, personally, while writing integration tests (this is the place, where we usually face to web drivers), prefer PhantomJS, but it’s headless, so you can’t enjoy watching how your bot works. We have to use something like Capybara + Selenium:
# Gemfile
gem 'capybara'
gem 'selenium-webdriver'
# runner.rb
require 'capybara'
Capybara.register_driver :selenium do |app|
Capybara::Selenium::Driver.new(app, :browser => :firefox)
end
Capybara.javascript_driver = :selenium
Capybara.default_driver = :selenium
Capybara.run_server = false
So, the script registers a driver, specifies its browser (Firefox), makes it default and runs Capybara in browser mode (i.e. without own server in the background)
Dead simple:
Capybara.current_session.visit('http://browserquest.mozilla.org')
# or in object-oriented style
class Game
include Capybara::DSL
def initialize(url)
# all methods of
# Capybara.current_session
# are available here
visit(url)
end
def play
# logic of the bot
end
end
Game.new('http://browserquest.mozilla.org/').play
Now it runs a Firefox and opens the page with the game.
So, there are two ways to compile Ruby into JavaScript:
To compile a file with Ruby, run:
Opal.append_path('some/path/to/dir/with/your/files')
Opal::Builder.build('relative/path/from/that/dir/to/you/file')
To compile a string with ruby:
Opal.compile("plain ruby code")
The first way is what we really need:
require
commands work as in MRI)app.rb
that require
-s other filesapp.rb
to the pageThis is the place where the main App
class is created. But! It’s defined in anonymous function, so this variable is not available outside the context.
This game uses CommonJS
to load files. This library caches all previously required files and instantly returns cached result on the seconds require
.
We can use it:
app
file.super
I have chosen a method called start
:
# opal/bot.rb
module Patch
def self.apply
%x{
var app = require('app');
oldStart = app.prototype.start;
app.prototype.start = function(username) {
window.currentApplication = this;
oldStart.apply(this, arguments);
}
}
end
end
Patch.apply
Some explanations:
%x{js code}
just passes provided JavaScript into compiled version (i.e. runs it without any translation)currentApplication
that contains an instance of App
classAs you can see, to start the game you need to:
After all of these steps the game will be ready, but the point here is that most of the steps are asynchronous. You can’t just type your name and immediately press ‘Play’ button (and you can’t press ‘Play’ without waiting for loading)
This is the place where promises shine. Opal has its own standard library that
Promise
that acts pretty much like a jQuery.Deferred()
require 'promise'
promise = Promise.new
promise.then { puts 'Done' }
promise.fail { puts 'Fail' }
promise.resolve
# => 'Done' (in JavaScript console)
# or
promise.reject
# => 'Fail'
(Promise
is like an object that is a combination of callback
-s and errback
-s, but you don’t invoke callbacks manually, instead you just switch the state of your promise-object and it automatically triggers callbacks
/errbacks
)
Here is a little helper module that saves our time:
module Utils
def wait_for(promise = Promise.new, &waiting)
result = waiting.call
if !!result
promise.resolve
else
after 0.1 do
wait_for(promise, &waiting)
end
end
promise
end
end
# and usage
class MyClass
include Utils
def call
some_async_method_without_ability_to_pass_callback
wait_for do
method_called && result_is_success
end
end
end
wait_for
method takes a promise (which is a blank promise object by default) and a block (which will be converted to JavaScript function). It calls the block and resolve
-s the promise if it is returned true. If not, it calls itself again after 100 ms (after
= setTimeout
) with the same promise object
To type player’s name we should run:
# I'm using opal-jquery here
# I think it does not require any explanation
input = Element.find('#nameinput')
input.value = @player_name
wait_for do
Element.find('.play.button.disabled').length == 0
end.then do
# Button is ready, we can click it here
end
To click the button, run:
button = Element.find('#createcharacter .play.button div')
button.trigger(:click)
wait_for do
Element.find('#instructions').has_class?('active')
end.then do
# The game is ready here
# And it shows us instructions
# We are almost ready to start the game
end
To close instructions, run:
Element.find('#instructions').trigger(:click)
To run this command, call
StartGame.new('Bot player').invoke.then do
alert("I'm in the game")
end
As an enter point we are going to use global JavaScript variable currentApplication
. It has a property game
(that, unexpectedly, returns instance of Game
class). game
has a player
property (instance of Player
) and entities
property which is an object containing all entities on the map, their types and coordinates. You can easily find their JavaScript implementations in the GitHub repository of the game.
So, our main objects are:
currentApplication
currentApplication.game
currentApplication.game.player
currentApplication.game.entities
First class for wrapping is definitely an App
:
class Application
include Native
def self.current
self.new(`currentApplication`)
end
def initialize(native)
@native = native
end
alias_native :game, :game, as: Game
def to_n
@native
end
end
So, we have a class called Application
that wraps some native JavaScript object and has a ruby-method game
that calls JavaScript-method game
and wraps it using Game
class (see below). As a bonus, we have a class-method current
that returns wrapped currentApplication
.
The next class is a Game
:
class Game
include Native
def self.current
Application.current.game
end
def initialize(native)
@native = native
end
alias_native :player, :player, as: Player
alias_native :say
def to_n
@native
end
def entities
res = []
native_entities = `currentApplication.game.entities`
Native::Hash.new(native_entities).each do |e_id, e|
res << Native(e)
end
EntityCollection.new(res)
end
end
And again, this class can wrap any JavaScript game object, has methods player
, say
and entities
(EntityCollection
is our next class to implement).
(we can test method say
write now, just put Game.current.say('Hello')
to the block where the game is ready and start chatting with other players)
The game provides a global JavaScript object Types
with all mobs/items/armors/weapons information, it allows to identify unknown entity, compare armors and weapons by rank. Basically, it provides everything for writing a bot logic.
To convert it to Ruby, use Types = Native(`Types`)
and use this object in the Ruby world!
Here is my definition of Entity
class:
class Entity
include Native
def initialize(native)
@native = native
end
def to_n
@native
end
alias_native :kind
def player?
Types.isPlayer(kind)
end
# some other methods
# like mob?
# or heal?
def weapon_rank
Types.getWeaponRank(kind)
end
def armor_rank
Types.getArmorRank(kind)
end
end
Well, this class can wrap player/mob/armor/weapon/healing, but this is only a value-object, we still need to implement our collection-object EntityCollection
:
class EntityCollection
def initialize(native_entities)
@entities = native_entities.map do |native_entity|
Entity.new(native_entity.to_n)
end
end
def players
entities = @entities.select(&:player?)
EntityCollection.new(entities)
end
# similar methods like
# mobs/weapons/armors/healings
# are omitted and are just like 'players' method
end
(quickly and without any explanation):
class Player
include Utils
include Native
def self.current
Game.current.player
end
def initialize(native)
@native = native
end
alias_native :distance_to, :getDistanceToEntity
alias_native :moving?, :isMoving
alias_native :attacking?, :isAttacking
alias_native :hp, :hitPoints
alias_native :max_hp, :maxHitPoints
def full_hp?
hp == max_hp
end
alias_native :weapon_name, :getWeaponName
alias_native :armor_name, :getArmorName
end
It’s not as difficult once we have all these classes prepared. The algorithm of farming is like:
GOTO
1All of these steps will be our methods, and all of them must be asynchronous.
Just one method is missing here (closest
):
class EntityCollection
def by_distance
entities = @entities.sort_by do |entity|
Player.current.distance_to(entity)
end
EntityCollection.new(entities)
end
def first
@entities.first
end
def last
@entities.last
end
def closest
by_distance.first
end
end
def kill_mob
closest_mob = Game.current.entities.mobs.closest
`#{Game.current.to_n}.makePlayerAttack(#{closest_mob.to_n})`
# TODO: move this method to the game class
# using alias_native :)
end
def pick_up
`#{Game.current.to_n}.makePlayerGoToItem(#{item.to_n});`
end
def get_armor
current_weapon_name = Player.current.weapon_name
weapons = Game.current.entities.weapons
closest_weapon = weapons.better_than(current_weapon_name).closest
if closest_weapon.nil?
# No weapon, probably next time
return
end
if Player.current.distance_to(closest_weapon) > 100
# Weapon is too far away, next time
return
end
pick_up(closest_weapon)
end
Just like a previous snippet, but with armors
instead of weapons
All of these steps should return promises, every single method written below should wait for player to stop moving and attacking. To make this we need some common method like:
def wait_until_inactive
promise = Promise.new
wait_for do
!Player.current.moving? && !Player.current.attacking?
end.then do
# Wait 1 more second to continue
after 1 do
promise.resolve
end
end
promise
end
And put it to the end of each action method.
We need the main method farm
, right?
def farm
kill_mob.then do
get_weapon.then do
get_armor.then do
heal.then do
farm
end
end
end
end
end
This, is probably the thing that I have personally learned during writing this article. Even if you think in Ruby, you still have to deal with asynchronous components like callbacks/promises. When you need to make an HTTP request in Ruby, you just get you favorite HTTP adapter (mine is RestClient
), send a request and your interpreter waits for response. In JavaScript you have to process response in some callback, because you can’t just stop your interpreter (you know, it blocks UI).
As for me, the main thing Opal gives to you is some ability to think in terms of Ruby classes/modules/inheritance system. But it does not let you completely escape from JavaScript ecosystem (no callbacks? - block is a callback). I would say, most of Opal functionality related to Ruby classes can be replaced with, for example, JsClass
library (which is really wonderful). Opal allows you to compile existing Ruby libraries to JavaScript and use them on the client - this is probably the main feature. Some day significant amount of Ruby libraries will be ported to client-side and probably some day we will think in terms of Ruby even on the client.
First of all, we need to know what is PhantomJS. I would say it’s a ‘tool that acts like a browser but can be controlled from outside using simple command interface’. In more common words, it’s a web driver. It’s a full-featured WebKit (an engine of Chrome/Safari/few last versions of Opera and other browsers), but in console. You can use it for scripting, automating or testing.
Poltergeist is a Ruby wrapper for PhantomJS. Usually you write the code for PhantomJS on JavaScript, with Poltergeist you can run it on Ruby.
Well, I’m pretty sure you know what it is. It’s a test framework. And it supports different web drivers like:
RackTest
- web driver that does not support JavaScript, extremely fast, but deadly primitive; best solution for tests that don’t require any JavaScript execution
Selenium
- the most popular web driver
Advantages:
Disadvantages:
Capybara-webkit
- headless WebKit (does not run a browser), but still requires X server
Poltergeist
- see above, headless WebKit, does not require X server, so you can run it everywhere.
When you write integration tests sometimes you need to run asynchronous JavaScript in context of your page and get a response back to Ruby. Here is an example from my current project: the client part of our application supports offline mode, so we store the data in WebSQL and sync it with server once connection is restored. The API of WebSQL is asynchronous, so we have a result of execution in some provided callback:
// something like
WebSQLWrapper.execute('select 1', function(result) {
// result of execution becomes available after some time
});
We have integration tests with Capybara+Poltergeist+PhantomJS combo that tests the whole stack of interaction between a client-side application and a server API. And WebSQL should be flushed between tests or populated with startup data before some specific tests. We also have delayed operations on the client that runs periodically.
All these features require us to run JavaScript code manually in PhantomJS between/before tests.
Quite simple:
sudo apt-get install phantomjs
or download binaries from the official site (you can even try 2.0 beta there).gem 'capybara'
and that’s it.gem 'poltergeist'
.Require both of them in your spec_helper
and force Capybara to use Poltergeist as a web driver:
require 'capybara/rails'
require 'capybara/poltergeist'
Capybara.register_driver :poltergeist_debug do |app|
driver_options = {
inspector: true,
timeout: 5,
js_errors: false,
debug: false,
phantomjs_logger: Logger.new('/dev/null')
}
if ENV['DEBUG_PHANTOMJS']
driver_options.merge!({
logger: Kernel,
js_errors: true,
debug: true,
phantomjs_logger: File.open(Rails.root.join('log/phantomjs.log'), 'a')
})
end
Capybara::Poltergeist::Driver.new(app, driver_options)
end
Capybara.javascript_driver = :poltergeist_debug
Capybara.default_driver = :poltergeist_debug
With this configuration Poltergeist does not print any noisy output, but you can enable it by passing DEBUG_PHANTOMJS=true
Here is a simple of the code that returns it’s response asynchronously:
setTimeout(function() {
alert('Thanks for waiting 1 second');
}, 1000);
This code actually does nothing, but it’s a demonstration of how asynchronous stuff works.
Yes, when you write something like:
it 'displays a message' do
expect(page).to have_text('Hey')
end
and at the moment of running this expectation your page does not have this text but receives it a second later - the test will be green. Why?
Capybara has a setting called default_wait_time
(was changed to default_max_wait_time
, but is still acceptable) which is 2 seconds by default. Here is how Capybara uses it.
It runs the code again and again, and stops if
(A little remark here. Capybara saves the time on the beginning of this method and on every iteration compares this time with Time.now
- this is a very nice hack to save Capybara from wrapping API calls into Timecop.freeze
- good job!)
Yes, of course. Let’s simplify it a little bit:
module WaitHelper
extend self
# Calls provided +block+ every 100ms
# and stops when it returns false
#
# @param timeout [Fixnum]
# @yield block for execution
#
# @example
# current_time = Time.now
# WaitHelper.wait_until(3) do
# Time.now - current_time > 2
# end
#
# # 2 seconds later ...
# # => true
#
# current_time = Time.now
# WaitHelper.wait_until(3) do
# Time.now - current_time > 10
# end
#
# # 3 seconds later (after timeout)
# # => false
#
def wait_until(timeout, &block)
begin
Timeout.timeout(timeout) do
sleep(0.1) until value = block.call
value
end
rescue TimeoutError
false
end
end
end
Alright, now let’s use it.
WaitHelper.wait_until(timeout) {}
code_for_execution = <<-JS
setTimeout(function() {
window.asyncResponse = 'some response';
}, 1000)
JS
code_for_polling = 'window.asyncResponse'
Capybara.current_session.evaluate_script(code_for_execution)
result = WaitHelper.wait_until(2) do
Capybara.current_session.evaluate_script(code_for_polling)
end
puts result
# => 'some response'
Why not. Here is a gem called capybara-async-runner
. And here is how to use it.
Installation:
# Gemfile
gem 'capybara-async_runner'
# spec/spec_helper.rb
require 'capybara/async_runner'
First of all, I don’t like to mix JavaScript and Ruby code in a single file, so we need templates (like .js.erb
).
You need to specify the directory with templates:
# spec/spec_helper.rb
Capybara::AsyncRunner.setup do |config|
config.commands_directory = Rails.root.join('spec/fixtures/async_commands')
end
Let’s write our first command
class TestCommand < Capybara::AsyncRunner::Command
# global command name
self.command_name = :test_command_name
# .js.erb file in directory specified above
self.file_to_run = 'template'
response :parsed_json do |data|
JSON.parse(data)
end
end
This class follows the Command pattern, you can invoke it in the following way:
Capybara::AsyncRunner.run(:test_command_name)
# or
TestCommand.new.invoke
Let’s create our template for this command:
// spec/fixtures/async_commands/template.js.erb
setTimeout(function() {
var json = JSON.stringify([1,2,3]);
<%= parsed_json(js[:json]) %>
})
There are few things that I need to explain:
parsed_json
- this is an output point from the script that was defined in the command class. When you call it in the template, it embeds some JavaScript that stores passed data into the global variable. parsed_json(123)
produces something like window.parsed_json_result = 123
(so we can can grab this response later from the second script)js
- this is a proxy method from the gem that acts like a Hash. The method []
on this Hash returns passed key (and the whole method js
is kind of “JavaScript memory”). <%= parsed_json(js[:json]) %>
produces window.parsed_json_result = json
which is exactly what we need.Capybara::AsyncRunner.run(:test_command_name)
, the gem executes the script generated from your template in the context of Capybara.current_session
. Then it subscribes to all defined responses (this is actually, the second script) and returns the first one that becomes defined (window.parsed_json_result
in this case).parsed_json
is like a handler for transforming data which it returns (we invoke parsed_json
method with json
variable which contains raw JSON, our handler parses it)If you are familiar with templates in Ruby, just quickly look at the code, this class is a context of rendering.
You need to follow these steps to create a command using a gem:
You can pass any data to the template:
class TestCommand < Capybara::AsyncRunner::Command
self.command_name = :test
self.template = 'template'
# if you don't pass any block
# it will return raw value
response :done
end
data = {
name: 'Ilya'
}
Capybara::AsyncRunner.run(:test, data)
# => 'Ilya'
And use it template:
someLongRunningMethod(function() {
// 'done' is a method that generates JavaScript
// 'data' returns data that we've passed to 'run'
// :name is a key in that Hash
<%= done(data[:name]) %>
})
As I mentioned before, on my current project we use WebSQL, but it’s deprecated, so I’m not going to use it in examples. Instead let’s write a wrapper for IndexedDB.
First of all, we need a JavaScript wrapper, I don’t like the native IndexedDB API. First result from google = Dexie.js
.
Let’s plan our scenario:
Dexie.js
into the pageCapybara.current_session.visit('http://google.com')
Dexie.js
into the pageHow to invoke:
module IndexedDB
URL = 'http://www.url.to.dexie.js.source'
end
Capybara::AsyncRunner.run('indexeddb:wrapper:inject', url: IndexedDB::URL)
Explanation:
<script>
tag to the <head>
The command raises an exception if response is error
.
All this manipulations are synchronous for Ruby. The end of running the command means that we can continue execution.
Moreover, this command is safe, we can call it multiple times and it will inject the script into the page only once.
How to invoke:
Capybara::AsyncRunner.run('indexeddb:wrapper:initialize')
Explanation:
Dexie.js
callback
was called, returns sucess
errback
was called, returns error
with error messageerror
was returnedHow to invoke:
user_data = { name: 'Some Name' }
user_id = Capybara::AsyncRunner.run('indexeddb:insert', store: 'users', data: user_data)
p "User ID: #{user_id}"
This step is quite simple if you understand the previous one.
How to invoke:
methods = [
{ method: 'where', arguments: ['id']},
{ method: 'equals', arguments: [user_id] },
{ method: 'toArray', arguments: [] }
]
p Capybara::AsyncRunner.run('indexeddb:query', store: 'users', methods: methods)
Here we pass an array of methods and their arguments to template, iterate over them and build a Dexie.js
scope (just like ActiveRecord::Relation
), and return a result back to ruby command.
After wrapping it even more we can get interface like this
Full example can be found here
I would say the topic of this article is not so popular. Single page applications that work in offline mode (and because of this, use WebSQL/IndexedDB/some other asynchronous storage, probably) are still not frequent today. Usually when you build an SPA, you just write your tests using Jasmine or something like that. You mock your requests to the server API and test your client in some isolated environment. But these tests are still functional (you verify a single component - client, in this case - but not the whole application).
Someday working on a rich client-side application, remember about the idea of this post. You can control the logical flow of your client-server communication in integration tests, and this is good. Wrap your client code, build a micro-framework on top of this gem and test every piece of your code.
]]>Comparing to other tools for generating API documentation (yardoc
, sdoc
) I would say that the main thing that you gain with Apipie is that your documentation is a real ruby code, so you can write computations, concerns etc.
Here is a simple example of how it looks in code:
class UsersController < ApplicationController
resource_description do
formats [:json]
api_versions 'public'
end
api :POST, '/users' 'Create user'
description 'Create user with specifed user params'
param :user, Hash, desc: 'User information' do
param :full_name, String, desc: 'Full name of the user'
param :age, Fixnum, desc: 'Age of the user'
end
def create
# Some application code
end
end
So, you invoke a DSL from Apipie before your action method and it automatically comes to generated docs.
In the example above we have passed an HTTP verb and a path of this action. We don’t have to do it! Instead, we can simply write:
api! 'Some description'
# other docs here...
def create
end
It automatically takes information from your routes that look like
resources :users, only: [:create]
You can pass versions of your API that include this endpoint on
api_versions ...
As you can see, in the first example we had 3 types:
Hash
Array
Fixnum
Apipie has also:
Enum
Regexp
We have already typed parameters, so why do we ignore it and write custom before_action
-s for validating parameters manually? This feature is enabled by default, but if you don’t think that it’s a good idea to validate your parameters through documenting tool, just pass
config.validate = false
The creator of the gem told me that ‘People either love it or don’t understand why it’s even there’, so it’s up to you to decide if you need it.
I would say that it does not work as people usually expect.
Apipie::DSL::Concern
allows you to document actions that are defined in concerns.
Usually people think that it allows you to extract documentation from your controller in order to not mix code with documentation and application logic. And, to be honest, I think so too. The workaround for doing this goes below in the sections ‘Extracting docs to mixins’
Are you tired of writing examples manually? Me too :) With Apipie you can record request/response pairs to separated YAML file and display them in generated HTML. Pass :show_in_doc
to metadata of your RSpec example and enjoy. Apipie embeds a module for recording requests to ` ActionController::TestCase::Behavior which is the core of all requests specs for RSpec and Minitest (yes, both of them delegate performing requests internally to
ActionController::TestCase::Behavior` - source).
There is a plenty of other things in Apipie that are very cool, by I did not have a chance to use it yet.
Some of these items can be difficult to explain, if you have any questions after reading it, just google it, answers for all of them should be in ruby docs (or ping me if you still don’t get it).
This is the main question I have got after reading official README. I don’t want to mix documentation and application logic. First of all we need to understand how exactly Apipie builds mapping between action names and compiled DSL. Even without reading source code the only guess that we may have is method_added
hook. Every time when you define a method (on instance or on class, it does not matter), Ruby automatically fires method_added
method on your class.
class TestClass
def self.method_added(method_name)
puts "Added method #{method_name}"
end
def method1; end
def method2; end
end
# => Added method method1
# => Added method method2
So, the algorithm is like this:
api
(or api!
)Apipie.current_action_name
) and arguments (in Apipie.current_action_data
, for example)Apipie.current_action_data
Apipie.current_action_name => Apipie.current_action_data
to a global Hash like Apipie.all_docs
/docs
, Apipie displays all data from Apipie.all_docs
And if it’s true we can write something like:
# app/docs/users_doc.rb
module UsersDoc
# we need the DSL, right?
extend Apipie::DSL::Concern
api :GET, '/users', 'List users'
def show
# Nothing here, it's just a stub
end
end
# app/controller/users_controller.rb
class UsersController < ApplicationController
include UsersDoc
def show
# Application code goes here
# and it overrides blank method
# from the module
end
end
And yes, it works! Let’s add resource description
module UsersDoc
extend Apipie::DSL::Concern
resource_description do
formats [:json]
api_versions 'public'
end
end
Which breaks it…
Apipie: Can not resolve resource UsersDoc name.
Yes, because our resource is UsersController
. The error happens in the following lines in Apipie source code:
def get_resource_name(klass)
if klass.class == String
klass
elsif @controller_to_resource_id.has_key?(klass)
@controller_to_resource_id[klass]
elsif Apipie.configuration.namespaced_resources? && klass.respond_to?(:controller_path)
return nil if klass == ActionController::Base
path = klass.controller_path
path.gsub(version_prefix(klass), "").gsub("/", "-")
elsif klass.respond_to?(:controller_name)
return nil if klass == ActionController::Base
klass.controller_name
else
raise "Apipie: Can not resolve resource #{klass} name."
end
end
Most of this code does not really matter, the thing is that Apipie does not know what to do with module UsersDoc
. It has no resource_id
, let’s define it.
resource_description do
resource_id 'Users'
# other dsl
end
Now we get an error
undefined method `superclass' for UsersDoc:Module
But… Module (I mean, an instance of Module class) can’t have a parent class. It’s a module!
Forget that you are a ruby developer and define a method called superclass
on your module :)
# before calling DSL
def self.superclass
UsersController
end
So, from this moment
UsersDoc.superclass == UsersController
# => true
Refresh the page with documentation and see that it finally works! Here is a gist with a full code that wraps ugly code and blank methods: full gist
Imagine the following scenario:
I have two versions of API. When I was young and stupid, I implemented the first one (v1
). And it has an authentication by plain username and password (in headers). Few years later something changed in my mind and I created another one (v2
) which has token-based authentication.
First API contains 100 actions in 10 controllers, so does the second one, too. To make a documentation I have to copy-paste 2 lines of code 200 times (to put it for every action) or 20 times (to put it to every resource).
Instead of repeated copying the real description of authentication mechanism it would be really nice to have something like
# for Api::V1
auth_with :password
# for Api::V2
auth_with :token
And describe authentication in a more declarative way. In order to make it we need to add our own method to DSL. Here is what it may look like:
module BaseDoc
# ... code from the gist
AUTH_METHODS = {
password: Api::Auth::PasswordDoc,
token: Api::Auth::TokenDoc
}
def auth_with(auth_method)
mod = AUTH_METHODS[auth_method] or
raise "Unknown auth strategy #{auth_method}"
send(:include, mod)
end
end
# app/docs/api/auth/password_doc.rb
module Api::Auth::PasswordDoc
def self.included(base)
base.instance_eval do
# documentation of password-
# based authentication
header :username, required: true
header :password, required: true
error code: 401, desc: 'Unauthorized'
end
end
end
# app/docs/api/auth/token_doc.rb
module Api::Auth::TokenDoc
def self.included(base)
base.instance_eval do
# documentation of token-
# based authentication
header 'Token', 'Your API token'
error code: 401, desc: 'Token is requried'
end
end
end
And from now we can write
module Api::V1::UsersDoc
extend BaseDoc
doc_for :show do
auth_with :password
# or
# auth_with :token
end
end
So, we have a resource_description
and api
methods, but how can we define common parts for all of our actions? I hate copy-paste driven development, let’s write a DSL for this.
We want to have a defaults
method which takes a block and executes it for each action.
module BaseDoc
# ...
def defaults(&block)
@defaults = block
end
def doc_for(action_name, &block)
instance_eval(&block)
# Only the next line added
instance_eval(&@defaults) if @defaults
api_version namespace_name if namespace_name
define_method(action_name) do
# ... define it in your controller with the real code
end
end
end
So, we just store passed block and invoke it. This example is much simpler then the previous one.
You can click by statically generated docs (thanks to GitHub pages):
Apipie is an amazing library and its most significant advantage is that you document ruby using ruby (being a ruby developer), which gives you ability to define custom behaviors and scenarios. I can’t even imagine myself writing API docs using yardoc
(however, I use it to document plain ruby classes).
If you have any bugs (or ideas to implement), please, create an issue on GitHub, let’s make it even better!
]]>