TL;DR
This article is about instruction sequences and evaluating them using pure Ruby.
The repository is available here .
Is it a Ruby implementation?
No. It’s just a runner of instructions. It is similar to MRI’s virtual machine, but it lacks many features and it’s 100 times slower.
Can I use it in my applications?
Of course, no. Well, if you want.
Does it work at all?
Yes, and it even passes most language specs from RubySpec test suite.
How Ruby evaluates your code.
Well, I think I should start with explaining basics. There is a plenty of articles about it, so I’ll be short:
- First, Ruby parses your code into AST (
parse.y
) - Then, it compiles it into instruction sequence (
compile.c
) - And every time when you let’s say call a method it evaluates these instructions. (
insn.def
,vm_eval.c
,vm_insnhelper.c
)
Long time ago there was no YARV and Ruby used to evaluate AST.
1+2
is just a small syntax tree with a send
node in the root and two children: int(1)
and int(2)
. To evaluate it you need to “traverse” it by walking down recursively. Primitive nodes are getting substituted with values (integers 1 and 2 respectively), send
node calls +
on pre-reduced children and returns 3
.
On the other side, YARV is a stack-based Virtual Machine (VM) that evaluates an assembly-like language that consist of more than 100 predefined instructions that store their input/output values on the VM’s stack.
These instructions have arguments, some of them are primitive values, some are inlined Ruby objects, some are flags.
You can view these instructions by running Ruby with --dump=insns
flag:
$ ruby --dump=insns -e 'puts 2 + 3'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putobject 2
0003 putobject 3
0005 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:puts, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0011 leave
As you can see, there are 5 instructions:
putself
- pushesself
at the top of the stackputobject
- pushes given object (numbers 2 an 3)opt_plus
- optimized instruction for+
methodopt_send_without_block
- optimized instruction for a generic method call without blockleave
- an equivalent ofreturn
The structure of the instruction sequence (ISeq)
Let’s start with the example above.
MRI has an API to compile an arbitrary code into instructions:
> pp RubyVM::InstructionSequence.compile("2+3").to_a
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>0,
:local_size=>0,
:stack_max=>2,
:node_id=>4,
:code_location=>[1, 0, 1, 3]},
"<compiled>",
"<compiled>",
"<compiled>",
1,
:top,
[],
{},
[],
[1,
:RUBY_EVENT_LINE,
[:putobject, 2],
[:putobject, 3],
[:opt_plus, {:mid=>:+, :flag=>16, :orig_argc=>1}, false],
[:leave]]]
Our code gets compiled into an array where:
- The first element returns the type of the instruction sequence
- Then comes
MAJOR
/MINOR
/PATCH
parts of the Ruby version that was used to compile it - Then comes the hash with a meta information about this iseq
- Then comes relative/full/some other file name. It is possible to override them similar to passing
file/line
arguments to{class,module}_eval
(Object.module_eval("[__FILE__, __LINE__]", '(file)', 42)
) - Then comes the line where the code begins (1 by default)
- Then comes the type of the iseq (
:top
here means the code was parsed and compiled as a top-level block of code) - Then we have a few empty arrays and hashes (we will return to them later)
- And the last item is an array of instructions
Each instruction is either:
- a number
- a special symbol
- or an array
Only arrays are “real” instructions, numbers and symbols are special debug entries that are used internally by the VM. In our case a number followed by the :RUBY_EVENT_LINE
is a mark that MRI uses to know what is the number of the line that is currently being interpreted (for example, all backtrace entries include these numbers)
Building a PoC
How can we evaluate instructions above? Well, we definitely need a stack:
$stack = []
Then, we need to iterate over instructions (the last item of the iseq) and evaluate them one by one. We could write a giant case-when
, but I promise that it won’t fit on 10 screens. Let’s use some meta-programming and dynamic dispatching:
def execute_insn(insn)
name, *payload = insn
send(:"execute_#{name}", payload)
end
This way we need to write a method per instruction type, let’s start with putobject
:
def execute_putobject(object)
$stack.push(object)
end
def execute_opt_plus(options, flag)
rhs = $stack.pop
lhs = $stack.pop
$stack.push(lhs + rhs)
end
def execute_leave
# nothing here so far
end
This code is enough to get 5
in the stack once it’s executed. Here’s the runner:
iseq = RubyVM::InstructionSequence.compile("2+3").to_a
insns = iseq[13].grep(Array) # ignore numbers and symbols for now
insns.each { |insn| execute_insn(insn) }
pp $stack
You should see [5]
.
All instructions above simply pull some values from the stack, do some computations and push the result back.
Self
Let’s think about self
for a minute:
> pp RubyVM::InstructionSequence.compile("self").to_a[13]
[1, :RUBY_EVENT_LINE, [:putself], [:leave]]
self
is stored somewhere internally, and even more, it’s dynamic:
puts self
class X
puts self
end
The first puts self
prints main
, while the second one prints X
.
Here comes the concept of frames.
Frames
Frame is an object inside the virtual machine that represents a closure. Or a binding
. It’s an isolated “world” with its own set of locals, its own self
, its own file
/line
information, it’s own rescue
and ensure
handlers etc.
Frame also has a type:
- it can be a
top
frame that wraps all the code in your file. All variables set in the global scope of your file belong to the top frame. Each parsed and evaluated file creates its own top frame. - it can be a
method
frame. All methods create it. One method frame per one method call. - it can be a
block
frame. All blocks and lambdas create it. One block frame per one block call. - it can be a
class
frame, however it does not mean that instantiating a class creates it. The wholeclass X; ... end
does it. Later when you doX.new
Ruby does not create anyclass
frames. This frame represents a class definition. - it can be a
rescue
frame that represents a code insiderescue => e; <...>; end
block - it can be an
ensure
frame (well, I’m sure you get it) - there are also
module
(for a module body),sclass
(for a singleton class) and a very uniqueeval
frame.
And they are stored internally as a stack.
When you invoke caller
method you see this stack (or some information based on it). Each entry in the error backtrace is based on the state of this stack when the error is thrown.
OK, let’s write some code to extend our VM:
class FrameClass
COMMON_FRAME_ATTRIBUTES = %i[
_self
nesting
locals
file
line
name
].freeze
def self.new(*arguments, &block)
Struct.new(
*COMMON_FRAME_ATTRIBUTES,
*arguments,
keyword_init: true
) do
class_eval(&block)
def self.new(iseq:, **attributes)
instance = allocate
instance.file = iseq.file
instance.line = iseq.line
instance.name = iseq.name
instance.locals = {}
instance.send(:initialize, **attributes)
instance
end
end
end
end
I like builders and I don’t like inheritance (the fact that frames share some common attributes does not mean that they should be inherited from an AbstractFrame
class; also I don’t like abstract classes, they are dead by definition)
Here’s the first version of the TopFrame
:
TopFrame = FrameClass.new do
def initialize(**)
self._self = TOPLEVEL_BINDING.eval('self')
self.nesting = [Object]
end
def pretty_name
"TOP #{file}"
end
end
Let’s walk through the code:
- each frame has a common set of attributes:
_self
- whatself
returns inside the framenesting
- whatModule.nesting
returns (used for the relative constant lookup)locals
- a set of local variablesfile
andline
- currently running__FILE__:__LINE__
name
- a human-readable name of the frame, we will use it mostly for debuggingFrameClass
is a builder that is capable of building a customFrame
class (similar toStruct
class)FrameClass.new
takes:- a list of custom frame-class-specific attributes
- and a block that is evaluated in a context of the created frame class
So, the TopFrame
class is a Struct
-like class that:
- has a constructor that takes only common attributes (because we haven’t specified any in
FrameClass.new
) - has a custom behavior in the constructor that sets
_self
andnesting
- has a custom
pretty_name
instance method
We will create as many classes as we need to cover all kinds of frames (I will return to it later, I promise)
Wrapping the ISeq
I don’t like working with plain arrays, and as I mentioned above there’s a ton of useful information in the instruction sequence that we get from RubyVM::InstructionSequence.compile("code").to_a
.
Let’s create a wrapper that knows the meaning of array items:
class ISeq
attr_reader :insns
def initialize(ruby_iseq)
@ruby_iseq = ruby_iseq
@insns = @ruby_iseq[13].dup
end
def file
@ruby_iseq[6]
end
def line
@ruby_iseq[8]
end
def kind
@ruby_iseq[9]
end
def name
@ruby_iseq[5]
end
def lvar_names
@ruby_iseq[10]
end
def args_info
@ruby_iseq[11]
end
end
Instance methods are self-descriptive, but just in case:
file
/line
return file/line where the iseq has been createdkind
returns a Symbol that we will later use to distinguish frames (:top
for aTopFrame
)insns
returns a list of instructionsname
is an internal name of the frame that is used in stacktraces (forclass X; end
it returns<class:X>
)lvar_names
is an array of all local variable names that are used in the frameargs_info
is a special Hash with a meta-information about arguments (empty for all frames except methods)
Frame stack
Frames are organized as a stack internally, every time when we enter a frame Ruby pushes it on a stack. When the frame ends (i.e. when its list of instruction ends or there’s a special [:leave]
instruction) it pops it.
class FrameStack
attr_reader :stack
include Enumerable
def initialize
@stack = []
end
def each
return to_enum(:each) unless block_given?
@stack.each { |frame| yield frame }
end
def push(frame)
@stack << frame
frame
end
def push_top(**args)
push TopFrame.new(**args)
end
def pop
@stack.pop
end
def top
@stack.last
end
def size
@stack.size
end
def empty?
@stack.empty?
end
end
Each entry in the stack is a frame that we entered at some point, so we can quickly build a caller
:
class BacktraceEntry < String
def initialize(frame)
super("#{frame.file}:#{frame.line}:in `#{frame.name}'")
end
end
stack = FrameStack.new
code = '2 + 3'
iseq = ISeq.new(RubyVM::InstructionSequence.compile(code, 'test.rb', '/path/to/test.rb', 42).to_a)
stack.push_top(iseq: iseq)
caller = stack.map { |frame| BacktraceEntry.new(frame) }.join("\n")
puts caller
# => "/path/to/test.rb:42: in `<compiled>'"
Writing the evaluator
Let’s write it in a “script style” (it is simplified for a good reason, real code is much more complicated):
iseq = ISeq.new(RubyVM::InstructionSequence.compile_file('path/to/file.rb').to_a)
frame_stack = FrameStack.new
stack = []
frame_stack.push_top(iseq: iseq)
until frame_stack.empty?
current_frame = frame_stack.top
if current_frame.nil?
break
end
if current_frame.insns.empty?
frame_stack.pop_frame
next
end
current_insn = current_frame.insns.shift
execute_insn(current_insn)
end
Generally speaking, the code above is the core of the VM. Once it’s executed both frame_stack
and stack
must be empty. I added a bunch of consistency checks in my implementation, but for the sake of simplicity I’m going to omit them here.
Instructions
I’ll try to be short here, there are about 100 instructions in Ruby, and some of them look similar.
putself
, putobject
, putnil
, putstring
, putiseq
All of these guys push a simple object at the top of the stack. putnil
pushes a known global nil
object, others have an argument that is used in stack.push(argument)
optimized instructions like opt_plus
Ruby has a mode (that is turned on by default) that optimizes some frequently used method calls, like +
or .size
. It is possible to turn them off by manipulating RubyVM::InstructionSequence.compile_option
(if you set :specialized_instruction
to false
you’ll get a normal method call instead of the specialized instruction).
All of them do one specific thing, here’s an example of the opt_size
:
def execute_opt_size(_)
push(pop.size)
end
Of course we do it this way because we cannot optimize it. MRI does a different thing:
- if there’s a string/array/hash on top of the stack
- and
String#size
(orArray#size
if it’s an array) is not redefined - then it directly calls a C method
rb_str_length
(orRARRAY_LEN
if it’s an array) - otherwise (if it’s an object of some other type or a method has been redefined) it calls a method through the regular method dispatch mechanism (which is obviously slower)
We could do the same sequence of steps, but we can’t invoke a C method, and so calling a check + .size
afterwards is even slower. It’s better for us to fall to the slow branch from the beginning.
You can print all available specialized instructions by running
RubyVM::INSTRUCTION_NAMES.grep(/\Aopt_/)
On Ruby 2.6.4 there are 34 of them.
opt_send_without_block
(or send
if specialized instructions are disabled)
This is an instruction that is used to invoke methods. puts 123
looks like this:
[:opt_send_without_block, {:mid=>:puts, :flag=>20, :orig_argc=>1}, false]
It has 2 arguments:
- a hash with options
mid
- a method ID (method name)flag
- a bitmask with a metadata about the invocationorig_argc
- a number of arguments passed to a method call- a boolean flag that is called
CALL_DATA
in C. I have no idea what it does
Here’s the rough implementation:
def execute_opt_send_without_block(options, _)
mid = options[:mid]
args = options[:orig_argc].times.map { pop }.reverse
recv = pop
result = recv.send(mid, *args)
push(result)
end
So here we
- take a method from the
options
hash (mid
) - then we pop N arguments from the stack (
args
) - then we pop the receiver of the method (
recv
) - then call
recv.send(mid, *args)
(in our case it’sself.send(:puts, *[123])
- and then we push the result back to the stack
method definition
I intentionally started with method calls because Ruby defines methods via method calls. Yes.
Ruby has a special singleton object called Frozen Core
. When you define a method via def m; end
Ruby invokes frozen_core.send("core#define_method", method_iseq)
:
$ ruby --dump=insns -e 'def m; end'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)> (catch: FALSE)
0000 putspecialobject 1 ( 1)[Li]
0002 putobject :m
0004 putiseq m
0006 opt_send_without_block <callinfo!mid:core#define_method, argc:2, ARGS_SIMPLE>, <callcache>
0009 leave
The object itself is defined here .
Of course, we don’t have access to the Frozen Core. But we have an instruction that pushes it at the top of the stack. We can create our own FrozenCore = Object.new.freeze
and check if recv
is equal to this frozen core.
As you may notice there are also putobject :m
and putiseq
instructions. And argc
of the method call is 2. Hmmm.
core#define_method
takes two arguments:
- a method name that is pushed by a
putobject
instruction - and an iseq that is … pushed by the
putiseq
instruction. Yes, instruction is an argument for another instruction.
Here’s the code:
if recv.equal?(FrozenCore) && mid == :'core#define_method'
method_name, body_iseq = *args
result = __define_method(method_name: method_name, body_iseq: body_iseq)
end
Here’s how __define_method
looks:
def __define_method(method_name:, body_iseq:)
parent_nesting = current_frame.nesting
define_on = MethodDefinitionScope.new(current_frame)
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
method_name
end
When we enter a method in Ruby it inherits Module.nesting
of a frame that defines it. This is why we also copy current_frame.nesting
to a method frame.
define_on = MethodDefinitionScope.new(current_frame)
is also quite simple:
class MethodDefinitionScope
def self.new(frame)
case frame._self
when Class, Module
frame._self
when TOPLEVEL_BINDING.eval('self')
Object
else
frame._self.singleton_class
end
end
end
- If we define a method in a global scope it is defined on the
Object
class. - If we define a method inside a class/module context - well, class/module is where the method will be defined.
- If we define a method inside some other context (inside
instance_eval
for example) - the method is defined on the singleton class of the object.
These code constructions are equivalent:
def m; end
# and
Object.define_method(:m) {}
class X; def m; end; end
# and
X.define_method(:m) {}
o = Object.new
o.instance_eval { def m; end }
# and
o.singleton_class.define_method(:m) {}
Then comes this part:
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
We define a method that takes any arguments (it breaks Method#parameters
, but let’s ignore it) and an optional block and executes the ISeq of the method body in a context of self
.
I admit that it’s a very hacky trick, but it allows us to dynamically assign self
.
Plus, we pass all other things that can (and in most cases will) be used in a method body:
method_args
- what was given to a particular invocation of our methodblock
- a block given to a method callparent_nesting
-Module.nesting
in the outer scope. We have to store in the beginning of the method definition because it may change before the method gets called.
execute(iseq, **options)
is a tiny wrapper that pushes the frame into the frame_stack
depending on the kind
of the given iseq:
def execute(iseq, **payload)
iseq = ISeq.new(iseq)
push_frame(iseq, **payload)
evaluate_last_frame
pop_frame
end
def push_frame(iseq, **payload)
case iseq.kind
when :top
@frame_stack.push_top(
iseq: iseq
)
when :method
@frame_stack.push_method(
iseq: iseq,
parent_nesting: payload[:parent_nesting],
_self: payload[:_self],
arg_values: payload[:method_args],
block: payload[:block]
)
else
raise NotImplementedError, "Unknown iseq kind #{iseq.kind.inspect}"
end
end
local variables
there are 2 most-commonly used instructions to get/set locals:
getlocal
setlocal
Both take two arguments:
- an offset of the frame where the variable is stored
- an ID of the variable
Here’s an example
> pp RubyVM::InstructionSequence.compile('a = 10; b = 20; a; b').to_a[13]
[[:putobject, 10],
[:setlocal_WC_0, 4],
[:putobject, 20],
[:setlocal_WC_0, 3],
[:getlocal_WC_0, 3],
[:leave]]
We push 10
to the stack, then we pop it and assign to a variable with ID = 4 in the current frame (setlocal_WC_0
here is a specialized instruction that is setlocal 0, 4
when the optimization is turned off).
Here’s the code to maintain locals:
require 'set'
# A simple struct that represents a single local variable;
# has a name, an ID and a value (or no value)
Local = Struct.new(:name, :id, :value, keyword_init: true) do
def get
value
end
def set(value)
self.value = value
value
end
end
# A wrapper around "Set" that holds all locals for some frame;
# Absolutely each frame has its own instance of "Locals"
class Locals
UNDEFINED = Object.new
def UNDEFINED.inspect; 'UNDEFINED'; end
def initialize(initial_names)
@set = Set.new
initial_names.reverse_each.with_index(3) do |arg_name, idx|
# implicit args (like a virtual attribute that holds mlhs value) have numeric names
arg_name += 1 if arg_name.is_a?(Integer)
declare(name: arg_name, id: idx).set(Locals::UNDEFINED)
end
end
def declared?(name: nil, id: nil)
!find_if_declared(name: name, id: id).nil?
end
def declare(name: nil, id: nil)
local = Local.new(name: name, id: id, value: nil)
@set << local
local
end
def find_if_declared(name: nil, id: nil)
if name
@set.detect { |var| var.name == name }
elsif id
@set.detect { |var| var.id == id }
else
raise NotImplementedError, "At least one of name:/id: is required"
end
end
def find(name: nil, id: nil)
result = find_if_declared(name: name, id: id)
if result.nil?
raise InternalError, "No local name=#{name.inspect}/id=#{id.inspect}"
end
result
end
def pretty
@set
.map { |local| ["#{local.name}(#{local.id})", local.value] }
.sort_by { |(name, value)| name }
.to_h
end
end
So locals
inside a frame is just a set. It is possible to declare a local, to check if it’s declared and to get it.
Here’s the implementation of getlocal
:
def execute_getlocal(local_var_id, n)
frame = n.times.inject(current_frame) { |f| f.parent_frame }
local = frame.locals.find(id: local_var_id)
value = local.get
if value.equal?(Locals::UNDEFINED)
value = nil
end
push(value)
end
We jump out n
times to get the Nth frame, we find a local, we return nil
if it’s undefined
and we push the value back to the stack (so the result can be used by a subsequent instruction)
Here’s the implementation of setlocal
:
def execute_setlocal(local_var_id, n)
value = pop
frame = n.times.inject(current_frame) { |f| f.parent_frame }
local =
if (existing_local = frame.locals.find_if_declared(id: local_var_id))
existing_local
elsif frame.equal?(current_frame)
frame.locals.declare(id: local_var_id)
else
raise InternalError, 'locals are malformed'
end
local.set(value)
end
This one is a bit more complicated:
- first, we
pop
the value from the stack (it was pushed by a previous instructionputobject 10
) - then we find Nth frame
- then we get a local from this frame
- if it’s there we use it; if not - we declare it in the current frame
- and then we set the value
method arguments
Every method has a list of arguments. Yes, sometimes it’s empty, but even in such case we do an arity check. In general arguments initialization is a part of every method call.
This part of Ruby is really complicated, because we have 12 argument types:
- required positional argument -
def m(x)
- optional positional argument -
def m(x = 42)
- rest argument -
def m(*x)
- post argument -
def m(*, x = 1)
mlhs
argument (can be used as a post argument too) -def m( (x, *y, z) )
- required keyword argument -
def m(x:)
- optional keyword argument -
def m(x: 42)
- rest keyword argument -
def m(**x)
- block argument -
def m(&x)
- shadow argument -
proc { |;x| }
(I did not implement it because I never used it) nil
keyword argument (since 2.7) -def m(**nil)
- arguments forwarding (also since 2.7) -
def m(...)
First, let’s take a look at the iseq to see what we have:
> pp RubyVM::InstructionSequence.compile('def m(a, b = 42, *c, d); end').to_a[13][4][1]
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>4,
:local_size=>4,
:stack_max=>1,
:node_id=>7,
:code_location=>[1, 0, 1, 28]},
"m",
"<compiled>",
"<compiled>",
1,
:method,
[:a, :b, :c, :d],
{:opt=>[:label_0, :label_4],
:lead_num=>1,
:post_num=>1,
:post_start=>3,
:rest_start=>2},
[],
[:label_0,
1,
[:putobject, 42],
[:setlocal_WC_0, 5],
:label_4,
[:putnil],
:RUBY_EVENT_RETURN,
[:leave]]]
There are two entries that we are interested in:
- a list of argument names (well, it’s a list of all variables, but it works for our case)
- a hash with arguments information
Let’s prepare and group it first, it’s hard to work with such format:
class CategorizedArguments
attr_reader :req, :opt, :rest, :post, :kw, :kwrest, :block
def initialize(arg_names, args_info)
@req = []
@opt = []
@rest = nil
@post = []
parse!(arg_names.dup, args_info.dup)
end
def parse!(arg_names, args_info)
(args_info[:lead_num] || 0).times do
req << take_arg(arg_names)
end
opt_info = args_info[:opt].dup || []
opt_info.shift
opt_info.each do |label|
opt << [take_arg(arg_names), label]
end
if args_info[:rest_start]
@rest = take_arg(arg_names)
end
(args_info[:post_num] || 0).times do
post << take_arg(arg_names)
end
end
def take_arg(arg_names)
arg_name_or_idx = arg_names.shift
if arg_name_or_idx.is_a?(Integer)
arg_name_or_idx += 1
end
arg_name_or_idx
end
end
I intentionally skip keyword arguments here, but they are not that much different from other types of arguments. The only noticeable difference is that optional keyword arguments have “inlined” default values if they are simple enough (like plain strings or numbers, but not expressions like 2+2
). If you are interested you can go to the repository and check this file.
Then, we should parse arguments and assign them into local variables when we push a method frame (so they are available once we start executing instructions of a method body):
class MethodArguments
attr_reader :args, :values, :locals
def initialize(iseq:, values:, locals:)
@values = values.dup
@locals = locals
@iseq = iseq
@args = CategorizedArguments.new(
iseq.lvar_names,
iseq.args_info
)
end
def extract(arity_check: false)
if arity_check && values.length < args.req.count + args.post.count
raise ArgumentError, 'wrong number of arguments (too few)'
end
# Required positional args
args.req.each do |name|
if arity_check && values.empty?
raise ArgumentError, 'wrong number of arguments (too few)'
end
value = values.shift
locals.find(name: name).set(value)
end
# Optional positional args
args.opt.each do |(name, label)|
break if values.length <= args.post.count
value = values.shift
locals.find(name: name).set(value)
VM.jump(label)
end
# Rest positional argument
if (name = args.rest)
value = values.first([values.length - args.post.length, 0].max)
@values = values.last(args.post.length)
locals.find(name: name).set(value)
end
# Required post positional arguments
args.post.each do |name|
if arity_check && values.empty?
raise ArgumentError, 'Broken arguments, cannot extract required argument'
end
value = values.shift
locals.find(name: name).set(value)
end
# Make sure there are no arguments left
if arity_check && values.any?
raise ArgumentError, 'wrong number of arguments (too many)'
end
end
end
Here values
is what we get in a method call in *method_args
, locals
is equal to MethodFrame#locals
that is set to Locals.new
by default.
Let’s write MethodFrame
class!
MethodFrame = FrameClass.new do
attr_reader :arg_values
attr_reader :block
def initialize(parent_nesting:, _self:, arg_values:, block:)
self._self = _self
self.nesting = parent_nesting
@block = block
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: iseq.args_info[:block_start] ? block : nil
).extract(arity_check: true)
end
def pretty_name
"#{_self.class}##{name}"
end
end
Method frame is just a regular frame that extracts arguments during its initialization.
constants
A regular constant assignment (like A = 1
) is based on a scope (Module.nesting
for relative lookup) and two instructions:
setconstant
getconstant
Both have a single argument - a constant name. But how does Ruby distinguish relative and absolute constant lookup? I mean, what’s the difference between A
and ::A
?
Ruby uses a special instruction to set a “constant scope” that:
- for relative lookup
- in the optimized mode does
opt_getinlinecache
beforeget/setconstant
- in the non-optimized mode does
pushnil
(that works as a flag)
- in the optimized mode does
- for absolute lookup Ruby computes it via a sequence of instructions
Let’s take a look at the non-optimized mode (because we can’t optimize it anyway):
> pp RubyVM::InstructionSequence.compile('A; ::B; Kernel::D').to_a[13]
[[:putnil],
[:getconstant, :A],
[:pop],
[:putobject, Object],
[:getconstant, :B],
[:pop],
[:putnil],
[:getconstant, :Kernel],
[:getconstant, :D],
[:leave]]
A
constant performs a relative lookup, so putnil
is used.
::B
constant performs a global lookup on the Object
that is a known object, and so it’s inlined in the putobject
instruction.
Kernel::D
first searches for Kernel
constant locally, then it uses it as a “scope” for a constant D
.
Quite easy, right? Not so fast. Ruby uses Module.nesting
to perform a bottom-top search. This is why it’s so important to maintain nesting
value in frames. Thus, the lookup is performed on current_scope.nesting
in reverse order:
def execute_getconstant(name)
scope = pop
search_in = scope.nil? ? current_scope.nesting.reverse : [scope]
search_in.each do |mod|
if mod.const_defined?(name)
const = mod.const_get(name)
push(const)
return
end
end
raise NameError, "uninitialized constant #{name}"
end
If the scope is given (via push
in a previous instruction) we use it. Otherwise we have a relative lookup and so we must use current_scope.nesting.reverse
.
setconstant
is a bit simpler, because it always defines a constant on a scope set by a previous instruction:
> pp RubyVM::InstructionSequence.compile('A = 10; ::B = 20; Kernel::D = 30').to_a[13]
[[:putobject, 10],
[:putspecialobject, 3],
[:setconstant, :A],
[:putobject, 20],
[:putobject, Object],
[:setconstant, :B],
[:putobject, 30],
[:dup],
[:putnil],
[:getconstant, :Kernel],
[:setconstant, :D],
[:leave]]
putspecialobject
is an instruction that is (when called with 3) pushes a “current” scope.
def putspecialobject(kind)
case kind
when 3
push(current_frame.nesting.last) # push "current" scope
else
raise NotImplementedError, "Unknown special object #{kind}"
end
end
def setconstant(name)
scope = pop
value = pop
scope.const_set(name, value)
end
Instance/Class variables
Instance variables are always picked from the self
of the current frame (they literally look like a simplified version of local variables that are always stored in self
of the current scope):
> pp RubyVM::InstructionSequence.compile('@a = 42; @a').to_a[13]
[[:putobject, 42],
[:setinstancevariable, :@a, 0],
[:getinstancevariable, :@a, 0],
[:leave]]
I guess you know how the code should look like:
def execute_getinstancevariable(name, _)
value = current_frame._self.instance_variable_get(name)
push(value)
end
def execute_setinstancevariable(name, _)
value = pop
current_frame._self.instance_variable_set(name, value)
end
Class variables are similar, but it is possible to get it in the instance method, so it uses self
if our current frame is a ClassFrame
or self.class
otherwise:
> pp RubyVM::InstructionSequence.compile('@@a = 42; @@a').to_a[13]
[[:putobject, 42],
[:setclassvariable, :@@a],
[:getclassvariable, :@@a],
[:leave]]
def execute_setclassvariable(name)
value = pop
klass = current_frame._self
klass = klass.class unless klass.is_a?(Class)
klass.class_variable_set(name, value)
end
def execute_getclassvariable(name)
klass = current_frame._self
klass = klass.class unless klass.is_a?(Class)
value = klass.class_variable_get(name)
push(value)
end
Literals
But how can we construct arrays, hashes?
> pp RubyVM::InstructionSequence.compile('[ [:foo,a,:bar], [4,5], 42 ]').to_a[13]
[[:putobject, :foo],
[:putself],
[:send, {:mid=>:a, :flag=>28, :orig_argc=>0}, false, nil],
[:putobject, :bar],
[:newarray, 3],
[:duparray, [4, 5]],
[:putobject, 42],
[:newarray, 3],
[:leave]]
As you can see the strategy of building an array depends on its dynamicity:
- for dynamic
[:foo, a, :bar]
MRI usesnewarray
(becausea
has to be computed in runtime) - for primitive
[4, 5]
it usesduparray
(because it’s faster)
The whole array is also dynamic (because one of its elements is also dynamic). Let’s define them:
def execute_duparray(array)
push(array.dup)
end
def execute_newarray(size)
array = size.times.map { pop }.reverse
push(array)
end
Do hashes support inlining?
> pp RubyVM::InstructionSequence.compile('{ primitive: { foo: :bar }, dynamic: { c: d } }').to_a[13]
[[:putobject, :primitive],
[:duphash, {:foo=>:bar}],
[:putobject, :dynamic],
[:putobject, :c],
[:putself],
[:send, {:mid=>:d, :flag=>28, :orig_argc=>0}, false, nil],
[:newhash, 2],
[:newhash, 4],
[:leave]]
Yes! duphash
contains an inlined hash that should be pushed to the stack as is. newhash
has a numeric argument that represents the number of keys and values on the hash (i.e. keys * 2
or values * 2
, there’s no difference). And once again, if at least one element of the hash is dynamic, the whole has is also dynamic and so it uses newhash
:
def execute_duphash(hash)
push(hash.dup)
end
def execute_newhash(size)
hash = size.times.map { pop }.reverse.each_slice(2).to_h
push(hash)
end
Why do we need .dup
in duphash
and duparray
? The reason is simple: this instruction can be executed multiple times (if it’s a part of a method or block, for example), and so the same value will be pushed to the stack multiple times. One of the next instructions can modify it but literals have to stay static no matter what. Without using .dup
the code like
2.times do
p [1, 2, 3].pop
end
would print 3
and 2
.
Splats
Splat is one of the most beautiful features of Ruby. Splat is foo, bar = *baz
(and also [*foo, *bar]
):
> pp RubyVM::InstructionSequence.compile('a, b = *c, 42').to_a[13]
[[:putself],
[:send, {:mid=>:c, :flag=>28, :orig_argc=>0}, false, nil],
[:splatarray, true],
[:putobject, 42],
[:newarray, 1],
[:concatarray],
[:dup],
[:expandarray, 2, 0],
[:setlocal_WC_0, 4],
[:setlocal_WC_0, 3],
[:leave]]
splatarray
pops the object from the stack, converts it to Array
by calling to_a
(if it’s not an array; otherwise there’s no type casting), and pushes the result back to the stack.
concatarray
constructs an array from two top elements and pushes it back. So it changes the stack [a, b]
to [ [a,b] ]
. If items are arrays it expands and merges them.
expandarray
expands it by doing pop
and pushing items back to the stack. It takes the number of elements that need to be returned, so if an array is bigger it drops some items, if it’s too small - it pushes as many nil
s as needed.
def execute_splatarray(_)
array = pop
array = array.to_a unless array.is_a?(Array)
push(array)
end
def execute_concatarray(_)
last = pop
first = pop
push([*first, *last])
end
def execute_expandarray(size, _flag)
array = pop
if array.size < size
array.push(nil) until array.size == size
elsif array.size > size
array.pop until array.size == size
else
# they are equal
end
array.reverse_each { |item| push(item) }
end
In fact expandarray
is much, much more complicated, you can go to the repository and check it if you want.
Keyword splats (like { **x, **y }
) are really similar to array splats, I’m not going to cover them here.
conditions (if/unless)
To handle conditions Ruby uses local goto
(just like in C). Target of the goto
-like instruction is a label:
> pp RubyVM::InstructionSequence.compile("a = b = c = 42; if a; b; else; c; end").to_a[13]
[[:putobject, 42],
[:dup],
[:setlocal_WC_0, 3],
[:dup],
[:setlocal_WC_0, 4],
[:setlocal_WC_0, 5],
[:getlocal_WC_0, 5],
[:branchunless, :label_20],
[:jump, :label_16],
:label_16,
[:getlocal_WC_0, 4],
[:jump, :label_22],
:label_20,
[:getlocal_WC_0, 3],
:label_22,
[:leave]]
Do you see these :label_<NN>
symbols? They are used as markers. branchunless
takes a single argument: a label that it to jump to if the value on the top of the stack is false
or nil
. If it’s true
-like it does nothing.
def execute_branchunless(label)
cond = pop
unless cond
jump(label)
end
end
def jump(label)
insns = current_frame.iseq.insns
insns.drop_while { |insn| insn != label }
insns.shift # to drop the label too
end
Here we do pop
, check it and call jump
if it’s false
. jump
skips instructions until it sees a given label.
MRI also has branchif
and branchnil
:
branchif
doesif cond
as a main checkbranchnil
doesif cond.nil?
String interpolation/concatenation
Ruby has a few compile-time optimizations that optimize code like
"a""b"
"#{'a'}#{'b'}"
into a string "ab"
. However more complicated cases with dynamic interpolation involve a few new instructions:
> pp RubyVM::InstructionSequence.compile('"#{a}#{:sym}"').to_a[13]
[[:putobject, ""],
[:putself],
[:send, {:mid=>:a, :flag=>28, :orig_argc=>0}, false, nil],
[:dup],
[:checktype, 5],
[:branchif, :label_18],
[:dup],
[:send, {:mid=>:to_s, :flag=>20, :orig_argc=>0}, false, nil],
[:tostring],
:label_18,
[:putobject, :sym],
[:dup],
[:checktype, 5],
[:branchif, :label_31],
[:dup],
[:send, {:mid=>:to_s, :flag=>20, :orig_argc=>0}, false, nil],
[:tostring],
:label_31,
[:concatstrings, 3],
[:leave]]
Parts above are split into sections.
- There’s an “invisible” beginning of the string (
putobject ""
) - Then we have an interpolated method call
a
: - First we call
a
viasend
- then we run
checktype
instruction that checks for an argument5
that what’s popped is a string. it pushes back a boolean value - then we conditionally invoke
to_s
if the object is not a string - then we have an interpolated symbol
:sym
that gets interpolated in the same way - and finally we invoke
concatstrings 3
that doespop
3 times, concatenates 3 strings and pushes the result back to the stack
First let’s take a look at the checktype
instruction:
CHECK_TYPE = ->(klass, obj) {
klass === obj
}.curry
RB_OBJ_TYPES = {
0x00 => ->(obj) { raise NotImplementedError }, # RUBY_T_NONE
0x01 => CHECK_TYPE[Object], # RUBY_T_OBJECT
0x02 => ->(obj) { raise NotImplementedError }, # RUBY_T_CLASS
0x03 => ->(obj) { raise NotImplementedError }, # RUBY_T_MODULE
0x04 => ->(obj) { raise NotImplementedError }, # RUBY_T_FLOAT
0x05 => CHECK_TYPE[String], # RUBY_T_STRING
0x06 => ->(obj) { raise NotImplementedError }, # RUBY_T_REGEXP
0x07 => ->(obj) { raise NotImplementedError }, # RUBY_T_ARRAY
0x08 => ->(obj) { raise NotImplementedError }, # RUBY_T_HASH
0x09 => ->(obj) { raise NotImplementedError }, # RUBY_T_STRUCT
0x0a => ->(obj) { raise NotImplementedError }, # RUBY_T_BIGNUM
0x0b => ->(obj) { raise NotImplementedError }, # RUBY_T_FILE
0x0c => ->(obj) { raise NotImplementedError }, # RUBY_T_DATA
0x0d => ->(obj) { raise NotImplementedError }, # RUBY_T_MATCH
0x0e => ->(obj) { raise NotImplementedError }, # RUBY_T_COMPLEX
0x0f => ->(obj) { raise NotImplementedError }, # RUBY_T_RATIONAL
0x11 => ->(obj) { raise NotImplementedError }, # RUBY_T_NIL
0x12 => ->(obj) { raise NotImplementedError }, # RUBY_T_TRUE
0x13 => ->(obj) { raise NotImplementedError }, # RUBY_T_FALSE
0x14 => ->(obj) { raise NotImplementedError }, # RUBY_T_SYMBOL
0x15 => ->(obj) { raise NotImplementedError }, # RUBY_T_FIXNUM
0x16 => ->(obj) { raise NotImplementedError }, # RUBY_T_UNDEF
0x1a => ->(obj) { raise NotImplementedError }, # RUBY_T_IMEMO
0x1b => ->(obj) { raise NotImplementedError }, # RUBY_T_NODE
0x1c => ->(obj) { raise NotImplementedError }, # RUBY_T_ICLASS
0x1d => ->(obj) { raise NotImplementedError }, # RUBY_T_ZOMBIE
0x1e => ->(obj) { raise NotImplementedError }, # RUBY_T_MOVED
0x1f => ->(obj) { raise NotImplementedError }, # RUBY_T_MASK
}.freeze
def execute_checktype(type)
item_to_check = pop
check = RB_OBJ_TYPES.fetch(type) { raise InternalError, "checktype - unknown type #{type}" }
result = check.call(item_to_check)
push(result)
end
I blindly took it from MRI and yes, this instruction supports many types. I implemented only two of them, but the rest looks simple (except imemo
and friends). Honestly I have no idea why, but about 95% of specs from the RubySpec (only language group, I did not check the whole test suite) are passing with these missing parts. I have no idea how to trigger MRI to use them. Maybe it uses them internally?
concatstrings
looks just like newarray
:
def execute_concatstrings(count)
strings = count.times.map { pop }.reverse
push(strings.join)
end
Blocks
Blocks are passed to method calls as a third argument:
> pp RubyVM::InstructionSequence.compile('m { |a| a + 42 }').to_a[13]
[[:putself],
[:send,
{:mid=>:m, :flag=>4, :orig_argc=>0},
false,
["YARVInstructionSequence/SimpleDataFormat",
2,
6,
1,
{:arg_size=>1,
:local_size=>1,
:stack_max=>2,
:node_id=>7,
:code_location=>[1, 2, 1, 16]},
"block in <compiled>",
"<compiled>",
"<compiled>",
1,
:block,
[:a],
{:lead_num=>1, :ambiguous_param0=>true},
[[:redo, nil, :label_1, :label_9, :label_1, 0],
[:next, nil, :label_1, :label_9, :label_9, 0]],
[1,
:RUBY_EVENT_B_CALL,
[:nop],
:label_1,
:RUBY_EVENT_LINE,
[:getlocal_WC_0, 3],
[:putobject, 42],
[:send, {:mid=>:+, :flag=>16, :orig_argc=>1}, false, nil],
:label_9,
[:nop],
:RUBY_EVENT_B_RETURN,
[:leave]]]]]
Block definitely needs a frame that looks pretty much like a MethodFrame
:
BlockFrame = FrameClass.new(:arg_values) do
def initialize(arg_values:)
self._self = parent_frame._self
self.nesting = parent_frame.nesting
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: nil
).extract(arity_check: false)
end
def pretty_name
name
end
end
(For simplicity let’s ignore that blocks can also take blocks; also let’s ignore lambdas, we will return to them later)
The code above looks almost like a method frame. The only difference is the arity_check
value that we pass to the MethodArguments
class.
But when should we create this frame? And how can we get a proc from it?
VM_CALL_ARGS_BLOCKARG = (0x01 << 1)
def execute_send(options, flag, block_iseq)
_self = self
mid = options[:mid]
args = []
block =
if block_iseq
proc do |*args|
execute(block_iseq, self: _self, arg_values: args)
end
elsif flag & VM_CALL_ARGS_BLOCKARG
pop
else
nil
end
args = options[:orig_argc].times.map { pop }.reverse
recv = pop
result = recv.send(mid, *args, &block)
push(result)
end
It looks like a more generalized version of opt_send_without_block
, because opt_send_without_block
is a specialized implementation of send
.
This instruction also pops a receiver and arguments, but what’s important, it also computes the block.
- If
block_iseq
is given we create a proc that (once called) executes a block instruction (i.e. a block body) with given arguments. This block usesself
of the place where it was created. (i.e.self == proc { self }.call
always returns true) - If there’s no
block_iseq
the block can be given via a&block
argument. MRI marks method call asVM_CALL_ARGS_BLOCKARG
(this flag is just a bitmask) - and then we simply call a method with a generated proc object.
Implicit block like b = proc {}; m(&b)
does not need any additional implementation. Method proc
here takes a block (handled by the first if
branch), it gets stored in a local variable and we pass it to the method as a block argument (elseif
branch).
Lambdas
It’s complicated and I don’t have a complete solution that covers all cases (I guess because MRI does not expose enough APIs to do it. Or I’m just not smart enough).
Arrow lambda (->(){}
) is just a method call FrozenCore#lambda
, and so we can easily determine that it’s a lambda and not a proc. But what about lambda {}
? It can be overwritten.
An incomplete (and somewhat unreliable) solution is to check that our receiver does not override lambda
method inherited from a Kernel
module:
creating_a_lambda = false
if mid == :lambda
if recv.equal?(FrozenCore)
# ->{} syntax
creating_a_lambda = true
end
if recv.class.instance_method(:lambda).owner == Kernel
if Kernel.instance_method(:lambda) == RubyRb::REAL_KERNEL_LAMBDA
# an original "lambda" method from a Kernel module
creating_a_lambda = true
end
end
end
Then we can set it on our block frame as an attribute.
# in the branch that creates a proc from the `block_iseq`
proc do |*args|
execute(block_iseq, self: _self, arg_values: args, is_lambda: creating_a_lambda)
end
BlockFrame = FrameClass.new(:arg_values, :is_lambda) do
def initialize(arg_values:, is_lambda:)
self.is_lambda = is_lambda
self._self = parent_frame._self
self.nesting = parent_frame.nesting
MethodArguments.new(
iseq: iseq,
values: arg_values,
locals: locals,
block: nil
).extract(arity_check: is_lambda)
end
end
Arity check is enabled only if our proc is a lambda.
Calling a block
If you remember when we define a method we tell it to save given block in a method frame:
define_on.define_method(method_name) do |*method_args, &block|
execute(body_iseq, _self: self, method_args: method_args, block: block, parent_nesting: parent_nesting)
end
And the frame itself saves it in the attr_reader
.
So both explicit and implicit blocks are available in a method body via current_frame.block
. It’s possible to invoke it by calling block.call(arguments)
(if it’s available as an explicit block argument) or to call yield(arguments)
(in such case it does not even have to be declared in a method signature).
pp RubyVM::InstructionSequence.compile('def m; yield; end').to_a[13][4][1][13]
[[:invokeblock, {:mid=>nil, :flag=>16, :orig_argc=>0}],
[:leave]]
Honestly even before I started working on this article I expected MRI to do something like this. yield
is equivalent to <current block>.call(args)
:
def execute_invokeblock(options)
args = options[:orig_argc].times.map { pop }.reverse
frame = current_frame
frame = frame.parent_frame until frame.can_yield?
result = frame.block.call(*args)
push(result)
end
Do you see frame = frame.parent_frame until frame.can_yield?
? The reason for this line is that you may have a code like
def m
[1,2,3].each { |item| yield item }
end
^ yield
here belongs to the method m
, not to the BlockFrame
of the .each
method. There can be more nested blocks, so we have to go up until we see something that supports yield
. Well, we know that only one frame can do yield
: it’s a MethodFrame
.
Our frame class factory need to be extended to generate this method by default and return false from it. MethodFrame
has to override it and return true
. Polymorphism!
Super
Calling super
is very similar to calling yield
: it can be replaced with method(__method__).super_method.call(args)
.
__method__
can be retrieved from current_frame.name
, args
are processed using options[:orig_argc]
:
def execute_invokesuper(options, _, _)
recv = current_frame._self
mid = current_frame.name
super_method = recv.method(mid).super_method
args = options[:orig_argc].times.map { pop }.reverse
result = super_method.call(*args)
push(result)
end
This implementation is incorrect, it can’t handle a sequence of super
calls (class A < B < C
, each has a method that calls super
). I guess it’s possible to implement it by recording the class where the method was defined (i.e. by storing current_frame._self
before calling define_method
and passing it to the MethodFrame
constructor as a defined_in
attribute). This way we could do something like this:
def execute_invokesuper(options, _, _)
recv = current_frame._self
mid = current_frame.name
dispatchers = recv.class.ancestors
current_dispatcher_idx = dispatchers.index(current_frame.defined_in)
next_dispatcher = dispatchers[current_dispatcher_idx + 1]
super_method = next_dispatcher.instance_method(mid).bind(recv)
args = options[:orig_argc].times.map { pop }.reverse
result = super_method.call(*args)
push(result)
end
I did not implement it because MSpec does not rely on it and I usually try to avoid sequences of super
calls.
Global variables
Similar to locals and instance variables, there are getglobal
/setglobal
instructions. They also take a variable name as an argument.
Unfortunately, Ruby has no API to dynamically get/set global variables. But we have eval
!
def execute_getglobal((name))
push eval(name.to_s)
end
def execute_setglobal((name))
# there's no way to set a gvar by name/value
# but eval can reference locals
value = pop
eval("#{name} = value")
end
defined?
keyword
As you may know this keyword can handle pretty much anything:
> pp RubyVM::InstructionSequence.compile('defined?(42)').to_a[13]
[:putobject, "expression"],
[:leave]]
> pp RubyVM::InstructionSequence.compile('a = 42; defined?(a)').to_a[13]
[[:putobject, 42],
[:setlocal_WC_0, 3],
[:putobject, "local-variable"],
[:leave]]
In some simple cases it does not do any computations. It’s obvious that 42
is an expression and a
is a local variable (and there’s no way to remove it by any code between assignment and defined?
check)
More advanced checks use a defined
instruction:
> pp RubyVM::InstructionSequence.compile('@a = 42; defined?(@a)').to_a[13]
[[:putobject, 42],
[:setinstancevariable, :@a, 0],
[:putnil],
[:defined, 2, :@a, true],
[:leave]]
The first argument is a special enum
flag that specifies what are we trying to check here:
module DefinedType
DEFINED_NOT_DEFINED = 0
DEFINED_NIL = 1
DEFINED_IVAR = 2
DEFINED_LVAR = 3
DEFINED_GVAR = 4
DEFINED_CVAR = 5
DEFINED_CONST = 6
DEFINED_METHOD = 7
DEFINED_YIELD = 8
DEFINED_ZSUPER = 9
DEFINED_SELF = 10
DEFINED_TRUE = 11
DEFINED_FALSE = 12
DEFINED_ASGN = 13
DEFINED_EXPR = 14
DEFINED_IVAR2 = 15
DEFINED_REF = 16
DEFINED_FUNC = 17
end
I’ll show you the branch that handles instance variables:
def execute_defined(defined_type, obj, needstr)
# used only in DEFINED_FUNC/DEFINED_METHOD branches
# but we still have to do `pop` here (even if it's unused)
context = pop
verdict =
case defined_type
when DefinedType::DEFINED_IVAR
ivar_name = obj
if current_frame._self.instance_variable_defined?(ivar_name)
'instance-variable'
end
# ... other branches
end
push(verdict)
end
All other branches are similar, they do some check and push a constant string or nil
back to the stack.
Range literals
For static ranges (like (1..2)
) Ruby uses inlining and a well-known putobject
instruction. But what if it’s dynamic? Like (a..b)
> pp RubyVM::InstructionSequence.compile('a = 3; b = 4; p (a..b); p (a...b)').to_a[13]
[[:putobject, 3],
[:setlocal_WC_0, 4],
[:putobject, 4],
[:setlocal_WC_0, 3],
[:putself],
[:getlocal_WC_0, 4],
[:getlocal_WC_0, 3],
[:newrange, 0],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:pop],
[:putself],
[:getlocal_WC_0, 4],
[:getlocal_WC_0, 3],
[:newrange, 1],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:leave]]
There’s a special newrange
instruction that takes a flag as an argument to specify inclusion of the right side (i.e. to distinguish ..
vs ...
)
def execute_newrange(flag)
high = pop
low = pop
push(Range.new(low, high, flag == 1))
end
Long jumps
This is probably the most complicated part. What if you have a method that has a loop inside a loop that does return
? You want to stop executing both loops and simply exit the method, right?
def m
2.times do |i|
3.times do |j|
return [i,j]
end
end
end
m
Of course you can just find the closest frame that supports return
(i.e. a MethodFrame
), but you also need to stop execution of two running methods and blocks. In our case it’s even more complicated because we don’t control them (they are written in C).
The only way I was able to find is to throw an exception. An exception destroys all frames (including YARV’s C frames) until it finds someone who can catch and handle it. If there’s no such frame the programs exits with an error.
Let’s create a special exception class called VM::LongJumpError
. Each frame class has to know what it can handle (for example, you can do break
in a block, but not in a method; return
is normally supported only by methods and lambdas, etc):
class LongJumpError < InternalError
attr_reader :value
def initialize(value)
@value = value
end
def do_jump!
raise InternalError, 'Not implemented'
end
def message
"#{self.class}(#{@value.inspect})"
end
end
class ReturnError < LongJumpError
def do_jump!
frame = current_frame
if frame.can_return?
# swallow and consume
frame.returning = self.value
else
pop_frame(reason: "longjmp (return) #{self}")
raise self
end
end
end
Each longjmp
exception wraps the value that it “returns” with (or “breaks” with, for break
we need a separate class, but I’m going to skip it here. break
/next
and other friends are really similar to return
).
But we need to catch them, right? Without a rescue
handler we will have something conceptually similar to segfault:
def execute(iseq, **payload)
iseq = ISeq.new(iseq)
push_frame(iseq, **payload)
# here comes the difference:
# we wrap executing instructions into a rescue handler
begin
evaluate_last_frame
rescue LongJumpError => e
e.do_jump!
end
pop_frame
end
The only missing thing is the implementation of can_return?
method in our frames. All frames except MethodFrame
(and BlockFrame
it it’s marked as lambda
) must return false
, MethodFrame
must return true
.
MRI uses a special instruction called throw
that has a single argument that is a throw_type
(an enum
, for return
it’s 1, break
is 3, next
is 4, there are also retry
/redo
and a few more). The value that must be attached to the thrown exception comes from the stack (so this instruction does a single pop
)
VM_THROW_STATE_MASK = 0xff
RUBY_TAG_NONE = 0x0
RUBY_TAG_RETURN = 0x1
RUBY_TAG_BREAK = 0x2
RUBY_TAG_NEXT = 0x3
RUBY_TAG_RETRY = 0x4
RUBY_TAG_REDO = 0x5
RUBY_TAG_RAISE = 0x6
RUBY_TAG_THROW = 0x7
RUBY_TAG_FATAL = 0x8
RUBY_TAG_MASK = 0xf
def execute_throw(throw_type)
throw_type = throw_type & VM_THROW_STATE_MASK
throw_obj = pop
case throw_type
when RUBY_TAG_RETURN
raise VM::ReturnError, throw_obj
when RUBY_TAG_BREAK
raise VM::BreakError, throw_obj
when RUBY_TAG_NEXT
raise VM::NextError, throw_obj
# ...
end
end
longjmp
in MRI
But does it work in the same way in MRI? C does not have exceptions. And at the same time there is a bunch of places where MRI does something like
if (len > ARY_MAX_SIZE) {
rb_raise(rb_eArgError, "array size too big");
}
// handle validated data
This rb_raise
somehow exits a C function. Well, here’s the trick: non-local goto
.
There are C calls that perform a goto
to any place (that was previously marked of course, similar to jump to labels for a local goto
):
setjmp()
saves the stack context/environment inenv
for later use bylongjmp
… also known as “context switch”. And it’s relatively expensive.
Even if you don’t raise
an exception and only do begin; ...; rescue; end
in your code you still have to save the context (to jump to it once you raise
an error). MRI does not know at compile time which methods can throw an error (and do you throw them at all), so each rescue
produces a setjmp
call (and each raise
triggers a longjmp
and passes closest rescue
-> saved env
as an argument)
rescue
/ensure
So now we know that raise/rescue works via long jumps under the hood. Let’s implement our own exceptions.
By sticking to MRI exceptions we can unwrap both internal and our stacks at the same time. I’m not going to override raise
, it should do what it originally does, but we still need to support our own rescue
blocks. Let’s see what MRI gives us:
> pp RubyVM::InstructionSequence.compile('begin; p "x"; rescue A; p "y"; end').to_a
[ # ...snip
[[:rescue,
[ # ...snip
"rescue in <compiled>",
"<compiled>",
"<compiled>",
1,
:rescue,
[:"\#$!"],
{},
[],
[1,
[:getlocal_WC_0, 3],
[:putnil],
[:getconstant, :A],
[:checkmatch, 3],
[:branchif, :label_11],
[:jump, :label_19],
:label_11,
:RUBY_EVENT_LINE,
[:putself],
[:putstring, "y"],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
[:leave],
:label_19,
[:getlocal_WC_0, 3],
[:throw, 0]]],
:label_0,
:label_7,
:label_8,
0],
[:retry, nil, :label_7, :label_8, :label_0, 0]],
[:label_0,
1,
:RUBY_EVENT_LINE,
[:putself],
[:putstring, "x"],
[:send, {:mid=>:p, :flag=>20, :orig_argc=>1}, false, nil],
:label_7,
[:nop],
:label_8,
[:leave]]]
An instruction sequence that has some rescue
blocks inside includes all information about them (in the element #12, right above an instructions list). Each rescue handler is a frame with its own list of variables and instructions. Its kind
is :rescue
and is has at least one local variable: $!
. It starts with a dollar sign, but it’s a local variable. According to its semantics it has to be a local variable, but unfortunately it can’t look like a local variable (because it’d would potentially conflict with method calls). I mean, that’s how I explain it to myself, I don’t know for sure what was the initial reason to design it this way.
It also has a few labels at the bottom - :label_7, :label_8, :label_0
:
- the first label is a “begin” label. It marks where the (potentially) critical section of your code begins
- the second label is an “end” label
- the third label is an “exit” label. It marks where we should jump to if the error has been caught and handled.
A top-level instruction also contains these labels, and the meaning of them is:
- if we evaluate instructions and we see a label that is a “begin” label of some rescue handler we enable the handler
- if we see a label that is an “end” of some rescue handler we disable it
- if we execute a single instruction and catch an exception we:
- iterate over all enabled rescue handlers
- and for each of them we push a
RescueFrame
with a rescue iseq - and we set a
$!
variable in this frame to the error that we have just caught - the rescue frame decides where to go next:
- back to the original frame via
jump
to the “exit” label after doingpop_frame
- or somewhere else via re-raise (if the iseq contains it)
- back to the original frame via
Let’s code it!
class ISeq
attr_reader :rescue_handlers
attr_reader :ensure_handlers
def initialize(ruby_iseq)
@ruby_iseq = ruby_iseq
reset!
setup_rescue_handlers!
setup_ensure_handlers!
end
# ... other existing methods on ISeq class
def setup_rescue_handlers!
@rescue_handler = @ruby_iseq[12]
.select { |handler| handler[0] == :rescue }
.map { |(_, iseq, begin_label, end_label, exit_label)| Handler.new(iseq, begin_label, end_label, exit_label) }
end
def setup_ensure_handlers!
@ensure_handler = @ruby_iseq[12]
.select { |handler| handler[0] == :ensure }
.map { |(_, iseq, begin_label, end_label, exit_label)| Handler.new(iseq, begin_label, end_label, exit_label) }
end
class Handler
attr_reader :iseq
attr_reader :begin_label, :end_label, :exit_label
def initialize(iseq, begin_label, end_label, exit_label)
@iseq = iseq
@begin_label, @end_label, @exit_label = begin_label, end_label, exit_label
end
end
end
So now each iseq
object has two getters:
rescue_handlers
ensure_handlers
Frames must know which handlers are active (but not instruction sequences, because methods can recursively call themselves and so the same iseq will be reused; it’s a per-frame property):
class FrameClass
def self.new
Struct.new {
# Both must be set to `Set.new` in the constructor
attr_accessor :enabled_rescue_handlers
attr_accessor :enabled_ensure_handlers
}
end
end
So this way all frames have it too. Every time when we see a label in our execution loop we need to check if it matches any begin_label
or end_label
of our current_frame.iseq.rescue_handlers
(or ensure_handlers
):
def on_label(label)
{
current_iseq.rescue_handlers => current_frame.enabled_rescue_handlers,
current_iseq.ensure_handlers => current_frame.enabled_ensure_handlers,
}.each do |all_handlers, enabled_handlers|
all_handlers
.select { |handler| handler.begin_label == label }
.each { |handler| enabled_handlers << handler }
all_handlers
.select { |handler| handler.end_label == label }
.each { |handler| enabled_handlers.delete(handler) }
end
end
side note: when we do a local jump
we should also walk through skipped instructions and enable/disable our handlers; this is important
OK, now the only missing part is the reworked execution loop:
# a generic runner that is used inside the loop
def execute_insn(insn)
case insn
when [:leave]
current_frame.returning = stack.pop
when Array
name, *payload = insn
with_error_handling do
send(:"execute_#{name}", payload)
end
# -- new branch for labels --
when Regexp.new("label_\\d+")
on_label(insn)
else
# ignore
end
end
# a wrapper that catches and handles error
def with_error_handling
yield
rescue VM::InternalError => e
# internal errors like LongJumpError should be invisible for users
raise
rescue Exception => e
handle_error(e)
end
def handle_error(error)
if (rescue_handler = current_frame.enabled_rescue_handler.first)
result = execute(rescue_handler.iseq, caught: error, exit_to: rescue_handler.exit_label)
stack.push(result)
else
raise error
end
end
# This guy also needs customization to support `jump(exit_label)`
def pop_frame
frame = @frame_stack.pop
if frame.is_a?(RescueFrame)
jump(frame.exit_to)
end
frame.returning
end
# And here's the rescue frame implementation
RescueFrame = FrameClass.new do
attr_reader :parent_frame, :caught, :exit_to
def initialize(parent_frame:, caught:, exit_to:)
self._self = parent_frame._self
self.nesting = parent_frame.nesting
@parent_frame = parent_frame
@caught = caught
@exit_to = exit_to
# $! always has an ID = 3
locals.declare(id: 3, name: :"\#$!")
locals.find(id: 3).set(caught)
end
end
But why do we handle only the first handler? Can there be multiple handlers? The answer is no, because:
- MRI merges multiple consecutive
rescue
handlers (by usingcase error
branching in a rescue body) rescue
itself is frame, and so nestedrescue
is a rescue handler of the rescue handler
throw
/catch
methods
As a side (and I personally think a very interesting) note, while I was working on this project I have realized that specs for Kernel#throw
are not working for me at all. They were literally completely broken (even after I finished working on a very basic implementation of exceptions):
catch(:x) do
begin
throw :x
rescue Exception => e
puts e
end
end
This code does not print anything. However, if you do just throw :x
you get an exception:
> throw :x
Traceback (most recent call last):
2: from (irb):14
1: from (irb):14:in `throw'
UncaughtThrowError (uncaught throw :x)
Huh, what’s going on? Let’s take a look at the implementation of Kernel#throw
:
void
rb_throw_obj(VALUE tag, VALUE value)
{
rb_execution_context_t *ec = GET_EC();
struct rb_vm_tag *tt = ec->tag;
while (tt) {
if (tt->tag == tag) {
tt->retval = value;
break;
}
tt = tt->prev;
}
if (!tt) {
VALUE desc[3];
desc[0] = tag;
desc[1] = value;
desc[2] = rb_str_new_cstr("uncaught throw %p");
rb_exc_raise(rb_class_new_instance(numberof(desc), desc, rb_eUncaughtThrow));
}
ec->errinfo = (VALUE)THROW_DATA_NEW(tag, NULL, TAG_THROW);
EC_JUMP_TAG(ec, TAG_THROW);
}
This code raises an instance of rb_eUncaughtThrow
only in one case: if there’s no frame above that may “catch” it (like in the case when we did just throw :x
).
However if there’s a frame somewhere above that has the same tag MRI performs a manual longjmp
. This is why we can’t catch this exception. There’s simply no exception if there’s a catch(:x)
above (but there would be an exception if we would do catch(:y) { throw :x }
).
Is it faster? Let’s see
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(:time => 3)
x.report('raise') do
begin
raise 'x'
rescue => e
end
end
x.report('throw') do
catch(:x) do
throw :x
end
end
x.compare!
end
$ ruby benchmark.rb
Warming up --------------------------------------
raise 101.832k i/100ms
throw 256.893k i/100ms
Calculating -------------------------------------
raise 1.310M (± 4.2%) i/s - 3.971M in 3.037942s
throw 4.853M (± 3.0%) i/s - 14.643M in 3.020227s
Comparison:
throw: 4852821.4 i/s
raise: 1309915.6 i/s - 3.70x slower
As expected, it’s faster. But what’s more important, let’s see what if there are more frames between raise/rescue
and throw/catch
. First, let’s create a small method that wraps given code into N frames:
def in_n_frames(depth = 0, n, blk)
if depth == n
blk.call
else
in_n_frames(depth + 1, n, blk)
end
end
begin
in_n_frames(1000, proc { raise 'err' })
rescue => e
p e.backtrace.length
end
It prints 1003
, because there’s also a TopFrame
and a BlockFrame
(and a small bug that does +1), but that’s absolutely fine for us.
$ ruby benchmark.rb
Warming up --------------------------------------
raise 1.061k i/100ms
throw 1.115k i/100ms
Calculating -------------------------------------
raise 10.628k (± 1.3%) i/s - 32.891k in 3.095347s
throw 11.183k (± 1.5%) i/s - 34.565k in 3.091514s
Comparison:
throw: 11183.1 i/s
raise: 10627.8 i/s - 1.05x slower
There’s almost no difference! The reason is simple: the only thing that is different is creation of the exception object. throw
does not do it.
Utility instructions
There are also a few interesting instructions that MRI uses to evaluate your complex code:
adjuststack(n)
- doesn.times { pop }
nop
- literally does nothingdupn(n)
- doespop
N times and then pushes them twice (or basically duplicates N last items)setn(n)
- doesstack[-n-1] = stack.top
topn(n)
- doespush(stack[-n-1])
swap
- swaps top two stack elementsdup
- likedupn(1)
reverse(n)
- reverses N stack elements (i.e. doesn.times.map { pop }.each { |value| push(value) }
)
Final words
First of all, I’d like to say thank you to everyone who made YARV. I was not able to find a single place where MRI behaves inefficiently (and I spent many hours looking into instructions).
Once again, the code is available here , feel free to create issues/message me in Twitter. And please, don’t use it in the real world.