Before we start
Disclaimer #1 first of all I’d like to say that I really like Ruby. I write a ton of Ruby code every single day and I prefer it over other languages. Please, do not take it seriously, Ruby is nice, and this post is mostly a joke.
Disclaimer #2 I’m not going to cover popular things like flip-flops (thanks God they are deprecated in 2.6.0).
I was thinking for a while which item should go first, but finally I had to give up. I think all items are funny.
Regexp ‘o’ flag
I don’t even know if there’s anyone in the world using it. o
flag is a very, very magical thing that “freezes” a regexp after parsing:
pry> 1.upto(5) { |i| puts /#{i}/o.source }
1
1
1
1
1
pry> 3.times.map { |i| /#{i}/o.object_id }
=> [70135960411140, 70135960411140, 70135960411140]
That’s a special syntax to define an inline regexp as a constant. It is a constant because its value is constant (object_id
returns the same value). I think the main purpose of such flag is to reduce objects allocation, and I believe it was not initially designed for such cases. If you are too lazy to extract a static regexp to a constant, simply add an o
flag.
Invalid encoding
Well, I have to confess, sometimes I hate Ruby for various reasons, this feature is one of them.
# encoding: utf-8
s = "\xff"
puts s.encoding
puts s.valid_encoding?
puts s.bytes
$ ruby test.rb
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
UTF-8
false
255
In this case string is not a “real” string. This bytes sequence is simply invalid for UTF-8 (in UTF-8 any codepoint > 127 works as a flag that indicates that the char is multi-byte and the next char (or chars) defines a real value), but Ruby allows it. It’s not even a String, it’s just a container of bytes. And for some reason Ruby allows you to pack an arbitrary sequence of bytes into a string, and if you want to ask “Is it valid?” you have a (I think) conceptually wrong String#valid_encoding?
method. Maybe the right way to solve it would be to:
- reject such strings during parsing (and throw
SyntaxError
) - raise an error when someone tries to put wrong bytes sequence to the string
- remove
String#valid_encoding?
method
Nested HEREdocs
p <<"A#{b}C"
#{
<<"A#{b}C"
A#{b}C
}
str
A#{b}C
I’m quite sure that there are no syntax highlighters that can properly handle this code. At the moment of writing GitHub is unable to do that. Try evaluating this code in IRB.
Setters and return values
As you most probably know in Ruby setters can’t have return values. They always return their arguments:
def m=(a)
return 42
end
self.m = 'return me'
# => "return me"
Yes, you can make a return by calling a setter method using Kernel#send
:
pry> send(:m=, 'return me')
=> 42
So, the general rule is like “if you call a setter method without :send you can’t make a return”. Wrong!
pry> def []=(a); 42; end
pry> self.[]= 'return me'
=> 42
I can’t imagine any reason to use such syntax, most probably it should be deprecated.
Passing blocks to the [] method
Imagine the following piece of code:
def [](idx, &block)
puts idx + block.call
end
self[1] { 2 }
Looks valid, right? We pass a positional argument 1
and a block that returns 2
to the method called []
. The method should print 3
. Let’s run it with Ruby 2.5:
$ ruby -v test.rb
2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
3
1 + 2 == 3
, everything is fine. Let’s try 2.4:
$ ruby -v test.rb
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-darwin17]
test.rb:5: syntax error, unexpected { arg, expecting end-of-input
self[1] { 2 }
^
Yes, this syntax was introduced in Ruby 2.5. Did you hear any announcements about it?
Spoiler: there are no tests for this syntax in ruby/ruby repository. Guess why?
Global variables
Let’s take a simple code:
pry> [1, 2, 3].join
=> "123"
Do you know anything about “pure” functions (or pure methods in this case)? Ideally methods should only use self
and provided arguments. Relying on any global state is bad because you are not the only one who can mutate it.
pry> $, = 'Ruby'
pry> [1, 2, 3].join
=> "1Ruby2Ruby3"
String#join
uses a global variable as a default value for the separator. Literally def join(sep = $,)
.
By the way, maybe I should put the following code to my current project before leaving it. How much time is needed to find it?
if rand > 0.5
$, = [102, 105, 110, 100, 32, 109, 101].pack('c*')
end
After doing require 'english'
you get two aliases for this global variable: $OUTPUT_FIELD_SEPARATOR
and $OFS
, that’s the real name of this global variable.
Instance variables without @ prefix
Spoiler: you may think that it’s impossible because the parser rejects such code. But in fact, Ruby allows it and there are even some specs for this - RubySpec
. I don’t know much about Ruby internals, but at least one class uses instance variables without @
, it’s called Range
. (0..3)
has 3 instance variables:
excl
= falsebegin
= 0end
= 3
“Any proves?” - let’s marshal it:
pry> Marshal.dump(0..3)
=> "\x04\bo:\nRange\b:\texclF:\nbegini\x00:\bendi\b"
This string contains a version (4.8), an indicator of the object (o
), a symbol :Range
, a hash of instance variables { excl: false, begin: 0, end: 3 }
.
Let’s change a class name a bit (but keep the same length to not break anything):
pry> class Hello; end
pry> Marshal.load("\x04\bo:\nHello\b:\texclF:\nbegini\x00:\bendi\b")
^^^^^
=> #<Hello:0x00007fece707a9c8>
Now we can change, for example, excl
to @one
(again, to keep the length the same):
pry> Marshal.load("\x04\bo:\nHello\b:\t@oneF:\nbegini\x00:\bendi\b")
^^^^
=> #<Hello:0x00007fece70533a0 @one=false>
pry> _.instance_variables
=> [:@one]
The conclusion is simple: excl
is an instance variable, but Kernel#instance_variables
hides it.
Kernel#instance_variable_get / set
and Ruby Lexer are the places that validate instance variable names. Low-level C calls don’t do it and in general when you write a C extension you may easily get an instance variable without @
char.
You can read my article about marshalling to get a full overview of its internals.
Implicit coercing
As you may know there are two types of coercing in Ruby: explicit and implicit.
Explicit is when you call methods like to_a
, to_h
, to_s
. When the object is not an Array/Hash/String, but can become it.
Implicit is when Ruby calls methods like to_ary
, to_hash
, to_str
for you. When the object acts as an Array/Hash/String and converting it to the corresponding class must happen automatically.
There are a lot of methods in the core library that are documented as “taking a String as an argument” but in fact they accept any objects that can be implicitly converted to a String.
pry> o = Object.new
pry> def o.to_str; "hello"; end
pry> "string" + o
=> "stringhello"
pry> "hello".casecmp(o)
=> 0
pry> "testhello".chomp(o)
=> "test"
pry> "hellostr".delete_prefix(o)
=> "str"
pry> "hello world".start_with?(o)
=> true
There are more methods like this, and not only for String.
Also, there’s a way to implicitly convert an abstract Object to
- Array (using
*
) - Hash (using
**
) - Proc (using
&
)
Sometimes it can be ridiculous:
class User
def to_ary; [1, 2, 3]; end
def to_hash; { a: 1 }; end
def to_proc; proc { 42 }; end
end
def m(*rest, **kwargs, &block)
rest.length + kwargs.length + block.call
end
user = User.new
m(*user, **user, &user)
# => 44
I’m not sure that this feature is required. But remember, that’s only my opinion.
Explicit coercing is explicit and forces you to call to_a/to_h/to_s
manually. Probably it would be better to restrict */**/&
operators to accept only Array/Hash/Proc
objects (and to be as strict as possible).
Implicit to_a
Previous section says that Ruby never invokes methods for explicit coercing on its own. There’s one exception: to_a
method.
o = Object.new
def o.to_a; [1, 2, 3]; end
def o.to_ary; [4, 5, 6]; end
a, b, c = o
p [a, b, c]
# [4, 5, 6]
# so, it calls to_ary, that's an implicit coercing
a, b, c = *o
p [a, b, c]
# [1, 2, 3]
# it calls to_a !! an explicit coercing gets called by Ruby
For some reason the concept of implicit/explicit coercing does not work for this case.
HEREdoc identifiers and newlines
The section about nested HEREdocs shows a HEREdoc identifier that has an interpolation inside. Also, it’s possible to use "\n"
:
p <<"HERE
"
content
HERE
it prints "content\n"
. For some reason newline is not allowed in the middle of the HEREdoc identifier (and don’t get me wrong, I think that newlines should be rejected, no matter in the middle or in the end).
1if true
Yes, that’s a valid syntax. Ruby has very special rules for white-spaces and newlines. 1i
is a special syntax for complex numbers, but 1if true
is 1 if true
. There’s also a 1r
syntax for rational numbers, and yes, 1rescue nil
is 1 rescue nil
.
pry> 1if true
1
pry> 1rescue nil
1
But what about 1ri
?
pry> 1rif true
SyntaxError: (syntax error, unexpected tIDENTIFIER, expecting end-of-input)
1rif true
^~~
Sweet. Bonus:
pry> def m; 1return; end
SyntaxError : (syntax error, unexpected keyword_return, expecting keyword_end)
def m; 1return; end
^~~~~~
pry> def m; 1retry; end
SyntaxError: (syntax error, unexpected keyword_retry, expecting keyword_end)
def m; 1retry; end
^~~~~
pry> def m; (1redo; end
SyntaxError: syntax error, unexpected keyword_redo, expecting keyword_end)
def m; 1redo; end
^~~~
Looks like there are special rules for keyword modifiers.
defined?
I think this is the most controversial keyword in Ruby. It takes literally everything as an argument.
pry> defined?(self)
=> "self"
pry> defined?(nil)
=> "nil"
pry> defined?(true)
=> "true"
pry> defined?(false)
=> "false"
pry> defined?(a = 1)
=> "assignment"
pry> a
=> nil
pry> a = 1; defined?(a)
=> "local-variable"
pry> defined?(begin; 1; 2; 3; end)
=> "expression"
pry> defined?(self.m)
=> nil
pry> def m; end; defined?(self.m)
=> "method"
pry> module M; def m; end; end
pry> include M
pry> def m; defined?(super); end
pry> m
=> "super"
It also can return yield
, constant
, class variable
, instance-variable
and global-variable
. By the way, where’s the dash in the class variable
?
That’s a strong violation of a single responsibility principle. This keyword can handle EVERYTHING!
Moreover, it handles all kinds of exceptions inside:
pry> a = Object.new
pry> defined?(
a.b.c.d +
MissingConstant +
yield +
super +
nil * 2 +
eval("!@#$%^") +
require('missing_file')
)
=> nil
That’s too much for a single keyword.
return
in the class/module body
You can’t call return
from a module/class body:
pry> class A; return 1; end
SyntaxError (Invalid return in class/module body)
class A; return 1; end
^~~~~~
pry> module A; return 1; end
SyntaxError (Invalid return in class/module body)
module A; return 1; end
^~~~~~
It throws a SyntaxError
, i.e. even if the code is unreachable, you still can’t write it, it’s simply invalid.
But you can use return
in a singleton class body:
pry> class << self; return; end
LocalJumpError: unexpected return
Now that’s a LocalJumpError
, so this code can be interpreted if nobody touches it:
pry> class << self; return; end if false
=> nil
Meta-characters
Again, this is something that probably could be removed from Ruby, I don’t know anyone using it.
Meta-character is a special sequence of characters that gets interpreted as a single character. Most probably you know one of them - \uDDDD
. But there are more:
pry> "\u1234"
=> "ሴ"
pry> "\377"
=> "\xFF"
pry> "\xFF"
=> "\xFF"
pry> "\C-\a"
=> "\a"
pry> "\ca"
=> "\u0001"
pry> "\M-a"
=> "\xE1"
pry> "\C-\M-f"
=> "\x86"
pry> "\M-\cf"
=> "\x86"
pry> "\c\M-f"
=> "\x86"
That’s absolutely insane! Moreover, Ruby starting from 2.6 ignores spaces (and tabs) around codepoints in the \u{}
syntax:
pry> "\u{ 123 456 }"
=> "ģі"
Invisible rest argument
MRI has a special rule for Proc
class: it expands a single array argument:
pry> proc { |a, b| [a, b] }.call([1, 2])
=> [1, 2]
… but only if it the proc takes more than one argument. And if the arity is 1 it works as you’d expect:
pry> proc { |a| [a] }.call([1, 2])
=> [[1, 2]]
And here’s an edge case: it’s possible to put a trailing comma after arguments list:
pry> proc { |a,| [a] }.call([1, 2])
=> [1]
… and MRI still expands an array. So how many arguments does this proc have?
pry> proc{|a,|}.arity
=> 1
What’s going on?
The answer is simple: there’s an invisible rest argument after trailing comma. The real interface of this proc is:
proc { |a, *| }
MRI generates it for you and then hides it.
If you are interested in implementation details take a look at parse.y
- there’s a special field excessed_comma
that works as a flag.
Also, you can clearly see in the Ripper’s output:
pry> require 'ripper'
pry> Ripper.sexp('proc{|a|}')[1][0][2][1][1]
=> [:params, [[:@ident, "a", [1, 6]]], nil, nil, nil, nil, nil, nil]
pry> Ripper.sexp('proc{|a,|}')[1][0][2][1][1]
=> [:params, [[:@ident, "a", [1, 6]]], nil, 0, nil, nil, nil, nil]
Do you see the difference?
Dynamicity of optarg
default values
In Ruby optional arguments are very, very powerful. You can pass pretty much anything as a default value of the argument in the method signature:
pry> def m(a = (puts 'no a')); a; end
pry> m(1)
=> 1
pry> m
no a
=> nil
I don’t like it in general. I would say it’s too powerful, you can abuse this feature and do some really crazy stuff. For example like this:
def m(a = (return 1; nil))
return 2
end
What does this method return when you call it without any arguments? Yep, it returns 1
.
The reason why it works this way is that MRI inlines optional arguments initialization to the method body, so for VM this code actually looks like:
def m(a = nil)
a ||= (return 1; nil)
return 2
end
You can even go further and redefine a method in its arguments:
def factorial(
n,
redefinition = <<-RUBY,
define_method(__method__) do |
_ = (return 1 if n == 1; nil),
_ = eval(redefinition),
_ = (return n * (n -= 1; send(__method__)); nil)
|
end
RUBY
_ = eval(redefinition),
_ = (return send(__method__); nil)
)
# <<EMPTY BODY>>
end
p factorial(5)
Yes, this method has no body but is still capable of calculating factorial.
Shadow arguments
I think only people that work with parsing tools are aware of this feature. That’s a special kind of argument that “shadows” outer variable. The syntax is |;shadowarg|
:
pry> n = 1; proc { n }.call
=> 1
pry> n = 1; proc { |;n| n }.call
=> nil
Basically, it’s nice to have an ability to use own isolated set of local variables in your block and be sure that you don’t change an outer scope. But again, does anyone use it? And also it reminds me a var
keyword from the JavaScript.
Dynamicity of rescue
Take a look at the following code:
begin
raise 'error message'
rescue => RuntimeError
puts 'caught an error'
end
Looks correct from the first glance, right? And it even prints caught an error
, but in fact it has an invalid code construction. It is valid from the parser perspective, but I think you don’t want to write such code, it redefines a constant RuntimeError
.
Ruby has a very tricky mechanism of converting getters to setters. It can convert
local variable get
tolocal variable set
(most popular usage ofrescue
handler)instance variable get
toinstance variable set
const get
toconst set
getter method
tosetter method
- and many more like global/class variables
So if you have object = OpenStruct.new
and you catch an error using rescue => object.field
you’ll get object.field = <thrown error>
called under the hood.
That’s definitely very, very flexible but does anyone need it? I’d better reject all cases above except local variables.
I have seen the first snippet in the real codebase and it was quite difficult to understand why the spec that asserts something like expect { code construction }.to raise_error(RuntimeError)
does not work.
Positional/keyword arguments
I used to think that positional and keyword arguments act like two completely separate groups of arguments. If the last argument is a Hash and you pass it to the method call it
- populates all keyword arguments
- raises an error if some keyword arguments are missing
- sets default values to missing optional keyword arguments
But I was wrong. One argument value can populate both positional and keyword argument at the same item:
def m(a = 1, b: 1)
[a, b]
end
p m(b: 2, 'b' => 3)
# => [{"b"=>3}, 2]
I feel like it’s a bug:
- There’s only one argument provided
- And some of its keys are not symbols
- So MRI should not use it for keyword arguments initialization
- And
a
must be{ b: 2, 'b' => 3 }
- And so
b
must be just1
(default value)
Final words
This story is not about bad parts of Ruby or anything like that. Don’t feel bad because of this - I’m really sorry. I was trying to cover some rarely used features and explain as much as I can.