Ruby for the uninitiated

A friend’s Slack bot has a command that requests information from a variety of sources, and it took around 10 seconds to complete. The bot is written in Ruby, and while I know the basics of the language, I’m not familiar with the tooling, and haven’t worked on any non-toy Ruby projects.

But I wanted to speed it up. It’s an embarrassingly parallel problem, so when I saw that the requests were processed serially, some easy wins came to mind, that even I should be able to implement. But first, I had to set up the environment, something that’s always been frustrating for me in Ruby land.

This time, I did it the right way, so I’m writing this for other newcomers, and for my future self.

Installing the prerequisites

First things first, we’re gonna need a Ruby version manager. This solves a couple of problems with the default macOS Ruby setup:

The first candidate was RVM but I didn’t like how it required me to install things in an unmanaged manner, so I opted for rbenv, which is packaged for Homebrew:

> brew install rbenv

The installation doesn’t take care of everything, so run rbenv init to complete the setup. The result of this depends on your shell; on fish, it says:

# Load rbenv automatically by appending
# the following to ~/.config/fish/config.fish:
status --is-interactive; and source (rbenv init -|psub)

So I did, and rbenv setup was finished. With that out of the way, I installed the latest version of MRI like so:

> rbenv install 2.4.1

This compiles MRI 2.4.1 from source, and took quite a while on my machine, so don’t worry if there’s an no visual feedback during that time. When this finished, I switched the active version with rbenv global 2.4.1, and installed the project dependencies:

> gem install bundler
> bundle

And… everything explodes, filling my screen with red text. The key thing seemed to be error: cannot run C compiled programs, which is misleading—the problem is that it can’t compile gems written in C. In my case, xcode-select --install fixed the problem, by installing the Xcode command line tools.

Now, I ran bundle again, and once again it blew up in my face, with errors like:

generator.c:861:25: error: ‘rb_cFixnum’ undeclared (first use in this function)
     } else if (klass == rb_cFixnum) {
                                  ^

Oh, C, something I’m intimately familiar with. This error is due to the integer unification, that happened in Ruby 2.4. The JSON gem used by this project has not been updated for 2.4, so I had to do rbenv install 2.3.1. After this, bundle successfully installed the dependencies.

Parallelizing the requests

After all the false starts, I could finally dig into the code. The part I wanted to speed up looked like this:

@content ||= [Parser::Alpha.fetch(topic).to_s]
@content << Parser::Google.fetch_all(self.query) unless self.query.nil? || self.query.empty? || self.query == self.topic
@content << Parser::Google.fetch_all("facts about #{topic}")
@content << Parser::Wikipedia.fetch_all(topic)

This performs a synchronous HTTP request to each source, serially. There’s probably nicer ways to parallelise things in Ruby, but I opted for theThread and Mutex classes, brought in with require 'thread':

@content ||= []

mutex = Mutex.new
threads = []

threads << Thread.new() do
   c = Parser::Alpha.fetch(topic).to_s
   mutex.synchronize { @content << c }
end

threads << Thread.new() do
   c = Parser::Google.fetch_all(self.query) unless self.query.nil? || self.query.empty? || self.query == self.topic
   mutex.synchronize { @content << c }
end

threads << Thread.new() do
   c = Parser::Google.fetch_all("facts about #{topic}")
   mutex.synchronize { @content << c }
end

threads << Thread.new() do
   c = Parser::Wikipedia.fetch_all(topic)
   mutex.synchronize { @content << c }
end

threads.each(&:join)

This spins up a thread for each request, performs the HTTP requests on their respective thread, and safely appends the response to the @content instance variable, by using a mutex for synchronisation. Finally, it joins all the threads before returning the aggregated results.

The code is very repetitive, because I couldn’t figure out how to parameterise on the thing that’s unique for each request. To test it, and get some timings, I used pry and the benchmark module:

> gem install pry
> bundle exec pry -r source-file.rb
[1] pry(main)> Benchmark.measure { some code }

This patch improved performance by a few seconds, from around 9 seconds to 6. Not as big an improvement as I expected, perhaps partly due to the GIL? Threads are expensive, so I could probably improve it further by using a thread pool, instead of spinning up new ones every time.

What I really should have done is profile, of course. I’ll try Profiler and ruby-prof next time.

Wrapping up

Setting up the development environment was painful. Parallelising code with threads in Ruby is fairly easy, but my limited knowledge led to repetitive code, and didn’t speed things up as much as I expected. Still, it was a good learning experience, and I’m better equipped to contribute to Ruby projects than I was before.

Published in programming, ruby

Copyright © 2017 Alva