Nursing Code

Using Rust in Ruby


Introduction

I'm not going to make the case for using Rust, there are hundreds of articles and videos out there which will do a much better job of explaining the case for why Rust is an amazing language.

This guide is about how to put Rust to practical use in Ruby projects where you find a place where you would want the speed of a C extension without the crippling fear of what havoc that extension could wreak if you make a mistake.

String sanitization

Handling user input is a classic scenario where you have competing needs of customers and security. We want users to be able to write their own bios, add links and insert images, but we now have a major problem.

If we're not careful an unscrupulous person could easily insert cross site scripting, click jacking code or a bitcoin miner!

One approach is to provide users with the ability to enter a restricted set of features via Markdown. This is reasonable, but then there's a request to allow some HTML tags because you want to allow <section>s or something.

You could reach for a regex (hint, don't) or build some kind of custom parser. Of course enterprising folks may well find loopholes in your implementation so you decide to look for a library. You'll probably end up finding Loofah, a nokogiri / libxml based library.

Loofah seems to be very flexible, but for me is a little complicated.

Enter ammonia

Ammonia is a Rust based HTML sanitization library. Under the hood it uses html5ever which is part of Mozilla's servo project.

What this means is that the parser is very robust and extremely well tested. Under the hood it minimizes memory allocations and is highly performant.

Because of the Rust implementation it's free from a whole class of C errors that can bring your whole system down or introduce security bugs.

Now if only we could use it in Ruby!

Rutie vs Helix

In this post I'm going to use rutie rather than helix. Either project can achieve the same end result, but I find that rutie is in more active development and seems to have an easier time being used in your own gems.

rutie aims to provide Ruby users the power of Rust and helps with building extensions as well as handling passing data across the boundary between Rust and Ruby.

Getting started

First we will generate a new gem using bundle gem sterilize.

Next we need to add the rust component, so we'll run cargo init --lib in the sterilize directory.

Then in our Cargo.toml file, we add rutie and ammonia as a dependencies

[dependencies]
ammonia = "3.0.0"
rutie = "0.7.0"

[lib]
name = "sterilize"
crate-type = ["dylib"]

The actual Rust code is fairly straightforward if you are familiar with the language, but I'll go through what some of the macros do.

#[macro_use]
extern crate rutie;
extern crate ammonia;

use ammonia::clean;
use rutie::{Module, Object, RString, VM};

module!(Sterilize);

methods!(
    Sterilize,
    _itself,
    fn perform(input: RString) -> RString {
        let dirty_string = input.map_err(|e| VM::raise_ex(e)).unwrap().to_string();
        let sterile = clean(&dirty_string);
        RString::new_utf8(&sterile)
    }
);

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_sterilize() {
    Module::from_existing("Sterilize").define(|itself| {
        itself.def_self("perform", perform);
    });
}

Let's break this down a bit.

module!(Sterilize);

The module macro creates a Struct which can later have methods added to it and finally be used in Ruby itself.

methods!(
    Sterilize,
    _itself,
    fn perform(input: RString) -> RString {
        let dirty_string = input.map_err(|e| VM::raise_ex(e)).unwrap().to_string();
        let sterile = clean(&dirty_string);
        RString::new_utf8(&sterile)
    }
);

The methods macro is used to create callbacks for Ruby methods. Under the hood it is safe and uses Result semantics for error handling.

fn perform(input: RString) -> RString {
    let dirty_string = input.map_err(|e| VM::raise_ex(e)).unwrap().to_string();
    let sterile = clean(&dirty_string);
    RString::new_utf8(&sterile)
}

The perform function is the meat of the library.

What's interesting here is that the type actually looks kinda wrong. There's no .map_err on RString. The macro methods! is actually wrapping this function up to be safe, so the real type of input and return is Result<RString, AnyException>.

So first we must unwrap the Result to get access to the underlying String. If for some reason this fails, the function returns an Err(AnyException).

If we manage to get the underlying String it gets passed as a reference to the clean function from the ammonia library.

We then return an Ok(RString). Again, the Ok is happening via the macro.

This is a bit magical seeming, but unlike Ruby magic, all of this is type checked at compile time and so is very safe.

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_sterilize() {
    Module::new("Sterilize").define(|itself| {
        itself.def_self("perform", perform);
    });
}

This final piece of code is annotated to tell the compiler not to mangle the function name, without #[no_mangle] the compiler will generally rename the function to something shorter and less friendly. We explicitly want to keep the name because we will be calling the code from outside of Rust. We also opt in to pub extern "C" which helps with using the function through the rust C API.

Module::new is programatically defining a module in Ruby.

Now for the ruby portion

The ruby side of this project is extremely slim.

require "sterilize/version"
require 'rutie'

module Sterilize
 Rutie.new(:sterilize).init("Init_sterilize", __dir__)
end

What this does is load up the generated rust code and runs the Init_sterilze function from above, which in turn defines the method in the Sterilize module.

From here on in we can simply run Sterilize.process("Some potentially unsfafe user entered text") and it will be santized for us!

What about using it in production?

This is something that isn't super well documented (well, not at all really).

What I've chosen to do is to use a gem called thermite to facilitate build. It's a fairly simple tool that helps you call out to cargo commands and also for defining Rake tasks for building your dependencies.

Thermite

The usage guide for thermite is very helpful so let's follow the steps.

The major thing of interest spec.extensions = ['ext/Rakefile'] . This is from the gemspec and tells bundler to run the Rake task we define in ext when installing the gem.

With this in place, when you add our gem to you project and run bundle it will call our to cargo and compile the library for the architecture of you environment.

But is it fast?

Using this ammonia wrapper rather than Loofah results in almost 10x reduction in time taken.

Let's test it out.

We'll create a huge string with lots of naughty script tags.

dodgy code

Then we'll run both Loofah and Sterilize via the Benchmark library. Now obviously benchmarks are not an absolute measure of speed and in a running system there are other constraints that could affect your results, but I think the difference is quite compelling.

Benchmark.bm do | benchmark |
  benchmark.report("Sterilize#perform") do
    50.times do
      Sterilize.perform(unsafe_string)
    end
  end
  benchmark.report("Loofah.scrub_fragment(unsafe_string, :prune).to_str") do
    50.times do
      Loofah.scrub_fragment(unsafe_string, :prune).to_str
    end
  end
end

After running the benchmark we se that the Rust / ammonia approach is in the ballpark of 10 times faster.

Libraryusersystemtotalreal
Sterilize#perform1.2844600.0060971.290557( 1.295062)
Loofah.scrub_fragment(unsafe_string, :prune).to_str10.1838020.06482610.248628( 10.274430)

Summing up

It took a bit of effort to figure out how all the parts are wired together and in particular getting it deployable in a production environment, but I hope that this post will help others bridge the gap.

Source code

Source code for this gem is available at https://github.com/mfeckie/sterilize