CV Partner blog

Using Rust Lambdas in Production

News

Technical

Sales and marketing

Productivity

Design

Author:

Sam Rose

Update: Join the discussion over at HackerNews!

We’ve written in the past about how the CV Partner web application is written in Ruby on Rails. The web application isn’t the whole story, though. Surrounding it are many supporting services, and we are increasingly using Rust to write these services.

We’re also using more Lambdas in our architecture, and we want to use Rust in those as well. The landscape for Rust lambdas isn’t barren, but it’s not well-trodden either.

This post is going to cover how we write, build, and deploy our Rust lambdas. Our lambdas have the following qualities that we’re proud of and want to share with you:

Fast, standardised build. All of our lambdas use the same Dockerfile to build, and make good use of Docker’s layer caching. Incremental builds in CI take under a minute.
Run locally. If you’re working on a lambda, you don’t want to have to sit through a CI build to see if your changes work. All of our lambdas can run locally and in AWS using the same code.
Private GitHub dependencies. There aren’t many options out there for private Cargo repositories, so we use private GitHub repositories for our internal libraries.

The code

The starting point for writing a Lambda in Rust is to use the official Rust lambda runtime. At the time of writing, the last release of this library is version 0.2, which doesn’t support async/await. Async/await support is present in master, though. Here’s how it looks in practice:


use lambda::{lambda, Context};
use serde_json::Value;

type Error = Box;

#[lambda]
#[tokio::main]
async fn main(event: Value, _: Context) -> Result {
    Ok(event)
}

The problem with this is that you can’t run it locally. The #[lambda] attribute wraps your main function in another main function that calls in to the AWS lambda API.

To get around this, we write two main functions:



#[cfg(feature = "with-lambda")]
use lambda::{lambda, Context};
use serde::{Deserialize, Serialize};

type Error = Box;

#[derive(Deserialize, Debug)]
struct Input {
    name: String
}

#[derive(Serialize, Debug)]
struct Output {
    greeting: String
}

async fn handler(input: Input) -> Result {
    Ok(Output { greeting: format!(“Hello, {}!”, input.name) })
}

#[cfg(feature = "with-lambda")]
#[lambda]
#[tokio::main]
async fn main(input: Input, _: Context) -> Result {
    handler(input).await
}

#[cfg(not(feature = "with-lambda"))]
#[tokio::main]
async fn main() -> Result<(), Error> {
    let input_str = std::env::args().nth(1);
    if input_str.is_none() {
        panic!(
            "you must pass an input parameter as the first argument, and it must be a JSON string"
        );
    }
    let input = serde_json::from_str(&input_str.unwrap())?;
    let output = handler(input).await?;
    println!("{}", serde_json::to_string(&output)?);
    Ok(())
}

We’re making use of Rust’s “feature” flags to compile a different harness around the handle function depending on whether we want to run locally or in AWS.

Here’s the Cargo.toml file:


[package]
name = "rust-lambda-template"
version = "0.1.0"
authors = ["Sam Rose "]
edition = "2018"

[dependencies]
lambda = { git = "https://github.com/awslabs/aws-lambda-rust-runtime/", rev = "c8dbcd39e0b1cf9ecf395e2b2f9df6c6c0d97780" }
tokio = { version = "0.2", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_derive = "1"
serde_json = "1"

[features]
with-lambda = []

Two noteworthy things:

We’re using a version of the aws-lambda-rust-runtime that hasn’t officially been released. This isn’t ideal, and we’re eagerly awaiting a 0.3 release.
We have a features section, which is where we define the with-lambda feature we use in the Rust code shown above.

Running our lambda now gives us the following:


$ cargo run -- ‘{ “name”: “Sam” }’
{ “message”: “Hello, Sam!” }

The Dockerfile

All of our lambdas build with the same Dockerfile. I’ll show it in all of its glory and then explain what’s going on bit by bit. Brace yourself.


# syntax=docker/dockerfile:experimental

FROM rust:latest as cargo-build
ARG name

RUN apt-get update
RUN apt-get install musl-tools -y
RUN rustup target add x86_64-unknown-linux-musl

WORKDIR /usr/src/${name}
COPY Cargo.toml Cargo.toml
RUN mkdir src/
RUN echo "fn main() {println!(\"if you see this, the build broke\")}" > src/main.rs
RUN mkdir -p $HOME/.ssh
RUN ssh-keyscan github.com > $HOME/.ssh/known_hosts
RUN test "$(cat $HOME/.ssh/known_hosts | ssh-keygen -lf -)" = "2048 SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8 github.com (RSA)"
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo build --features with-lambda --release --target=x86_64-unknown-linux-musl

RUN rm src/main.rs
COPY src/* src
RUN touch src/**
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo test --features with-lambda --release --target=x86_64-unknown-linux-musl
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo build --features with-lambda --release --target=x86_64-unknown-linux-musl

FROM alpine:latest
ARG name
COPY --from=cargo-build /usr/src/${name}/target/x86_64-unknown-linux-musl/release/${name} /usr/local/bin/${name}

First of all, shout out to Shane Utt whose blog post we used as a starting point for this.

The first line is a Docker directive that says we want to use some experimental Dockerfile syntax. The syntax in question is the --mount=type=ssh flag to the RUN commands, but we’ll talk about later.


FROM rust:latest as cargo-build
ARG name

This next bit says we want to use the latest Rust image, and we’re passing in a build arg called “name.” This is how we’re able to share this Dockerfile between all of our lambdas without having to modify it.


RUN apt-get update
RUN apt-get install musl-tools -y
RUN rustup target add x86_64-unknown-linux-musl

Next we run an update on the image, and we install musl. If you’re not familiar, musl is a libc replacement that you can link to statically. This means the resulting binary won’t depend on the system’s libc, which makes it more portable. It’s not a strict requirement for running on AWS Lambda, but it’s good practice.


WORKDIR /usr/src/${name}
COPY Cargo.toml Cargo.toml
RUN mkdir src/
RUN echo "fn main() {println!(\"if you see this, the build broke\")}" > src/main.rs

The next few lines set up a pseudo project, where the only things we’re going to compile are our dependencies and a dummy main.rs. The idea behind this is to use Docker’s layer caching to avoid having to compile our dependencies every build. This leads to significantly faster incremental builds in Docker.


RUN mkdir -p $HOME/.ssh
RUN ssh-keyscan github.com > $HOME/.ssh/known_hosts
RUN test "$(cat $HOME/.ssh/known_hosts | ssh-keygen -lf -)" = "2048 SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8 github.com (RSA)"

Up until now, we’ve done exactly what Shane Utt did in his version of this. These three lines, though, are new. Because we use SSH to fetch private dependencies (more on this later), we would sometimes find that our builds would fail with the error “host key verification failed.” To get around that we pull down GitHub’s host keys and make sure they’re what we expect them to be based on the values here.


RUN mkdir -p $HOME/.ssh
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo build --features with-lambda --release --target=x86_64-unknown-linux-musl

Our first bit of experimental syntax! The --mount flag is a new thing introduced with the BuildKit engine for Docker, you can read about it in depth here. The type=ssh bit is us telling Docker that we want to use an SSH agent for this command. In the docker build invocation, which we’ll see later, we can tell Docker what keys to add to this SSH agent.

The reason we do this is because it was the only way we could find that let us depend on private GitHub repositories in our Cargo.toml file, in a way that worked both locally and in CI. It means we can do this in our Cargo.toml file:


[dependencies]
private-library = { git = "ssh://github.com/cvpartner/private-library", tag = "1.0" }

And it Just Works™.

The rest of the RUN command is our first cargo build. It looks a lot scarier than it is. Most of it is us telling rustc to link against musl instead of the default libc. The only other interesting bit is the --features with-lambda. This matches up with the code we saw earlier to produce a binary that’s going to work properly when deployed in AWS.


RUN rm src/main.rs
COPY src/* src
RUN touch src/**
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo test --features with-lambda --release --target=x86_64-unknown-linux-musl
RUN --mount=type=ssh RUSTFLAGS=-Clinker=musl-gcc cargo build --features with-lambda --release --target=x86_64-unknown-linux-musl

Next up, we’re copying over our actual source code. The touch command is necessary for cargo to realise the files are new, because when we created our dummy main.rs file earlier we created a new file with a timestamp later than the one on the real main.rs file. This is different to the approach taken by Shane Utt, as we found that approach would often result in builds where the dummy main.rs file was the one that ended up in the final build.

Another addition is the cargo test invocation. Tests are good!


FROM alpine:latest
ARG name
COPY --from=cargo-build /usr/src/${name}/target/x86_64-unknown-linux-musl/release/${name} /usr/local/bin/${name}

Lastly we create a new build stage and copy over the final executable. The new build stage is in order to keep the final image small. Ours tend to clock in at around 8MB.

The build script

Invoking Docker is done in a shell script which is also identical for all of our lambdas.


#!/usr/bin/env bash

set -e
set -x

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
cd $DIR

NAME=$(cat Cargo.toml | grep "name" | head -n 1 | sed -E 's/name = "(.*)"/\1/')
DOCKER_NAME="cvpartner/$NAME"

if [[ -z $BUILD_ID ]];
then
  TAG=$DOCKER_NAME
  SSH="--ssh default"
else
  TAG="our.private.docker.registry/$DOCKER_NAME"
  SSH="--ssh default=/home/ci/.ssh/id_rsa"
fi

DOCKER_BUILDKIT=1 docker build $SSH --cache-from $TAG --build-arg "name=$NAME" --build-arg "BUILDKIT_INLINE_CACHE=1" -t $TAG .
docker run -v $DIR:/dist --rm --entrypoint cp $TAG "/usr/local/bin/$NAME" /dist/bootstrap

if [[ -z $BUILD_ID ]];
then
  echo
else
  docker push $TAG
fi

zip lambda.zip bootstrap
rm bootstrap

Let’s walk through it just like we did with the Dockerfile.


#!/usr/bin/env bash

set -e
set -x

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
cd $DIR

These lines are common to a lot of the bash scripts we have at CV Partner. They set up the following behaviours:

The script will exit on the first unsuccessful command.
The script will echo every command run.
The script will execute as if it were run from the directory it lives in.

We find these to be useful defaults for writing most of our bash scripts.


NAME=$(cat Cargo.toml | grep "name" | head -n 1 | sed -E 's/name = "(.*)"/\1/')
DOCKER_NAME="cvpartner/$NAME"

These lines aren’t beautiful, but they get the job done. The name of the project is extracted from the Cargo.toml file and some variables are set based on it.


if [[ -z $BUILD_ID ]];
then
  TAG=$DOCKER_NAME
  SSH="--ssh default"
else
  TAG="our.private.docker.registry/$DOCKER_NAME"
  SSH="--ssh default=/home/ci/.ssh/id_rsa"
fi

We use the presence of an environment variable called BUILD_ID to check if we’re running in CI or locally. We use this information to set some more variables, the first one being the tag to use for the Docker image we end up building. The other is used to tell Docker what SSH keys to forward into the Docker build. This is the other half of the --mount=type=ssh thing we saw in the Dockerfile. Locally we tell it to use our existing SSH agent, and in CI we use a specific key that has access to our private GitHub repositories.

We also make sure to use our private Docker registry in CI so when we push images, they’re then available to use by later builds as a cache. Locally, we don’t do this. We just use the local Docker daemon.


DOCKER_BUILDKIT=1 docker build $SSH --cache-from $TAG --build-arg "name=$NAME" --build-arg "BUILDKIT_INLINE_CACHE=1" -t $TAG .
docker run -v $DIR:/dist --rm --entrypoint cp $TAG "/usr/local/bin/$NAME" /dist/bootstrap

This is the bit that actually does the building. The Docker invocation is quite involved, so let’s break it down:

DOCKER_BUILDKIT=1 is an environment variable we have to set to tell Docker to use the BuildKit engine. We need to do this for the SSH agent forwarding.
docker build is what it says on the tin.
$SSH subs in the SSH part of the command we crafted earlier.
--cache-from $TAG tells the Docker build to use any layers it can from the tag we specified earlier. Without this, it would only search locally for layers.
--build-arg “name=$NAME” passes in the name of the project we extracted from the Cargo.toml file.
--build-arg “BUILDKIT_INLINE_CACHE=1” this is, for some reason, necessary to get --cache-from to work with BuildKit.
-t $TAG what to call the image once built.

Phew. Scary but necessary to get all the good stuff.

The line after copies the resulting binary out of the image and into our current directory with the name “bootstrap,” which is necessary as it’s the file AWS Lambda looks for to execute when you’re running without a runtime. More on this later.


if [[ -z $BUILD_ID ]];
then
  echo
else
  docker push $TAG
fi

If we’re running in CI, push to our private registry.


zip lambda.zip bootstrap
rm bootstrap

Create a .zip file containing our bootstrap executable. This is the final packaged artifact we’ll be uploading to AWS Lambda in the next step.

The deployment

As mentioned in a previous post, we deploy all of our infrastructure using CloudFormation. Our lambdas are no exception.

The barebones CloudFormation template for one of our Rust lambdas contains 3 resources, and only 2 of them are strictly necessary. I’ll cover all 3 for completeness.

IAM Role

The first one is the most vanilla. It’s your bog-standard Lambda IAM role:


  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action: sts:AssumeRole
      Path: "/"
      Policies:
        - PolicyName: !Join ["-", [!Ref "AWS::StackName", Policy]]
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: "logs:CreateLogGroup"
                Resource: !Sub "arn:aws:logs:eu-west-1:${AWS::AccountId}:*"
              - Effect: Allow
                Action:
                  - "logs:CreateLogStream"
                  - "logs:PutLogEvents"
                Resource: !Sub "arn:aws:logs:eu-west-1:${AWS::AccountId}:log-group:*:*"
              - Effect: Allow
                Action: "cloudwatch:PutMetricData"
                Resource: "*"

It’s very common for our lambdas to access other AWS resources, and when they do we’ll add permissions to this role. By default it just allows lambda.amazonaws.com to assume it, and it allows the lambda to create and write logs.

The lambda

The second and last necessary resource is the lambda itself.


  Lambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Ref AWS::StackName
      Description: My lambda function
      Handler: doesnt.matter
      Role: !GetAtt LambdaRole.Arn
      Code: lambda.zip
      Runtime: provided
      Timeout: 300
      MemorySize: 128
      Environment:
        Variables:
          RUST_BACKTRACE: “1”

There are a few things here you might want to modify. The FunctionName and Description, for example. Also if you anticipate needing different Timeout or MemorySize parameters you should, of course, tweak those.

You can see the reference to our lambda.zip in here. This is a local file path, which means this template will need to be packaged before use. Our CI server does this for us at build time, which doubles up nicely as a basic check on the validity of our templates.

You can also see that we pass in RUST_BACKTRACE=1. We found that in practice the usefulness of this outweighed the costs. In a bunch of our lambdas we also use the env_logger crate and set RUST_LOG=rust_lambda_template=info for logging. Note that you’ll need to change rust_lambda_template to the module name of your executable. This should be the same as the name field in your Cargo.toml, with dashes replaced with underscores.

The monitoring

This last resource is the optional one. For all of our lambdas we like to have some basic monitoring in place. The one alert we thought applies to all of our lambdas is one to tell us if the lambda has a higher-than-usual error rate. Our template defines something generic that can be tweaked as necessary.


ErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: My lambda function is experiencing a high error rate
      EvaluationPeriods: 5
      Threshold: 5
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Sub "arn:aws:sns:${AWS::Region}:${AWS::AccountId}:some-alert-topic"
      OKActions:
        - !Sub "arn:aws:sns:${AWS::Region}:${AWS::AccountId}:some-alert-topic"
      Metrics:
        - Id: errorpct
          Expression: "100 * errors / MAX([errors, invocations])"
          Label: ErrorPercent
          ReturnData: true
        - Id: "errors"
          Label: "Errors"
          MetricStat:
            Metric:
              Dimensions:
                - Name: FunctionName
                  Value: !Ref Lambda
              MetricName: Errors
              Namespace: "AWS/Lambda"
            Period: 60
            Stat: Sum
            Unit: Count
          ReturnData: false
        - Id: "invocations"
          Label: "Invocations"
          MetricStat:
            Metric:
              Dimensions:
                - Name: FunctionName
                  Value: !Ref Lambda
              MetricName: Invocations
              Namespace: "AWS/Lambda"
            Period: 60
            Stat: Sum
            Unit: Count
          ReturnData: false

It’s a lot, but all it’s saying is if the error rate of the lambda is higher than 5% for 5 minutes, an alert is sent to an SNS topic. We hook this up to PagerDuty, and we hook PagerDuty up to Slack and our phones.
There’s a lot of flexibility in CloudWatch alarms. For example, if you want to alert if 2 out of the last 5 data points are above the given threshold, you can specify a DatapointsToAlarm: 2 parameter. It’s worth losing a few hours in the documentation if alerting is something you’re planning to take seriously.

A note on Rusoto

Because we’re frequently interacting with AWS services, we use the rusoto crates. They’re fantastic, but there are two things you should be aware of:

Creating a client struct is expensive.
There are no retries by default.

For the first one, we recommend using the lazy_static crate. Here’s an example:


use rusoto_core::Region;
use rusoto_dynamodb::{DynamoDb, DynamoDbClient};

lazy_static! {
    static ref DYNAMODB: DynamoDbClient = DynamoDbClient::new(Region::EuWest1);
}

This reuses the client struct as much as possible. Multiple lambda invocations will go to the same process, so this will also get reused across invocations.

For the second one we’ve found the again crate works well.


let res = again::retry(|| {
  DYNAMODB.delete_item(DeleteItemInput {
    // ...
  })
})
.await?;

By default this retries 5 times, with backoff. The retry policy is configurable if you need more control, and it’s covered in the crate’s documentation.

Closing Thoughts

That’s it. That’s how we do Rust lambdas at CV Partner, from start to finish. We’re hoping that this serves as a resource for other people wanting to start running Rust lambdas in production, and have been struggling to find somewhere that ties all of the pieces together.

Learn more by contacting CV Partner