A short step-by-step intro to machine learning in Rust (2024)

I recently got into blending machine learning with Rust and wrote a beginner-friendly guide.
Author
Affiliation
James Birkenau

TensorScience

Published

November 25, 2023

Introduction

I’ve always been fascinated by how machine learning can seem like magic, turning raw data into predictions and insights. Recently, I took the leap into combining this field with my passion for Rust—a language that’s all about performance and safety. The journey was tough but enlightening, with Rust’s rigor ensuring that every step of building a model was precise and efficient. Along the way, I discovered tools and libraries that made Rust a surprisingly hospitable environment for ML. Now, I’m excited to share the lessons learned from setting up environments to deploying production-ready models.

Introduction to Machine Learning and Rust

A graphic showing the rust logo with machine gears and neural network motifs.

Machine learning (ML) has transformed countless industries, enabling advancements in fields from healthcare to finance. But it’s not just about the algorithms; it’s also about the tools and languages we use to implement them. Rust, a language praised for its performance and safety, is increasingly becoming a go-to choice for ML projects. Let’s explore why.

As I began my journey into machine learning, the quest for performance and safety led me straight to Rust. While Python has long been the lingua franca of ML, Rust offers a compelling alternative. Its type system and ownership model prevent many classes of bugs, and it’s designed for concurrency. Performance-wise, it often rivals C++.

Here’s how I got started with a simple linear regression example, using the linregress crate, a linear regression library that’s simple and effective for beginners:

use linregress::{FormulaRegressionBuilder, RegressionDataBuilder};

fn main() {
    let data = RegressionDataBuilder::new()
        .build_from_slice(&[1.0, 2.0, 3.0], &[2.0, 4.0, 5.9])
        .unwrap();
    let formula = "Y ~ X";
    let regression = FormulaRegressionBuilder::new()
        .data(&data)
        .formula(formula)
        .fit()
        .unwrap();

    println!("{:?}", regression);
}

This code snippet fits a linear model to given x-y data points. The beauty of Rust is its expressiveness and focus on safety, even for those of us who are just starting. If something can go wrong, Rust is designed to catch it at compile-time, preventing those hair-pulling moments during runtime.

The Rust ecosystem for ML is growing, and crates like linregress are just the beginning. Libraries such as ndarray for n-dimensional arrays and rand for random number generation are critical for ML work, providing the building blocks I needed to get up and running with more complex algorithms.

Here’s a quick demonstration on how to use ndarray and rand together in Rust:

use ndarray::Array;
use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let rows = 4;
    let cols = 2;
    let mut array = Array::zeros((rows, cols));
    for mut row in array.genrows_mut() {
        for element in row.iter_mut() {
            *element = rng.gen_range(0.0..10.0)
        }
    }
    println!("Random Array:\n{}", array);
}

In this code, we create a 4x2 matrix and populate it with random floating-point numbers between 0.0 and 10.0. Leveraging Rust’s strict type checking, memory, and thread safety guarantees, I could iterate and mutate the array with confidence, avoiding common errors often encountered in more flexible languages.

One thing I’ve learned is that, in Rust, expressing intent is crystal clear. Each option, unwrap, and result is explicit, ensuring you think through the possible outcomes of your code’s execution path.

As someone new to both ML and Rust, starting from basic operations and simple libraries has been fundamental. Working through examples, I’ve often referred to the Rust Book, a comprehensive guide to the language’s syntax and concepts available at The Rust Programming Language Book. For ML-specific resources, browsing crates.io, the Rust community’s crate registry, was invaluable. Code repositories on GitHub, such as rust-ml organization, also house several projects that can offer real-world insight.

Stepping into machine learning with Rust is a rewarding challenge. The learning curve may seem steep, but progress is made bite by bite, building understanding and capability. In the following sections, we’ll further demystify the Rust ecosystem for ML and move beyond basics to build, evaluate, and deploy our Rust-powered ML models.

Setting Up the Rust Environment for ML

A screenshot of a rust installation guide or successful setup message.

Setting up your Rust environment for machine learning can seem daunting at first, but I’ll guide you through the process step-by-step so you can start crunching data in no time. Having a solid foundation is crucial before you dive into the complexities of machine learning, so let’s get your machine set up and ready.

First things first, you’ll need to install Rust. The easiest way to install Rust and manage its toolchain is through rustup. Run the following in your terminal:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

This script installs rustup, which is the Rust toolchain installer. Once complete, you’ll need to follow the on-screen instructions to add Rust to your system PATH. Usually, this is handled automatically, but if you encounter issues, the Rust installation page has detailed instructions.

Next up, you need to install the package manager for Rust, Cargo, but if you’ve installed rustup, Cargo comes along with it! You can confirm if Cargo is installed by running:

cargo --version

With Rust and Cargo installed, you’re now set to add machine learning functionality. In the Rust ecosystem, there are different crates (Rust’s term for packages/libraries) available for machine learning, such as linfa and smartcore. They may not be as mature as Python’s scikit-learn, but they’re quite promising.

To use linfa, for instance, you’ll need to include it in your dependencies. Open or create a Cargo.toml file in your project directory, and add the following lines:

[dependencies]
linfa = "0.4.0"

Next, we’ll set up a simple project where we’ll later insert our machine learning code. Run the following command in your terminal:

cargo new ml_project
cd ml_project

This command creates a new directory called ml_project with a Cargo.toml file and a src directory with a main.rs file. Rust projects start with this main.rs file as the entry point.

Opening main.rs, you should see a simple “Hello, world!” program. Let’s replace that with a template to ensure everything is working properly.

fn main() {
    println!("Setting up ML environment in Rust!");
}

Run your project with:

cargo run

You should see the message “Setting up ML environment in Rust!” printed in your terminal, confirming everything is good to go.

Finally, to keep your code tidy and error-free, I highly recommend using Rust’s formatter and linting tools. Run the following commands to install rustfmt and clippy:

rustup component add rustfmt clippy

Then, before every commit, simply run:

cargo fmt
cargo clippy

These will format your Rust code and provide linting to catch common mistakes or unidiomatic code.

So there you have it. With Rust, Cargo, and some initial project setup out of the way, you’re ready to move on to processing data and building models. Stick to the simplicity and performance that Rust offers, and you’ll find that it’s an exciting language to work with in machine learning. Don’t forget, as you encounter challenges or need to deepen your understanding, the Rust community is an incredibly supportive resource.

Data Preparation and Processing in Rust

An infographic of data being funneled and transformed through rust code.

As I dove into machine learning with Rust, I quickly learned that well-prepared data is the bedrock upon which reliable models are built. Let’s walk through some Rust essentials that helped me get my data in shape.

Before anything else, I needed to read my dataset. CSV is a common format here, and Rust’s csv crate is perfectly tailored for the job. Parsing a CSV file becomes as simple as this:

use csv::ReaderBuilder;
use std::error::Error;

fn read_csv(file_path: &str) -> Result<(), Box<dyn Error>> {
    let mut rdr = ReaderBuilder::new().from_path(file_path)?;
    for result in rdr.records() {
        let record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

With my data loaded, the next step was cleaning. This included handling missing values, removing duplicates, and potentially normalizing or scaling some fields. The polars crate came in handy here, offering DataFrame structures like Python’s Pandas:

use polars::prelude::*;

fn clean_data(df: DataFrame) -> DataFrame {
    df.drop_nulls(None)
      .drop_duplicates(None, None)
}

Manipulating data is a staple of preparation, and I used polars again to transform columns, perform aggregations, and so on. Here’s how I added a new column to my DataFrame:

fn transform_data(mut df: DataFrame) -> Result<DataFrame> {
    let new_col = df.column("existing_col")?.mean()?;
    df = df.hstack(&[Series::new("new_col", &[new_col])])?;
    Ok(df)
}

Feature selection came next. Choosing the right features that could affect my model’s predictive power was crucial. Here, I created a mask to filter my DataFrame columns based on some criteria:

use polars::prelude::*;

fn select_features(df: &DataFrame) -> Result<DataFrame> {
    let mask = df.column("feature_importance")?.gt(0.1)?;
    let selected_cols = df.mask(&mask);
    Ok(selected_cols)
}

Finally, we get to a point where splitting the dataset into training and testing sets is necessary. Maintaining the balance and shuffling them correctly ensures my model doesn’t get biased. smartcore crate offers utilities to split datasets in a stratified fashion.

use smartcore::dataset::shuffle_split::ShuffleSplit;

fn split_data(df: DataFrame) -> (DataFrame, DataFrame) {
    let splitter = ShuffleSplit::default().with_test_size(0.3).with_train_size(0.7); 
    let (train_idx, test_idx) = splitter.split(&df.shape().0);

    let train_df = df.take(&train_idx);
    let test_df = df.take(&test_idx);

    (train_df, test_df)
}

All this code only scratches the surface of what’s feasible in Rust. I encourage you to explore more by venturing into the documentation of these crates. The Rust community is incredibly supportive and the plethora of resources available is testament to that.

Here are some starting points for your journey:

  • csv crate docs: https://docs.rs/csv/latest/csv/
  • polars crate docs: https://docs.rs/polars/latest/polars/
  • smartcore repo: https://github.com/smartcorelib/smartcore

Machine learning data preparation can often feel daunting, but Rust’s speed and safety add a confidence that’s hard to match. As you get your hands dirty with data, you’ll start seeing patterns and abstractions that you can build upon—penning your own tale in Rust’s growing machine learning ecosystem. Happy coding!

Choosing the Right Machine Learning Library in Rust

A collage of logos from various rust machine learning libraries.

With the ever-growing ecosystem of Rust, the landscape of its machine learning libraries has also seen considerable expansion. Finding the right ML library for your project involves considering factors like ease of use, performance, and the level of community support.

Let’s look at some of the top contenders.

First off, linfa—the Rust equivalent of Python’s scikit-learn—is a go-to for general-purpose machine learning tasks. It offers a variety of algorithms and utilities for classification, regression, and clustering.

use linfa::prelude::*;
use linfa_linear::LinearRegression;

fn main() {
    let dataset = linfa_datasets::diabetes();
    let model = LinearRegression::default().fit(&dataset).unwrap();
    let prediction = model.predict(&dataset);

    println!("Predicted vs true values: {:?}", prediction);
}

For tasks like NLP or computer vision, where deep learning is a requisite, you might lean towards tch-rs, which provides Rust bindings for PyTorch.

use tch::{nn, nn::Module, nn::OptimizerConfig, Device, Tensor};

fn main() {
    let vs = nn::VarStore::new(Device::cuda_if_available());
    let net = nn::seq()
        .add(nn::linear(vs.root(), 784, 256, Default::default()))
        .add_fn(|xs| xs.relu())
        .add(nn::linear(vs.root(), 256, 10, Default::default()));

    let mut opt = nn::Adam::default().build(&vs, 1e-3).unwrap();

    for epoch in 1..200 {
        let input = Tensor::randn(&[64, 784], (tch::Kind::Float, Device::Cuda(0)));
        let output = net.forward(&input);

        output.backward();
        opt.step();
        // Usually here you'd add your training logic;
        // for brevity, we're focusing on the structure.
    }
}

In the case your focus is precision and performance, ndarray and ndarray-linalg combo is the backbone for scientific computing in Rust.

use ndarray::prelude::*;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

fn main() {
    let a = Array::random((10, 10), Uniform::new(0., 10.));
    let b = Array::random((10, 10), Uniform::new(0., 10.));

    let c = a.dot(&b);
    println!("{:8.4}", c);
}

Remember, installing these libraries is as simple as adding the necessary dependencies to your Cargo.toml. For instance:

[dependencies]
linfa = "0.4"
tch = "0.4"
ndarray = "0.15"
ndarray-linalg = "0.14"

Before locking down your choice, consider the maturity of the libraries and your support needs. Newcomers might appreciate more robust documentation and community examples. Therefore, critiquing the number of contributors and the frequency of updates on their respective GitHub repos can offer insights into the library’s long-term viability.

Building a community-driven machine learning project? Maybe take a look at are-we-learning-yet.com. There, you’ll find a handy curation of machine learning crates in Rust. It’s a community-driven effort to track the state of ML in the Rust ecosystem.

When I was starting out with machine learning in Rust, rummaging through community forums for insights or stumbling upon a helpful blog post was part of the process. Today, an increasing number of tutorials and the active Rust ML community on platforms like users.rust-lang.org and GitHub discussions have significantly lowered the barrier for entry.

Choosing the right machine learning library might seem daunting at first. But by examining your project’s requirements and exploring these libraries’ capabilities, you’ll find a fit that not only works for your current project but also contributes to your long-term growth in machine learning with Rust.

Building Your First Machine Learning Model in Rust

A visualization of a simple machine learning model with rust code snippets.

With the environment set up and data prepared, I’ll walk you through building your first machine learning model in Rust. Let’s select a simple yet effective algorithm for our first model: linear regression. It’s a go-to method for predictive modeling and a great starting point to understand ML paradigms.

We’ll use the linfa crate, a Rust machine learning framework offering a variety of algorithms and utilities. Make sure you’ve included linfa in your Cargo.toml file to get started. Assuming you’ve already loaded and preprocessed your dataset, let’s jump into the fun part.

First, you need to split your dataset into training and testing sets to validate your model later on.

use linfa::prelude::*;
use linfa_linear::LinearRegression;

let (train, validation) = dataset.split_with_ratio(0.9);

In the snippet above, we split our dataset so that 90% is used for training, and the remainder will be for validation. Ensuring your model performs well on unseen data is critical for machine learning.

Next, we instantiate the linear regression model. For simplicity, I’ll use default parameters, but you can tweak them for better performance later.

let lin_reg = LinearRegression::new();

Training the model is a one-liner. Just pass the training set to the .fit() method and Rust does the heavy lifting for you.

let model = lin_reg.fit(&train).unwrap();

After fitting the model on the training data, we can make predictions using the validation set and evaluate how well our model performs.

let predicted = model.predict(&validation);

To quantify the performance of our model, we’ll calculate the mean squared error (MSE) between our predictions and the actual values. It’s a widely used measure for regression models.

let mse = predicted.r2_score(&validation).unwrap();
println!("Mean Squared Error: {}", mse);

If the MSE is too high, it may indicate that the model is underfitting. Don’t worry; it’s part of the learning curve. Iteratively adjust your model parameters or consider more sophisticated models.

To circle back, linfa is a feature-rich ML framework in Rust and pulls in many other crates in the ecosystem as dependencies to support various tasks. If you want to dig deeper, I encourage you to explore linfa’s GitHub repository for more examples and algorithms.

Remember, this isn’t about getting the perfect model on your first try; it’s about understanding the process. Each step - from choosing your algorithm to interpreting results - adds to your grasp of machine learning with Rust.

There’s a lot more to machine learning than fitting a simple model, but every expert was once a beginner, and you’re well on your way. Keep experimenting with different models, tuning parameters, and validating your results. And, of course, once you’re comfortable with the basics, dive into more complex algorithms and embrace the full power of machine learning in Rust.

Evaluating and Tuning Machine Learning Models

Charts displaying machine learning model performance metrics next to rust code.

After training your machine learning model in Rust, it’s crucial to know how good it actually is. I’ve found that evaluating and tuning models can turn a decent model into an exceptional one. It’s a bit like fine-tuning an instrument; the process can make all the difference between a mediocre sound and a beautiful melody.

Firstly, I’ll use cross-validation to gauge the model’s performance. Rust’s ML libraries, like smartcore, make this easy. This snippet demonstrates how you might perform cross-validation on a dataset:

use smartcore::model_selection::{cross_val_score, KFold};
use smartcore::ensemble::random_forest_classifier::RandomForestClassifier;
use smartcore::dataset::iris::load_dataset;

let data = load_dataset();
let kf = KFold::default().with_n_splits(5);

let scores = cross_val_score(RandomForestClassifier::default(), &data.data, &data.target, kf);

println!("Accuracy: {:?}", scores);

This will output the accuracy of the RandomForestClassifier, averaged over five folds of the data. Low variance between fold scores indicates a stable model, while high variance suggests overfitting to certain parts of the data.

Next, I dive into hyperparameter tuning, which is essentially optimizing model settings. In Rust, frameworks like argmin can be used for this task. Here’s an example of tuning a hyperparameter using a grid search with smartcore:

use smartcore::model_selection::GridSearchCV;
use smartcore::ensemble::random_forest_classifier::RandomForestClassifier;
use smartcore::metrics::accuracy;
use smartcore::dataset::iris::load_dataset;

let data = load_dataset();
let classifier = RandomForestClassifier::default();

let param_grid = vec![
    ("n_estimators", vec![10, 50, 100]),
    ("max_depth", vec![Some(3), Some(5), None]),
];

let gs = GridSearchCV::fit(
    &classifier,
    &param_grid,
    &data.data,
    &data.target,
    accuracy,
);

println!("Best parameters: {:?}", gs.best_params());
println!("Best score: {}", gs.best_score());

This code tries different combinations of the number of trees (n_estimators) and the maximum depth of trees (max_depth) to see which gives the best accuracy.

Remember that while hyperparameters can improve performance, I also keep an eye out for overfitting - making the model great at predicting the training data but poor at generalizing to new data.

After tuning, it’s time to revalidate the model. If the tuning has been successful, you should see an increase in accuracy (or whatever metric you’ve chosen) on the validation set.

Now, to wrap this up with a sneak peek at what deploying a model looks like, check out this snippet of serializing a model with serde_pickle for later use:

use serde_pickle as pickle;
use smartcore::ensemble::random_forest_classifier::RandomForestClassifier;
use smartcore::dataset::iris::load_dataset;
use std::fs::File;
use std::io::Write;

let data = load_dataset();
let model = RandomForestClassifier::default().fit(&data.data, &data.target).unwrap();

let encoded: Vec<u8> = pickle::to_vec(&model, true).unwrap();

let mut file = File::create("model.pkl").unwrap();
file.write_all(&encoded).unwrap();

println!("Model saved to model.pkl");

This snippet fits a RandomForestClassifier to the Iris dataset and then serializes the fitted model using pickle so that it can be persisted to a file.

Evaluating and tuning machine learning models can indeed be as straightforward as this, even in a systems programming language like Rust. I hope you’ve found these code examples and explanations handy for your journey in Rust-based machine learning. Keep adjusting those hyperparameters, and may your models predict with accuracy and grace.

Deploying Rust Machine Learning Models in Production

An image of a server rack with rust and ml icons indicating deployment.

Having built and finetuned your machine learning model in Rust, the final and crucial step sits on the horizon: deploying your model into production. This is where all the meticulous training and evaluation translate into a practical application, making the hard work and countless hours you’ve invested bear fruit. Let me guide you through this process with straightforward examples and explanations.

Firstly, ensure your model’s performance meets expectations. Rust, being a system programming language, has a performance edge. This bodes well for production where efficiency is paramount.

Let’s consider the deployment architecture. You could deploy your model as a microservice, which is standard industry practice. Microservices are easy to scale and integrate into existing systems, most notably when using containerization tools like Docker.

Here’s a simple Dockerfile to get your Rust machine learning service rolling:

FROM rust:1.XX.0 as builder
WORKDIR /usr/src/myapp
COPY . .
RUN cargo install --path .

FROM debian:buster-slim
COPY --from=builder /usr/local/cargo/bin/myapp /usr/local/bin/myapp
ENTRYPOINT ["myapp"]

You’ll need to replace 1.XX.0 with the current version of Rust and myapp with the name of your application. After creating this Dockerfile, you can build the image with docker build -t myapp . and subsequently run it with docker run.

Next up is actually writing a basic web server in Rust that operates as a RESTful API to interact with the model. I’ll keep it simple using warp, a Rust web framework:

use warp::Filter;

#[tokio::main]
async fn main() {

    // Define the routes
    let health_route = warp::path!("health").map(|| warp::reply::json(&"OK"));
    let predict_route = warp::post()
        .and(warp::path("predict"))
        .and(warp::body::json())
        .map(|input_data| {
            // Call your model predict function here
            let prediction = model.predict(input_data);
            warp::reply::json(&prediction)
        });
        
    // Combine the routes
    let routes = health_route.or(predict_route);

    // Start the server
    warp::serve(routes).run(([0, 0, 0, 0], 3030)).await;
}

Note how we’ve defined a health check route and a prediction endpoint. Replace model.predict(input_data) with your actual model’s prediction method.

Monitor your model’s performance in production. This involves knowing its ongoing accuracy, latency, throughput, and error rates. Use this information to decide when to retrain or tweak your model.

Testing is an ongoing process. Integration tests verify that your model and API behave as expected. Unit tests for small parts and end-to-end tests for the entire system are equally important. Rust has robust testing tools baked into Cargo. An integration test for the predict route might look like this:

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_predict_route() {
        let input_data = MyInputData { /* fields with test data */ };
        let api = filter::predict_route();
        let response = request()
            .method("POST")
            .path("/predict")
            .json(&input_data)
            .reply(&api)
            .await;

        assert_eq!(response.status(), 200, "Expected OK response");
        
        // Deserialize and assert prediction result
        let prediction: MyPredictionType = serde_json::from_slice(response.body()).unwrap();
        assert!(prediction.is_valid(), "Expected a valid prediction");
    }
}

Replace MyInputData and MyPredictionType with your specific types and ensure your tests cover various cases.

To wrap up, deploying a Rust machine learning model into production might seem daunting at first, especially if you’re new to systems programming or machine learning. However, following these structured steps and combining the power of Rust’s safety and performance with sound software engineering practices will set you on the path to success. Stay meticulous, understand that deployment is just as important as development, and continuously monitor your model’s performance to maintain the integrity of your application in production.