Web Scraping in Rust with Reqwest and Scraper

24 February 2025 (updated) | 8 min read

In this Rust tutorial you'll learn how to create a basic web scraper by scraping the top ten movies list from IMDb. Rust is a language known for its speed and safety and we'll try two approaches: blocking IO and asynchronous IO with tokio.

cover image

Implementing a Web Scraper in Rust

You’re going to set up a fully functioning web scraper in Rust. Your target for scraping will be IMDb , a database of movies, TV series, and other media.

In the end, you’ll have a Rust program that can scrape the top ten movies by user rating at any given moment.

This tutorial assumes you already have Rust and Cargo (Rust’s package manager) installed. If you don’t, follow the official documentation to install them.

Windows Installation Tips

If you're on Windows, after you run the installation you might need to add Cargo to your system's PATH variable and then restart the computer.

If Cargo is not in the PATH, add it manually:

  1. Press Win + R, type sysdm.cpl, and press Enter.
  2. Go to the Advanced tab.
  3. Click Environment Variables.
  4. Under User Variables, find "Path" in system variables and click Edit.
  5. Click New
  6. Add the cargo path e.g. C:\Users\YOURUSERNAME.cargo\bin

or

Open Powershell as Administrator (right-click on the Powershell icon to find the Run as Administrator option), then execute this command, don't forget to replace your systems username in the command:

[System.Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Users\YOURUSERNAME\.cargo\bin", [System.EnvironmentVariableTarget]::Machine)

Want more web scraping options? Check out our comprehensive list of Web Scraping Tools .

Creating the Project and Adding Dependencies

To start off, you need to create a basic Rust project and add all the dependencies you’ll be using. This is best done with Cargo.

To generate a new project for a Rust binary, run:

cargo new web_scraper

Next, add the required libraries to the dependencies. We'll start with reqwest and scraper.

Open the web_scraper folder in your favorite code editor and open the cargo.toml file. At the end of the file, add the libraries:

[dependencies]

reqwest = { version = "0.12.12", features = ["blocking"] }
scraper = "0.22.0"

Now you can move to src/main.rs and start creating your web scraper.

Getting the Website HTML

Scraping a page usually involves getting the HTML code of the page and then parsing it to find the information you need. Therefore, you’ll need to make the code of the IMDb page available in your Rust program. To do that, you first need to understand how browsers work because they’re your usual way of interacting with web pages.

To display a web page in the browser, the browser (client) sends an HTTP request to the server, which responds with the page's source code. The browser then renders this code.

HTTP has various different types of requests, such as GET (for getting the contents of a resource) and POST (for sending information to the server). To get the code of an IMDb web page in your Rust program, you’ll need to mimic the behavior of browsers by sending an HTTP GET request to IMDb.

In Rust, you can use reqwest for that. This commonly used Rust library provides the features of an HTTP client. It can do a lot of the things that a regular browser can do, such as open pages, log in, and store cookies.

To request the code of a page, you can use the reqwest::blocking::get method, we'll print the response code and length of the page's HTML:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = reqwest::blocking::get(
        "https://www.imdb.com/search/title/?groups=top_100&sort=user_rating,desc&count=10",
    )?;

    // Print the HTTP response status code
    println!("Response status: {}", response.status());

    let body = response.text()?; // Fetch the response body as text
    println!("Response length: {}", body.len());

    Ok(())
}

To run the script in the terminal, cd into your web_scraper folder:

cd web_scraper

Then do cargo run:

cargo run

response will now contain the full HTML code of the page you requested.

Extracting Information from HTML

The hardest part of a web scraping project is usually getting the specific information you need out of the HTML document. For this purpose, a commonly used tool in Rust is the scraper library. It works by parsing the HTML document into a tree-like structure. You can use CSS selectors to query the elements you’re interested in.

The first step is to parse your entire HTML document using the library:

    let document = scraper::Html::parse_document(&response);

Next, find and select the parts you need. To do that, check the website’s code and find a collection of CSS selectors that uniquely identify those items.

The simplest way to do this is via your regular browser. Find the element you need, then check the code of that element by inspecting it:

How to inspect an element

In the case of IMDb, the element you need is the name of the movie. When you check the element, you’ll see that it’s wrapped in an <a> tag:

<a href="/title/tt0111161/?ref_=sr_t_1" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">1. The Shawshank Redemption</h3></a>

Unfortunately, this tag is not unique. Since there are a lot of <a> tags on the page, it wouldn’t be a smart idea to scrape them all, as most of them won’t be the items you need. Instead, find the tag unique to movie titles and then navigate to the <a> tag inside that tag.

Currently, IMDb wraps each movie title inside an <a> tag with the class ipc-title-link-wrapper, and the title itself is inside an <h3> tag with the class ipc-title__text:

<a href="/title/tt0111161/?ref_=sr_t_1" class="ipc-title-link-wrapper" tabindex="0">
    <h3 class="ipc-title__text">1. The Shawshank Redemption</h3>
</a>

To extract movie titles, use the scraper::Selector::parse method with the following selector:

a.ipc-title-link-wrapper h3.ipc-title__text

This selector finds all <h3> tags with the class ipc-title__text that are inside an <a> tag with the class ipc-title-link-wrapper.

The text() method ensures that only the actual movie title is extracted without additional HTML markup.

Now, let's combine all the steps together:

let titles: Vec<String> = document
    .select(&scraper::Selector::parse(
      "a.ipc-title-link-wrapper h3.ipc-title__text",
    )?)
    .map(|x| x.text().collect::<String>())
    .collect();

for title in titles {
    println!("{}", title);
}

Your web scraper is now done.

Here’s the complete code of the scraper:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = reqwest::blocking::get(
        "https://www.imdb.com/search/title/?groups=top_100&sort=user_rating,desc&count=10",
    )?
    .text()?;

    let document = scraper::Html::parse_document(&response);

    let titles: Vec<String> = document
        .select(&scraper::Selector::parse(
            "a.ipc-title-link-wrapper h3.ipc-title__text",
        )?)
        .map(|x| x.text().collect::<String>())
        .collect();

    for title in titles {
        println!("{}", title);
    }
    Ok(())
}

If you save the file and run it with cargo run, you should get the list of top ten movies at any given moment:

1. The Shawshank Redemption
2. The Godfather
3. The Dark Knight
4. The Lord of the Rings: The Return of the King
5. Schindler's List
6. The Godfather: Part II
7. 12 Angry Men
8. Pulp Fiction
9. Inception
10. The Lord of the Rings: The Two Towers

Employing Asynchronous IO

Relying on blocking IO is convenient when you're trying to scrape a single page. However, it does not scale well when you're trying to scrape multiple pages at the same time.

While we could run those processes on multiple threads, Rust has a much more scalable solution: asynchronous IO with tokio library. This allows you to trade off some of the convenience of blocking IO for extra scalability.

For this example, we'll target example.com to not hit IMDb rate limiting.

First, let's try to fetch example.com 100 times and log the execution time:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut documents: Vec<String> = Vec::new();

    let start = std::time::Instant::now();

    for _ in 0..100 {
        let response = reqwest::blocking::get("https://www.example.com/")?.text()?;
        documents.push(response);
    }

    let elapsed = start.elapsed();

    println!("Fetched and stored {} documents", documents.len());
    println!("Time taken to fetch and parse 100 pages: {:?}", elapsed);

    Ok(())
}

It took around 11 seconds, not bad:

Fetched and stored 100 documents
Time taken to fetch and parse 100 pages: 11.491626375s

Now, let's try the same, but with asynchronous IO.

We'll need to add tokio to our dependencies:

tokio = { version = "1", features = ["full"] }

And then we can rewrite our code to use asynchronous IO:

use reqwest::Client;
use tokio::time::Instant;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new();
    let start = Instant::now();

    let futures = (0..100).map(|_| async {
        client.get("https://www.example.com/").send().await?.text().await
    });

    let documents: Vec<String> = futures::future::try_join_all(futures).await?;
    let elapsed = start.elapsed();

    println!("Fetched and stored {} documents", documents.len());
    println!("Time taken to fetch and parse 100 pages: {:?}", elapsed);

    Ok(())
}

This time, it took less than 400ms:

Fetched and stored 100 documents
Time taken to fetch and parse 100 pages: 334.931084ms

Quite a difference, right?

Naturally, this is not a scientific benchmark, but it should give you some indication of the performance difference between blocking and asynchronous IO.

Conclusion

In this tutorial, you've seen how to use Rust to create a simple web scraper using two approaches: blocking IO and asynchronous IO. Rust isn’t a popular language for scripting, but as you saw, it can get the job done quickly and efficiently.

Obviously, this is just a starting point. Here are some options you can try out as an exercise:

  • Parse data into a custom struct: You can create a typed Rust struct that holds movie data. This will make it easier to print the data and work with it further inside your program.
  • Save data in a file: Instead of printing out movie data, you can instead save it in a file.
  • Create a Client that logs into an IMDb account: You might want IMDb to display movies according to your preferences before you parse them. For example, IMDb shows film titles in the language of the country you live in. If this is an issue, you will need to configure your IMDb preferences and then create a web scraper that can log in and scrape with preferences.

However, sometimes working with CSS selectors isn’t enough. You might need a more advanced solution that simulates actions taken by a real browser. In that case, you can use thirtyfour , Rust’s UI testing library, for more powerful web scraping action.

If you love low-level languages you might also like our Web scraping with C++ .

image description
Grzegorz Piwowarek

Independent consultant, blogger at 4comprehension.com, trainer, Vavr project lead - teaching distributed systems, architecture, Java, and Golang