Understanding tasks and threads in rust: a practical guide

TL;DR

The main difference between tasks and threads in Rust is their level of abstraction. Tasks are an API provided by libraries like Tokio, which use threads under the hood to manage concurrency. Threads, on the other hand, are a direct way to handle concurrency at the OS level. Unlike Go, Rust doesn’t have a built-in runtime or scheduler, so you need to manage concurrency yourself or use libraries like Tokio.

Rust vs. Go: No Built-in Runtime

One of the key differences between Rust and Go is that Rust doesn’t have a built-in runtime or scheduler. In Go, the runtime manages goroutines, making concurrency easy to use. In Rust, you have to manage concurrency explicitly, either by using OS threads directly or by using libraries like Tokio.

Types of Threads: Green vs. OS

In the world of concurrency, there are different types of threads that help manage and execute tasks efficiently. Two of the most common types are OS threads and green threads. Let’s explore how they work and how they come together in Rust.

OS Threads

Managed by the Kernel: OS threads are managed directly by the operating system’s kernel. The kernel handles scheduling, context switching, and the lifecycle of these threads.
Resource-Intensive: OS threads are more resource-intensive to create and manage because the kernel allocates resources for each thread.
True Parallelism: OS threads can run in parallel on multiple CPU cores, taking full advantage of multi-core processors.

Examples: Native threads in operating systems like Linux, Windows, and macOS.

Green Threads

Managed by a Runtime: Green threads are managed by a runtime or user-level library, not the kernel. The runtime handles scheduling and context switching.
Lightweight: Green threads are lighter and cheaper to create and manage compared to OS threads.
Cooperative Multitasking: Green threads rely on cooperative multitasking, where the runtime decides when to switch between threads.

Examples: Goroutines in Go, green threads in languages like Erlang.

Essentially, some programming languages have green threads built right into their system. For example, in Go, green threads are called goroutines, and Erlang uses lightweight processes, which are essentially green threads managed by the Erlang runtime.

In other cases, green threads are created using special libraries. These libraries provide the tools needed to manage green threads within an application.

For instance, in Python, the greenlet library allows developers to create and manage green threads.

In contrast, Rust doesn’t have a built-in green thread library. Instead, it relies on the Tokio runtime, which provides a higher-level abstraction for managing green threads.

Once we’ve defined threads, let’s talk about why we need them - concurrency.

Rust and Concurrency Management

In Rust, you have two main options for managing concurrency: using OS threads directly or using an async runtime like Tokio.

Using OS Threads Directly

Rust’s standard library provides the std::thread module for creating and managing OS threads. This gives you direct control over thread creation and management but requires careful handling of synchronization and context switching.

use std::thread;

fn main() {
    let handle = thread::spawn(|| {
        println!("Hello from a thread!");
    });

    handle.join().unwrap();
}

Using Tokio for Async Tasks

Tokio is a popular async runtime for Rust that provides a higher-level abstraction for managing concurrency. Tokio uses a thread pool to execute tasks, allowing you to write async code without worrying about the low-level details of thread management.

use tokio::task;

#[tokio::main]
async fn main() {
    let handle = task::spawn(async {
        println!("Hello from a task!");
    });

    handle.await.unwrap();
}

Even though Tokio helps you outsource all the thread management, you still need to configure it to manage resources correctly. This is what fine-tuning the Tokio configuration looks like:

use tokio::runtime::{Builder, Runtime};
use tokio::task;

fn main() {
    let rt = Builder::new_multi_thread()
        .worker_threads(4) // Set the number of worker threads
        .max_blocking_threads(50) // Set the maximum number of blocking threads
        .enable_all() // Enable all runtime features
        .build()
        .unwrap();

    rt.spawn(async {
        println!("Hello from a task!");
    });

    rt.block_on(async {
        task::yield_now().await;
    });
}

Concurrency vs. Parallelism in Rust

When talking about concurrency, the concept of parallelism often comes up.

The concept of parallelism is not mutually exclusive to concurrency; both can coexist within a task. Concurrency is responsible for managing asynchronous tasks, ensuring they can make progress without blocking each other. Parallelism, on the other hand, is the concept that ensures those tasks running simultaneously are truly independent, allowing them to execute on multiple CPU cores at the same time.

Concurrency in Rust

Concurrency in Rust is managed using async tasks, which are handled by runtimes like Tokio. These tasks can be paused and resumed, allowing other tasks to run concurrently. This is ideal for I/O-bound tasks, where tasks spend a lot of time waiting for external events.

use tokio::task;

#[tokio::main]
async fn main() {
    let handle = task::spawn(async {
        println!("Hello from a task!");
    });

    handle.await.unwrap();
}

In this example, Tokio manages the concurrent execution of tasks, allowing them to make progress without blocking each other.

Parallelism in Rust

Parallelism in Rust is achieved using OS threads, which can run independently on multiple CPU cores. This is ideal for CPU-bound tasks, where tasks require significant computational resources.

use std::thread;

fn main() {
    let handles: Vec<_> = (0..4).map(|i| {
        thread::spawn(move || {
            println!("Hello from thread {}", i);
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }
}

In this example, OS threads run in parallel on multiple CPU cores, taking full advantage of multi-core processors.

Combining Concurrency and Parallelism

In Rust, you can combine concurrency and parallelism to achieve efficient and scalable applications. By using async tasks managed by Tokio and configuring the runtime to use multiple worker threads, you can handle both I/O-bound and CPU-bound tasks effectively.

Example of Fine-Tuning Tokio for Parallelism:

use tokio::runtime::{Builder, Runtime};
use tokio::task;

fn main() {
    let rt = Builder::new_multi_thread()
        .worker_threads(4) // Set the number of worker threads
        .max_blocking_threads(50) // Set the maximum number of blocking threads
        .enable_all() // Enable all runtime features
        .build()
        .unwrap();

    rt.spawn(async {
        println!("Hello from a task!");
    });

    rt.block_on(async {
        task::yield_now().await;
    });
}

In this example, Tokio is configured to use multiple worker threads, allowing tasks to run in parallel on multiple CPU cores while still managing concurrency efficiently.

Concurrency and Parallelism in Node.js

If you’re familiar with Node.js, you know that it can also achieve concurrency and parallelism. Although Node.js is primarily designed for concurrency with its event-driven, non-blocking I/O model, it can also achieve parallelism through the use of worker threads or child processes.

Node.js can achieve parallelism through the use of worker threads or child processes. Worker threads allow you to run JavaScript code in parallel, taking full advantage of multi-core processors. The worker_threads module provides a way to create and manage worker threads, enabling CPU-bound tasks to run independently of the main thread.

const {
  Worker,
  isMainThread,
  parentPort,
  workerData,
} = require("node:worker_threads");

if (isMainThread) {
  // Main thread
  const worker = new Worker(__filename, {
    workerData: { num: 42 },
  });

  worker.on("message", (msg) => {
    console.log("Received message from worker:", msg);
  });

  worker.on("exit", (code) => {
    if (code !== 0)
      console.error(new Error(`Worker stopped with exit code ${code}`));
  });

  worker.postMessage("Hello, worker!");
} else {
  // Worker thread
  const data = workerData;
  console.log("Worker received data:", data);

  parentPort.on("message", (msg) => {
    console.log("Worker received message:", msg);
    parentPort.postMessage("Hello from worker!");
  });
}

Additionally, Node.js can use child processes to run separate Node.js processes in parallel. The child_process module provides a way to create and manage child processes, allowing tasks to run independently of the main process.

Example of Parallelism with Child Processes:

const { fork } = require("node:child_process");

const child = fork("child.js");

child.on("message", (msg) => {
  console.log("Received message from child:", msg);
});

child.send("Hello, child!");

By leveraging worker threads and child processes, Node.js can handle both I/O-bound and CPU-bound tasks efficiently, taking full advantage of multi-core processors.

Recap

So to sum up, I hope that makes it clearer: the difference between tasks and threads in Rust is all about abstraction.

Tasks are managed by libraries like Tokio, using threads behind the scenes to handle concurrency.

Threads, on the other hand, are a direct way to manage concurrency at the OS level. Unlike Go, Rust doesn’t have a built-in runtime or scheduler, so you either manage concurrency yourself or use libraries like Tokio.