C++20 Threading Primitives: jthread, Latch, Barrier, and Semaphore

by Frost · January 30, 2026

C++11 gave us std::thread, std::mutex, and std::condition_variable. They work, but they leave gaps: threads that leak if you forget to join, no built-in cancellation, and no high-level synchronization primitives beyond mutexes. C++20 fills every one of these gaps with std::jthread, cooperative cancellation via stop tokens, std::latch, std::barrier, and std::counting_semaphore.

`std::jthread` — RAII Threads

std::thread calls std::terminate if destroyed while joinable. This means every code path — including exception paths — must call join() or detach(). std::jthread fixes this by automatically requesting a stop and joining in its destructor:

#include <thread>

void process_data() {
    std::jthread worker([] {
        heavy_computation();
    });

    if (should_abort()) return;
}

When process_data returns — whether normally or via exception — the jthread destructor requests a stop, waits for the thread to finish, and cleans up. No manual join() needed.

Feature	`std::thread`	`std::jthread`
Auto-join on destruction	No (`std::terminate`)	Yes
Cooperative cancellation	No	Built-in stop tokens
Move semantics	Yes	Yes
Detach support	Yes	Yes

Cooperative Cancellation

std::jthread introduces a cancellation protocol built on three types:

std::stop_source — Issues stop requests
std::stop_token — Checks whether a stop has been requested
std::stop_callback — Invokes a callable when a stop is requested

If a thread function's first parameter is std::stop_token, jthread automatically passes one connected to its internal stop source:

void worker(std::stop_token token) {
    while (!token.stop_requested()) {
        auto batch = fetch_next_batch();
        process(batch);
    }
}

std::jthread t(worker);
std::this_thread::sleep_for(std::chrono::seconds(5));
t.request_stop();

The request_stop() call is thread-safe, atomic, and irreversible. The worker checks stop_requested() at each iteration and exits gracefully.

Stop Callbacks

std::stop_callback registers a function that runs synchronously when a stop is requested — useful for waking blocked operations:

void interruptible_wait(std::stop_token token) {
    std::mutex mtx;
    std::condition_variable_any cv;
    std::unique_lock lock(mtx);

    std::stop_callback on_stop(token, [&cv] { cv.notify_all(); });
    cv.wait(lock, [&token] { return token.stop_requested(); });
}

The callback executes in the thread that calls request_stop(). If a stop was already requested before the callback is registered, it executes immediately in the registering thread. Exceptions from callbacks call std::terminate.

`std::latch` — One-Shot Countdown

A latch is a single-use countdown synchronization primitive. You initialize it with a count, threads decrement it, and any thread can block until the count reaches zero:

#include <latch>
#include <thread>
#include <vector>

void parallel_init(std::vector<Subsystem>& systems) {
    std::latch ready(systems.size());

    std::vector<std::jthread> threads;
    for (auto& sys : systems) {
        threads.emplace_back([&sys, &ready] {
            sys.initialize();
            ready.count_down();
        });
    }

    ready.wait();
}

Key characteristics:

count_down(n) decrements by n (default 1)
wait() blocks until the count reaches zero
arrive_and_wait() combines both operations
try_wait() checks without blocking
Single-use — once the count hits zero, the latch cannot be reset

Latches solve the "wait for N things to finish" pattern without the boilerplate of a mutex, a counter, and a condition variable.

`std::barrier` — Reusable Phase Synchronization

A barrier is a latch that resets automatically after each phase. It synchronizes a fixed group of threads across multiple rounds, optionally running a completion function between phases:

#include <barrier>

void phased_simulation(int num_threads, int num_steps) {
    auto on_phase_complete = []() noexcept {
        swap_buffers();
    };

    std::barrier sync(num_threads, on_phase_complete);

    auto simulate = [&](int thread_id) {
        for (int step = 0; step < num_steps; ++step) {
            compute_partial(thread_id);
            sync.arrive_and_wait();
        }
    };

    std::vector<std::jthread> threads;
    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back(simulate, i);
    }
}

Each phase:

Threads call arrive() or arrive_and_wait(), decrementing the expected count
When the count reaches zero, the completion function runs
All waiting threads unblock
The count resets for the next phase

The arrive_and_drop() method lets a thread leave the group permanently, reducing the expected count for all future phases.

Method	Behavior
`arrive()`	Decrement count, return arrival token
`wait(token)`	Block until phase completes
`arrive_and_wait()`	Combine arrive + wait
`arrive_and_drop()`	Arrive and leave the group

`std::counting_semaphore` and `std::binary_semaphore`

Semaphores control concurrent access to a shared resource by maintaining an internal counter. acquire() decrements the counter (blocking if zero), and release() increments it:

#include <semaphore>

std::counting_semaphore<4> pool(4);

void limited_access() {
    pool.acquire();
    use_resource();
    pool.release();
}

This limits concurrent access to 4 threads. std::binary_semaphore is an alias for std::counting_semaphore<1>, useful for signaling between threads:

std::binary_semaphore signal(0);

std::jthread producer([&] {
    auto data = generate();
    buffer = data;
    signal.release();
});

signal.acquire();
consume(buffer);

Unlike mutexes, semaphores have no ownership — any thread can release. This makes them suitable for producer-consumer signaling where the thread that acquires is different from the thread that releases.

Type	Max Count	Use Case
`counting_semaphore<N>`	N	Resource pool, connection limiting
`binary_semaphore`	1	Thread signaling, event notification

Parallel Pipeline Example

Combining these primitives into a practical pattern — a producer-consumer pipeline with bounded buffering and graceful shutdown:

template <typename T, int Capacity>
struct BoundedChannel {
    std::array<T, Capacity> buffer_{};
    int head_ = 0, tail_ = 0;
    std::mutex mtx_;
    std::counting_semaphore<Capacity> empty_{Capacity};
    std::counting_semaphore<Capacity> full_{0};

    void send(T item) {
        empty_.acquire();
        {
            std::lock_guard lock(mtx_);
            buffer_[tail_ % Capacity] = std::move(item);
            ++tail_;
        }
        full_.release();
    }

    T recv() {
        full_.acquire();
        T item;
        {
            std::lock_guard lock(mtx_);
            item = std::move(buffer_[head_ % Capacity]);
            ++head_;
        }
        empty_.release();
        return item;
    }
};

The empty semaphore blocks producers when the buffer is full; the full semaphore blocks consumers when the buffer is empty. No condition variables, no spurious wakeups.

Choosing the Right Primitive

Need	Primitive
Thread that cleans up automatically	`jthread`
Graceful shutdown signaling	`stop_token` / `stop_source`
Wait for N tasks to complete once	`latch`
Synchronize threads across repeated phases	`barrier`
Limit concurrent access to a resource	`counting_semaphore`
Signal between two threads	`binary_semaphore`
Mutual exclusion with ownership	`mutex` (still the right tool)

These primitives are available in all three major compilers (GCC 11+, Clang 14+, MSVC 19.28+) under -std=c++20. They replace a substantial amount of hand-rolled synchronization code with tested, standardized building blocks.

std::jthread — RAII Threads