Backend Concepts

DTL uses C++20 concepts to define backend requirements. This ensures compile-time verification that backend implementations satisfy required interfaces.

Overview

DTL’s backend architecture has three primary concept hierarchies:

Concept Family

Purpose

Key Implementations

Communicator

Point-to-point and collective communication

mpi_comm_adapter

MemorySpace

Memory allocation and properties

host_memory_space, cuda_memory_space

Executor

Computation dispatch

cpu_executor, inline_executor

Communicator Concepts

Communicators provide distributed communication capabilities across ranks.

Communicator (Base)

The core concept for point-to-point communication:

template <typename T>
concept Communicator = requires(T& comm, const T& ccomm,
                                void* buf, const void* cbuf,
                                size_type count, rank_t rank,
                                int tag, request_handle& req) {
    typename T::size_type;

    // Query operations
    { ccomm.rank() } -> std::same_as<rank_t>;
    { ccomm.size() } -> std::same_as<rank_t>;

    // Blocking point-to-point
    { comm.send(cbuf, count, rank, tag) } -> std::same_as<void>;
    { comm.recv(buf, count, rank, tag) } -> std::same_as<void>;

    // Non-blocking point-to-point
    { comm.isend(cbuf, count, rank, tag) } -> std::same_as<request_handle>;
    { comm.irecv(buf, count, rank, tag) } -> std::same_as<request_handle>;

    // Request completion
    { comm.wait(req) } -> std::same_as<void>;
    { comm.test(req) } -> std::same_as<bool>;
};

Required Types:

  • size_type: Type for counts and sizes

Required Operations:

Operation

Description

rank()

Returns this process’s rank (0 to size-1)

size()

Returns total number of ranks

send()

Blocking send to destination rank

recv()

Blocking receive from source rank

isend()

Non-blocking send, returns request handle

irecv()

Non-blocking receive, returns request handle

wait()

Wait for non-blocking operation to complete

test()

Test if non-blocking operation completed

CollectiveCommunicator

Extends Communicator with collective operations:

template <typename T>
concept CollectiveCommunicator = Communicator<T> &&
    requires(T& comm, void* buf, const void* cbuf,
             size_type count, rank_t root) {
    { comm.barrier() } -> std::same_as<void>;
    { comm.broadcast(buf, count, root) } -> std::same_as<void>;
    { comm.scatter(cbuf, buf, count, root) } -> std::same_as<void>;
    { comm.gather(cbuf, buf, count, root) } -> std::same_as<void>;
    { comm.allgather(cbuf, buf, count) } -> std::same_as<void>;
    { comm.alltoall(cbuf, buf, count) } -> std::same_as<void>;
};

Additional Operations:

Operation

Description

barrier()

Synchronize all ranks

broadcast()

Root sends data to all ranks

scatter()

Root distributes data to all ranks

gather()

All ranks send data to root

allgather()

Gather data to all ranks

alltoall()

All-to-all exchange

ReducingCommunicator

Extends CollectiveCommunicator with reduction operations:

template <typename T>
concept ReducingCommunicator = CollectiveCommunicator<T> &&
    requires(T& comm, const void* sendbuf, void* recvbuf,
             size_type count, rank_t root) {
    { comm.reduce_sum(sendbuf, recvbuf, count, root) } -> std::same_as<void>;
    { comm.allreduce_sum(sendbuf, recvbuf, count) } -> std::same_as<void>;
};

Additional Operations:

Operation

Description

reduce_sum()

Sum reduction to root rank

allreduce_sum()

Sum reduction to all ranks

Communicator Tags

Tag types identify communicator implementations:

struct mpi_communicator_tag {};
struct shared_memory_communicator_tag {};
struct gpu_communicator_tag {};

MemorySpace Concepts

Memory spaces abstract memory allocation across different address spaces.

MemorySpace (Base)

template <typename T>
concept MemorySpace = requires(T& space, const T& cspace,
                               size_type size, size_type alignment,
                               void* ptr) {
    typename T::pointer;
    typename T::size_type;

    // Allocation
    { space.allocate(size) } -> std::same_as<void*>;
    { space.allocate(size, alignment) } -> std::same_as<void*>;
    { space.deallocate(ptr, size) } -> std::same_as<void>;

    // Properties
    { cspace.properties() } -> std::same_as<memory_space_properties>;
    { cspace.name() } -> std::convertible_to<const char*>;
};

Required Types:

  • pointer: Raw pointer type (typically void*)

  • size_type: Size type for counts

Required Operations:

Operation

Description

allocate(size)

Allocate size bytes

allocate(size, alignment)

Allocate with alignment requirement

deallocate(ptr, size)

Free allocation

properties()

Returns memory_space_properties

name()

Returns human-readable name

Memory Space Properties

struct memory_space_properties {
    bool host_accessible = true;    // CPU can access
    bool device_accessible = false; // GPU can access
    bool unified = false;           // Coherent across host/device
    bool supports_atomics = true;   // Atomic operations work
    bool pageable = true;           // Pageable (vs pinned)
    size_type alignment = alignof(std::max_align_t);
};

TypedMemorySpace

Adds typed allocation support:

template <typename Space, typename T>
concept TypedMemorySpace = MemorySpace<Space> &&
    requires(Space& space, size_type count, T* ptr) {
    { space.template allocate_typed<T>(count) } -> std::same_as<T*>;
    { space.deallocate_typed(ptr, count) } -> std::same_as<void>;
    { space.template construct<T>(ptr) } -> std::same_as<void>;
    { space.destroy(ptr) } -> std::same_as<void>;
};

Memory Space Tags

struct host_memory_space_tag {};    // CPU memory
struct device_memory_space_tag {};  // GPU memory
struct unified_memory_space_tag {}; // Managed memory
struct pinned_memory_space_tag {};  // Page-locked host memory

Executor Concepts

Executors dispatch computation to processing resources.

Executor (Base)

template <typename T>
concept Executor = requires(T& exec, const T& cexec,
                            std::function<void()> f) {
    { exec.execute(f) } -> std::same_as<void>;
    { cexec.name() } -> std::convertible_to<const char*>;
};

Required Operations:

Operation

Description

execute(f)

Execute callable f

name()

Returns executor name

ParallelExecutor

Adds parallel execution capabilities:

template <typename T>
concept ParallelExecutor = Executor<T> &&
    requires(T& exec, const T& cexec,
             size_type count, std::function<void(size_type)> f) {
    { exec.parallel_for(count, f) } -> std::same_as<void>;
    { cexec.max_parallelism() } -> std::same_as<size_type>;
    { cexec.suggested_parallelism() } -> std::same_as<size_type>;
};

Additional Operations:

Operation

Description

parallel_for(count, f)

Execute f(i) for i in [0, count)

max_parallelism()

Maximum concurrent work items

suggested_parallelism()

Recommended parallelism level

BulkExecutor

Optimized for bulk operations with chunking:

template <typename T>
concept BulkExecutor = ParallelExecutor<T> &&
    requires(T& exec, size_type count,
             std::function<void(size_type, size_type)> f) {
    { exec.bulk_execute(count, f) } -> std::same_as<void>;
};

Executor Properties

struct executor_properties {
    size_type max_concurrency = 1;    // Max concurrent work items
    bool in_order = true;              // Ordered execution
    bool owns_threads = false;         // Manages thread pool
    bool supports_work_stealing = false;
};

Executor Tags

struct inline_executor_tag {};
struct thread_pool_executor_tag {};
struct single_thread_executor_tag {};
struct gpu_executor_tag {};

Standard Executors

DTL provides these built-in executors:

inline_executor

Executes work immediately in the calling thread:

class inline_executor {
public:
    template <typename F>
    void execute(F&& f) { std::forward<F>(f)(); }

    static constexpr const char* name() noexcept { return "inline"; }
    static constexpr bool is_synchronous() noexcept { return true; }
};

sequential_executor

Sequential execution with parallel interface:

class sequential_executor {
public:
    template <typename F>
    void parallel_for(size_type count, F&& f) {
        for (size_type i = 0; i < count; ++i) { f(i); }
    }

    static constexpr size_type max_parallelism() noexcept { return 1; }
};

Concept Verification

Implementations use static_assert to verify concept satisfaction:

// In mpi_comm_adapter.hpp
static_assert(Communicator<mpi_comm_adapter>,
              "mpi_comm_adapter must satisfy Communicator concept");
static_assert(CollectiveCommunicator<mpi_comm_adapter>,
              "mpi_comm_adapter must satisfy CollectiveCommunicator concept");
static_assert(ReducingCommunicator<mpi_comm_adapter>,
              "mpi_comm_adapter must satisfy ReducingCommunicator concept");

// In cuda_memory_space.hpp
static_assert(MemorySpace<cuda_memory_space>,
              "cuda_memory_space must satisfy MemorySpace concept");

// In cpu_executor.hpp
static_assert(Executor<cpu_executor>,
              "cpu_executor must satisfy Executor concept");
static_assert(ParallelExecutor<cpu_executor>,
              "cpu_executor must satisfy ParallelExecutor concept");

Concept Header Locations

Concept

Header

Communicator

include/dtl/backend/concepts/communicator.hpp

MemorySpace

include/dtl/backend/concepts/memory_space.hpp

Executor

include/dtl/backend/concepts/executor.hpp

MemoryTransfer

include/dtl/backend/concepts/memory_transfer.hpp

Serializer

include/dtl/backend/concepts/serializer.hpp

See Also