Backend Concepts
DTL uses C++20 concepts to define backend requirements. This ensures compile-time verification that backend implementations satisfy required interfaces.
Overview
DTL’s backend architecture has three primary concept hierarchies:
Concept Family |
Purpose |
Key Implementations |
|---|---|---|
Communicator |
Point-to-point and collective communication |
|
MemorySpace |
Memory allocation and properties |
|
Executor |
Computation dispatch |
|
Communicator Concepts
Communicators provide distributed communication capabilities across ranks.
Communicator (Base)
The core concept for point-to-point communication:
template <typename T>
concept Communicator = requires(T& comm, const T& ccomm,
void* buf, const void* cbuf,
size_type count, rank_t rank,
int tag, request_handle& req) {
typename T::size_type;
// Query operations
{ ccomm.rank() } -> std::same_as<rank_t>;
{ ccomm.size() } -> std::same_as<rank_t>;
// Blocking point-to-point
{ comm.send(cbuf, count, rank, tag) } -> std::same_as<void>;
{ comm.recv(buf, count, rank, tag) } -> std::same_as<void>;
// Non-blocking point-to-point
{ comm.isend(cbuf, count, rank, tag) } -> std::same_as<request_handle>;
{ comm.irecv(buf, count, rank, tag) } -> std::same_as<request_handle>;
// Request completion
{ comm.wait(req) } -> std::same_as<void>;
{ comm.test(req) } -> std::same_as<bool>;
};
Required Types:
size_type: Type for counts and sizes
Required Operations:
Operation |
Description |
|---|---|
|
Returns this process’s rank (0 to size-1) |
|
Returns total number of ranks |
|
Blocking send to destination rank |
|
Blocking receive from source rank |
|
Non-blocking send, returns request handle |
|
Non-blocking receive, returns request handle |
|
Wait for non-blocking operation to complete |
|
Test if non-blocking operation completed |
CollectiveCommunicator
Extends Communicator with collective operations:
template <typename T>
concept CollectiveCommunicator = Communicator<T> &&
requires(T& comm, void* buf, const void* cbuf,
size_type count, rank_t root) {
{ comm.barrier() } -> std::same_as<void>;
{ comm.broadcast(buf, count, root) } -> std::same_as<void>;
{ comm.scatter(cbuf, buf, count, root) } -> std::same_as<void>;
{ comm.gather(cbuf, buf, count, root) } -> std::same_as<void>;
{ comm.allgather(cbuf, buf, count) } -> std::same_as<void>;
{ comm.alltoall(cbuf, buf, count) } -> std::same_as<void>;
};
Additional Operations:
Operation |
Description |
|---|---|
|
Synchronize all ranks |
|
Root sends data to all ranks |
|
Root distributes data to all ranks |
|
All ranks send data to root |
|
Gather data to all ranks |
|
All-to-all exchange |
ReducingCommunicator
Extends CollectiveCommunicator with reduction operations:
template <typename T>
concept ReducingCommunicator = CollectiveCommunicator<T> &&
requires(T& comm, const void* sendbuf, void* recvbuf,
size_type count, rank_t root) {
{ comm.reduce_sum(sendbuf, recvbuf, count, root) } -> std::same_as<void>;
{ comm.allreduce_sum(sendbuf, recvbuf, count) } -> std::same_as<void>;
};
Additional Operations:
Operation |
Description |
|---|---|
|
Sum reduction to root rank |
|
Sum reduction to all ranks |
MemorySpace Concepts
Memory spaces abstract memory allocation across different address spaces.
MemorySpace (Base)
template <typename T>
concept MemorySpace = requires(T& space, const T& cspace,
size_type size, size_type alignment,
void* ptr) {
typename T::pointer;
typename T::size_type;
// Allocation
{ space.allocate(size) } -> std::same_as<void*>;
{ space.allocate(size, alignment) } -> std::same_as<void*>;
{ space.deallocate(ptr, size) } -> std::same_as<void>;
// Properties
{ cspace.properties() } -> std::same_as<memory_space_properties>;
{ cspace.name() } -> std::convertible_to<const char*>;
};
Required Types:
pointer: Raw pointer type (typicallyvoid*)size_type: Size type for counts
Required Operations:
Operation |
Description |
|---|---|
|
Allocate |
|
Allocate with alignment requirement |
|
Free allocation |
|
Returns |
|
Returns human-readable name |
Memory Space Properties
struct memory_space_properties {
bool host_accessible = true; // CPU can access
bool device_accessible = false; // GPU can access
bool unified = false; // Coherent across host/device
bool supports_atomics = true; // Atomic operations work
bool pageable = true; // Pageable (vs pinned)
size_type alignment = alignof(std::max_align_t);
};
TypedMemorySpace
Adds typed allocation support:
template <typename Space, typename T>
concept TypedMemorySpace = MemorySpace<Space> &&
requires(Space& space, size_type count, T* ptr) {
{ space.template allocate_typed<T>(count) } -> std::same_as<T*>;
{ space.deallocate_typed(ptr, count) } -> std::same_as<void>;
{ space.template construct<T>(ptr) } -> std::same_as<void>;
{ space.destroy(ptr) } -> std::same_as<void>;
};
Executor Concepts
Executors dispatch computation to processing resources.
Executor (Base)
template <typename T>
concept Executor = requires(T& exec, const T& cexec,
std::function<void()> f) {
{ exec.execute(f) } -> std::same_as<void>;
{ cexec.name() } -> std::convertible_to<const char*>;
};
Required Operations:
Operation |
Description |
|---|---|
|
Execute callable |
|
Returns executor name |
ParallelExecutor
Adds parallel execution capabilities:
template <typename T>
concept ParallelExecutor = Executor<T> &&
requires(T& exec, const T& cexec,
size_type count, std::function<void(size_type)> f) {
{ exec.parallel_for(count, f) } -> std::same_as<void>;
{ cexec.max_parallelism() } -> std::same_as<size_type>;
{ cexec.suggested_parallelism() } -> std::same_as<size_type>;
};
Additional Operations:
Operation |
Description |
|---|---|
|
Execute |
|
Maximum concurrent work items |
|
Recommended parallelism level |
BulkExecutor
Optimized for bulk operations with chunking:
template <typename T>
concept BulkExecutor = ParallelExecutor<T> &&
requires(T& exec, size_type count,
std::function<void(size_type, size_type)> f) {
{ exec.bulk_execute(count, f) } -> std::same_as<void>;
};
Executor Properties
struct executor_properties {
size_type max_concurrency = 1; // Max concurrent work items
bool in_order = true; // Ordered execution
bool owns_threads = false; // Manages thread pool
bool supports_work_stealing = false;
};
Standard Executors
DTL provides these built-in executors:
inline_executor
Executes work immediately in the calling thread:
class inline_executor {
public:
template <typename F>
void execute(F&& f) { std::forward<F>(f)(); }
static constexpr const char* name() noexcept { return "inline"; }
static constexpr bool is_synchronous() noexcept { return true; }
};
sequential_executor
Sequential execution with parallel interface:
class sequential_executor {
public:
template <typename F>
void parallel_for(size_type count, F&& f) {
for (size_type i = 0; i < count; ++i) { f(i); }
}
static constexpr size_type max_parallelism() noexcept { return 1; }
};
Concept Verification
Implementations use static_assert to verify concept satisfaction:
// In mpi_comm_adapter.hpp
static_assert(Communicator<mpi_comm_adapter>,
"mpi_comm_adapter must satisfy Communicator concept");
static_assert(CollectiveCommunicator<mpi_comm_adapter>,
"mpi_comm_adapter must satisfy CollectiveCommunicator concept");
static_assert(ReducingCommunicator<mpi_comm_adapter>,
"mpi_comm_adapter must satisfy ReducingCommunicator concept");
// In cuda_memory_space.hpp
static_assert(MemorySpace<cuda_memory_space>,
"cuda_memory_space must satisfy MemorySpace concept");
// In cpu_executor.hpp
static_assert(Executor<cpu_executor>,
"cpu_executor must satisfy Executor concept");
static_assert(ParallelExecutor<cpu_executor>,
"cpu_executor must satisfy ParallelExecutor concept");
Concept Header Locations
Concept |
Header |
|---|---|
Communicator |
|
MemorySpace |
|
Executor |
|
MemoryTransfer |
|
Serializer |
|
See Also
Backend Selection - When to use each backend
Implementing a Backend - Step-by-step guide