# Backend Concepts DTL uses C++20 concepts to define backend requirements. This ensures compile-time verification that backend implementations satisfy required interfaces. ## Overview DTL's backend architecture has three primary concept hierarchies: | Concept Family | Purpose | Key Implementations | |----------------|---------|---------------------| | **Communicator** | Point-to-point and collective communication | `mpi_comm_adapter` | | **MemorySpace** | Memory allocation and properties | `host_memory_space`, `cuda_memory_space` | | **Executor** | Computation dispatch | `cpu_executor`, `inline_executor` | ## Communicator Concepts Communicators provide distributed communication capabilities across ranks. ### Communicator (Base) The core concept for point-to-point communication: ```cpp template concept Communicator = requires(T& comm, const T& ccomm, void* buf, const void* cbuf, size_type count, rank_t rank, int tag, request_handle& req) { typename T::size_type; // Query operations { ccomm.rank() } -> std::same_as; { ccomm.size() } -> std::same_as; // Blocking point-to-point { comm.send(cbuf, count, rank, tag) } -> std::same_as; { comm.recv(buf, count, rank, tag) } -> std::same_as; // Non-blocking point-to-point { comm.isend(cbuf, count, rank, tag) } -> std::same_as; { comm.irecv(buf, count, rank, tag) } -> std::same_as; // Request completion { comm.wait(req) } -> std::same_as; { comm.test(req) } -> std::same_as; }; ``` **Required Types:** - `size_type`: Type for counts and sizes **Required Operations:** | Operation | Description | |-----------|-------------| | `rank()` | Returns this process's rank (0 to size-1) | | `size()` | Returns total number of ranks | | `send()` | Blocking send to destination rank | | `recv()` | Blocking receive from source rank | | `isend()` | Non-blocking send, returns request handle | | `irecv()` | Non-blocking receive, returns request handle | | `wait()` | Wait for non-blocking operation to complete | | `test()` | Test if non-blocking operation completed | ### CollectiveCommunicator Extends `Communicator` with collective operations: ```cpp template concept CollectiveCommunicator = Communicator && requires(T& comm, void* buf, const void* cbuf, size_type count, rank_t root) { { comm.barrier() } -> std::same_as; { comm.broadcast(buf, count, root) } -> std::same_as; { comm.scatter(cbuf, buf, count, root) } -> std::same_as; { comm.gather(cbuf, buf, count, root) } -> std::same_as; { comm.allgather(cbuf, buf, count) } -> std::same_as; { comm.alltoall(cbuf, buf, count) } -> std::same_as; }; ``` **Additional Operations:** | Operation | Description | |-----------|-------------| | `barrier()` | Synchronize all ranks | | `broadcast()` | Root sends data to all ranks | | `scatter()` | Root distributes data to all ranks | | `gather()` | All ranks send data to root | | `allgather()` | Gather data to all ranks | | `alltoall()` | All-to-all exchange | ### ReducingCommunicator Extends `CollectiveCommunicator` with reduction operations: ```cpp template concept ReducingCommunicator = CollectiveCommunicator && requires(T& comm, const void* sendbuf, void* recvbuf, size_type count, rank_t root) { { comm.reduce_sum(sendbuf, recvbuf, count, root) } -> std::same_as; { comm.allreduce_sum(sendbuf, recvbuf, count) } -> std::same_as; }; ``` **Additional Operations:** | Operation | Description | |-----------|-------------| | `reduce_sum()` | Sum reduction to root rank | | `allreduce_sum()` | Sum reduction to all ranks | ### Communicator Tags Tag types identify communicator implementations: ```cpp struct mpi_communicator_tag {}; struct shared_memory_communicator_tag {}; struct gpu_communicator_tag {}; ``` ## MemorySpace Concepts Memory spaces abstract memory allocation across different address spaces. ### MemorySpace (Base) ```cpp template concept MemorySpace = requires(T& space, const T& cspace, size_type size, size_type alignment, void* ptr) { typename T::pointer; typename T::size_type; // Allocation { space.allocate(size) } -> std::same_as; { space.allocate(size, alignment) } -> std::same_as; { space.deallocate(ptr, size) } -> std::same_as; // Properties { cspace.properties() } -> std::same_as; { cspace.name() } -> std::convertible_to; }; ``` **Required Types:** - `pointer`: Raw pointer type (typically `void*`) - `size_type`: Size type for counts **Required Operations:** | Operation | Description | |-----------|-------------| | `allocate(size)` | Allocate `size` bytes | | `allocate(size, alignment)` | Allocate with alignment requirement | | `deallocate(ptr, size)` | Free allocation | | `properties()` | Returns `memory_space_properties` | | `name()` | Returns human-readable name | ### Memory Space Properties ```cpp struct memory_space_properties { bool host_accessible = true; // CPU can access bool device_accessible = false; // GPU can access bool unified = false; // Coherent across host/device bool supports_atomics = true; // Atomic operations work bool pageable = true; // Pageable (vs pinned) size_type alignment = alignof(std::max_align_t); }; ``` ### TypedMemorySpace Adds typed allocation support: ```cpp template concept TypedMemorySpace = MemorySpace && requires(Space& space, size_type count, T* ptr) { { space.template allocate_typed(count) } -> std::same_as; { space.deallocate_typed(ptr, count) } -> std::same_as; { space.template construct(ptr) } -> std::same_as; { space.destroy(ptr) } -> std::same_as; }; ``` ### Memory Space Tags ```cpp struct host_memory_space_tag {}; // CPU memory struct device_memory_space_tag {}; // GPU memory struct unified_memory_space_tag {}; // Managed memory struct pinned_memory_space_tag {}; // Page-locked host memory ``` ## Executor Concepts Executors dispatch computation to processing resources. ### Executor (Base) ```cpp template concept Executor = requires(T& exec, const T& cexec, std::function f) { { exec.execute(f) } -> std::same_as; { cexec.name() } -> std::convertible_to; }; ``` **Required Operations:** | Operation | Description | |-----------|-------------| | `execute(f)` | Execute callable `f` | | `name()` | Returns executor name | ### ParallelExecutor Adds parallel execution capabilities: ```cpp template concept ParallelExecutor = Executor && requires(T& exec, const T& cexec, size_type count, std::function f) { { exec.parallel_for(count, f) } -> std::same_as; { cexec.max_parallelism() } -> std::same_as; { cexec.suggested_parallelism() } -> std::same_as; }; ``` **Additional Operations:** | Operation | Description | |-----------|-------------| | `parallel_for(count, f)` | Execute `f(i)` for i in [0, count) | | `max_parallelism()` | Maximum concurrent work items | | `suggested_parallelism()` | Recommended parallelism level | ### BulkExecutor Optimized for bulk operations with chunking: ```cpp template concept BulkExecutor = ParallelExecutor && requires(T& exec, size_type count, std::function f) { { exec.bulk_execute(count, f) } -> std::same_as; }; ``` ### Executor Properties ```cpp struct executor_properties { size_type max_concurrency = 1; // Max concurrent work items bool in_order = true; // Ordered execution bool owns_threads = false; // Manages thread pool bool supports_work_stealing = false; }; ``` ### Executor Tags ```cpp struct inline_executor_tag {}; struct thread_pool_executor_tag {}; struct single_thread_executor_tag {}; struct gpu_executor_tag {}; ``` ## Standard Executors DTL provides these built-in executors: ### inline_executor Executes work immediately in the calling thread: ```cpp class inline_executor { public: template void execute(F&& f) { std::forward(f)(); } static constexpr const char* name() noexcept { return "inline"; } static constexpr bool is_synchronous() noexcept { return true; } }; ``` ### sequential_executor Sequential execution with parallel interface: ```cpp class sequential_executor { public: template void parallel_for(size_type count, F&& f) { for (size_type i = 0; i < count; ++i) { f(i); } } static constexpr size_type max_parallelism() noexcept { return 1; } }; ``` ## Concept Verification Implementations use `static_assert` to verify concept satisfaction: ```cpp // In mpi_comm_adapter.hpp static_assert(Communicator, "mpi_comm_adapter must satisfy Communicator concept"); static_assert(CollectiveCommunicator, "mpi_comm_adapter must satisfy CollectiveCommunicator concept"); static_assert(ReducingCommunicator, "mpi_comm_adapter must satisfy ReducingCommunicator concept"); // In cuda_memory_space.hpp static_assert(MemorySpace, "cuda_memory_space must satisfy MemorySpace concept"); // In cpu_executor.hpp static_assert(Executor, "cpu_executor must satisfy Executor concept"); static_assert(ParallelExecutor, "cpu_executor must satisfy ParallelExecutor concept"); ``` ## Concept Header Locations | Concept | Header | |---------|--------| | Communicator | `include/dtl/backend/concepts/communicator.hpp` | | MemorySpace | `include/dtl/backend/concepts/memory_space.hpp` | | Executor | `include/dtl/backend/concepts/executor.hpp` | | MemoryTransfer | `include/dtl/backend/concepts/memory_transfer.hpp` | | Serializer | `include/dtl/backend/concepts/serializer.hpp` | ## See Also - [Backend Selection](backend_selection.md) - When to use each backend - [Implementing a Backend](implementing_backend.md) - Step-by-step guide