# Legacy Deep-Dive: Policies > This page is retained as a **detailed reference**. > The canonical user path is now the chaptered handbook. **Primary chapter**: [06-policies-and-execution-control.md](06-policies-and-execution-control.md) **Runtime and handles**: [Runtime and Handle Model](13-runtime-and-handle-model.md) --- ## Detailed Reference (Legacy) DTL uses a policy-based design that separates concerns into orthogonal configuration axes. This allows flexible, compile-time configuration of distributed behavior. --- ## Table of Contents - [Overview](#overview) - [The Five Policy Axes](#the-five-policy-axes) - [Partition Policies](#partition-policies) - [Placement Policies](#placement-policies) - [Execution Policies](#execution-policies) - [Consistency Policies](#consistency-policies) - [Error Policies](#error-policies) - [Policy Composition](#policy-composition) - [Policy Precedence](#policy-precedence) --- ## Overview Distributed programming entangles multiple concerns: - How data is partitioned across ranks - Where data resides (host vs device memory) - How operations execute (sync vs async) - When writes become visible - How errors are handled DTL separates these into **five orthogonal policy axes**, allowing you to configure each independently. ### Why Policies? ```cpp // Without policies: hardcoded behavior distributed_vector vec(1000); // What partition? What memory? What error handling? // With policies: explicit, configurable behavior distributed_vector, host_only> vec(1000); // Or using policy_set for runtime composition ``` --- ## The Five Policy Axes | Axis | Question | Default | |------|----------|---------| | **Partition** | How is data divided across ranks? | `block_partition` | | **Placement** | Where does data live (host/device)? | `host_only` | | **Execution** | How do operations execute? | `seq` (synchronous) | | **Consistency** | When are writes visible? | `bulk_synchronous` | | **Error** | How are errors reported? | `expected` (result-based) | --- ## Partition Policies Partition policies determine how global indices map to ranks. ### block_partition (Default) Divides data into contiguous chunks: ```cpp // 1000 elements across 4 ranks: // Rank 0: indices [0, 250) // Rank 1: indices [250, 500) // Rank 2: indices [500, 750) // Rank 3: indices [750, 1000) dtl::distributed_vector> vec(1000, size, rank); // Block partition is the default dtl::distributed_vector vec_default(1000, size, rank); // Same as above ``` Properties: - Contiguous local storage - Good cache locality - Simple ownership queries - Best for sequential access patterns ### cyclic_partition Round-robin element distribution (planned): ```cpp // 1000 elements across 4 ranks: // Rank 0: indices 0, 4, 8, 12, ... // Rank 1: indices 1, 5, 9, 13, ... // Rank 2: indices 2, 6, 10, 14, ... // Rank 3: indices 3, 7, 11, 15, ... dtl::distributed_vector> vec(1000, size, rank); ``` Properties: - Better load balancing for irregular access - Non-contiguous local storage - Higher overhead for sequential access ### block_cyclic_partition Combines block and cyclic (planned): ```cpp // Block size 64, cyclic distribution: // Rank 0: indices [0,64), [256,320), ... // Rank 1: indices [64,128), [320,384), ... // etc. dtl::distributed_vector> vec(1000, size, rank); ``` Properties: - Balance between locality and load balancing - Standard in scientific computing (ScaLAPACK) ### hash_partition Hash-based distribution (for associative containers): ```cpp // Elements distributed by hash of key dtl::distributed_unordered_map> map(size, rank); // Custom hash function dtl::distributed_unordered_map> map(size, rank); ``` ### replicated Full copy on each rank: ```cpp // Every rank has complete copy dtl::distributed_vector lookup_table(1000, size, rank); ``` Properties: - No communication for reads - Writes require synchronization - Memory scales with rank count --- ## Placement Policies Placement policies determine where data resides physically. ### host_only (Default) Data resides in host (CPU) memory: ```cpp dtl::distributed_vector, dtl::host_only> vec(1000, size, rank); // host_only is the default dtl::distributed_vector vec_default(1000, size, rank); // Same as above ``` Properties: - Universal compatibility - No GPU required - Standard allocators ### device_only Data resides in device (GPU) memory: ```cpp // Requires DTL_ENABLE_CUDA or DTL_ENABLE_HIP dtl::distributed_vector, dtl::device_only<0>> vec(1000, size, rank); // Access requires GPU kernels or explicit transfer auto local = vec.local_view(); // Returns device pointer ``` Properties: - Data stays on GPU - Host access requires transfer - Best for GPU-only workflows ### device_preferred Prefers device memory with automatic fallback: ```cpp dtl::distributed_vector, dtl::device_preferred> vec(1000, size, rank); // Uses GPU memory if available, host memory otherwise ``` ### unified_memory CUDA Unified Memory (managed memory): ```cpp dtl::distributed_vector, dtl::unified_memory> vec(1000, size, rank); // Accessible from both host and device // Automatic page migration ``` Properties: - Convenience for mixed host/device access - Performance implications from page faults - Requires CUDA unified memory support --- ## Execution Policies Execution policies control how operations are performed. ### seq (Default) Synchronous, blocking execution: ```cpp // Operation completes before returning dtl::for_each(dtl::seq, vec, [](double& x) { x *= 2; }); // seq is the default dtl::for_each(vec, [](double& x) { x *= 2; }); // Same as above ``` Properties: - Simple to reason about - Deterministic completion - No concurrent execution ### par Parallel execution (blocking): ```cpp // Uses multiple threads, but still blocks until complete dtl::for_each(dtl::par, vec, [](double& x) { x *= 2; }); ``` Properties: - Utilizes multiple CPU cores - Still blocks caller - Thread-safe functor required ### par_unseq Parallel and vectorized (blocking): ```cpp // Enables SIMD and multi-threading dtl::for_each(dtl::par_unseq, vec, [](double& x) { x *= 2; }); ``` Properties: - Maximum CPU parallelism - Functor must be vectorization-safe - No synchronization in functor ### async Non-blocking execution: ```cpp // Returns immediately with a future auto future = dtl::for_each(dtl::async, vec, [](double& x) { x *= 2; }); // Do other work... // Wait for completion future.wait(); ``` Properties: - Enables overlap of computation and communication - Returns future/event handle - Requires explicit synchronization ### Usage with Algorithms ```cpp // Transform with parallel execution dtl::transform(dtl::par, vec, output, [](double x) { return x * x; }); // Reduce with async execution auto future = dtl::reduce(dtl::async, vec, 0.0, std::plus<>{}); // ... do other work ... double result = future.get(); ``` --- ## Consistency Policies Consistency policies define when writes become visible to other ranks. ### bulk_synchronous (Default) BSP model with explicit barriers: ```cpp // Writes not visible until barrier dtl::distributed_vector vec(1000, size, rank); auto local = vec.local_view(); local[0] = 42.0; // Local write // Writes become visible after barrier vec.barrier(); ``` Properties: - Clear synchronization points - Simple reasoning about visibility - Standard HPC model ### sequential_consistent Strongest consistency (planned): ```cpp dtl::distributed_vector vec(1000, size, rank); // All operations appear in a single global order // Higher synchronization overhead ``` ### release_acquire C++ memory model consistency (planned): ```cpp // Writes in release-ordered operations visible to acquire-ordered readers ``` ### relaxed Minimal ordering (planned): ```cpp // Only atomicity guaranteed, no ordering // Maximum performance, complex reasoning ``` --- ## Error Policies Error policies determine how errors are reported. ### expected (Default) Result-based error handling: ```cpp dtl::distributed_vector vec(1000, size, rank); auto global = vec.global_view(); auto result = global[500].get(); if (result.has_value()) { double val = result.value(); } else { auto error = result.error(); // Handle error } ``` Properties: - No exceptions - Explicit error checking - Compile-time enforced handling ### throwing Exception-based error handling: ```cpp dtl::distributed_vector vec(1000, size, rank); try { auto global = vec.global_view(); double val = global[500].get(); // Throws on error } catch (const dtl::communication_error& e) { // Handle error } ``` Properties: - Familiar exception patterns - Automatic propagation - Cannot be ignored --- ## Policy Composition ### Using policy_set Combine policies into a single set: ```cpp using my_policies = dtl::policy_set< dtl::block_partition<>, dtl::host_only, dtl::par, dtl::bulk_synchronous, dtl::expected >; dtl::distributed_vector vec(1000, size, rank); ``` ### Partial Specification Unspecified axes use defaults: ```cpp // Only specify partition, others use defaults dtl::distributed_vector> vec(1000, size, rank); // Equivalent to: // dtl::distributed_vector, host_only, seq, bulk_synchronous, expected> ``` ### Call-Site Override Override policies per-operation: ```cpp dtl::distributed_vector vec(1000, size, rank); // Default policies // Override execution policy for this call dtl::for_each(dtl::par, vec, [](double& x) { x *= 2; }); // Override multiple policies dtl::for_each( dtl::policy_set{}, vec, [](double& x) { x *= 2; } ); ``` --- ## Policy Precedence When multiple policy sources exist, precedence is: 1. **Call-site policy_set** (highest priority) 2. **Container-level defaults** 3. **Context default policy_set** 4. **Library defaults** (lowest priority) ```cpp // Context with default parallel execution auto ctx = dtl::context(dtl::policy_set{}); // Container uses context default (par) dtl::distributed_vector vec(ctx, 1000, size, rank); // Operation uses container default (par) dtl::for_each(vec, func); // Parallel execution // Call-site override beats all dtl::for_each(dtl::seq, vec, func); // Sequential execution ``` ### Conflict Detection Conflicting policies at the same level cause errors: ```cpp // COMPILE ERROR: two partition policies dtl::distributed_vector, dtl::cyclic_partition<>> > vec(1000, size, rank); ``` --- ## Common Policy Combinations ### High-Performance Computing (Default) ```cpp using hpc_policies = dtl::policy_set< dtl::block_partition<>, dtl::host_only, dtl::par, dtl::bulk_synchronous, dtl::expected >; ``` ### GPU Accelerated ```cpp using gpu_policies = dtl::policy_set< dtl::block_partition<>, dtl::device_only<0>, dtl::par, dtl::bulk_synchronous, dtl::expected >; ``` ### Development/Debugging ```cpp using debug_policies = dtl::policy_set< dtl::block_partition<>, dtl::host_only, dtl::seq, // Sequential for easier debugging dtl::bulk_synchronous, dtl::throwing // Exceptions for stack traces >; ``` ### Maximum Throughput ```cpp using throughput_policies = dtl::policy_set< dtl::block_partition<>, dtl::device_preferred, dtl::par_unseq, dtl::bulk_synchronous, dtl::expected >; ``` --- ## See Also - [Containers Guide](containers.md) - Using containers with policies - [Algorithms Guide](algorithms.md) - Algorithm execution policies - [Error Handling Guide](error_handling.md) - Error policy details