STL to DTL Migration Guide
This guide helps you migrate code from STL containers and algorithms to DTL equivalents for distributed computing.
Table of Contents
Philosophy
DTL is STL-inspired but not STL-identical. Key differences:
Aspect |
STL |
DTL |
|---|---|---|
Memory model |
Single process |
Distributed across ranks |
Element access |
Always local |
Local |
Iteration |
Full container |
Local partition (primary) |
Communication |
N/A |
Explicit |
Error handling |
Exceptions |
Configurable (result or exceptions) |
Core principle: DTL makes distribution explicit. Code that “just works” in STL may need restructuring in DTL.
Container Mapping
std::vector → distributed_vector
STL:
std::vector<double> vec(1000);
vec[0] = 42.0;
double x = vec[500];
DTL:
// Standalone mode (single rank)
dtl::distributed_vector<double> vec(1000, 1, 0);
// Multi-rank mode
dtl::distributed_vector<double> vec(1000, num_ranks, my_rank);
// Local access via local_view
auto local = vec.local_view();
local[0] = 42.0; // First local element
// Global access via global_view (explicit)
auto global = vec.global_view();
double x = global[500].get(); // May communicate
std::array/std::span → local_view
Local views provide STL-compatible access:
STL:
std::span<double> span(data, size);
for (auto& x : span) {
x *= 2;
}
DTL:
auto local = vec.local_view();
for (auto& x : local) {
x *= 2;
}
// Identical iteration pattern
Multi-dimensional: std::mdspan → distributed_tensor
STL (C++23):
std::mdspan<double, std::extents<100, 100>> mat(data);
mat(50, 50) = 42.0;
DTL:
dtl::distributed_tensor<double, 2> mat({100, 100}, num_ranks, my_rank);
// Local mdspan access
auto local = mat.local_mdspan();
local(i, j) = 42.0; // Local indices
// Global access
auto global = mat.global_view();
global({50, 50}).put(42.0); // Global indices, may communicate
View Mapping
Direct Reference vs remote_ref
STL (direct reference):
double& ref = vec[i];
ref = 42.0; // Direct write
double x = ref; // Direct read
DTL (local view - same as STL):
auto local = vec.local_view();
double& ref = local[i]; // Direct reference to local element
ref = 42.0;
double x = ref;
DTL (global view - explicit remote):
auto global = vec.global_view();
auto ref = global[i]; // remote_ref<double>, NOT double&
// Must use explicit operations
ref.put(42.0); // Write
double x = ref.get(); // Read
// These DO NOT compile:
// double& bad = global[i]; // Error: no implicit conversion
// *global[i] = 42.0; // Error: no implicit dereference
Range-Based For Loops
STL:
for (double& x : vec) {
x *= 2;
}
DTL (local):
// Local view: identical pattern
for (double& x : vec.local_view()) {
x *= 2;
}
DTL (global):
// NO direct global iteration - use segmented view
auto segv = vec.segmented_view();
for (auto& segment : segv.segments()) {
for (double& x : segment.local_range()) {
x *= 2;
}
}
Algorithm Mapping
std::for_each
STL:
std::for_each(vec.begin(), vec.end(), [](double& x) { x *= 2; });
DTL:
// Operates on local partition
dtl::for_each(vec, [](double& x) { x *= 2; });
// Or with explicit local view
auto local = vec.local_view();
std::for_each(local.begin(), local.end(), [](double& x) { x *= 2; });
std::transform
STL:
std::transform(input.begin(), input.end(), output.begin(),
[](double x) { return x * x; });
DTL:
dtl::transform(input, output, [](double x) { return x * x; });
std::accumulate / std::reduce
STL:
double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
DTL (local only):
double local_sum = dtl::local_reduce(vec, 0.0, std::plus<>{});
DTL (global):
double global_sum = dtl::distributed_reduce(vec, 0.0, std::plus<>{});
// All ranks participate; all ranks get the result
std::sort
STL:
std::sort(vec.begin(), vec.end());
DTL (local):
dtl::local_sort(vec); // Sorts local partition only
DTL (global):
dtl::distributed_sort(vec); // Global sort, collective operation
std::find / std::find_if
STL:
auto it = std::find(vec.begin(), vec.end(), value);
DTL (local):
auto local = vec.local_view();
auto it = std::find(local.begin(), local.end(), value);
DTL (global):
// No built-in global find; must implement with collective
auto local = vec.local_view();
auto it = std::find(local.begin(), local.end(), value);
bool found_local = (it != local.end());
// Collective to determine if found globally
bool found_global;
MPI_Allreduce(&found_local, &found_global, 1, MPI_C_BOOL, MPI_LOR, MPI_COMM_WORLD);
Common Patterns
Pattern 1: Initialize All Elements
STL:
std::vector<double> vec(1000);
for (size_t i = 0; i < vec.size(); ++i) {
vec[i] = static_cast<double>(i);
}
DTL:
dtl::distributed_vector<double> vec(1000, num_ranks, my_rank);
auto local = vec.local_view();
for (size_t i = 0; i < local.size(); ++i) {
// Compute global index
size_t global_idx = vec.global_offset() + i;
local[i] = static_cast<double>(global_idx);
}
Pattern 2: Compute Sum
STL:
double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
DTL:
// Local sum only
double local_sum = dtl::local_reduce(vec, 0.0, std::plus<>{});
// Global sum
double global_sum = dtl::distributed_reduce(vec, 0.0, std::plus<>{});
Pattern 3: Find Min/Max
STL:
auto [min_it, max_it] = std::minmax_element(vec.begin(), vec.end());
double min_val = *min_it;
double max_val = *max_it;
DTL:
// Local min/max
auto local = vec.local_view();
auto [local_min_it, local_max_it] = std::minmax_element(local.begin(), local.end());
double local_min = *local_min_it;
double local_max = *local_max_it;
// Global min/max
double global_min, global_max;
MPI_Allreduce(&local_min, &global_min, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);
MPI_Allreduce(&local_max, &global_max, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD);
Pattern 4: Filter Elements
STL:
std::vector<int> filtered;
std::copy_if(vec.begin(), vec.end(), std::back_inserter(filtered),
[](int x) { return x > 0; });
DTL:
// Filter locally
auto local = vec.local_view();
std::vector<int> local_filtered;
std::copy_if(local.begin(), local.end(), std::back_inserter(local_filtered),
[](int x) { return x > 0; });
// Note: local_filtered is local to each rank
// For global filtered result, need explicit redistribution
Pattern 5: Dot Product
STL:
double dot = std::inner_product(a.begin(), a.end(), b.begin(), 0.0);
DTL:
// Transform-reduce pattern
auto local_a = a.local_view();
auto local_b = b.local_view();
double local_dot = std::inner_product(local_a.begin(), local_a.end(),
local_b.begin(), 0.0);
double global_dot;
MPI_Allreduce(&local_dot, &global_dot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
Pitfalls to Avoid
Pitfall 1: Ignoring Distribution
Wrong:
// Assumes all elements are local
for (size_t i = 0; i < vec.global_size(); ++i) {
vec.global_view()[i].put(i); // Massive communication!
}
Correct:
auto local = vec.local_view();
for (size_t i = 0; i < local.size(); ++i) {
local[i] = vec.global_offset() + i; // Local only
}
Pitfall 2: Forgetting Collective Participation
Wrong:
if (rank == 0) {
auto sum = dtl::distributed_reduce(vec, 0.0, std::plus<>{}); // DEADLOCK
}
Correct:
auto sum = dtl::distributed_reduce(vec, 0.0, std::plus<>{}); // All ranks
if (rank == 0) {
std::cout << "Sum: " << sum << "\n";
}
Pitfall 3: Using Stale Views
Wrong:
auto local = vec.local_view();
// ... operations ...
vec.resize(2000);
local[0] = 42.0; // ERROR: view invalidated
Correct:
auto local = vec.local_view();
// ... operations ...
vec.resize(2000);
local = vec.local_view(); // Refresh
local[0] = 42.0; // OK
Pitfall 4: Assuming Global Order
Wrong:
dtl::local_sort(vec); // Only sorts local partition!
// Elements are NOT globally sorted
Correct:
dtl::distributed_sort(vec); // Global sort
// Now globally sorted
Pitfall 5: Implicit Conversions
Won’t Compile:
auto global = vec.global_view();
double x = global[500]; // Error: remote_ref<double> not implicitly convertible
Correct:
auto global = vec.global_view();
double x = global[500].get(); // Explicit
Migration Checklist
Use this checklist when migrating STL code to DTL:
Container Migration
[ ] Replace
std::vector<T>withdtl::distributed_vector<T>[ ] Add rank information to constructors (
num_ranks,my_rank)[ ] Review all element access for local vs global needs
Access Pattern Migration
[ ] Replace direct
vec[i]withlocal_view()[i]for local access[ ] Use
global_view()[i].get()/put()for global access[ ] Update range-based for loops to use
local_view()
Algorithm Migration
[ ] Replace
std::accumulatewithdtl::local_reduceordtl::distributed_reduce[ ] Replace
std::sortwithdtl::local_sortordtl::distributed_sort[ ] Update all algorithms to operate on appropriate views
Communication Points
[ ] Identify where global results are needed
[ ] Add collective operations for cross-rank data
[ ] Ensure all ranks participate in collective calls
Error Handling
[ ] Choose error policy (
expectedorthrowing)[ ] Add error checking for operations that may fail
[ ] Handle collective errors appropriately
Testing
[ ] Test with single rank (standalone mode)
[ ] Test with multiple ranks
[ ] Verify results match sequential reference
Quick Reference
STL |
DTL Local |
DTL Global |
|---|---|---|
|
|
|
|
|
|
|
|
Use segmented_view |
|
|
|
|
|
|
|
|
|