Legacy Deep-Dive: Language Bindings
This page is retained as a detailed reference. The canonical user path is now the chaptered handbook.
Primary chapter: 08-language-bindings-overview.md
Runtime and handles: Runtime and Handle Model
Detailed Reference (Legacy)
DTL provides language bindings beyond the native C++ API, enabling use from C, Python, Fortran, and other languages. This guide covers binding architecture, usage patterns, and language-specific considerations.
Table of Contents
Overview
Binding Architecture
DTL’s language bindings follow a layered architecture:
┌─────────────────────────────────────────────────────────┐
│ User Applications │
├──────────┬──────────┬──────────┬──────────┬────────────┤
│ Python │ Fortran │ Julia │ R │ Other │
├──────────┴──────────┴──────────┴──────────┴────────────┤
│ C ABI Layer │
│ (libdtl_c.so / dtl_c.dll) │
├─────────────────────────────────────────────────────────┤
│ C++ Core Library │
│ (header-only + MPI) │
└─────────────────────────────────────────────────────────┘
Key Design Principles:
C ABI as Universal Interface: The C bindings provide a stable ABI that any language with C FFI can use
Python with Native Feel: The Python bindings wrap the C ABI with Pythonic idioms and NumPy integration
Zero-Copy Where Possible: Data views share memory with the underlying C++ containers
Non-owning span semantics: C/Python/Fortran expose first-class distributed span APIs that remain explicitly non-owning
Available Bindings
Language |
Status |
Interface |
Documentation |
|---|---|---|---|
C++ |
Native |
Header-only |
Full API reference |
C |
Complete |
libdtl_c |
|
Python |
Complete |
dtl module |
|
Fortran |
Complete |
Native |
Feature Matrix
Feature |
C++ |
C |
Python |
|---|---|---|---|
Containers |
|||
distributed_vector |
Native |
|
|
distributed_array |
Native |
|
|
distributed_span |
Native ( |
|
|
distributed_tensor |
Native |
|
|
Collective Operations |
|||
allreduce |
Native |
|
|
broadcast |
Native |
|
|
gather/scatter |
Native |
|
|
reduce |
Native |
|
|
allgather |
Native |
|
|
Algorithms |
|||
for_each/transform |
Native |
|
|
fill/copy |
Native |
|
|
find/count |
Native |
|
|
reduce |
Native |
|
|
sort |
Native |
|
|
Policies |
|||
Partition (block/cyclic) |
Template params |
|
Constructor kwargs |
Placement (host/device) |
Template params |
|
Constructor kwargs |
RMA Operations |
|||
Window create/destroy |
Native |
|
|
put/get |
Native |
|
|
Atomic accumulate |
Native |
|
|
fetch_and_op |
Native |
|
|
compare_and_swap |
Native |
|
|
Synchronization |
Native |
|
|
C Bindings
The C bindings provide a stable, language-agnostic interface to DTL. They are designed for:
Integration with C codebases
Building bindings for other languages
Systems requiring ABI stability
Quick Example
#include <dtl/bindings/c/dtl.h>
#include <stdio.h>
int main() {
dtl_context_t ctx;
dtl_status status = dtl_context_create_default(&ctx);
if (status != DTL_SUCCESS) {
fprintf(stderr, "Error: %s\n", dtl_status_message(status));
return 1;
}
printf("Rank %d of %d\n", dtl_context_rank(ctx), dtl_context_size(ctx));
// Create distributed vector
dtl_vector_t vec;
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, 1000, &vec);
if (status != DTL_SUCCESS) {
dtl_context_destroy(ctx);
return 1;
}
// Access local data
double* data = dtl_vector_local_data_mut(vec);
for (size_t i = 0; i < dtl_vector_local_size(vec); i++) {
data[i] = (double)i;
}
// Barrier synchronization
dtl_context_barrier(ctx);
// Cleanup
dtl_vector_destroy(vec);
dtl_context_destroy(ctx);
return 0;
}
Building with C Bindings
# Compile
gcc -std=c99 -o my_program my_program.c -ldtl_c -lmpi
# Run with MPI
mpirun -np 4 ./my_program
See the complete C bindings guide for detailed API reference.
Python Bindings
The Python bindings provide a Pythonic interface with seamless NumPy integration.
Installation
# From source (in DTL build directory)
cd build
make python_install
# Or using pip (when package is published)
pip install dtl
Quick Example
import dtl
import numpy as np
# Create context (uses MPI_COMM_WORLD by default)
with dtl.Context() as ctx:
print(f"Rank {ctx.rank} of {ctx.size}")
# Create distributed vector
vec = dtl.DistributedVector(ctx, size=1000, dtype=np.float64)
# Access local data as NumPy array (zero-copy)
local = vec.local_view()
local[:] = np.arange(len(local)) + vec.local_offset
# Collective operations
local_sum = np.sum(local)
global_sum = dtl.allreduce(ctx, local_sum, op=dtl.SUM)
if ctx.is_root:
print(f"Global sum: {global_sum}")
Running with MPI
mpirun -np 4 python my_program.py
Key Features
Zero-Copy NumPy Views:
local_view()returns a NumPy array that shares memory with DTLmpi4py Interoperability: Pass mpi4py communicators to
Context()Type Annotations: Full PEP 484 type hints for IDE support
Collective Operations:
allreduce,broadcast,gather,scatter,reduce,allgather
See the complete Python bindings guide for detailed documentation.
Fortran Bindings
Fortran programs can use DTL through the C bindings via ISO_C_BINDING.
Quick Example
program dtl_example
use, intrinsic :: iso_c_binding
implicit none
! DTL interface declarations
interface
function dtl_context_create_default(ctx) bind(c, name='dtl_context_create_default')
import :: c_ptr, c_int
type(c_ptr), intent(out) :: ctx
integer(c_int) :: dtl_context_create_default
end function
function dtl_context_rank(ctx) bind(c, name='dtl_context_rank')
import :: c_ptr, c_int
type(c_ptr), value :: ctx
integer(c_int) :: dtl_context_rank
end function
subroutine dtl_context_destroy(ctx) bind(c, name='dtl_context_destroy')
import :: c_ptr
type(c_ptr), value :: ctx
end subroutine
end interface
type(c_ptr) :: ctx
integer(c_int) :: status, rank
! Create context
status = dtl_context_create_default(ctx)
if (status /= 0) stop 'Failed to create context'
rank = dtl_context_rank(ctx)
print *, 'Rank:', rank
call dtl_context_destroy(ctx)
end program
See the complete Fortran bindings guide for module templates and advanced usage.
Other Languages
Any language with C FFI support can use DTL through the C bindings:
Julia
using Libdl
const libdtl = dlopen("libdtl_c")
dtl_context_create_default = dlsym(libdtl, :dtl_context_create_default)
dtl_context_rank = dlsym(libdtl, :dtl_context_rank)
ctx = Ref{Ptr{Nothing}}()
status = ccall(dtl_context_create_default, Cint, (Ref{Ptr{Nothing}},), ctx)
rank = ccall(dtl_context_rank, Cint, (Ptr{Nothing},), ctx[])
println("Rank: $rank")
Rust
use std::ffi::c_void;
extern "C" {
fn dtl_context_create_default(ctx: *mut *mut c_void) -> i32;
fn dtl_context_rank(ctx: *mut c_void) -> i32;
fn dtl_context_destroy(ctx: *mut c_void);
}
fn main() {
let mut ctx: *mut c_void = std::ptr::null_mut();
unsafe {
let status = dtl_context_create_default(&mut ctx);
if status == 0 {
let rank = dtl_context_rank(ctx);
println!("Rank: {}", rank);
dtl_context_destroy(ctx);
}
}
}
Performance Considerations
Binding Overhead
Binding |
Overhead |
Notes |
|---|---|---|
C++ |
None |
Native implementation |
C |
Minimal |
Single function call indirection |
Python |
Low |
pybind11 optimized; NumPy views are zero-copy |
Fortran |
Minimal |
Direct C call via ISO_C_BINDING |
Best Practices
Minimize boundary crossings: Do bulk operations rather than element-by-element access
Use zero-copy views: Python’s
local_view()shares memory with C++Prefer collective operations:
allreduce,broadcast, etc. are optimizedKeep data on native side: Avoid copying between language runtimes
Example: Efficient Python Usage
# GOOD: Bulk operation on NumPy array
local = vec.local_view()
local[:] = np.sin(local) # Vectorized NumPy operation
# BAD: Element-by-element in Python loop
for i in range(len(local)):
local[i] = np.sin(local[i]) # Python loop overhead