C API Reference
DTL Version: 0.1.0-alpha.1 Last Updated: 2026-03-06
The DTL C bindings provide a C-compatible interface to DTL’s core functionality. All functions use opaque handle types, dtl_status return codes, and callback-based algorithms.
Master Header: #include <dtl/bindings/c/dtl.h>
Conventions
All functions return
dtl_status(integer) unless stated otherwiseDTL_SUCCESS(0) indicates success; non-zero values are error codesOpaque handles (
dtl_context_t,dtl_vector_t, etc.) are pointer-sized typesCallback functions receive a
void* user_datafor user contextNULLhandles are safe to pass to destroy functions (no-op)All collective operations require all ranks to participate
dtl_version_string()returns the full public release string (0.1.0-alpha.1)Unless documented otherwise, destroy functions own and release the underlying handle
Host-only builds are supported; MPI-dependent operations must fail explicitly when MPI is unavailable
Context Functions
Header: #include <dtl/bindings/c/dtl_context.h>
dtl_context_create
Create a context with options.
dtl_status dtl_context_create(dtl_context_t* ctx, const dtl_context_options* opts);
Parameters:
ctx— Pointer to receive the context handle (must not be NULL)opts— Configuration options (NULL for defaults)
Example:
dtl_context_t ctx;
dtl_context_options opts;
dtl_context_options_init(&opts);
opts.device_id = 0; // Use GPU 0
dtl_status status = dtl_context_create(&ctx, &opts);
if (status != DTL_SUCCESS) {
fprintf(stderr, "Failed: %s\n", dtl_status_message(status));
return 1;
}
dtl_context_create_default
Create a context with default options.
In this alpha release, callers should not assume MPI is auto-initialized on their behalf. In MPI builds, create the context only after MPI is available in the process and treat collective operations as collective over the active communicator.
dtl_status dtl_context_create_default(dtl_context_t* ctx);
dtl_context_destroy
Destroy a context and release resources.
void dtl_context_destroy(dtl_context_t ctx);
Safe to call with NULL.
dtl_context_rank / dtl_context_size
Query rank and world size.
dtl_rank_t dtl_context_rank(dtl_context_t ctx);
dtl_rank_t dtl_context_size(dtl_context_t ctx);
dtl_context_barrier
Collective barrier synchronization.
dtl_status dtl_context_barrier(dtl_context_t ctx);
dtl_context_options_init
Initialize options struct with defaults.
void dtl_context_options_init(dtl_context_options* opts);
Context Domain Composition (CUDA/NCCL)
Mode-aware context composition APIs are available for CUDA/NCCL domains.
// Duplicate or split communicator domains
dtl_status dtl_context_dup(dtl_context_t ctx, dtl_context_t* out);
dtl_status dtl_context_split(dtl_context_t ctx, int color, int key, dtl_context_t* out);
// Add CUDA domain
dtl_status dtl_context_with_cuda(dtl_context_t ctx, int device_id, dtl_context_t* out);
// Add/split NCCL domain (default mode and explicit mode variants)
dtl_status dtl_context_with_nccl(dtl_context_t ctx, int device_id, dtl_context_t* out);
dtl_status dtl_context_with_nccl_ex(
dtl_context_t ctx, int device_id, dtl_nccl_operation_mode mode, dtl_context_t* out);
dtl_status dtl_context_split_nccl(dtl_context_t ctx, int color, int key, dtl_context_t* out);
dtl_status dtl_context_split_nccl_ex(
dtl_context_t ctx, int color, int key, int device_id,
dtl_nccl_operation_mode mode, dtl_context_t* out);
// NCCL mode/capability introspection
int dtl_context_nccl_mode(dtl_context_t ctx);
int dtl_context_nccl_supports_native(dtl_context_t ctx, dtl_nccl_operation op);
int dtl_context_nccl_supports_hybrid(dtl_context_t ctx, dtl_nccl_operation op);
NCCL mode values:
DTL_NCCL_MODE_NATIVE_ONLY: reject non-native NCCL operation families.DTL_NCCL_MODE_HYBRID_PARITY: allow explicit hybrid MPI-assisted parity paths.
NCCL Device Collective APIs (Explicit Surface)
The C ABI exposes explicit NCCL device-buffer collectives and mode-aware _ex
variants for parity families.
Header:
#include <dtl/bindings/c/dtl_communicator.h>
// Native-device families (+ mode-aware variants)
dtl_status dtl_nccl_allreduce_device(...);
dtl_status dtl_nccl_allreduce_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_broadcast_device(...);
dtl_status dtl_nccl_broadcast_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_barrier_device(dtl_context_t ctx);
// Hybrid parity families (explicit _ex variants)
dtl_status dtl_nccl_gatherv_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_scatterv_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_allgatherv_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_alltoallv_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_scan_device_ex(..., dtl_nccl_operation_mode mode);
dtl_status dtl_nccl_exscan_device_ex(..., dtl_nccl_operation_mode mode);
All explicit NCCL collectives are device-buffer APIs. Passing host pointers is an error by contract.
Full example:
#include <dtl/bindings/c/dtl.h>
#include <stdio.h>
int main(int argc, char** argv) {
dtl_context_t ctx;
dtl_status status = dtl_context_create_default(&ctx);
if (status != DTL_SUCCESS) return 1;
printf("Rank %d of %d\n", dtl_context_rank(ctx), dtl_context_size(ctx));
dtl_context_destroy(ctx);
return 0;
}
Container Functions — Vector
Header: #include <dtl/bindings/c/dtl_vector.h>
dtl_vector_create
Create a distributed vector with specified dtype and global size.
dtl_status dtl_vector_create(dtl_context_t ctx, dtl_dtype dtype,
dtl_size_t global_size, dtl_vector_t* vec);
Supported dtypes: DTL_DTYPE_INT8, DTL_DTYPE_INT32, DTL_DTYPE_INT64, DTL_DTYPE_UINT8, DTL_DTYPE_UINT32, DTL_DTYPE_UINT64, DTL_DTYPE_FLOAT32, DTL_DTYPE_FLOAT64
dtl_vector_create_fill
Create a vector initialized with a value.
dtl_status dtl_vector_create_fill(dtl_context_t ctx, dtl_dtype dtype,
dtl_size_t global_size, const void* value,
dtl_vector_t* vec);
dtl_vector_destroy
Destroy a vector and free its memory.
void dtl_vector_destroy(dtl_vector_t vec);
Size Queries
dtl_size_t dtl_vector_global_size(dtl_vector_t vec);
dtl_size_t dtl_vector_local_size(dtl_vector_t vec);
Data Access
const void* dtl_vector_local_data(dtl_vector_t vec);
void* dtl_vector_local_data_mut(dtl_vector_t vec);
Example:
dtl_vector_t vec;
dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, 1000, &vec);
double* data = (double*)dtl_vector_local_data(vec);
dtl_size_t local_n = dtl_vector_local_size(vec);
for (dtl_size_t i = 0; i < local_n; i++) {
data[i] = (double)i;
}
dtl_vector_destroy(vec);
Container Functions — Array
Header: #include <dtl/bindings/c/dtl_array.h>
Arrays have the same interface as vectors but cannot be resized.
dtl_status dtl_array_create(dtl_context_t ctx, dtl_dtype dtype,
dtl_size_t global_size, dtl_array_t* arr);
dtl_status dtl_array_create_fill(dtl_context_t ctx, dtl_dtype dtype,
dtl_size_t global_size, const void* value,
dtl_array_t* arr);
void dtl_array_destroy(dtl_array_t arr);
dtl_size_t dtl_array_global_size(dtl_array_t arr);
dtl_size_t dtl_array_local_size(dtl_array_t arr);
void* dtl_array_local_data(dtl_array_t arr);
Container Functions — Span
Header: #include <dtl/bindings/c/dtl_span.h>
dtl_status dtl_span_from_vector(dtl_vector_t vec, dtl_span_t* span);
dtl_status dtl_span_from_array(dtl_array_t arr, dtl_span_t* span);
dtl_status dtl_span_from_tensor(dtl_tensor_t tensor, dtl_span_t* span);
void dtl_span_destroy(dtl_span_t span);
dtl_size_t dtl_span_size(dtl_span_t span);
dtl_size_t dtl_span_local_size(dtl_span_t span);
dtl_size_t dtl_span_size_bytes(dtl_span_t span);
dtl_rank_t dtl_span_rank(dtl_span_t span);
dtl_rank_t dtl_span_num_ranks(dtl_span_t span);
const void* dtl_span_data(dtl_span_t span);
void* dtl_span_data_mut(dtl_span_t span);
dtl_status dtl_span_first(dtl_span_t span, dtl_size_t count, dtl_span_t* out_span);
dtl_status dtl_span_last(dtl_span_t span, dtl_size_t count, dtl_span_t* out_span);
dtl_status dtl_span_subspan(dtl_span_t span, dtl_size_t offset, dtl_size_t count,
dtl_span_t* out_span);
dtl_span_t is a non-owning view. The backing container/storage must outlive all derived span handles.
Container Functions — Tensor
Header: #include <dtl/bindings/c/dtl_tensor.h>
dtl_status dtl_tensor_create(dtl_context_t ctx, dtl_dtype dtype,
dtl_size_t ndim, const dtl_size_t* shape,
dtl_tensor_t* tensor);
void dtl_tensor_destroy(dtl_tensor_t tensor);
dtl_size_t dtl_tensor_ndim(dtl_tensor_t tensor);
dtl_size_t dtl_tensor_global_size(dtl_tensor_t tensor);
dtl_size_t dtl_tensor_local_size(dtl_tensor_t tensor);
void* dtl_tensor_local_data(dtl_tensor_t tensor);
Container Functions — Map
Header: #include <dtl/bindings/c/dtl_map.h>
dtl_status dtl_map_create(dtl_context_t ctx, dtl_dtype key_dtype,
dtl_dtype value_dtype, dtl_map_t* map);
void dtl_map_destroy(dtl_map_t map);
dtl_status dtl_map_insert(dtl_map_t map, const void* key, const void* value);
dtl_status dtl_map_find(dtl_map_t map, const void* key, void* value, int* found);
dtl_status dtl_map_erase(dtl_map_t map, const void* key);
dtl_size_t dtl_map_local_size(dtl_map_t map);
Algorithm Functions
Header: #include <dtl/bindings/c/dtl_algorithms.h>
Callback Types
// Mutable element callback
typedef void (*dtl_unary_func)(void* element, dtl_size_t index, void* user_data);
// Const element callback
typedef void (*dtl_const_unary_func)(const void* element, dtl_size_t index, void* user_data);
// Transform callback
typedef void (*dtl_transform_func)(const void* input, void* output,
dtl_size_t index, void* user_data);
// Predicate callback
typedef int (*dtl_predicate)(const void* element, void* user_data);
// Comparator callback
typedef int (*dtl_comparator)(const void* a, const void* b, void* user_data);
// Binary reduction callback
typedef void (*dtl_binary_func)(const void* a, const void* b,
void* result, void* user_data);
For-Each
dtl_status dtl_for_each_vector(dtl_vector_t vec, dtl_unary_func func, void* user_data);
dtl_status dtl_for_each_vector_const(dtl_vector_t vec, dtl_const_unary_func func, void* user_data);
dtl_status dtl_for_each_array(dtl_array_t arr, dtl_unary_func func, void* user_data);
dtl_status dtl_for_each_array_const(dtl_array_t arr, dtl_const_unary_func func, void* user_data);
Example:
void double_element(void* elem, dtl_size_t idx, void* user_data) {
double* val = (double*)elem;
*val *= 2.0;
}
dtl_for_each_vector(vec, double_element, NULL);
Transform
dtl_status dtl_transform_vector(dtl_vector_t src, dtl_vector_t dst,
dtl_transform_func func, void* user_data);
dtl_status dtl_transform_array(dtl_array_t src, dtl_array_t dst,
dtl_transform_func func, void* user_data);
Copy / Fill
dtl_status dtl_copy_vector(dtl_vector_t src, dtl_vector_t dst);
dtl_status dtl_copy_array(dtl_array_t src, dtl_array_t dst);
dtl_status dtl_fill_vector(dtl_vector_t vec, const void* value);
dtl_status dtl_fill_array(dtl_array_t arr, const void* value);
Find
dtl_status dtl_find_vector(dtl_vector_t vec, const void* value, dtl_size_t* index);
dtl_status dtl_find_if_vector(dtl_vector_t vec, dtl_predicate pred,
void* user_data, dtl_size_t* index);
dtl_status dtl_find_array(dtl_array_t arr, const void* value, dtl_size_t* index);
Count
dtl_status dtl_count_vector(dtl_vector_t vec, const void* value, dtl_size_t* count);
dtl_status dtl_count_if_vector(dtl_vector_t vec, dtl_predicate pred,
void* user_data, dtl_size_t* count);
Reduce
dtl_status dtl_reduce_vector(dtl_vector_t vec, const void* init,
dtl_binary_func func, void* result, void* user_data);
dtl_status dtl_reduce_array(dtl_array_t arr, const void* init,
dtl_binary_func func, void* result, void* user_data);
Sort
dtl_status dtl_sort_vector(dtl_vector_t vec, dtl_comparator comp, void* user_data);
dtl_status dtl_sort_array(dtl_array_t arr, dtl_comparator comp, void* user_data);
Example:
int compare_doubles(const void* a, const void* b, void* user_data) {
double da = *(const double*)a;
double db = *(const double*)b;
if (da < db) return -1;
if (da > db) return 1;
return 0;
}
dtl_sort_vector(vec, compare_doubles, NULL);
Communication Functions
Header: #include <dtl/bindings/c/dtl_communicator.h>
Point-to-Point (Blocking)
dtl_status dtl_send(dtl_context_t ctx, const void* buf, dtl_size_t count,
dtl_dtype dtype, dtl_rank_t dest, dtl_tag_t tag);
dtl_status dtl_recv(dtl_context_t ctx, void* buf, dtl_size_t count,
dtl_dtype dtype, dtl_rank_t source, dtl_tag_t tag);
dtl_status dtl_sendrecv(dtl_context_t ctx,
const void* sendbuf, dtl_size_t sendcount,
dtl_dtype senddtype, dtl_rank_t dest, dtl_tag_t sendtag,
void* recvbuf, dtl_size_t recvcount,
dtl_dtype recvdtype, dtl_rank_t source, dtl_tag_t recvtag);
Point-to-Point (Non-Blocking)
dtl_status dtl_isend(dtl_context_t ctx, const void* buf, dtl_size_t count,
dtl_dtype dtype, dtl_rank_t dest, dtl_tag_t tag,
dtl_request_t* request);
dtl_status dtl_irecv(dtl_context_t ctx, void* buf, dtl_size_t count,
dtl_dtype dtype, dtl_rank_t source, dtl_tag_t tag,
dtl_request_t* request);
dtl_status dtl_wait(dtl_request_t request);
dtl_status dtl_waitall(dtl_size_t count, dtl_request_t* requests);
dtl_status dtl_test(dtl_request_t request, int* completed);
void dtl_request_free(dtl_request_t request);
Barrier
dtl_status dtl_barrier(dtl_context_t ctx);
Broadcast
dtl_status dtl_broadcast(dtl_context_t ctx, void* buf, dtl_size_t count,
dtl_dtype dtype, dtl_rank_t root);
Reduce / Allreduce
dtl_status dtl_reduce(dtl_context_t ctx,
const void* sendbuf, void* recvbuf,
dtl_size_t count, dtl_dtype dtype,
dtl_reduce_op op, dtl_rank_t root);
dtl_status dtl_allreduce(dtl_context_t ctx,
const void* sendbuf, void* recvbuf,
dtl_size_t count, dtl_dtype dtype,
dtl_reduce_op op);
Reduction operations: DTL_REDUCE_SUM, DTL_REDUCE_PROD, DTL_REDUCE_MIN, DTL_REDUCE_MAX, DTL_REDUCE_LAND, DTL_REDUCE_LOR, DTL_REDUCE_BAND, DTL_REDUCE_BOR
Gather / Scatter
dtl_status dtl_gather(dtl_context_t ctx,
const void* sendbuf, dtl_size_t sendcount, dtl_dtype senddtype,
void* recvbuf, dtl_size_t recvcount, dtl_dtype recvdtype,
dtl_rank_t root);
dtl_status dtl_scatter(dtl_context_t ctx,
const void* sendbuf, dtl_size_t sendcount, dtl_dtype senddtype,
void* recvbuf, dtl_size_t recvcount, dtl_dtype recvdtype,
dtl_rank_t root);
dtl_status dtl_allgather(dtl_context_t ctx,
const void* sendbuf, dtl_size_t sendcount, dtl_dtype senddtype,
void* recvbuf, dtl_size_t recvcount, dtl_dtype recvdtype);
Status Codes
Header: #include <dtl/bindings/c/dtl_status.h>
Constant |
Value |
Description |
|---|---|---|
|
0 |
Operation succeeded |
|
100 |
Communication failure |
|
200 |
Memory allocation failure |
|
400 |
Index out of bounds |
|
500 |
Backend error |
|
410 |
Invalid argument |
|
411 |
Null pointer |
|
420 |
Operation not supported |
const char* dtl_status_message(dtl_status status);
int dtl_status_is_error(dtl_status status);
Data Types
Header: #include <dtl/bindings/c/dtl_types.h>
typedef int64_t dtl_size_t; // Size type
typedef int32_t dtl_rank_t; // Rank identifier
typedef int32_t dtl_tag_t; // Message tag
// Opaque handles
typedef struct dtl_context_s* dtl_context_t;
typedef struct dtl_vector_s* dtl_vector_t;
typedef struct dtl_array_s* dtl_array_t;
typedef struct dtl_tensor_s* dtl_tensor_t;
typedef struct dtl_map_s* dtl_map_t;
typedef struct dtl_request_s* dtl_request_t;