Fortran Bindings Guide
This guide covers using DTL from Fortran through the native dtl module, which
wraps the C API via Fortran 2003’s ISO_C_BINDING.
For v0.1.0-alpha.1, the supported Fortran examples are built from
bindings/fortran/examples. The legacy examples/fortran directory is
reference-only and is not part of the default top-level example build.
Table of Contents
Overview
DTL provides native Fortran bindings through the dtl module. This provides:
Zero Manual Interfaces: Just
use dtl- no interface blocks neededFull C API Access: All core C API functions wrapped
Direct Memory Access: Work with Fortran arrays pointing to DTL data
MPI Integration: Works seamlessly with MPI Fortran bindings
Type Safety: Fortran’s strong typing with explicit C interop
Distributed Span in Fortran
The Fortran module exposes first-class span bindings through the C ABI:
creation:
dtl_span_from_vector,dtl_span_from_array,dtl_span_from_tensor,dtl_span_createlifecycle:
dtl_span_destroy,dtl_span_is_validmetadata:
dtl_span_size,dtl_span_local_size,dtl_span_rank,dtl_span_num_rankslocal data and subspans:
dtl_span_data_mut,dtl_span_first,dtl_span_last,dtl_span_subspan
As in C/C++, spans are non-owning. The backing container handle must outlive all derived span handles and any c_f_pointer projections.
Quick Start
program hello_dtl
use dtl
implicit none
type(c_ptr) :: ctx
integer(c_int) :: status
status = dtl_context_create_default(ctx)
if (status /= DTL_SUCCESS) stop 'Failed'
print *, 'Rank', dtl_context_rank(ctx), 'of', dtl_context_size(ctx)
call dtl_context_destroy(ctx)
end program
Prerequisites
Fortran 2003+ compiler with
ISO_C_BINDINGsupport:gfortran 4.3+
ifort 10+
flang
nvfortran
DTL C bindings library (
libdtl_c.soorlibdtl_c.a)DTL Fortran module (
libdtl_fortran.a+dtl.mod)MPI Fortran bindings (optional, for multi-rank execution)
Building with CMake
Enable Fortran Bindings
cmake -B build \
-DDTL_BUILD_C_BINDINGS=ON \
-DDTL_BUILD_FORTRAN=ON \
-DDTL_BUILD_EXAMPLES=ON
cmake --build build
Requirements
DTL_BUILD_FORTRAN=ONrequiresDTL_BUILD_C_BINDINGS=ONA Fortran compiler must be available
CMake will automatically detect and enable the Fortran language
Build Outputs
lib/libdtl_fortran.a- Fortran module libraryinclude/dtl/fortran/dtl.mod- Fortran module filebin/fortran/dtl_fortran_hello- Example programbin/fortran/dtl_fortran_vector_demo- Vector example program
Native Module Usage
Import the Module
program my_program
use dtl
implicit none
! Your code here
end program
Available Constants
Status Codes
DTL_SUCCESS ! Operation succeeded
DTL_ERROR_COMMUNICATION ! Communication error
DTL_ERROR_MEMORY ! Memory allocation error
DTL_ERROR_BOUNDS ! Index out of bounds
DTL_ERROR_INVALID_ARGUMENT ! Invalid argument
Data Types
DTL_DTYPE_INT32 ! 32-bit integer
DTL_DTYPE_INT64 ! 64-bit integer
DTL_DTYPE_FLOAT32 ! Single precision float
DTL_DTYPE_FLOAT64 ! Double precision float
Reduction Operations
DTL_OP_SUM ! Sum
DTL_OP_PROD ! Product
DTL_OP_MIN ! Minimum
DTL_OP_MAX ! Maximum
Manual Interface (Legacy)
Note: This section documents the legacy approach of manually writing interface blocks. The native
dtlmodule is now preferred for new projects.
If you need to create custom interfaces or extend the bindings, you can create a Fortran module that declares DTL’s C interface:
dtl_bindings.f90
!> DTL Fortran Interface Module (Legacy/Custom)
!> Provides ISO_C_BINDING declarations for DTL C API
module dtl_bindings
use, intrinsic :: iso_c_binding
implicit none
! Opaque handle types
type, bind(c) :: dtl_context_handle
type(c_ptr) :: ptr = c_null_ptr
end type
type, bind(c) :: dtl_vector_handle
type(c_ptr) :: ptr = c_null_ptr
end type
type, bind(c) :: dtl_tensor_handle
type(c_ptr) :: ptr = c_null_ptr
end type
! Status codes
integer(c_int), parameter :: DTL_SUCCESS = 0
integer(c_int), parameter :: DTL_ERROR_COMMUNICATION = 100
integer(c_int), parameter :: DTL_ERROR_MEMORY = 200
integer(c_int), parameter :: DTL_ERROR_BOUNDS = 400
integer(c_int), parameter :: DTL_ERROR_BACKEND = 500
! Data types
integer(c_int), parameter :: DTL_DTYPE_INT32 = 2
integer(c_int), parameter :: DTL_DTYPE_INT64 = 3
integer(c_int), parameter :: DTL_DTYPE_FLOAT32 = 8
integer(c_int), parameter :: DTL_DTYPE_FLOAT64 = 9
! Reduction operations
integer(c_int), parameter :: DTL_OP_SUM = 0
integer(c_int), parameter :: DTL_OP_PROD = 1
integer(c_int), parameter :: DTL_OP_MIN = 2
integer(c_int), parameter :: DTL_OP_MAX = 3
! Interface declarations
interface
!> Create default context (MPI_COMM_WORLD)
function dtl_context_create_default(ctx) bind(c, name='dtl_context_create_default')
import :: c_ptr, c_int
type(c_ptr), intent(out) :: ctx
integer(c_int) :: dtl_context_create_default
end function
!> Destroy context
subroutine dtl_context_destroy(ctx) bind(c, name='dtl_context_destroy')
import :: c_ptr
type(c_ptr), value :: ctx
end subroutine
!> Get rank
function dtl_context_rank(ctx) bind(c, name='dtl_context_rank')
import :: c_ptr, c_int
type(c_ptr), value :: ctx
integer(c_int) :: dtl_context_rank
end function
!> Get size
function dtl_context_size(ctx) bind(c, name='dtl_context_size')
import :: c_ptr, c_int
type(c_ptr), value :: ctx
integer(c_int) :: dtl_context_size
end function
!> Barrier synchronization
function dtl_context_barrier(ctx) bind(c, name='dtl_context_barrier')
import :: c_ptr, c_int
type(c_ptr), value :: ctx
integer(c_int) :: dtl_context_barrier
end function
!> Create vector
function dtl_vector_create(ctx, dtype, global_size, vec) &
bind(c, name='dtl_vector_create')
import :: c_ptr, c_int, c_int64_t
type(c_ptr), value :: ctx
integer(c_int), value :: dtype
integer(c_int64_t), value :: global_size
type(c_ptr), intent(out) :: vec
integer(c_int) :: dtl_vector_create
end function
!> Destroy vector
subroutine dtl_vector_destroy(vec) bind(c, name='dtl_vector_destroy')
import :: c_ptr
type(c_ptr), value :: vec
end subroutine
!> Get global size
function dtl_vector_global_size(vec) bind(c, name='dtl_vector_global_size')
import :: c_ptr, c_int64_t
type(c_ptr), value :: vec
integer(c_int64_t) :: dtl_vector_global_size
end function
!> Get local size
function dtl_vector_local_size(vec) bind(c, name='dtl_vector_local_size')
import :: c_ptr, c_int64_t
type(c_ptr), value :: vec
integer(c_int64_t) :: dtl_vector_local_size
end function
!> Get local offset
function dtl_vector_local_offset(vec) bind(c, name='dtl_vector_local_offset')
import :: c_ptr, c_int64_t
type(c_ptr), value :: vec
integer(c_int64_t) :: dtl_vector_local_offset
end function
!> Get mutable local data pointer
function dtl_vector_local_data_mut(vec) bind(c, name='dtl_vector_local_data_mut')
import :: c_ptr
type(c_ptr), value :: vec
type(c_ptr) :: dtl_vector_local_data_mut
end function
!> Allreduce
function dtl_allreduce(ctx, sendbuf, recvbuf, count, dtype, op) &
bind(c, name='dtl_allreduce')
import :: c_ptr, c_int, c_int64_t
type(c_ptr), value :: ctx
type(c_ptr), value :: sendbuf
type(c_ptr), value :: recvbuf
integer(c_int64_t), value :: count
integer(c_int), value :: dtype
integer(c_int), value :: op
integer(c_int) :: dtl_allreduce
end function
!> Broadcast
function dtl_broadcast(ctx, buf, count, dtype, root) &
bind(c, name='dtl_broadcast')
import :: c_ptr, c_int, c_int64_t
type(c_ptr), value :: ctx
type(c_ptr), value :: buf
integer(c_int64_t), value :: count
integer(c_int), value :: dtype
integer(c_int), value :: root
integer(c_int) :: dtl_broadcast
end function
!> Get error message
function dtl_status_message(status) bind(c, name='dtl_status_message')
import :: c_ptr, c_int
integer(c_int), value :: status
type(c_ptr) :: dtl_status_message
end function
!> Check MPI availability
function dtl_has_mpi() bind(c, name='dtl_has_mpi')
import :: c_int
integer(c_int) :: dtl_has_mpi
end function
end interface
contains
!> Convert C string pointer to Fortran string
function c_to_f_string(c_str) result(f_str)
type(c_ptr), intent(in) :: c_str
character(len=:), allocatable :: f_str
character(kind=c_char), pointer :: chars(:)
integer :: i, length
if (.not. c_associated(c_str)) then
f_str = ""
return
end if
! Find string length
length = 0
call c_f_pointer(c_str, chars, [1000])
do i = 1, 1000
if (chars(i) == c_null_char) exit
length = length + 1
end do
allocate(character(len=length) :: f_str)
do i = 1, length
f_str(i:i) = chars(i)
end do
end function
!> Get error message as Fortran string
function get_error_message(status) result(msg)
integer(c_int), intent(in) :: status
character(len=:), allocatable :: msg
msg = c_to_f_string(dtl_status_message(status))
end function
end module dtl_bindings
Basic Usage
Hello World
program dtl_hello
use dtl_bindings
implicit none
type(c_ptr) :: ctx
integer(c_int) :: status, rank, size
! Create context
status = dtl_context_create_default(ctx)
if (status /= DTL_SUCCESS) then
print *, 'Error creating context: ', get_error_message(status)
stop 1
end if
! Get rank and size
rank = dtl_context_rank(ctx)
size = dtl_context_size(ctx)
print '(A,I0,A,I0)', 'Rank ', rank, ' of ', size
! Barrier
status = dtl_context_barrier(ctx)
! Cleanup
call dtl_context_destroy(ctx)
end program dtl_hello
Environment Operations
The environment manages backend lifecycle. Use environment factory methods to create contexts (preferred over dtl_context_create_default):
type(c_ptr) :: env, ctx
integer(c_int) :: status
! Create environment (initializes MPI and other backends)
status = dtl_environment_create(env)
if (status /= DTL_SUCCESS) stop 'Environment creation failed'
! Create world context from environment
status = dtl_environment_make_world_context(env, ctx)
if (status /= DTL_SUCCESS) stop 'Context creation failed'
print *, 'Rank', dtl_context_rank(ctx), 'of', dtl_context_size(ctx)
! Cleanup
call dtl_context_destroy(ctx)
call dtl_environment_destroy(env)
The Fortran interface blocks for environment functions follow the same ISO_C_BINDING pattern as context operations. See the C API reference (dtl_environment.h) for the full list of available functions.
Context Operations
Contexts can be created directly or via environment factory methods (preferred).
type(c_ptr) :: ctx
integer(c_int) :: status, rank, size
! Create context
status = dtl_context_create_default(ctx)
if (status /= DTL_SUCCESS) stop 'Context creation failed'
! Query properties
rank = dtl_context_rank(ctx)
size = dtl_context_size(ctx)
! Synchronize
status = dtl_context_barrier(ctx)
! Cleanup (MUST call when done)
call dtl_context_destroy(ctx)
NCCL Mode-Aware Context APIs (Fortran via C Interop)
When CUDA/NCCL are available, Fortran bindings expose mode-aware NCCL context composition and capability queries:
integer(c_int) :: status, mode, can_native, can_hybrid
type(c_ptr) :: base_ctx, nccl_ctx, split_ctx
status = dtl_context_with_nccl_ex(base_ctx, 0_c_int, DTL_NCCL_MODE_NATIVE_ONLY, nccl_ctx)
status = dtl_context_split_nccl_ex(nccl_ctx, 0_c_int, 0_c_int, 0_c_int, &
DTL_NCCL_MODE_HYBRID_PARITY, split_ctx)
mode = dtl_context_nccl_mode(split_ctx)
can_native = dtl_context_nccl_supports_native(split_ctx, DTL_NCCL_OP_ALLREDUCE)
can_hybrid = dtl_context_nccl_supports_hybrid(split_ctx, DTL_NCCL_OP_SCAN)
Fortran also exposes explicit C-interoperable NCCL device collective procedures such as:
dtl_nccl_allreduce_device(_ex)dtl_nccl_broadcast_device(_ex)dtl_nccl_scan_device_ex/dtl_nccl_exscan_device_exvariable-size parity families:
dtl_nccl_*v_device_ex
Container Operations
Working with Vectors
use dtl_bindings
implicit none
type(c_ptr) :: ctx, vec
integer(c_int) :: status
integer(c_int64_t) :: global_size, local_size, local_offset, i
real(c_double), pointer :: data(:)
type(c_ptr) :: data_ptr
! Create context
status = dtl_context_create_default(ctx)
! Create vector with 10000 double-precision elements
global_size = 10000_c_int64_t
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, global_size, vec)
if (status /= DTL_SUCCESS) then
print *, 'Error creating vector'
stop 1
end if
! Get local data pointer
local_size = dtl_vector_local_size(vec)
local_offset = dtl_vector_local_offset(vec)
data_ptr = dtl_vector_local_data_mut(vec)
! Convert to Fortran pointer
call c_f_pointer(data_ptr, data, [local_size])
! Fill with values
do i = 1, local_size
data(i) = real(local_offset + i - 1, kind=c_double)
end do
print '(A,I0,A,I0)', 'Local size: ', local_size, ', offset: ', local_offset
! Cleanup
call dtl_vector_destroy(vec)
call dtl_context_destroy(ctx)
Working with Different Data Types
! Integer vector
status = dtl_vector_create(ctx, DTL_DTYPE_INT32, global_size, int_vec)
call c_f_pointer(dtl_vector_local_data_mut(int_vec), int_data, [local_size])
! Single precision
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT32, global_size, float_vec)
call c_f_pointer(dtl_vector_local_data_mut(float_vec), float_data, [local_size])
Collective Operations
Allreduce
use dtl_bindings
implicit none
type(c_ptr) :: ctx
integer(c_int) :: status
real(c_double), target :: local_sum, global_sum
! ... create context and compute local_sum ...
status = dtl_allreduce(ctx, c_loc(local_sum), c_loc(global_sum), &
1_c_int64_t, DTL_DTYPE_FLOAT64, DTL_OP_SUM)
if (status == DTL_SUCCESS) then
print '(A,F12.4)', 'Global sum: ', global_sum
end if
Broadcast
real(c_double), target :: data(100)
integer(c_int) :: root
root = 0 ! Root rank
! Root fills data
if (dtl_context_rank(ctx) == root) then
data = 42.0d0
end if
! Broadcast to all
status = dtl_broadcast(ctx, c_loc(data), 100_c_int64_t, &
DTL_DTYPE_FLOAT64, root)
Reduce
! Reduce to root
status = dtl_reduce(ctx, c_loc(local_data), c_loc(result), &
count, DTL_DTYPE_FLOAT64, DTL_OP_SUM, root)
! Only root has valid result
if (dtl_context_rank(ctx) == root) then
print *, 'Result: ', result
end if
Error Handling
integer(c_int) :: status
character(len=:), allocatable :: error_msg
status = dtl_some_operation(...)
if (status /= DTL_SUCCESS) then
error_msg = get_error_message(status)
print *, 'DTL Error: ', error_msg
! Handle specific errors
select case (status)
case (DTL_ERROR_COMMUNICATION)
print *, 'Communication failure'
case (DTL_ERROR_MEMORY)
print *, 'Memory allocation failure'
case (DTL_ERROR_BOUNDS)
print *, 'Index out of bounds'
case default
print *, 'Unknown error'
end select
stop 1
end if
Complete Examples
Example 1: Distributed Sum
!> Compute sum of distributed vector
program distributed_sum
use dtl_bindings
implicit none
type(c_ptr) :: ctx, vec
integer(c_int) :: status, rank
integer(c_int64_t) :: global_size, local_size, local_offset, i
real(c_double), pointer :: data(:)
real(c_double), target :: local_sum, global_sum
real(c_double) :: expected
! Initialize
status = dtl_context_create_default(ctx)
if (status /= DTL_SUCCESS) stop 'Context failed'
rank = dtl_context_rank(ctx)
! Create vector
global_size = 10000_c_int64_t
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, global_size, vec)
if (status /= DTL_SUCCESS) stop 'Vector failed'
! Get local data
local_size = dtl_vector_local_size(vec)
local_offset = dtl_vector_local_offset(vec)
call c_f_pointer(dtl_vector_local_data_mut(vec), data, [local_size])
! Fill with indices
do i = 1, local_size
data(i) = real(local_offset + i - 1, kind=c_double)
end do
! Compute local sum
local_sum = sum(data)
! Global reduction
status = dtl_allreduce(ctx, c_loc(local_sum), c_loc(global_sum), &
1_c_int64_t, DTL_DTYPE_FLOAT64, DTL_OP_SUM)
! Verify (sum of 0..N-1 = N*(N-1)/2)
expected = real(global_size * (global_size - 1) / 2, kind=c_double)
if (rank == 0) then
print '(A,F15.1)', 'Global sum: ', global_sum
print '(A,F15.1)', 'Expected: ', expected
if (abs(global_sum - expected) < 0.01d0) then
print *, 'SUCCESS!'
else
print *, 'FAILURE!'
end if
end if
! Cleanup
call dtl_vector_destroy(vec)
call dtl_context_destroy(ctx)
end program distributed_sum
Example 2: Matrix-Vector Multiply
!> Distributed matrix-vector multiply (row-distributed matrix)
program matvec
use dtl_bindings
implicit none
type(c_ptr) :: ctx, x_vec, y_vec
integer(c_int) :: status, rank, size
integer(c_int64_t) :: n, local_rows, row_offset, i, j
real(c_double), pointer :: x(:), y(:)
real(c_double), allocatable :: local_matrix(:,:), global_x(:)
real(c_double), target :: temp
! Initialize
status = dtl_context_create_default(ctx)
rank = dtl_context_rank(ctx)
size = dtl_context_size(ctx)
n = 1000_c_int64_t ! N x N matrix, N vector
! Create distributed vectors
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, n, x_vec)
status = dtl_vector_create(ctx, DTL_DTYPE_FLOAT64, n, y_vec)
local_rows = dtl_vector_local_size(x_vec)
row_offset = dtl_vector_local_offset(x_vec)
! Get local data pointers
call c_f_pointer(dtl_vector_local_data_mut(x_vec), x, [local_rows])
call c_f_pointer(dtl_vector_local_data_mut(y_vec), y, [local_rows])
! Initialize x
do i = 1, local_rows
x(i) = 1.0d0
end do
! We need full x for matrix-vector multiply
allocate(global_x(n))
! Allgather x (simplified - in practice use dtl_allgather)
! Here we just use a simple pattern
global_x = 1.0d0
! Local matrix (row-distributed)
allocate(local_matrix(local_rows, n))
do i = 1, local_rows
do j = 1, n
! Simple matrix: A[i,j] = i + j
local_matrix(i, j) = real(row_offset + i + j, kind=c_double)
end do
end do
! Compute y = A * x (local rows)
y = matmul(local_matrix, global_x)
! Report
if (rank == 0) then
print '(A,I0,A)', 'Computed ', n, ' x ', n, ' matrix-vector product'
print '(A,F15.4)', 'y(1) = ', y(1)
end if
! Cleanup
deallocate(local_matrix, global_x)
call dtl_vector_destroy(x_vec)
call dtl_vector_destroy(y_vec)
call dtl_context_destroy(ctx)
end program matvec
Building and Running
CMake Integration (Recommended)
The recommended way to use DTL Fortran bindings is via CMake:
find_package(DTL REQUIRED)
add_executable(my_program my_program.f90)
target_link_libraries(my_program PRIVATE DTL::dtl_fortran)
Build DTL with Fortran bindings enabled:
cmake -B build \
-DDTL_BUILD_C_BINDINGS=ON \
-DDTL_BUILD_FORTRAN=ON
cmake --build build
cmake --install build
Running
mpirun -np 4 ./my_program
Legacy Manual Compilation
For environments without CMake:
# Compile the interface module first
gfortran -c dtl.f90
# Compile your program
gfortran -c my_program.f90
# Link with DTL and MPI
gfortran -o my_program my_program.o dtl.o -ldtl_c -lmpi_mpifh -lmpi
Note: The CMake-based build path above is the recommended approach.
References
C Bindings Guide - Full C API reference