Library for Accelerated Integrals — A high-performance molecular integral library for computational quantum chemistry.
Alpha Release (v0.1.0-alpha.2): This alpha release is primarily intended to expose the LibAccInt API and demonstrate basic usage patterns, including the shift from individual shell pairs/quartets to ShellSet-based batches as work units. This alpha is released under the BSD 3-Clause License; the final release will be relicensed under the MIT License. High-performance AM-specialized generated kernels will be available in a future public release.
LibAccInt computes one- and two-electron molecular integrals over Cartesian Gaussian basis functions using Rys quadrature, targeting Hartree-Fock and post-HF methods in quantum chemistry. The primary API is Engine::compute(...), which routes full-basis and ShellSet work units through CPU or GPU backends while preserving batched execution inside each ShellSetPair or ShellSetQuartet.
Engine::compute(...), Engine::compute_and_consume(...)FockBuilder (CPU), GpuFockBuilder (CUDA)compute_overlap_matrix(), compute_kinetic_matrix(), compute_nuclear_matrix(), compute_eri_tensor(), compute_eri_block()compute_batch_parallel(), compute_quartet(), compute_pair()compute_batch_screened(), compute_screening_statistics()sto-3g plus 28 bundled Basis Set Exchange JSON files loadable via load_basis_set()#include <libaccint/libaccint.hpp>
using namespace libaccint;
int main() {
// Define H₂O molecule (Bohr coordinates)
std::vector<data::Atom> atoms = {
{8, {0.0, 0.0, 0.0}}, // O
{1, {0.0, 1.43233673, -1.10866041}}, // H
{1, {0.0, -1.43233673, -1.10866041}} // H
};
auto basis = data::create_builtin_basis("sto-3g", atoms);
Engine engine(basis);
// One-electron matrices through the primary API
std::vector<Real> S;
engine.compute(OneElectronOperator(Operator::overlap()), S);
// Fock matrix build with the primary consumer API
Size nbf = basis.n_basis_functions();
std::vector<Real> D(nbf * nbf, 0.0);
for (Size i = 0; i < nbf; ++i) {
D[i * nbf + i] = 1.0;
}
consumers::FockBuilder fock(nbf);
fock.set_density(D.data(), nbf);
engine.compute(Operator::coulomb(), fock);
auto J = fock.get_coulomb_matrix();
auto K = fock.get_exchange_matrix();
return 0;
}
import libaccint
import numpy as np
atoms = [
libaccint.Atom(8, [0.0, 0.0, 0.0]),
libaccint.Atom(1, [0.0, 1.43233673, -1.10866041]),
libaccint.Atom(1, [0.0, -1.43233673, -1.10866041]),
]
basis = libaccint.basis_set("sto-3g", atoms)
engine = libaccint.Engine(basis)
S = engine.compute_overlap_matrix() # primary matrix helper
T = engine.compute_kinetic_matrix()
H = engine.compute_core_hamiltonian(atoms)
See examples/python/ and examples/mini-apps/python-hf/ for complete Hartree-Fock programs.
Prerequisites: CMake 3.25+, C++20 compiler (GCC 11+ / Clang 14+). Developer builds can fetch some third-party dependencies automatically; release/package builds require preinstalled nlohmann_json and pybind11.
# CPU Release build (recommended)
cmake --preset cpu-release
cmake --build --preset cpu-release
# Run tests
ctest --test-dir build/cpu-release
# Install
cmake --install build/cpu-release --prefix /usr/local
# Via CMake
cmake --preset cpu-release -DLIBACCINT_BUILD_PYTHON=ON
cmake --build --preset cpu-release
# Or via pip (editable install)
cd python && pip install -e .
| Preset | Description |
|---|---|
cpu-debug |
CPU build with debug symbols |
cpu-release |
CPU build with optimizations |
cuda-release |
CUDA + CPU build (requires CUDA Toolkit) |
cuda-release-safe |
CUDA + Python safe build for low-process/low-memory environments (tests/examples off, generated registries off, jobs=1) |
cuda-release-safe-mpi-tests |
WSL2-safe CUDA + MPI test build using system OpenMPI and capped build/test parallelism |
| Option | Default | Description |
|——–|———|————-|
| LIBACCINT_USE_CUDA | AUTO | Enable CUDA backend (AUTO/ON/OFF) |
| LIBACCINT_USE_MPI | AUTO | Enable MPI backend (AUTO/ON/OFF) |
| LIBACCINT_BUILD_TESTS | ON | Build test suite |
| LIBACCINT_BUILD_EXAMPLES | ON | Build examples |
| LIBACCINT_USE_OPENMP | ON | Enable OpenMP for CPU parallelization |
| LIBACCINT_ENABLE_CPU_GENERATED_REGISTRY | OFF | Reserved for future release (generated kernels not included in alpha) |
| LIBACCINT_ENABLE_CUDA_GENERATED_REGISTRY | OFF | Reserved for future release (generated kernels not included in alpha) |
| LIBACCINT_CUDA_HEAVY_TU_COMPILE_JOBS | 1 | Ninja job-pool cap for heavy CUDA TUs (eri_kernel*, CUDA registry); set 0 to disable throttling |
| LIBACCINT_CUDA_HEAVY_TU_OPT_LEVEL | 3 | Optimization override for heavy CUDA TUs only (0..3); lower values reduce compile-time pressure |
| LIBACCINT_BUILD_PYTHON | OFF | Build Python bindings |
cmake --preset cuda-release-safe
cmake --build --preset cuda-release-safe
pip install -e ./python
This preset disables tests/examples and generated registries, and forces low concurrency for heavy CUDA translation units to reduce process/memory spikes.
For WSL2 CUDA and MPI testing on this host class, use an explicit CUDA compiler and the system OpenMPI toolchain. The first CUDA test on an RTX 5070 may spend tens of seconds in PTX JIT warmup before subsequent tests run normally.
For repeatable local validation, use:
cmake --preset cuda-release-safe-mpi-tests
cmake --build --preset cuda-release-safe-mpi-tests
ctest --preset cuda-release-safe-mpi-tests
To build the deterministic source archive used by release packaging:
./packaging/release/build_source_release.sh
LibAccInt ships an in-tree Spack repository in packaging/spack.
spack repo add ./packaging/spack
spack install libaccint@main +openmp
spack install libaccint@main +mpi
spack install libaccint@main +cuda cuda_arch=90
For local development against the current checkout:
spack repo add ./packaging/spack
spack dev-build -d . libaccint@main +openmp
The Spack recipe disables CMake FetchContent, enables relocatable installs,
and expects all dependencies to come from Spack. Release-package hashes are
based on the deterministic source archive uploaded by the release workflow:
libaccint-${version}.tar.gz.
l<=4) across CPU/CUDA and spherical transforms.MultiGPUEngine, MultiGPUFockBuilder, CudaEngine::compute_eri_pipelined(), and CudaEngine::compute_eri_device_scatter() are experimental.LibAccInt currently provides one built-in basis (sto-3g) via create_builtin_basis().
For broader coverage, 28 Basis Set Exchange JSON basis files are bundled and loadable by name via load_basis_set().
| Category | Basis Sets |
|---|---|
| Pople | STO-3G, 3-21G, 6-31G, 6-31G*, 6-31G**, 6-31+G*, 6-31++G**, 6-311G, 6-311G*, 6-311G**, 6-311+G*, 6-311+G**, 6-311++G** |
| Dunning | cc-pVDZ, cc-pVTZ, cc-pVQZ, cc-pV5Z, aug-cc-pVDZ, aug-cc-pVTZ, aug-cc-pVQZ, aug-cc-pV5Z |
| Auxiliary (JKFIT) | cc-pVTZ-JKFIT, cc-pVQZ-JKFIT, cc-pV5Z-JKFIT |
| Auxiliary (RIFIT) | cc-pVDZ-RIFIT, cc-pVTZ-RIFIT, cc-pVQZ-RIFIT, cc-pV5Z-RIFIT |
// Built-in basis set (currently STO-3G)
auto basis = data::create_builtin_basis("sto-3g", atoms);
// Load bundled Basis Set Exchange JSON by name
auto basis2 = data::load_basis_set("cc-pvdz", atoms);
Basis set data sourced from the Basis Set Exchange. See share/basis_sets/ATTRIBUTION.md for full attribution.
| Component | Key Methods / Purpose |
|---|---|
| Engine | Central orchestrator — routes to CPU or GPU backend |
| Operators | Operator::overlap(), kinetic(), nuclear(), coulomb() |
| FockBuilder | CPU consumer — accumulates J and K matrices during integral computation |
| GpuFockBuilder | CUDA consumer — device-side Fock accumulation without host transfers |
compute_eri_tensor() |
Full (μν|λσ) tensor — for small basis sets |
compute_eri_block() |
Block of ERIs for a given shell quartet range |
compute_batch_parallel() |
OpenMP-parallel batched integral evaluation |
compute_all_2e_parallel() |
Parallel computation over all unique shell quartets |
compute_batch_screened() |
Schwarz-screened batched computation |
| DispatchPolicy | Auto-tuning CPU/GPU routing with cost model |
DispatchConfig::n_gpu_slots |
Concurrent GPU execution slots (default: 4) |
For alpha, shared-instance GPU concurrency is supported for the batched,
slot-backed APIs such as compute_batch_parallel(),
compute_all_2e_parallel(), compute_shell_set_pair(), and batched
compute_batch(...) calls routed to GPU. The underlying CudaEngine
manages a pool of GPU execution slots, each with its own CUDA stream and
device buffers:
DispatchConfig config;
config.n_gpu_slots = 4; // 4 concurrent GPU streams (reduce for small GPUs)
Engine engine(basis, config);
// Multi-threaded GPU computation over the supported batched GPU path
auto results = engine.compute_batch_parallel(
Operator::coulomb(), basis.shell_set_quartets(),
/*n_threads=*/4, BackendHint::ForceGPU);
Do not mutate dispatch/device configuration, screening inputs, or thread
configuration while concurrent GPU work is active. The authoritative alpha
support matrix is in docs/api/THREAD_SAFETY.md.
Full API documentation is available in docs/.
| Example | Description |
|---|---|
basic_hf_energy.cpp |
Minimal Hartree-Fock energy calculation |
hf_calculation.cpp |
Full HF with SCF convergence |
shellset_batched_1e.cpp |
Batched one-electron integral evaluation |
shellset_batched_fock.cpp |
Fock matrix build via compute-and-consume |
gpu_fock_build.cpp |
Device-side Fock accumulation (CUDA) |
gpu_hf_workflow.cpp |
Complete GPU-accelerated HF workflow |
density_fitting.cpp |
Density-fitted integrals |
custom_consumer.cpp |
Writing a custom integral consumer |
qmm_embedding.cpp |
QM/MM-style embedding |
hf_calculation.py |
Python HF calculation |
Complete mini-app HF programs (both C++ and Python) are in examples/mini-apps/.
The docs/ directory contains full Sphinx + Doxygen documentation covering API reference, theory, user guide, and developer guide. Build with:
cd docs && make html
Contributions are welcome. See CONTRIBUTING.md for guidelines on code style, testing, and the pull request process.
This alpha release is licensed under the BSD 3-Clause License — see LICENSE for details. The final release of LibAccInt will be relicensed under the MIT License.
Copyright (c) 2026 Bryce M. Westheimer
If you use LibAccInt in your research, please cite:
@software{libaccint2026,
title = {LibAccInt: Library for Accelerated Integrals},
author = {Bryce M. Westheimer},
year = {2026},
url = {https://github.com/brycewestheimer/libaccint-public},
note = {v0.1.0-alpha.2}
}
Acknowledgments: Rys quadrature implementation based on methods from libint. Inspired by libcint and GPU4PySCF. Basis set data from the Basis Set Exchange.