I/O Formats Guide

AutoFragment provides comprehensive I/O support for reading molecular structure files and writing fragmented systems to various quantum chemistry program formats.

Overview

The I/O module (autofragment.io) supports:

  • 11 input formats for reading molecular structures

  • 12 output formats for writing fragment-annotated input files

  • FormatRegistry for extensible format handling

Reading Molecular Structures

Quick Start

from autofragment import io

# Read from various formats (auto-detection by extension)
system = io.read_pdb("protein.pdb")
system = io.read_mol2("ligand.mol2")
system = io.read_sdf("compounds.sdf")
system = io.read_xyz("water64.xyz")

# All readers return ChemicalSystem objects
print(f"Atoms: {system.n_atoms}")
print(f"Bonds: {system.n_bonds}")

For isolated molecule lists, use the explicit XYZ helper:

molecules = io.read_xyz_molecules("water64.xyz")

Supported Input Formats

Format

Function

Extensions

Description

PDB

read_pdb()

.pdb, .ent

Protein Data Bank with CONECT records

MOL2

read_mol2()

.mol2

Tripos MOL2 with atom types and charges

SDF/MOL

read_sdf()

.sdf, .mol

MDL V2000/V3000 structure files

QCSchema

read_qcschema()

.json

MolSSI standard JSON format

GAMESS

read_gamess_input()

.inp

GAMESS input file geometry

Psi4

read_psi4_input()

.dat

Psi4 molecule blocks

Q-Chem

read_qchem_input()

.in

Q-Chem $molecule sections

ORCA

read_orca_input()

.inp

ORCA *xyz blocks

NWChem

read_nwchem_input()

.nw

NWChem geometry blocks

VASP

read_poscar()

.poscar, .contcar

VASP POSCAR/CONTCAR files

CIF

read_cif()

.cif, .mmcif

Crystallographic Information Files

PDB Reader

The PDB reader supports:

  • ATOM and HETATM records

  • CONECT records for explicit bonds

  • Multi-MODEL files (select with model=N)

  • Distance-based bond inference

from autofragment.io import read_pdb

# Read with explicit bonds only
system = read_pdb("protein.pdb", infer_bonds=False)

# Read specific model from NMR ensemble
system = read_pdb("ensemble.pdb", model=5)

# Access metadata
print(system.metadata["source_file"])

MOL2 Reader

The MOL2 reader handles:

  • Tripos atom type mapping (C.3, N.ar, etc.)

  • Bond orders (1=single, 2=double, ar=aromatic)

  • Partial charges

  • Multi-molecule files

from autofragment.io import read_mol2

system = read_mol2("ligand.mol2")

# Charges are stored per atom
for atom in system.atoms:
    print(f"{atom.symbol}: charge={atom.charge}")

SDF/MOL Reader

Supports both MDL V2000 (fixed-width) and V3000 (keyword-based) formats:

from autofragment.io import read_sdf, read_sdf_multi

# Read first molecule
system = read_sdf("compound.sdf")

# Read all molecules from multi-molecule file
systems = read_sdf_multi("library.sdf")
for i, sys in enumerate(systems):
    print(f"Molecule {i}: {sys.n_atoms} atoms")

QCSchema Reader

Reads MolSSI QCSchema JSON format with automatic Bohr→Angstrom conversion:

from autofragment.io import read_qcschema

system = read_qcschema("molecule.json")

# Access molecular properties
print(f"Charge: {system.metadata['molecular_charge']}")
print(f"Multiplicity: {system.metadata['molecular_multiplicity']}")

VASP Reader

Reads POSCAR/CONTCAR files with lattice vector handling:

from autofragment.io import read_poscar

system = read_poscar("POSCAR")

# Lattice vectors are in metadata
lattice = system.metadata["lattice"]

Writing Fragment Files

Quick Start

from autofragment.io import write_gamess_fmo, write_psi4_sapt

# After fragmenting a system...
write_gamess_fmo(fragments, "fmo_calc.inp", basis="6-31G*")
write_psi4_sapt(fragments[:2], "sapt_dimer.dat")

Supported Output Formats

Format

Function

Description

GAMESS FMO

write_gamess_fmo()

FMO with $FMO, $FMOBND, $DATA

GAMESS EFMO

write_gamess_efmo()

Effective Fragment MO

GAMESS EFP

write_gamess_efp()

Effective Fragment Potential

Psi4 SAPT

write_psi4_sapt()

SAPT with fragment separators

Psi4 Fragment

write_psi4_fragment()

General fragment calculation

Q-Chem EFP

write_qchem_efp()

EFP with $molecule sections

Q-Chem XSAPT

write_qchem_xsapt()

Extended SAPT

Q-Chem FRAGMO

write_qchem_fragmo()

Fragment MO analysis

NWChem

write_nwchem_fragment()

Fragment calculation

NWChem BSSE

write_nwchem_bsse()

Counterpoise correction

ORCA

write_orca_fragment()

Single fragment job

ORCA Multi-job

write_orca_multijob()

Separate jobs per fragment

QCSchema

write_qcschema()

JSON with fragment annotations

Molpro

write_molpro_sapt()

SAPT calculation

Turbomole

write_turbomole_fragment()

Coord file

CFOUR

write_cfour_fragment()

ZMAT format

XYZ

write_xyz_fragments()

XYZ with fragment markers

PDB

write_pdb_fragments()

PDB with fragments as chains

GAMESS FMO Writer

Generates complete GAMESS input for Fragment Molecular Orbital calculations:

from autofragment.io import write_gamess_fmo

write_gamess_fmo(
    fragments,
    "fmo_calc.inp",
    basis="6-31G*",
    method="RHF",
    runtype="energy",
    memory=500,  # MW
    fmo_level=2,
    nbody=2,
)

Output includes:

  • $CONTRL - Control options

  • $SYSTEM - Memory settings

  • $BASIS - Basis set specification

  • $FMO - Fragment definitions (NFRAG, INDAT, charges, multiplicities)

  • $GDDI - Parallel settings

  • $DATA - Atomic coordinates

Psi4 SAPT Writer

Generates Psi4 input for Symmetry-Adapted Perturbation Theory:

from autofragment.io import write_psi4_sapt

# SAPT requires exactly 2 fragments (dimer)
write_psi4_sapt(
    [monomer_a, monomer_b],
    "sapt_dimer.dat",
    method="sapt0",  # or sapt2, sapt2+, etc.
    basis="jun-cc-pvdz",
    freeze_core=True,
)

Q-Chem EFP Writer

Generates Q-Chem input for EFP calculations:

from autofragment.io import write_qchem_efp

write_qchem_efp(
    fragments,
    "efp_calc.in",
    qm_fragment_indices=[0],  # First fragment is QM
    method="hf",
    basis="6-31g*",
)

PDB Fragment Writer

Writes fragments encoded in PDB format:

from autofragment.io import write_pdb_fragments

# Fragments as separate chains (A, B, C, ...)
write_pdb_fragments(fragments, "output.pdb", mode="chains")

# Fragments as separate MODELs
write_pdb_fragments(fragments, "output.pdb", mode="models")

# Fragments as separate residues
write_pdb_fragments(fragments, "output.pdb", mode="residues")

Format Registry

The FormatRegistry provides a unified interface for format handling:

from autofragment.io import FormatRegistry

# Auto-detect format from extension
system = FormatRegistry.read("molecule.pdb")

# List supported formats
formats = FormatRegistry.supported_formats()
print(f"Readers: {formats['read']}")
print(f"Writers: {formats['write']}")

# Register custom format
def my_reader(filepath):
    # Custom parsing logic
    return ChemicalSystem(...)

FormatRegistry.register_reader(
    "myformat",
    my_reader,
    extensions=[".mfmt"],
    description="My custom format"
)

Coordinate Units

Different formats use different coordinate units:

Format

Native Units

AutoFragment Handling

PDB, MOL2, SDF

Angstroms

Direct read

QCSchema

Bohr

Converted to Angstroms on read

VASP

Angstroms or fractional

Converted to Cartesian Angstroms

CIF

Fractional

Converted to Cartesian Angstroms

Error Handling

All readers raise appropriate exceptions:

from autofragment.io import read_pdb

try:
    system = read_pdb("missing.pdb")
except FileNotFoundError:
    print("File not found")

try:
    system = read_pdb("malformed.pdb")
except ValueError as e:
    print(f"Parse error: {e}")

Tips for Large Files

For large molecular systems:

# SDF: Process molecules one at a time
from autofragment.io.readers.sdf import read_sdf_multi

for system in read_sdf_multi("large_library.sdf"):
    process(system)

# PDB: Read only specific model
system = read_pdb("trajectory.pdb", model=100)

See Also