I/O Formats Guide
AutoFragment provides comprehensive I/O support for reading molecular structure files and writing fragmented systems to various quantum chemistry program formats.
Overview
The I/O module (autofragment.io) supports:
11 input formats for reading molecular structures
12 output formats for writing fragment-annotated input files
FormatRegistry for extensible format handling
Reading Molecular Structures
Quick Start
from autofragment import io
# Read from various formats (auto-detection by extension)
system = io.read_pdb("protein.pdb")
system = io.read_mol2("ligand.mol2")
system = io.read_sdf("compounds.sdf")
system = io.read_xyz("water64.xyz")
# All readers return ChemicalSystem objects
print(f"Atoms: {system.n_atoms}")
print(f"Bonds: {system.n_bonds}")
For isolated molecule lists, use the explicit XYZ helper:
molecules = io.read_xyz_molecules("water64.xyz")
Supported Input Formats
Format |
Function |
Extensions |
Description |
|---|---|---|---|
PDB |
|
|
Protein Data Bank with CONECT records |
MOL2 |
|
|
Tripos MOL2 with atom types and charges |
SDF/MOL |
|
|
MDL V2000/V3000 structure files |
QCSchema |
|
|
MolSSI standard JSON format |
GAMESS |
|
|
GAMESS input file geometry |
Psi4 |
|
|
Psi4 molecule blocks |
Q-Chem |
|
|
Q-Chem $molecule sections |
ORCA |
|
|
ORCA *xyz blocks |
NWChem |
|
|
NWChem geometry blocks |
VASP |
|
|
VASP POSCAR/CONTCAR files |
CIF |
|
|
Crystallographic Information Files |
PDB Reader
The PDB reader supports:
ATOM and HETATM records
CONECT records for explicit bonds
Multi-MODEL files (select with
model=N)Distance-based bond inference
from autofragment.io import read_pdb
# Read with explicit bonds only
system = read_pdb("protein.pdb", infer_bonds=False)
# Read specific model from NMR ensemble
system = read_pdb("ensemble.pdb", model=5)
# Access metadata
print(system.metadata["source_file"])
MOL2 Reader
The MOL2 reader handles:
Tripos atom type mapping (C.3, N.ar, etc.)
Bond orders (1=single, 2=double, ar=aromatic)
Partial charges
Multi-molecule files
from autofragment.io import read_mol2
system = read_mol2("ligand.mol2")
# Charges are stored per atom
for atom in system.atoms:
print(f"{atom.symbol}: charge={atom.charge}")
SDF/MOL Reader
Supports both MDL V2000 (fixed-width) and V3000 (keyword-based) formats:
from autofragment.io import read_sdf, read_sdf_multi
# Read first molecule
system = read_sdf("compound.sdf")
# Read all molecules from multi-molecule file
systems = read_sdf_multi("library.sdf")
for i, sys in enumerate(systems):
print(f"Molecule {i}: {sys.n_atoms} atoms")
QCSchema Reader
Reads MolSSI QCSchema JSON format with automatic Bohr→Angstrom conversion:
from autofragment.io import read_qcschema
system = read_qcschema("molecule.json")
# Access molecular properties
print(f"Charge: {system.metadata['molecular_charge']}")
print(f"Multiplicity: {system.metadata['molecular_multiplicity']}")
VASP Reader
Reads POSCAR/CONTCAR files with lattice vector handling:
from autofragment.io import read_poscar
system = read_poscar("POSCAR")
# Lattice vectors are in metadata
lattice = system.metadata["lattice"]
Writing Fragment Files
Quick Start
from autofragment.io import write_gamess_fmo, write_psi4_sapt
# After fragmenting a system...
write_gamess_fmo(fragments, "fmo_calc.inp", basis="6-31G*")
write_psi4_sapt(fragments[:2], "sapt_dimer.dat")
Supported Output Formats
Format |
Function |
Description |
|---|---|---|
GAMESS FMO |
|
FMO with $FMO, $FMOBND, $DATA |
GAMESS EFMO |
|
Effective Fragment MO |
GAMESS EFP |
|
Effective Fragment Potential |
Psi4 SAPT |
|
SAPT with fragment separators |
Psi4 Fragment |
|
General fragment calculation |
Q-Chem EFP |
|
EFP with $molecule sections |
Q-Chem XSAPT |
|
Extended SAPT |
Q-Chem FRAGMO |
|
Fragment MO analysis |
NWChem |
|
Fragment calculation |
NWChem BSSE |
|
Counterpoise correction |
ORCA |
|
Single fragment job |
ORCA Multi-job |
|
Separate jobs per fragment |
QCSchema |
|
JSON with fragment annotations |
Molpro |
|
SAPT calculation |
Turbomole |
|
Coord file |
CFOUR |
|
ZMAT format |
XYZ |
|
XYZ with fragment markers |
PDB |
|
PDB with fragments as chains |
GAMESS FMO Writer
Generates complete GAMESS input for Fragment Molecular Orbital calculations:
from autofragment.io import write_gamess_fmo
write_gamess_fmo(
fragments,
"fmo_calc.inp",
basis="6-31G*",
method="RHF",
runtype="energy",
memory=500, # MW
fmo_level=2,
nbody=2,
)
Output includes:
$CONTRL- Control options$SYSTEM- Memory settings$BASIS- Basis set specification$FMO- Fragment definitions (NFRAG, INDAT, charges, multiplicities)$GDDI- Parallel settings$DATA- Atomic coordinates
Psi4 SAPT Writer
Generates Psi4 input for Symmetry-Adapted Perturbation Theory:
from autofragment.io import write_psi4_sapt
# SAPT requires exactly 2 fragments (dimer)
write_psi4_sapt(
[monomer_a, monomer_b],
"sapt_dimer.dat",
method="sapt0", # or sapt2, sapt2+, etc.
basis="jun-cc-pvdz",
freeze_core=True,
)
Q-Chem EFP Writer
Generates Q-Chem input for EFP calculations:
from autofragment.io import write_qchem_efp
write_qchem_efp(
fragments,
"efp_calc.in",
qm_fragment_indices=[0], # First fragment is QM
method="hf",
basis="6-31g*",
)
PDB Fragment Writer
Writes fragments encoded in PDB format:
from autofragment.io import write_pdb_fragments
# Fragments as separate chains (A, B, C, ...)
write_pdb_fragments(fragments, "output.pdb", mode="chains")
# Fragments as separate MODELs
write_pdb_fragments(fragments, "output.pdb", mode="models")
# Fragments as separate residues
write_pdb_fragments(fragments, "output.pdb", mode="residues")
Format Registry
The FormatRegistry provides a unified interface for format handling:
from autofragment.io import FormatRegistry
# Auto-detect format from extension
system = FormatRegistry.read("molecule.pdb")
# List supported formats
formats = FormatRegistry.supported_formats()
print(f"Readers: {formats['read']}")
print(f"Writers: {formats['write']}")
# Register custom format
def my_reader(filepath):
# Custom parsing logic
return ChemicalSystem(...)
FormatRegistry.register_reader(
"myformat",
my_reader,
extensions=[".mfmt"],
description="My custom format"
)
Coordinate Units
Different formats use different coordinate units:
Format |
Native Units |
AutoFragment Handling |
|---|---|---|
PDB, MOL2, SDF |
Angstroms |
Direct read |
QCSchema |
Bohr |
Converted to Angstroms on read |
VASP |
Angstroms or fractional |
Converted to Cartesian Angstroms |
CIF |
Fractional |
Converted to Cartesian Angstroms |
Error Handling
All readers raise appropriate exceptions:
from autofragment.io import read_pdb
try:
system = read_pdb("missing.pdb")
except FileNotFoundError:
print("File not found")
try:
system = read_pdb("malformed.pdb")
except ValueError as e:
print(f"Parse error: {e}")
Tips for Large Files
For large molecular systems:
# SDF: Process molecules one at a time
from autofragment.io.readers.sdf import read_sdf_multi
for system in read_sdf_multi("large_library.sdf"):
process(system)
# PDB: Read only specific model
system = read_pdb("trajectory.pdb", model=100)
See Also
Python API - Full Python API reference
Core Concepts - ChemicalSystem and Fragment types
autofragment API - Auto-generated API documentation