Water Cluster Fragmentation Tutorial

This tutorial demonstrates a complete workflow for fragmenting water clusters and writing quantum chemistry input files for fragment-based calculations. Water clusters are ideal starting points because individual water molecules are natural fragments with well-defined boundaries.

Overview

We’ll cover:

  1. Loading a water cluster from an XYZ file

  2. Partitioning into fragment groups with MolecularPartitioner

  3. Choosing a clustering method (including balanced partitioning)

  4. Tiered hierarchical fragmentation for large clusters

  5. Writing fragment-based input files for GAMESS, Psi4, and other programs

  6. Exporting results as QCSchema JSON

Prerequisites

import autofragment as af
from autofragment.partitioners import MolecularPartitioner
from autofragment.io.writers import (
    write_gamess_fmo,
    write_psi4_sapt,
    write_psi4_fragment,
)

Step 1: Load Water Cluster

From XYZ File

read_xyz auto-detects water molecules (groups of 3 atoms each) by default:

system = af.io.read_xyz("water20.xyz")

print(f"Total atoms: {len(system.atoms)}")
print(f"Molecules: {len(system.metadata.get('molecule_atom_indices', []))}")

Customizing Molecule Detection

For non-water systems or to disable water validation:

# Explicit atoms-per-molecule (e.g., methanol with 6 atoms each)
system = af.io.read_xyz("methanol_cluster.xyz", atoms_per_molecule=6)

# Skip water geometry validation
system = af.io.read_xyz("water20.xyz", validate_water=False)

Example XYZ Format

60
(H2O)20 cluster
O    0.000    0.000    0.000
H    0.957    0.000    0.000
H   -0.240    0.927    0.000
O    2.800    0.000    0.000
...

Step 2: Partition into Fragments

Basic Partitioning

partitioner = MolecularPartitioner(n_fragments=4, method="kmeans")
tree = partitioner.partition(system)

print(f"Fragments: {tree.n_fragments}")
print(f"Total atoms: {tree.n_atoms}")

for frag in tree.fragments:
    print(f"  {frag.id}: {len(frag.symbols)} atoms")

Available Clustering Methods

MolecularPartitioner supports several clustering algorithms via the method parameter:

Method

Description

Extra required

"kmeans"

Standard k-means (default)

"kmeans_constrained"

K-means with balanced cluster sizes

pip install autofragment[balanced]

"agglomerative"

Hierarchical agglomerative clustering

"spectral"

Spectral clustering

pip install autofragment[graph]

"gmm"

Gaussian Mixture Model

"birch"

BIRCH clustering

"geom_planes"

Geometric partitioning by cutting planes

Balanced Partitioning with kmeans_constrained

For many-body expansion (MBE) and FMO workflows, equally-sized fragments are often critical for load balancing and accuracy. The kmeans_constrained method enforces balanced cluster sizes:

pip install "autofragment[balanced]"
partitioner = MolecularPartitioner(
    n_fragments=4,
    method="kmeans_constrained",
)
tree = partitioner.partition(system)

# Verify balanced sizes
sizes = [len(frag.symbols) for frag in tree.fragments]
print(f"Fragment sizes: {sizes}")
# e.g., [15, 15, 15, 15] for 20 waters into 4 fragments

You can also enable balanced partitioning explicitly with any method that supports it:

partitioner = MolecularPartitioner(
    n_fragments=4,
    method="kmeans",
    strict_balanced=True,  # automatically uses kmeans_constrained
)

Topology-Aware Refinement

For better chemical locality, enable topology refinement which reassigns molecules based on neighborhood overlap:

partitioner = MolecularPartitioner(
    n_fragments=4,
    method="kmeans",
    topology_refine=True,
    topology_mode="graph",
    topology_hops=1,
)
tree = partitioner.partition(system)

Tiered Hierarchical Partitioning

For large clusters, tiered partitioning creates a hierarchy of fragments. This is useful for hierarchical many-body expansion (HMBE) workflows where you need multiple levels of spatial grouping:

# 2-tier: 4 primary fragments, each with 4 sub-fragments
partitioner = MolecularPartitioner(
    tiers=2, n_primary=4, n_secondary=4, method="kmeans"
)
tree = partitioner.partition(system)

print(f"Primary fragments: {tree.n_primary}")
print(f"Total fragments: {tree.n_fragments}")

for pf in tree.fragments:
    print(f"  {pf.id}: {len(pf.fragments)} sub-fragments")
    for sf in pf.fragments:
        print(f"    {sf.id}: {sf.n_atoms} atoms")

For even finer control, use 3-tier partitioning:

partitioner = MolecularPartitioner(
    tiers=3, n_primary=2, n_secondary=2, n_tertiary=2, method="kmeans"
)
tree = partitioner.partition(system)
# PF1 -> PF1_SF1 -> PF1_SF1_TF1, PF1_SF1_TF2, ...

K-Means Seeding Strategies

Control the initial centroids for k-means clustering:

# PCA-based seeding
partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans", init_strategy="pca"
)

# Per-tier seeding overrides (tiered mode)
partitioner = MolecularPartitioner(
    tiers=2, n_primary=4, n_secondary=4,
    method="kmeans",
    init_strategy="pca",              # default for all tiers
    init_strategy_secondary="radial"  # override for sub-fragments
)

Available strategies: "halfplane", "pca", "axis", "radial".

Step 3: Write Input Files

GAMESS FMO

write_gamess_fmo(
    tree.fragments,
    "water_fmo.inp",
    basis="aug-cc-pVDZ",
    method="MP2",
    runtype="energy",
    fmo_level=2,
    nbody=2,
)

GAMESS EFMO

from autofragment.io.writers import write_gamess_efmo

write_gamess_efmo(
    tree.fragments,
    "water_efmo.inp",
    basis="aug-cc-pVDZ",
)

Psi4 SAPT (Dimer Interaction Energies)

For symmetry-adapted perturbation theory between fragment pairs:

# Select two fragments for a SAPT calculation
dimer = [tree.fragments[0], tree.fragments[1]]

write_psi4_sapt(
    dimer,
    "water_sapt.dat",
    method="sapt2+",
    basis="aug-cc-pVDZ",
)

Psi4 General Fragment Calculation

write_psi4_fragment(
    tree.fragments,
    "water_frags.dat",
    method="mp2",
    basis="aug-cc-pVDZ",
    runtype="energy",
)

Other Programs

autofragment also provides writers for Q-Chem, NWChem, ORCA, Molpro, Turbomole, and CFOUR. See Output Formats for the full list.

Step 4: Export and Save Results

QCSchema JSON

# Save the full FragmentTree
tree.to_json("water20_fragments.json")

# Reload later
tree = af.FragmentTree.from_json("water20_fragments.json")

QCSchema Writer

from autofragment.io.writers import write_qcschema

write_qcschema(system, "water20_qcschema.json")

XYZ Fragment Output

from autofragment.io.writers import write_xyz_fragments

write_xyz_fragments(tree.fragments, "water20_fragments.xyz")

Step 5: Quick API

For simple workflows, the convenience function handles reading and partitioning in a single call:

import autofragment as af

tree = af.partition_xyz("water20.xyz", n_fragments=4, method="kmeans")
tree.to_json("water20_fragments.json")

for frag in tree.fragments:
    print(f"Fragment {frag.id}: {len(frag.symbols)} atoms")

Complete Example

#!/usr/bin/env python
"""Fragment a water cluster and write GAMESS FMO input."""

import autofragment as af
from autofragment.partitioners import MolecularPartitioner
from autofragment.io.writers import write_gamess_fmo

def main():
    # Load
    system = af.io.read_xyz("water20.xyz")
    print(f"Loaded {len(system.atoms)} atoms")

    # Partition with balanced clusters
    partitioner = MolecularPartitioner(
        n_fragments=4,
        method="kmeans_constrained",  # requires autofragment[balanced]
    )
    tree = partitioner.partition(system)
    print(f"Created {tree.n_fragments} fragments")

    for frag in tree.fragments:
        print(f"  {frag.id}: {len(frag.symbols)} atoms")

    # Write GAMESS FMO input
    write_gamess_fmo(
        tree.fragments,
        "water20_fmo.inp",
        basis="aug-cc-pVDZ",
        method="MP2",
        fmo_level=2,
        nbody=2,
    )
    print("Wrote water20_fmo.inp")

    # Also save as JSON for later use
    tree.to_json("water20_fragments.json")

if __name__ == "__main__":
    main()

Tips and Best Practices

  1. Basis set: Use augmented basis sets (aug-cc-pVDZ, aug-cc-pVTZ) for accurate hydrogen bonding.

  2. Balanced fragments: Use method="kmeans_constrained" (or strict_balanced=True) when equal-sized fragments matter for your workflow.

  3. Fragment count: Start with fewer fragments and increase; more fragments means more n-body terms.

  4. Topology refinement: Enable topology_refine=True for better spatial locality in larger clusters.

  5. Parallelization: Fragment-based QC calculations are embarrassingly parallel — each fragment or pair can run independently.

Next Steps