Fragmentation Algorithms Guide

AutoFragment provides a suite of algorithms for molecular fragmentation, ranging from simple clustering to advanced graph partitioning. This guide covers how to choose, configure, and combine these algorithms for optimal results.

Overview

Fragmentation in AutoFragment typically follows one of two paradigms:

Clustering: Spatial-based partitioning using atom/molecule coordinates.
Graph Partitioning: Topology-based partitioning using bond connectivity.

Clustering Algorithms

Clustering is fast and effective for systems without a complex covalent network, such as water clusters or solvent boxes.

from autofragment import MolecularPartitioner

# Use standard k-means (default)
partitioner = MolecularPartitioner(
    n_fragments=10,
    method="kmeans"
)
tree = partitioner.partition(system)

Available clustering methods:

kmeans: Standard k-means (default, no extra dependencies).
kmeans_constrained: K-means with size constraints (requires pip install autofragment[balanced]).
spectral: Spectral clustering on the adjacency matrix.
agglomerative: Hierarchical clustering.
gmm: Gaussian Mixture Models.
birch: BIRCH incremental clustering.
geom_planes: Geometric plane-based partitioning.

K-Means Seeding Strategies

AutoFragment provides pluggable initialization strategies for k-means clustering, giving better control over initial centroid placement:

from autofragment import MolecularPartitioner

# Use PCA-based seeding
partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans", init_strategy="pca"
)

Available strategies:

Strategy	Description
`"halfplane"`	Angular sectors in the PCA plane
`"pca"`	Binning along the first principal component
`"axis"`	Binning along a Cartesian axis (auto-detected or explicit)
`"radial"`	XY angular sectors around the center of mass

For the axis strategy, you can specify which axis to use:

partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans",
    init_strategy={"strategy": "axis", "axis": "z"}
)

Seeding strategies also work with kmeans_constrained. They are silently ignored for methods that don’t support custom initialization (agglomerative, spectral, etc.).

Direct Seeding API

from autofragment.algorithms.seeding import compute_seeds, SEEDING_STRATEGIES

# Compute seed centroids directly
seeds = compute_seeds(centroids, n_clusters=4, strategy="pca")
# seeds.shape == (4, 3)

Graph Partitioning

Graph-based methods use the covalent connectivity of the molecule to find natural “cut points,” avoiding breaking strong bonds or rings.

Min-Cut (Stoer-Wagner)

Finds the global minimum cut in the molecular graph.

from autofragment.algorithms.graph_partition import min_cut_partition

# Find the easiest way to split a molecule in two
cut_edges, partition1, partition2 = min_cut_partition(molecular_graph)

Balanced Partitioning (Kernighan-Lin)

Recursive bisection that prioritizes equal-sized fragments while minimizing the weight of cut bonds.

from autofragment.algorithms.graph_partition import balanced_partition

# Split into 4 balanced fragments
groups = balanced_partition(molecular_graph, n_fragments=4)

Community Detection (Louvain)

Finds “communities” or natural clusters based on modularity. Excellent for discovery-based fragmentation.

from autofragment.algorithms.graph_partition import community_partition

# Discover natural fragments
communities = community_partition(molecular_graph, algorithm="louvain")

Scoring

The scoring system allows you to quantify the quality of a fragmentation:

from autofragment.algorithms.scoring import FragmentationScore

score = FragmentationScore(
    bond_penalty=2.5,
    size_variance=0.1,
    interface_score=0.8,
    integrity_score=0.95,
)
print(f"Total Score: {score.total()}")
print(f"Bond Penalty: {score.bond_penalty}")
print(f"Size Variance: {score.size_variance}")

Preset scoring weights for common use cases:

from autofragment.algorithms.scoring import ScoringWeights

# For FMO calculations
weights = ScoringWeights.for_fmo()

# For many-body expansion
weights = ScoringWeights.for_mbe()

Choosing the Right Algorithm

System Type	Recommended Method	Why?
Water Clusters	`kmeans` or `kmeans_constrained`	Spatial proximity is key; uniform sizes.
Large Water Clusters	`kmeans` with `tiers=2`	Hierarchical grouping for HMBE workflows.
Proteins	`bio` (BioPartitioner)	Respects biological units and capping.
Large Molecules	`balanced_partition`	Maintains chemical connectivity while balancing load.
Disordered Solids	`community_partition`	Finds natural clusters in complex networks.

See the Theory Section for the mathematical details behind these algorithms.