Fragmentation Algorithms Guide
AutoFragment provides a suite of algorithms for molecular fragmentation, ranging from simple clustering to advanced graph partitioning. This guide covers how to choose, configure, and combine these algorithms for optimal results.
Overview
Fragmentation in AutoFragment typically follows one of two paradigms:
Clustering: Spatial-based partitioning using atom/molecule coordinates.
Graph Partitioning: Topology-based partitioning using bond connectivity.
Clustering Algorithms
Clustering is fast and effective for systems without a complex covalent network, such as water clusters or solvent boxes.
from autofragment import MolecularPartitioner
# Use standard k-means (default)
partitioner = MolecularPartitioner(
n_fragments=10,
method="kmeans"
)
tree = partitioner.partition(system)
Available clustering methods:
kmeans: Standard k-means (default, no extra dependencies).kmeans_constrained: K-means with size constraints (requirespip install autofragment[balanced]).spectral: Spectral clustering on the adjacency matrix.agglomerative: Hierarchical clustering.gmm: Gaussian Mixture Models.birch: BIRCH incremental clustering.geom_planes: Geometric plane-based partitioning.
Topology-Assisted Refinement
For systems where pure spatial clustering is ambiguous, you can apply topology-based post-refinement to improve chemical locality.
from autofragment import MolecularPartitioner
partitioner = MolecularPartitioner(
n_fragments=8,
method="kmeans",
topology_refine=True,
topology_mode="graph", # or "euclidean"
topology_hops=1,
topology_layers=1,
topology_k_per_layer=6,
topology_bond_policy="infer" # or "strict"
)
tree = partitioner.partition(system)
This refinement keeps default behavior unchanged when topology_refine=False.
K-Means Seeding Strategies
AutoFragment provides pluggable initialization strategies for k-means clustering, giving better control over initial centroid placement:
from autofragment import MolecularPartitioner
# Use PCA-based seeding
partitioner = MolecularPartitioner(
n_fragments=4, method="kmeans", init_strategy="pca"
)
Available strategies:
Strategy |
Description |
|---|---|
|
Angular sectors in the PCA plane |
|
Binning along the first principal component |
|
Binning along a Cartesian axis (auto-detected or explicit) |
|
XY angular sectors around the center of mass |
For the axis strategy, you can specify which axis to use:
partitioner = MolecularPartitioner(
n_fragments=4, method="kmeans",
init_strategy={"strategy": "axis", "axis": "z"}
)
Seeding strategies also work with kmeans_constrained. They are silently ignored
for methods that don’t support custom initialization (agglomerative, spectral, etc.).
Direct Seeding API
from autofragment.algorithms.seeding import compute_seeds, SEEDING_STRATEGIES
# Compute seed centroids directly
seeds = compute_seeds(centroids, n_clusters=4, strategy="pca")
# seeds.shape == (4, 3)
Graph Partitioning
Graph-based methods use the covalent connectivity of the molecule to find natural “cut points,” avoiding breaking strong bonds or rings.
Min-Cut (Stoer-Wagner)
Finds the global minimum cut in the molecular graph.
from autofragment.algorithms.graph_partition import min_cut_partition
# Find the easiest way to split a molecule in two
cut_edges, partition1, partition2 = min_cut_partition(molecular_graph)
Balanced Partitioning (Kernighan-Lin)
Recursive bisection that prioritizes equal-sized fragments while minimizing the weight of cut bonds.
from autofragment.algorithms.graph_partition import balanced_partition
# Split into 4 balanced fragments
groups = balanced_partition(molecular_graph, n_fragments=4)
Community Detection (Louvain)
Finds “communities” or natural clusters based on modularity. Excellent for discovery-based fragmentation.
from autofragment.algorithms.graph_partition import community_partition
# Discover natural fragments
communities = community_partition(molecular_graph, algorithm="louvain")
Scoring
The scoring system allows you to quantify the quality of a fragmentation:
from autofragment.algorithms.scoring import FragmentationScore
score = FragmentationScore(
bond_penalty=2.5,
size_variance=0.1,
interface_score=0.8,
integrity_score=0.95,
)
print(f"Total Score: {score.total()}")
print(f"Bond Penalty: {score.bond_penalty}")
print(f"Size Variance: {score.size_variance}")
Preset scoring weights for common use cases:
from autofragment.algorithms.scoring import ScoringWeights
# For FMO calculations
weights = ScoringWeights.for_fmo()
# For many-body expansion
weights = ScoringWeights.for_mbe()
Choosing the Right Algorithm
System Type |
Recommended Method |
Why? |
|---|---|---|
Water Clusters |
|
Spatial proximity is key; uniform sizes. |
Large Water Clusters |
|
Hierarchical grouping for HMBE workflows. |
Proteins |
|
Respects biological units and capping. |
Large Molecules |
|
Maintains chemical connectivity while balancing load. |
Disordered Solids |
|
Finds natural clusters in complex networks. |
See the Theory Section for the mathematical details behind these algorithms.