Fragmentation Algorithms Guide

AutoFragment provides a suite of algorithms for molecular fragmentation, ranging from simple clustering to advanced graph partitioning. This guide covers how to choose, configure, and combine these algorithms for optimal results.

Overview

Fragmentation in AutoFragment typically follows one of two paradigms:

  1. Clustering: Spatial-based partitioning using atom/molecule coordinates.

  2. Graph Partitioning: Topology-based partitioning using bond connectivity.

Clustering Algorithms

Clustering is fast and effective for systems without a complex covalent network, such as water clusters or solvent boxes.

from autofragment import MolecularPartitioner

# Use standard k-means (default)
partitioner = MolecularPartitioner(
    n_fragments=10,
    method="kmeans"
)
tree = partitioner.partition(system)

Available clustering methods:

  • kmeans: Standard k-means (default, no extra dependencies).

  • kmeans_constrained: K-means with size constraints (requires pip install autofragment[balanced]).

  • spectral: Spectral clustering on the adjacency matrix.

  • agglomerative: Hierarchical clustering.

  • gmm: Gaussian Mixture Models.

  • birch: BIRCH incremental clustering.

  • geom_planes: Geometric plane-based partitioning.

Topology-Assisted Refinement

For systems where pure spatial clustering is ambiguous, you can apply topology-based post-refinement to improve chemical locality.

from autofragment import MolecularPartitioner

partitioner = MolecularPartitioner(
    n_fragments=8,
    method="kmeans",
    topology_refine=True,
    topology_mode="graph",      # or "euclidean"
    topology_hops=1,
    topology_layers=1,
    topology_k_per_layer=6,
    topology_bond_policy="infer" # or "strict"
)

tree = partitioner.partition(system)

This refinement keeps default behavior unchanged when topology_refine=False.

K-Means Seeding Strategies

AutoFragment provides pluggable initialization strategies for k-means clustering, giving better control over initial centroid placement:

from autofragment import MolecularPartitioner

# Use PCA-based seeding
partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans", init_strategy="pca"
)

Available strategies:

Strategy

Description

"halfplane"

Angular sectors in the PCA plane

"pca"

Binning along the first principal component

"axis"

Binning along a Cartesian axis (auto-detected or explicit)

"radial"

XY angular sectors around the center of mass

For the axis strategy, you can specify which axis to use:

partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans",
    init_strategy={"strategy": "axis", "axis": "z"}
)

Seeding strategies also work with kmeans_constrained. They are silently ignored for methods that don’t support custom initialization (agglomerative, spectral, etc.).

Direct Seeding API

from autofragment.algorithms.seeding import compute_seeds, SEEDING_STRATEGIES

# Compute seed centroids directly
seeds = compute_seeds(centroids, n_clusters=4, strategy="pca")
# seeds.shape == (4, 3)

Graph Partitioning

Graph-based methods use the covalent connectivity of the molecule to find natural “cut points,” avoiding breaking strong bonds or rings.

Min-Cut (Stoer-Wagner)

Finds the global minimum cut in the molecular graph.

from autofragment.algorithms.graph_partition import min_cut_partition

# Find the easiest way to split a molecule in two
cut_edges, partition1, partition2 = min_cut_partition(molecular_graph)

Balanced Partitioning (Kernighan-Lin)

Recursive bisection that prioritizes equal-sized fragments while minimizing the weight of cut bonds.

from autofragment.algorithms.graph_partition import balanced_partition

# Split into 4 balanced fragments
groups = balanced_partition(molecular_graph, n_fragments=4)

Community Detection (Louvain)

Finds “communities” or natural clusters based on modularity. Excellent for discovery-based fragmentation.

from autofragment.algorithms.graph_partition import community_partition

# Discover natural fragments
communities = community_partition(molecular_graph, algorithm="louvain")

Scoring

The scoring system allows you to quantify the quality of a fragmentation:

from autofragment.algorithms.scoring import FragmentationScore

score = FragmentationScore(
    bond_penalty=2.5,
    size_variance=0.1,
    interface_score=0.8,
    integrity_score=0.95,
)
print(f"Total Score: {score.total()}")
print(f"Bond Penalty: {score.bond_penalty}")
print(f"Size Variance: {score.size_variance}")

Preset scoring weights for common use cases:

from autofragment.algorithms.scoring import ScoringWeights

# For FMO calculations
weights = ScoringWeights.for_fmo()

# For many-body expansion
weights = ScoringWeights.for_mbe()

Choosing the Right Algorithm

System Type

Recommended Method

Why?

Water Clusters

kmeans or kmeans_constrained

Spatial proximity is key; uniform sizes.

Large Water Clusters

kmeans with tiers=2

Hierarchical grouping for HMBE workflows.

Proteins

bio (BioPartitioner)

Respects biological units and capping.

Large Molecules

balanced_partition

Maintains chemical connectivity while balancing load.

Disordered Solids

community_partition

Finds natural clusters in complex networks.

See the Theory Section for the mathematical details behind these algorithms.