# Fragmentation Algorithms Guide

AutoFragment provides a suite of algorithms for molecular fragmentation, ranging from simple clustering to advanced graph partitioning. This guide covers how to choose, configure, and combine these algorithms for optimal results.

## Overview

Fragmentation in AutoFragment typically follows one of two paradigms:

1.  **Clustering**: Spatial-based partitioning using atom/molecule coordinates.
2.  **Graph Partitioning**: Topology-based partitioning using bond connectivity.

## Clustering Algorithms

Clustering is fast and effective for systems without a complex covalent network, such as water clusters or solvent boxes.

```python
from autofragment import MolecularPartitioner

# Use standard k-means (default)
partitioner = MolecularPartitioner(
    n_fragments=10,
    method="kmeans"
)
tree = partitioner.partition(system)
```

Available clustering methods:
- `kmeans`: Standard k-means (default, no extra dependencies).
- `kmeans_constrained`: K-means with size constraints (requires `pip install autofragment[balanced]`).
- `spectral`: Spectral clustering on the adjacency matrix.
- `agglomerative`: Hierarchical clustering.
- `gmm`: Gaussian Mixture Models.
- `birch`: BIRCH incremental clustering.
- `geom_planes`: Geometric plane-based partitioning.

### Topology-Assisted Refinement

For systems where pure spatial clustering is ambiguous, you can apply topology-based post-refinement to improve chemical locality.

```python
from autofragment import MolecularPartitioner

partitioner = MolecularPartitioner(
    n_fragments=8,
    method="kmeans",
    topology_refine=True,
    topology_mode="graph",      # or "euclidean"
    topology_hops=1,
    topology_layers=1,
    topology_k_per_layer=6,
    topology_bond_policy="infer" # or "strict"
)

tree = partitioner.partition(system)
```

This refinement keeps default behavior unchanged when `topology_refine=False`.

## K-Means Seeding Strategies

AutoFragment provides pluggable initialization strategies for k-means clustering,
giving better control over initial centroid placement:

```python
from autofragment import MolecularPartitioner

# Use PCA-based seeding
partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans", init_strategy="pca"
)
```

Available strategies:

| Strategy | Description |
|----------|-------------|
| `"halfplane"` | Angular sectors in the PCA plane |
| `"pca"` | Binning along the first principal component |
| `"axis"` | Binning along a Cartesian axis (auto-detected or explicit) |
| `"radial"` | XY angular sectors around the center of mass |

For the `axis` strategy, you can specify which axis to use:

```python
partitioner = MolecularPartitioner(
    n_fragments=4, method="kmeans",
    init_strategy={"strategy": "axis", "axis": "z"}
)
```

Seeding strategies also work with `kmeans_constrained`. They are silently ignored
for methods that don't support custom initialization (agglomerative, spectral, etc.).

### Direct Seeding API

```python
from autofragment.algorithms.seeding import compute_seeds, SEEDING_STRATEGIES

# Compute seed centroids directly
seeds = compute_seeds(centroids, n_clusters=4, strategy="pca")
# seeds.shape == (4, 3)
```

## Graph Partitioning

Graph-based methods use the covalent connectivity of the molecule to find natural "cut points," avoiding breaking strong bonds or rings.

### Min-Cut (Stoer-Wagner)

Finds the global minimum cut in the molecular graph.

```python
from autofragment.algorithms.graph_partition import min_cut_partition

# Find the easiest way to split a molecule in two
cut_edges, partition1, partition2 = min_cut_partition(molecular_graph)
```

### Balanced Partitioning (Kernighan-Lin)

Recursive bisection that prioritizes equal-sized fragments while minimizing the weight of cut bonds.

```python
from autofragment.algorithms.graph_partition import balanced_partition

# Split into 4 balanced fragments
groups = balanced_partition(molecular_graph, n_fragments=4)
```

### Community Detection (Louvain)

Finds "communities" or natural clusters based on modularity. Excellent for discovery-based fragmentation.

```python
from autofragment.algorithms.graph_partition import community_partition

# Discover natural fragments
communities = community_partition(molecular_graph, algorithm="louvain")
```

## Scoring

The scoring system allows you to quantify the quality of a fragmentation:

```python
from autofragment.algorithms.scoring import FragmentationScore

score = FragmentationScore(
    bond_penalty=2.5,
    size_variance=0.1,
    interface_score=0.8,
    integrity_score=0.95,
)
print(f"Total Score: {score.total()}")
print(f"Bond Penalty: {score.bond_penalty}")
print(f"Size Variance: {score.size_variance}")
```

Preset scoring weights for common use cases:

```python
from autofragment.algorithms.scoring import ScoringWeights

# For FMO calculations
weights = ScoringWeights.for_fmo()

# For many-body expansion
weights = ScoringWeights.for_mbe()
```

## Choosing the Right Algorithm

| System Type | Recommended Method | Why? |
|-------------|--------------------|------|
| Water Clusters | `kmeans` or `kmeans_constrained` | Spatial proximity is key; uniform sizes. |
| Large Water Clusters | `kmeans` with `tiers=2` | Hierarchical grouping for HMBE workflows. |
| Proteins | `bio` (BioPartitioner) | Respects biological units and capping. |
| Large Molecules | `balanced_partition` | Maintains chemical connectivity while balancing load. |
| Disordered Solids | `community_partition` | Finds natural clusters in complex networks. |

See the [Theory Section](../theory/scoring.md) for the mathematical details behind these algorithms.