Fragmentation Scoring Functions

Overview

Scoring functions quantify the quality of a molecular fragmentation to guide optimization algorithms. A good fragmentation should:

Minimize breaking of strong chemical bonds
Produce fragments of reasonable, uniform size
Minimize interfragment interactions
Preserve chemical integrity (functional groups, rings)
Balance computational cost across fragments

Total Score Formulation

The total score is a weighted combination of component scores:

$$ S_{\text{total}} = \sum_{i} w_i \cdot S_i $$

where:

$w_i$ are tunable weights for each component
$S_i$ are individual score components
Higher scores indicate better fragmentations

Alternative formulation as minimization (penalty-based):

$$ P_{\text{total}} = \sum_{i} w_i \cdot P_i $$

where $P_i$ are penalty terms and lower values are better.

Component Scores

Bond Breaking Penalty ($S_{\text{bond}}$)

The most critical component—penalizes breaking covalent bonds:

$$ S_{\text{bond}} = -\sum_{b \in \text{broken}} E_{\text{bond}}(b) \cdot f_{\text{type}}(b) $$

where:

$E_{\text{bond}}(b)$ is the estimated bond energy
$f_{\text{type}}(b)$ is a multiplier based on bond type

Bond Type Multipliers

Bond Type	$f_{\text{type}}$	Rationale
Single (rotatable)	1.0	Preferred break points
Single (non-rotatable)	1.5	Somewhat penalized
Aromatic	10.0	Strongly discouraged
Double	50.0	Almost forbidden
Triple	100.0	Forbidden

Bond Energy Estimation

Approximate bond energies (kcal/mol):

$$ E_{\text{bond}}(X-Y) \approx \sqrt{E(X-X) \cdot E(Y-Y)} $$

Bond	Energy (kcal/mol)
C-C	83
C-H	99
C-N	73
C-O	86
C=O	180
C=C	146
C≡C	200

Size Variance ($S_{\text{size}}$)

Encourages fragments of uniform size to balance computational load:

$$ S_{\text{size}} = -\alpha \cdot \frac{\sigma(n_i)}{\bar{n}} $$

where:

$n_i$ is the number of atoms in fragment $i$
$\bar{n} = \frac{1}{k}\sum_i n_i$ is the mean fragment size
$\sigma(n_i) = \sqrt{\frac{1}{k}\sum_i (n_i - \bar{n})^2}$ is the standard deviation
$\alpha$ is a scaling parameter

Alternative: Use coefficient of variation as penalty:

$$ \text{CV} = \frac{\sigma(n_i)}{\bar{n}} $$

Normalized score:

$$ S_{\text{size}} = 1 - \text{CV} $$

Size Range Penalty

Penalize fragments outside acceptable size range:

$$ P_{\text{range}} = \sum_i \max(0, n_i - n_{\max}) + \sum_i \max(0, n_{\min} - n_i) $$

where $n_{\min}$ and $n_{\max}$ are target fragment size bounds.

Interface Score ($S_{\text{interface}}$)

Minimizes interaction energy across fragment boundaries:

$$ S_{\text{interface}} = -\sum_{(F_i, F_j)} E_{\text{int}}(F_i, F_j) $$

Approximations for Interface Energy

Exposed surface area:

$$ E_{\text{int}} \approx \gamma \cdot A_{\text{exposed}} $$

where $\gamma$ is surface tension and $A_{\text{exposed}}$ is solvent-accessible surface area at cut points.

Electrostatic interaction:

$$ E_{\text{int}} \approx \sum_{a \in F_i} \sum_{b \in F_j} \frac{q_a q_b}{r_{ab}} $$

Number of cut bonds (simplified):

$$ E_{\text{int}} \approx N_{\text{cut}} $$

Chemical Integrity ($S_{\text{chem}}$)

Rewards preservation of chemically meaningful units:

$$ S_{\text{chem}} = \sum_f I_{\text{valid}}(f) $$

where $I_{\text{valid}}(f)$ is an indicator for chemically sensible fragments.

Validity Criteria

No broken rings: $$ I_{\text{ring}}(f) = \begin{cases} 0 & \text{if ring is split} \ 1 & \text{otherwise} \end{cases} $$
Complete functional groups: $$ I_{\text{func}}(f) = \begin{cases} 0 & \text{if functional group is split} \ 1 & \text{otherwise} \end{cases} $$
Proper valence: $$ I_{\text{valence}}(f) = \begin{cases} 1 & \text{if all atoms have valid valence (after capping)} \ 0 & \text{otherwise} \end{cases} $$

Computational Cost ($S_{\text{comp}}$)

Estimates total computational cost of the fragmentation:

$$ S_{\text{comp}} = -\sum_i C(n_i) $$

where $C(n)$ is the cost function for QM calculations.

Scaling Estimates

Method	Scaling	$C(n)$
HF	$O(n^4)$	$n^4$
DFT	$O(n^3)$	$n^3$
MP2	$O(n^5)$	$n^5$
CCSD	$O(n^6)$	$n^6$
CCSD(T)	$O(n^7)$	$n^7$

For many-body expansion, include pair interaction costs:

$$ S_{\text{comp}}^{\text{MBE}} = -\left[ \sum_i C(n_i) + \frac{1}{2}\sum_{i \neq j} C(n_i + n_j) \right] $$

Distance from Target Size ($S_{\text{target}}$)

Encourages fragments close to specified target size:

$$ S_{\text{target}} = -\sum_i \left| n_i - n_{\text{target}} \right|^p $$

Common choices:

$p = 1$: Linear penalty (MAE)
$p = 2$: Quadratic penalty (MSE), more severe for large deviations

Number of Fragments ($S_{\text{count}}$)

Penalize excessive fragmentation:

$$ S_{\text{count}} = -|k - k_{\text{target}}| $$

where $k$ is the number of fragments.

Weight Selection

Default Weights

Balanced for general molecular fragmentation:

Component	Weight	Rationale
$w_{\text{bond}}$	1.0	Primary consideration
$w_{\text{size}}$	0.3	Moderate importance
$w_{\text{interface}}$	0.2	Secondary
$w_{\text{chem}}$	0.5	Important for validity
$w_{\text{comp}}$	0.1	Use when balancing load

Application-Specific Weights

FMO calculations (prioritize size balance):

weights = {
    "bond": 1.0,
    "size": 0.8,  # Higher for balanced fragments
    "chem": 1.0,  # Critical for FMO
    "interface": 0.1,
}

Large protein fragmentation (prioritize computational cost):

weights = {
    "bond": 1.0,
    "size": 0.3,
    "comp": 0.5,  # Consider cost
    "chem": 0.5,
}

Water clusters (minimize interactions):

weights = {
    "bond": 0.5,  # Weak H-bonds
    "interface": 1.0,  # Minimize water-water
    "size": 0.1,  # Less important
}

Multi-Objective Optimization

Pareto Optimality

Multiple objectives can conflict. A fragmentation is Pareto optimal if no objective can be improved without worsening another.

The Pareto frontier contains all Pareto-optimal fragmentations:

$$ \mathcal{P} = {f : \nexists f’ \text{ such that } S_i(f’) \geq S_i(f) \forall i \text{ and } S_j(f’) > S_j(f) \text{ for some } j} $$

Scalarization

Convert to single objective via weighted sum:

$$ S_{\text{total}}(\mathbf{w}) = \sum_i w_i S_i $$

Different weight vectors explore different points on Pareto frontier.

ε-Constraint Method

Optimize one objective subject to constraints on others:

$$ \maximize \quad S_1 \ \text{subject to} \quad S_i \geq \epsilon_i, \quad i = 2, \ldots, k $$

Trade-offs and Considerations

Bond Breaking vs. Size Balance

Fewer cuts → larger fragments → worse computational scaling
More cuts → smaller fragments → more interface energy

Optimal balance depends on:

Target QM method (scaling exponent)
System type (proteins need larger fragments than water)
Accuracy requirements

Chemical Integrity vs. Flexibility

Strict preservation of all functional groups may produce:

Fragments that are too large
Poor size balance
Suboptimal partitioning

Consider relaxing constraints for:

Long-chain functional groups
Repeated patterns (polymers)

Computational Cost vs. Accuracy

Many-body expansion trade-off:

Fewer, larger fragments → fewer pairs → faster → less accurate
More, smaller fragments → more pairs → slower → more accurate

Target fragment size should be chosen based on desired accuracy.

Score Normalization

Z-Score Normalization

For comparing across systems:

$$ S_i^{\text{norm}} = \frac{S_i - \mu_i}{\sigma_i} $$

where $\mu_i$ and $\sigma_i$ are mean and std from reference fragmentations.

Min-Max Scaling

$$ S_i^{\text{scaled}} = \frac{S_i - S_i^{\min}}{S_i^{\max} - S_i^{\min}} $$

Bounds scores to $[0, 1]$ range.

Implementation Notes

from autofragment.scoring import FragmentationScorer

scorer = FragmentationScorer(
    weights={
        "bond": 1.0,
        "size": 0.3,
        "interface": 0.2,
        "chem": 0.5,
    }
)

# Score a fragmentation
fragments = partitioner.partition(system)
score = scorer.score(fragments)

# Get component breakdown
breakdown = scorer.score_breakdown(fragments)
for component, value in breakdown.items():
    print(f"{component}: {value:.3f}")

References

Fedorov, D. G., & Kitaura, K. (2007). Extending the power of quantum chemistry to large systems with the fragment molecular orbital method. JPC A, 111(30), 6904-6914.
Gordon, M. S., et al. (2012). Fragmentation methods: A route to accurate calculations on large systems. Chemical Reviews, 112(1), 632-672.
Collins, M. A., & Bettens, R. P. (2015). Energy-based molecular fragmentation methods. Chemical Reviews, 115(12), 5607-5642.