Fragmentation Scoring Functions

Overview

Scoring functions quantify the quality of a molecular fragmentation to guide optimization algorithms. A good fragmentation should:

  1. Minimize breaking of strong chemical bonds

  2. Produce fragments of reasonable, uniform size

  3. Minimize interfragment interactions

  4. Preserve chemical integrity (functional groups, rings)

  5. Balance computational cost across fragments

Total Score Formulation

The total score is a weighted combination of component scores:

$$ S_{\text{total}} = \sum_{i} w_i \cdot S_i $$

where:

  • $w_i$ are tunable weights for each component

  • $S_i$ are individual score components

  • Higher scores indicate better fragmentations

Alternative formulation as minimization (penalty-based):

$$ P_{\text{total}} = \sum_{i} w_i \cdot P_i $$

where $P_i$ are penalty terms and lower values are better.

Component Scores

Bond Breaking Penalty ($S_{\text{bond}}$)

The most critical component—penalizes breaking covalent bonds:

$$ S_{\text{bond}} = -\sum_{b \in \text{broken}} E_{\text{bond}}(b) \cdot f_{\text{type}}(b) $$

where:

  • $E_{\text{bond}}(b)$ is the estimated bond energy

  • $f_{\text{type}}(b)$ is a multiplier based on bond type

Bond Type Multipliers

Bond Type

$f_{\text{type}}$

Rationale

Single (rotatable)

1.0

Preferred break points

Single (non-rotatable)

1.5

Somewhat penalized

Aromatic

10.0

Strongly discouraged

Double

50.0

Almost forbidden

Triple

100.0

Forbidden

Bond Energy Estimation

Approximate bond energies (kcal/mol):

$$ E_{\text{bond}}(X-Y) \approx \sqrt{E(X-X) \cdot E(Y-Y)} $$

Bond

Energy (kcal/mol)

C-C

83

C-H

99

C-N

73

C-O

86

C=O

180

C=C

146

C≡C

200

Size Variance ($S_{\text{size}}$)

Encourages fragments of uniform size to balance computational load:

$$ S_{\text{size}} = -\alpha \cdot \frac{\sigma(n_i)}{\bar{n}} $$

where:

  • $n_i$ is the number of atoms in fragment $i$

  • $\bar{n} = \frac{1}{k}\sum_i n_i$ is the mean fragment size

  • $\sigma(n_i) = \sqrt{\frac{1}{k}\sum_i (n_i - \bar{n})^2}$ is the standard deviation

  • $\alpha$ is a scaling parameter

Alternative: Use coefficient of variation as penalty:

$$ \text{CV} = \frac{\sigma(n_i)}{\bar{n}} $$

Normalized score:

$$ S_{\text{size}} = 1 - \text{CV} $$

Size Range Penalty

Penalize fragments outside acceptable size range:

$$ P_{\text{range}} = \sum_i \max(0, n_i - n_{\max}) + \sum_i \max(0, n_{\min} - n_i) $$

where $n_{\min}$ and $n_{\max}$ are target fragment size bounds.

Interface Score ($S_{\text{interface}}$)

Minimizes interaction energy across fragment boundaries:

$$ S_{\text{interface}} = -\sum_{(F_i, F_j)} E_{\text{int}}(F_i, F_j) $$

Approximations for Interface Energy

Exposed surface area:

$$ E_{\text{int}} \approx \gamma \cdot A_{\text{exposed}} $$

where $\gamma$ is surface tension and $A_{\text{exposed}}$ is solvent-accessible surface area at cut points.

Electrostatic interaction:

$$ E_{\text{int}} \approx \sum_{a \in F_i} \sum_{b \in F_j} \frac{q_a q_b}{r_{ab}} $$

Number of cut bonds (simplified):

$$ E_{\text{int}} \approx N_{\text{cut}} $$

Chemical Integrity ($S_{\text{chem}}$)

Rewards preservation of chemically meaningful units:

$$ S_{\text{chem}} = \sum_f I_{\text{valid}}(f) $$

where $I_{\text{valid}}(f)$ is an indicator for chemically sensible fragments.

Validity Criteria

  1. No broken rings: $$ I_{\text{ring}}(f) = \begin{cases} 0 & \text{if ring is split} \ 1 & \text{otherwise} \end{cases} $$

  2. Complete functional groups: $$ I_{\text{func}}(f) = \begin{cases} 0 & \text{if functional group is split} \ 1 & \text{otherwise} \end{cases} $$

  3. Proper valence: $$ I_{\text{valence}}(f) = \begin{cases} 1 & \text{if all atoms have valid valence (after capping)} \ 0 & \text{otherwise} \end{cases} $$

Computational Cost ($S_{\text{comp}}$)

Estimates total computational cost of the fragmentation:

$$ S_{\text{comp}} = -\sum_i C(n_i) $$

where $C(n)$ is the cost function for QM calculations.

Scaling Estimates

Method

Scaling

$C(n)$

HF

$O(n^4)$

$n^4$

DFT

$O(n^3)$

$n^3$

MP2

$O(n^5)$

$n^5$

CCSD

$O(n^6)$

$n^6$

CCSD(T)

$O(n^7)$

$n^7$

For many-body expansion, include pair interaction costs:

$$ S_{\text{comp}}^{\text{MBE}} = -\left[ \sum_i C(n_i) + \frac{1}{2}\sum_{i \neq j} C(n_i + n_j) \right] $$

Distance from Target Size ($S_{\text{target}}$)

Encourages fragments close to specified target size:

$$ S_{\text{target}} = -\sum_i \left| n_i - n_{\text{target}} \right|^p $$

Common choices:

  • $p = 1$: Linear penalty (MAE)

  • $p = 2$: Quadratic penalty (MSE), more severe for large deviations

Number of Fragments ($S_{\text{count}}$)

Penalize excessive fragmentation:

$$ S_{\text{count}} = -|k - k_{\text{target}}| $$

where $k$ is the number of fragments.

Weight Selection

Default Weights

Balanced for general molecular fragmentation:

Component

Weight

Rationale

$w_{\text{bond}}$

1.0

Primary consideration

$w_{\text{size}}$

0.3

Moderate importance

$w_{\text{interface}}$

0.2

Secondary

$w_{\text{chem}}$

0.5

Important for validity

$w_{\text{comp}}$

0.1

Use when balancing load

Application-Specific Weights

FMO calculations (prioritize size balance):

weights = {
    "bond": 1.0,
    "size": 0.8,  # Higher for balanced fragments
    "chem": 1.0,  # Critical for FMO
    "interface": 0.1,
}

Large protein fragmentation (prioritize computational cost):

weights = {
    "bond": 1.0,
    "size": 0.3,
    "comp": 0.5,  # Consider cost
    "chem": 0.5,
}

Water clusters (minimize interactions):

weights = {
    "bond": 0.5,  # Weak H-bonds
    "interface": 1.0,  # Minimize water-water
    "size": 0.1,  # Less important
}

Multi-Objective Optimization

Pareto Optimality

Multiple objectives can conflict. A fragmentation is Pareto optimal if no objective can be improved without worsening another.

The Pareto frontier contains all Pareto-optimal fragmentations:

$$ \mathcal{P} = {f : \nexists f’ \text{ such that } S_i(f’) \geq S_i(f) \forall i \text{ and } S_j(f’) > S_j(f) \text{ for some } j} $$

Scalarization

Convert to single objective via weighted sum:

$$ S_{\text{total}}(\mathbf{w}) = \sum_i w_i S_i $$

Different weight vectors explore different points on Pareto frontier.

ε-Constraint Method

Optimize one objective subject to constraints on others:

$$ \maximize \quad S_1 \ \text{subject to} \quad S_i \geq \epsilon_i, \quad i = 2, \ldots, k $$

Trade-offs and Considerations

Bond Breaking vs. Size Balance

  • Fewer cuts → larger fragments → worse computational scaling

  • More cuts → smaller fragments → more interface energy

Optimal balance depends on:

  • Target QM method (scaling exponent)

  • System type (proteins need larger fragments than water)

  • Accuracy requirements

Chemical Integrity vs. Flexibility

Strict preservation of all functional groups may produce:

  • Fragments that are too large

  • Poor size balance

  • Suboptimal partitioning

Consider relaxing constraints for:

  • Long-chain functional groups

  • Repeated patterns (polymers)

Computational Cost vs. Accuracy

Many-body expansion trade-off:

  • Fewer, larger fragments → fewer pairs → faster → less accurate

  • More, smaller fragments → more pairs → slower → more accurate

Target fragment size should be chosen based on desired accuracy.

Score Normalization

Z-Score Normalization

For comparing across systems:

$$ S_i^{\text{norm}} = \frac{S_i - \mu_i}{\sigma_i} $$

where $\mu_i$ and $\sigma_i$ are mean and std from reference fragmentations.

Min-Max Scaling

$$ S_i^{\text{scaled}} = \frac{S_i - S_i^{\min}}{S_i^{\max} - S_i^{\min}} $$

Bounds scores to $[0, 1]$ range.

Implementation Notes

from autofragment.scoring import FragmentationScorer

scorer = FragmentationScorer(
    weights={
        "bond": 1.0,
        "size": 0.3,
        "interface": 0.2,
        "chem": 0.5,
    }
)

# Score a fragmentation
fragments = partitioner.partition(system)
score = scorer.score(fragments)

# Get component breakdown
breakdown = scorer.score_breakdown(fragments)
for component, value in breakdown.items():
    print(f"{component}: {value:.3f}")

References

  1. Fedorov, D. G., & Kitaura, K. (2007). Extending the power of quantum chemistry to large systems with the fragment molecular orbital method. JPC A, 111(30), 6904-6914.

  2. Gordon, M. S., et al. (2012). Fragmentation methods: A route to accurate calculations on large systems. Chemical Reviews, 112(1), 632-672.

  3. Collins, M. A., & Bettens, R. P. (2015). Energy-based molecular fragmentation methods. Chemical Reviews, 115(12), 5607-5642.