Fragmentation Rules

The Rules Engine is the core component that controls which bonds can or cannot be broken during molecular fragmentation. Rules encode chemical knowledge to ensure fragments remain chemically meaningful.

Overview

The rules system provides:

  • Flexible rule definitions via the FragmentationRule abstract base class

  • Priority-based evaluation where critical rules are evaluated first

  • Conflict resolution where the most restrictive action wins

  • Pre-built rules for common chemistry, biology, and materials science

Quick Start

from autofragment.rules import (
    RuleEngine,
    RuleAction,
    AromaticRingRule,
    DoubleBondRule,
    PeptideBondRule,
)

# Create an engine with rules
engine = RuleEngine([
    AromaticRingRule(),      # Never break aromatic rings
    DoubleBondRule(),        # Never break double/triple bonds
    PeptideBondRule(),       # Prefer keeping peptide bonds
])

# Evaluate a specific bond
action = engine.evaluate_bond((0, 1), chemical_system)
if action == RuleAction.MUST_NOT_BREAK:
    print("This bond cannot be broken")

# Get all breakable bonds in a system
breakable = engine.get_breakable_bonds(chemical_system)

Rule Actions

Rules return one of four actions, ordered from most to least restrictive:

Action

Description

Use Case

MUST_NOT_BREAK

Bond must never be broken

Aromatic rings, double bonds

PREFER_KEEP

Prefer keeping, can break if necessary

Peptide bonds, metal coordination

ALLOW

No preference (neutral)

Default when no rules apply

PREFER_BREAK

Good fragmentation point

Alpha-beta carbon bonds

When multiple rules apply to the same bond, the most restrictive action wins.

Priority System

Rules have integer priorities (higher = evaluated first):

Constant

Value

Typical Use

PRIORITY_CRITICAL

1000

Aromatic rings, proline rings

PRIORITY_HIGH

800

Metal coordination, peptide bonds

PRIORITY_MEDIUM

500

General rules (default)

PRIORITY_LOW

200

Hydrogen bonds

# Override default priority
rule = AromaticRingRule(priority=999)
print(rule.priority)  # 999

Built-in Rules

Common Chemical Rules

These rules apply universally to all molecular systems.

AromaticRingRule

Never break bonds within aromatic ring systems (benzene, naphthalene, pyridine, etc.).

rule = AromaticRingRule()
# Priority: CRITICAL
# Action: MUST_NOT_BREAK

DoubleBondRule

Never break double or triple bonds. Configurable minimum bond order.

rule = DoubleBondRule(min_order=1.5)  # Catches aromatic bonds too
# Priority: CRITICAL
# Action: MUST_NOT_BREAK

MetalCoordinationRule

Preserve metal-ligand coordination bonds. Supports custom metal sets.

rule = MetalCoordinationRule(
    metals={"Fe", "Cu", "Zn"},
    rule_action=RuleAction.MUST_NOT_BREAK
)
# Priority: HIGH

FunctionalGroupRule

Keep common functional groups intact (carboxyl, nitro, phosphate, etc.).

rule = FunctionalGroupRule()
# Default groups: carboxyl, nitro, sulfonate, phosphate, amino

Biological Rules

Designed for proteins and nucleic acids. These rules are configurable to support different fragmentation strategies.

PeptideBondRule

Handle peptide bonds between amino acid residues.

# Traditional FMO: keep peptide bonds
rule = PeptideBondRule(rule_action=RuleAction.PREFER_KEEP)

# Residue-based fragmentation: break at peptide bonds
rule = PeptideBondRule(rule_action=RuleAction.PREFER_BREAK)

DisulfideBondRule

Handle disulfide bridges (Cys-Cys S-S bonds).

# Preserve structure
rule = DisulfideBondRule(rule_action=RuleAction.MUST_NOT_BREAK)

# Allow breaking for domain separation
rule = DisulfideBondRule(rule_action=RuleAction.ALLOW)

AlphaBetaCarbonRule

Mark C-alpha to C-beta bonds as preferred break points (sidechain separation).

rule = AlphaBetaCarbonRule()
# Default action: PREFER_BREAK

ProlineRingRule

Never break the proline pyrrolidine ring (5-membered ring with N).

rule = ProlineRingRule()
# Always: MUST_NOT_BREAK

HydrogenBondRule

Handle hydrogen bonds. Defaults to ALLOW because H-bonds often need to be broken.

# For water clusters: allow breaking (default)
rule = HydrogenBondRule(rule_action=RuleAction.ALLOW)

# For secondary structure: prefer keeping
rule = HydrogenBondRule(rule_action=RuleAction.PREFER_KEEP)

Materials Science Rules

For solid-state and extended materials systems.

SiloxaneBridgeRule

Handle Si-O-Si siloxane bridges in silica and zeolites.

rule = SiloxaneBridgeRule(rule_action=RuleAction.PREFER_BREAK)

MOFLinkerRule

Preserve organic linker molecules in Metal-Organic Frameworks.

rule = MOFLinkerRule()
# Protects C-C, C-N, C-O bonds within linkers
# Does NOT protect metal-linker bonds

MetalNodeRule

Preserve metal node clusters (Zn4O, Cu paddlewheels, etc.).

rule = MetalNodeRule(cluster_distance=4.0)  # Angstroms
# Groups metals within 4Å as a cluster

PerovskiteOctahedralRule

Preserve MO₆ octahedra in perovskite structures.

rule = PerovskiteOctahedralRule(
    b_site_metals={"Ti", "Zr", "Nb"},
    allow_corner_breaking=True
)

ZeoliteAcidSiteRule

Protect Brønsted acid sites (Al-O(H)-Si) in zeolites.

rule = ZeoliteAcidSiteRule()
# Action: MUST_NOT_BREAK

SilanolRule

Preserve surface silanol groups (Si-OH).

rule = SilanolRule()
# Action: MUST_NOT_BREAK

MetalCarboxylateRule

Identify and allow breaking of metal-carboxylate (M-O) bonds to separate nodes from organic linkers.

rule = MetalCarboxylateRule(rule_action=RuleAction.PREFER_BREAK)

PolymerBackboneRule

Heuristic for identifying polymer backbone bonds.

rule = PolymerBackboneRule()

Creating Custom Rules

Extend FragmentationRule to create custom rules:

from autofragment.rules import FragmentationRule, RuleAction

class MyCustomRule(FragmentationRule):
    """Custom rule for my specific system."""
    
    name = "my_custom_rule"  # Required: unique identifier
    
    def __init__(self, priority=None):
        super().__init__(priority=priority or self.PRIORITY_MEDIUM)
    
    def applies_to(self, bond, system):
        """Return True if this rule applies to the bond."""
        graph = system.to_graph()
        atom1 = graph.get_atom(bond[0])
        atom2 = graph.get_atom(bond[1])
        
        # Your logic here
        return atom1["element"] == "X" and atom2["element"] == "Y"
    
    def action(self):
        """Return the action when this rule applies."""
        return RuleAction.MUST_NOT_BREAK

RuleEngine API

The RuleEngine manages rules and evaluates bonds:

engine = RuleEngine()

# Add/remove rules
engine.add_rule(AromaticRingRule())
engine.remove_rule("aromatic_ring")  # By name
engine.clear_rules()

# Query rules
rules = engine.get_rules()  # Sorted by priority
print(len(engine))  # Number of rules

# Evaluate bonds
action = engine.evaluate_bond((0, 1), system)
all_actions = engine.get_bond_actions(system)

# Find fragmentable bonds
breakable = engine.get_breakable_bonds(system)  # Not MUST_NOT_BREAK
preferred = engine.get_preferred_breaks(system)  # PREFER_BREAK only

Best Practices

  1. Start with critical rules: Always include AromaticRingRule and DoubleBondRule

  2. Be explicit about biological rules: Choose appropriate actions for your fragmentation strategy

  3. Layer rules by priority: Use CRITICAL for absolute constraints, HIGH for strong preferences

  4. Test with simple systems: Verify rule behavior on small molecules before applying to large systems

  5. Log rule decisions: Check which rules are triggering for debugging

Example: Protein Fragmentation

from autofragment.rules import (
    RuleEngine,
    AromaticRingRule,
    DoubleBondRule,
    PeptideBondRule,
    DisulfideBondRule,
    ProlineRingRule,
    AlphaBetaCarbonRule,
)

# FMO-style: fragment at alpha carbons, keep peptide bonds
fmo_engine = RuleEngine([
    AromaticRingRule(),                           # Protect aromatic sidechains
    DoubleBondRule(),                             # Protect double bonds
    ProlineRingRule(),                            # Protect proline rings
    DisulfideBondRule(),                          # Keep disulfides
    PeptideBondRule(rule_action=RuleAction.PREFER_KEEP),
    AlphaBetaCarbonRule(),                        # Break at C-alpha
])

# Get fragmentation points
breakable = fmo_engine.get_breakable_bonds(protein_system)
preferred = fmo_engine.get_preferred_breaks(protein_system)

Example: MOF Fragmentation

from autofragment.rules import (
    RuleEngine,
    AromaticRingRule,
    DoubleBondRule,
    MetalCoordinationRule,
    MOFLinkerRule,
    MetalNodeRule,
)

mof_engine = RuleEngine([
    AromaticRingRule(),        # Protect aromatic linkers
    DoubleBondRule(),          # Protect carboxylate C=O
    MOFLinkerRule(),           # Keep organic linkers intact
    MetalNodeRule(),           # Keep metal clusters intact
    # Metal-linker bonds will be the break points
])