Protein Fragmentation Guide
This guide covers the workflow for fragmenting protein structures using AutoFragment’s BioPartitioner.
Overview
Protein fragmentation presents unique challenges:
Peptide backbone must be handled carefully
Residue charges depend on pH
Disulfide bonds create cross-links
Water molecules around the protein need clustering
Prerequisites
from autofragment import ChemicalSystem, BioPartitioner
from autofragment.io.writers import write_gamess_fmo
Install the bio extra for mmCIF support:
pip install autofragment[bio]
Step 1: Partition from mmCIF
The BioPartitioner reads mmCIF files and partitions proteins into residue-based fragments with water clustering.
partitioner = BioPartitioner(
water_clusters=10,
water_cluster_method="kmeans",
random_state=42,
ph=7.4,
)
tree = partitioner.partition_file("1ubq.cif")
print(f"Created {len(tree.fragments)} fragments")
for frag in tree.fragments:
print(f" {frag.label}: {frag.n_atoms} atoms")
CLI Usage
autofragment bio --input protein.cif --output partitioned.json --water-clusters 10
Step 2: Generate FMO Input
from autofragment.io.writers import write_gamess_fmo
write_gamess_fmo(
tree.fragments,
"protein_fmo.inp",
method="MP2",
basis="6-31G*",
)
Step 3: Charge Assignment
pH-Dependent Charges
Residue charges depend on pH:
Residue |
pKa |
Charge at pH 7.4 |
|---|---|---|
Asp |
3.9 |
-1 |
Glu |
4.1 |
-1 |
His |
6.0 |
0 (may vary) |
Lys |
10.5 |
+1 |
Arg |
12.5 |
+1 |
Cys |
8.3 |
0 (unless in S-S) |
Tyr |
10.5 |
0 |
N-terminus |
9.7 |
+1 |
C-terminus |
2.0 |
-1 |
Complete Example
#!/usr/bin/env python
"""Complete protein fragmentation workflow."""
from autofragment import BioPartitioner
from autofragment.io.writers import write_gamess_fmo
def fragment_protein(cif_file, output_file):
# Configure and partition
partitioner = BioPartitioner(
water_clusters=10,
water_cluster_method="kmeans",
random_state=42,
ph=7.4,
)
tree = partitioner.partition_file(cif_file)
# Report
print(f"Created {len(tree.fragments)} fragments:")
for i, frag in enumerate(tree.fragments):
print(f" {i}: {frag.label}, {frag.n_atoms} atoms")
# Write FMO input
write_gamess_fmo(
tree.fragments,
output_file,
method="MP2",
basis="6-31G*",
)
print(f"\nWrote {output_file}")
if __name__ == "__main__":
fragment_protein("1ubq.cif", "ubiquitin_fmo.inp")
Troubleshooting
Common Issues
Incorrect charges: Check pH setting and histidine protonation states
Missing atoms: Ensure hydrogens are added before fragmentation (use --add-implicit-hydrogens flag)
FMO convergence: Try larger basis for initial guess
Next Steps
Output Formats: Alternative QC program formats
Algorithms Guide: Details on clustering methods