prxteinmpnn.io#
Utilities for processing structure and trajectory files.
- prxteinmpnn.io.string_key_to_index(string_keys, key_map, unk_index=None)[source]#
Convert string keys to integer indices based on a mapping.
Efficient vectorized implementation to convert a 1D array of string keys to a 1D array of integer indices using a provided mapping. If a key is not found in the mapping, it is replaced with a specified unknown index.
- Parameters:
- Return type:
Array
- Returns:
A 1D array of integer indices corresponding to the string keys.
- prxteinmpnn.io.string_to_protein_sequence(sequence, aa_map=None, unk_index=None)[source]#
Convert a string sequence to a ProteinSequence.
- Parameters:
sequence (
str
) – A string containing the protein sequence.aa_map (
dict
|None
) – A dictionary mapping amino acid names to integer indices. If None, uses the default restype_order mapping.unk_index (
int
|None
) – The index to use for unknown amino acids not found in the mapping. If None, uses unk_restype_index.
- Return type:
Int[Array, 'num_residues']
- Returns:
A ProteinSequence containing the amino acid type indices corresponding to the input string.
- prxteinmpnn.io.protein_sequence_to_string(sequence, aa_map=None)[source]#
Convert a ProteinSequence to a string.
- Parameters:
- Return type:
- Returns:
A string representation of the protein sequence.
- prxteinmpnn.io.residue_names_to_aatype(residue_names, aa_map=None)[source]#
Convert 3-letter residue names to amino acid type indices.
- Parameters:
- Return type:
Int[Array, 'num_residues']
- Returns:
A 1D array of amino acid type indices corresponding to the residue names.
- prxteinmpnn.io.atom_names_to_index(atom_names, atom_map=None)[source]#
Convert atom names to atom type indices.
- Parameters:
- Return type:
Int[Array, 'num_residues']
- Returns:
A 1D array of atom type indices corresponding to the atom names.
- prxteinmpnn.io._check_atom_array_length(atom_array)[source]#
Check if the AtomArray has a valid length.
- Parameters:
atom_array (
AtomArray
) – The AtomArray to check.- Raises:
ValueError – If the AtomArray is empty.
- Return type:
- prxteinmpnn.io._get_chain_index(atom_array)[source]#
Get the chain index from the AtomArray.
- Return type:
Int[Array, 'num_residues num_atoms']
- Parameters:
atom_array (AtomArray)
- prxteinmpnn.io._process_chain_id(atom_array, chain_id=None)[source]#
Process the chain_id of the AtomArray.
- prxteinmpnn.io._fill_in_cb_coordinates(coords_37, residue_names, atom_map=None)[source]#
Fill in the CB coordinates for residues that have them.
- Parameters:
coords_37 (
Array
) – A 2D array of shape (N, 37, 3) containing the coordinates of the atoms.residue_names (
ndarray
) – A 1D array of residue names corresponding to the coordinates.atom_map (
dict
[str
,int
] |None
) – A dictionary mapping residue names to their atom indices. If None, uses the default atom_order mapping.
- Return type:
Array
- Returns:
- A 2D array of shape (N, 37, 3) with the C-beta coordinates filled in for residues that have
them.
- For glycine residues, the C-beta coordinates are computed precisely based on the N, CA, and C
atoms.
For other residues, the original C-beta coordinates are retained if they exist.
- NOTE: This is not part of the pipeline, as despite this happening in the original code, it is
bypassed during feature extraction.
- prxteinmpnn.io.process_atom_array(atom_array, atom_map=None, chain_id=None)[source]#
Process an AtomArray to create a ProteinStructure.
- prxteinmpnn.io.from_structure_file(file_path, model=1, chain_id=None)[source]#
Construct a Protein object from a structure file (PDB, PDBx/mmCIF).
This implementation uses biotite for robust parsing and JAX for efficient vectorized processing to create a dense, fixed-size representation for each residue (37 atoms).
- WARNING: All non-standard residue types will be converted into UNK. All
atoms not in the canonical 37-atom set will be ignored.
- Parameters:
- Return type:
- Returns:
A new ProteinStructure parsed from the file contents.
- prxteinmpnn.io.from_trajectory(trajectory_file, topology_file=None, chain_id=None)[source]#
Construct ProteinStructure objects from a trajectory file.
This function reads a trajectory and yields a ProteinStructure for each frame.
- Parameters:
trajectory_file (
str
) – Path to the trajectory file (e.g., DCD, XTC, multi-model PDB).topology_file (
str
|None
) – Path to the topology file (e.g., PDB, PSF), required for coordinate-only trajectory formats.chain_id (
str
|Sequence
[str
] |None
) – If specified, only atoms from this chain will be included.
- Return type:
- Returns:
An iterator that yields a ProteinStructure for each frame in the trajectory.
- prxteinmpnn.io.from_string(pdb_string, model=1, chain_id=None)[source]#
Construct a ProteinStructure from a PDB string.
- Parameters:
- Return type:
- Returns:
A new ProteinStructure parsed from the PDB string.
- prxteinmpnn.io.protein_structure_to_model_inputs(protein_structure, bias=None)[source]#
Convert a ProteinStructure to model inputs.
- Parameters:
protein_structure (
ProteinStructure
) – A ProteinStructure object containing the structure data.bias (
Float[Array, 'num_residues num_classes']
|None
) – An optional InputBias jnp.ndarray with shape (num_residues, 20) containingDefault (bias information. This will shift output probabilities for each residue.)
zero. (to)
- Return type:
- Returns:
A ModelInputs containing the model inputs derived from the ProteinStructure.