data_structures#
Dataclasses for the PrxteinMPNN project.
prxteinmpnn.utils.data_structures
- class prxteinmpnn.utils.data_structures.ProteinTuple(coordinates, aatype, atom_mask, residue_index, chain_index, full_coordinates=None, dihedrals=None, source=None, mapping=None)[source]#
Bases:
NamedTupleTuple-based protein structure representation.
- Parameters:
coordinates (np.ndarray)
aatype (np.ndarray)
atom_mask (np.ndarray)
residue_index (np.ndarray)
chain_index (np.ndarray)
full_coordinates (np.ndarray | None)
dihedrals (np.ndarray | None)
source (str | None)
mapping (np.ndarray | None)
- coordinates#
Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).
- Type:
StructureAtomicCoordinates
- aatype#
Amino-acid type for each residue represented as an integer between 0
- Type:
ProteinSequence
- and 20,
where 20 is ‘X’. Shape is [num_res].
- atom_mask#
Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].
- Type:
AtomMask
- residue_index#
Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].
- Type:
ResidueIndex
- chain_index#
Chain index for each residue. Shape is [num_res].
- Type:
ChainIndex
- dihedrals#
Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.
- Type:
BackboneDihedrals | None
- coordinates: np.ndarray#
Alias for field number 0
- aatype: np.ndarray#
Alias for field number 1
- atom_mask: np.ndarray#
Alias for field number 2
- residue_index: np.ndarray#
Alias for field number 3
- chain_index: np.ndarray#
Alias for field number 4
- full_coordinates: np.ndarray | None#
Alias for field number 5
- dihedrals: np.ndarray | None#
Alias for field number 6
- source: str | None#
Alias for field number 7
- mapping: np.ndarray | None#
Alias for field number 8
- class prxteinmpnn.utils.data_structures.TrajectoryStaticFeatures(aatype, static_atom_mask_37, residue_indices, chain_index, valid_atom_mask, nitrogen_mask, num_residues)[source]#
Bases:
objectA container for pre-computed, frame-invariant protein features.
- Parameters:
aatype (np.ndarray)
static_atom_mask_37 (np.ndarray)
residue_indices (np.ndarray)
chain_index (np.ndarray)
valid_atom_mask (np.ndarray)
nitrogen_mask (np.ndarray)
num_residues (int)
- class prxteinmpnn.utils.data_structures.Protein(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None, full_coordinates=None)[source]#
Bases:
objectProtein structure or ensemble representation.
- Parameters:
coordinates (Float[Array, 'num_residues num_atoms 3'])
aatype (Int[Array, 'num_residues'])
one_hot_sequence (Float[Array, 'num_residues num_classes'])
atom_mask (Int[Array, 'num_residues num_atoms'])
residue_index (Int[Array, 'num_residues'])
chain_index (Int[Array, 'num_residues'])
dihedrals (Float[Array, 'num_residues 3'] | None)
mapping (Int | None)
full_coordinates (Float[Array, 'num_residues num_atoms 3'] | None)
- coordinates#
Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).
- Type:
StructureAtomicCoordinates
- aatype#
Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].
- Type:
Sequence
- atom_mask#
Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].
- Type:
AtomMask
- residue_index#
Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].
- Type:
AtomResidueIndex
-
coordinates:
Float[Array, 'num_residues num_atoms 3']#
-
aatype:
Int[Array, 'num_residues']#
-
one_hot_sequence:
Float[Array, 'num_residues num_classes']#
-
atom_mask:
Int[Array, 'num_residues num_atoms']#
-
residue_index:
Int[Array, 'num_residues']#
-
chain_index:
Int[Array, 'num_residues']#
- classmethod from_tuple(protein_tuple)[source]#
Create a Protein instance from a ProteinTuple.
- Parameters:
protein_tuple (ProteinTuple) – The input protein tuple.
- Returns:
The output protein dataclass.
- Return type:
- replace(**updates)#
Returns a new object replacing the specified fields with new values.
- class prxteinmpnn.utils.data_structures.ProteinEnsemble(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None)[source]#
Bases:
objectProtein structure or ensemble representation.
- Parameters:
coordinates (Float[Array, 'num_residues num_atoms 3'])
aatype (Int[Array, 'num_residues'])
one_hot_sequence (Float[Array, 'num_residues num_classes'])
atom_mask (Int[Array, 'num_residues num_atoms'])
residue_index (Int[Array, 'num_residues'])
chain_index (Int[Array, 'num_residues'])
dihedrals (Float[Array, 'num_residues 3'] | None)
mapping (Int | None)
- coordinates#
Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).
- Type:
StructureAtomicCoordinates
- aatype#
Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].
- Type:
Sequence
- atom_mask#
Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].
- Type:
AtomMask
- residue_index#
Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].
- Type:
AtomResidueIndex
- chain_index#
Chain index for each residue. Shape is [num_res].
- Type:
ChainIndex
- dihedrals#
Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.
- Type:
BackboneDihedrals | None
- mapping#
Optional array mapping residues in the ensemble to original structure indices. Shape is [num_res, num_frames]. If not provided, defaults to None.
- Type:
jnp.Array | None
-
coordinates:
Float[Array, 'num_residues num_atoms 3']#
-
aatype:
Int[Array, 'num_residues']#
-
one_hot_sequence:
Float[Array, 'num_residues num_classes']#
-
atom_mask:
Int[Array, 'num_residues num_atoms']#
-
residue_index:
Int[Array, 'num_residues']#
-
chain_index:
Int[Array, 'num_residues']#
- replace(**updates)#
Returns a new object replacing the specified fields with new values.