data_structures

data_structures#

Dataclasses for the PrxteinMPNN project.

prxteinmpnn.utils.data_structures

class prxteinmpnn.utils.data_structures.ProteinTuple(coordinates, aatype, atom_mask, residue_index, chain_index, full_coordinates=None, dihedrals=None, source=None, mapping=None)[source]#

Bases: NamedTuple

Tuple-based protein structure representation.

Parameters:

coordinates (np.ndarray)
aatype (np.ndarray)
atom_mask (np.ndarray)
residue_index (np.ndarray)
chain_index (np.ndarray)
full_coordinates (np.ndarray | None)
dihedrals (np.ndarray | None)
source (str | None)
mapping (np.ndarray | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:: StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0

Type:: ProteinSequence

and 20,: where 20 is ‘X’. Shape is [num_res].

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:: AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:: ResidueIndex

chain_index#

Chain index for each residue. Shape is [num_res].

Type:: ChainIndex

dihedrals#

Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.

Type:: BackboneDihedrals | None

coordinates: np.ndarray#: Alias for field number 0

aatype: np.ndarray#: Alias for field number 1

atom_mask: np.ndarray#: Alias for field number 2

residue_index: np.ndarray#: Alias for field number 3

chain_index: np.ndarray#: Alias for field number 4

full_coordinates: np.ndarray | None#: Alias for field number 5

dihedrals: np.ndarray | None#: Alias for field number 6

source: str | None#: Alias for field number 7

mapping: np.ndarray | None#: Alias for field number 8

class prxteinmpnn.utils.data_structures.TrajectoryStaticFeatures(aatype, static_atom_mask_37, residue_indices, chain_index, valid_atom_mask, nitrogen_mask, num_residues)[source]#

Bases: object

A container for pre-computed, frame-invariant protein features.

Parameters:

aatype (np.ndarray)
static_atom_mask_37 (np.ndarray)
residue_indices (np.ndarray)
chain_index (np.ndarray)
valid_atom_mask (np.ndarray)
nitrogen_mask (np.ndarray)
num_residues (int)

aatype: ndarray#

static_atom_mask_37: ndarray#

residue_indices: ndarray#

chain_index: ndarray#

valid_atom_mask: ndarray#

nitrogen_mask: ndarray#

num_residues: int#

class prxteinmpnn.utils.data_structures.Protein(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None, full_coordinates=None)[source]#

Bases: object

Protein structure or ensemble representation.

Parameters:

coordinates (Float[Array, 'num_residues num_atoms 3'])
aatype (Int[Array, 'num_residues'])
one_hot_sequence (Float[Array, 'num_residues num_classes'])
atom_mask (Int[Array, 'num_residues num_atoms'])
residue_index (Int[Array, 'num_residues'])
chain_index (Int[Array, 'num_residues'])
dihedrals (Float[Array, 'num_residues 3'] | None)
mapping (Int | None)
full_coordinates (Float[Array, 'num_residues num_atoms 3'] | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:: StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].

Type:: Sequence

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:: AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:: AtomResidueIndex

coordinates: Float[Array, 'num_residues num_atoms 3']#

aatype: Int[Array, 'num_residues']#

one_hot_sequence: Float[Array, 'num_residues num_classes']#

atom_mask: Int[Array, 'num_residues num_atoms']#

residue_index: Int[Array, 'num_residues']#

chain_index: Int[Array, 'num_residues']#

dihedrals: Float[Array, 'num_residues 3'] | None = None#

mapping: Int | None = None#

full_coordinates: Float[Array, 'num_residues num_atoms 3'] | None = None#

classmethod from_tuple(protein_tuple)[source]#

Create a Protein instance from a ProteinTuple.

Parameters:: protein_tuple (ProteinTuple) – The input protein tuple.
Returns:: The output protein dataclass.
Return type:: Protein

replace(**updates)#: Returns a new object replacing the specified fields with new values.

class prxteinmpnn.utils.data_structures.ProteinEnsemble(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None)[source]#

Bases: object

Protein structure or ensemble representation.

Parameters:

coordinates (Float[Array, 'num_residues num_atoms 3'])
aatype (Int[Array, 'num_residues'])
one_hot_sequence (Float[Array, 'num_residues num_classes'])
atom_mask (Int[Array, 'num_residues num_atoms'])
residue_index (Int[Array, 'num_residues'])
chain_index (Int[Array, 'num_residues'])
dihedrals (Float[Array, 'num_residues 3'] | None)
mapping (Int | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:: StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].

Type:: Sequence

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:: AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:: AtomResidueIndex

chain_index#

Chain index for each residue. Shape is [num_res].

Type:: ChainIndex

dihedrals#

Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.

Type:: BackboneDihedrals | None

mapping#

Optional array mapping residues in the ensemble to original structure indices. Shape is [num_res, num_frames]. If not provided, defaults to None.

Type:: jnp.Array | None

coordinates: Float[Array, 'num_residues num_atoms 3']#

aatype: Int[Array, 'num_residues']#

one_hot_sequence: Float[Array, 'num_residues num_classes']#

atom_mask: Int[Array, 'num_residues num_atoms']#

residue_index: Int[Array, 'num_residues']#

chain_index: Int[Array, 'num_residues']#

dihedrals: Float[Array, 'num_residues 3'] | None = None#

mapping: Int | None = None#

replace(**updates)#: Returns a new object replacing the specified fields with new values.

data_structures

Contents

data_structures#