data_structures#

Dataclasses for the PrxteinMPNN project.

prxteinmpnn.utils.data_structures

class prxteinmpnn.utils.data_structures.ProteinTuple(coordinates, aatype, atom_mask, residue_index, chain_index, full_coordinates=None, dihedrals=None, source=None, mapping=None)[source]#

Bases: NamedTuple

Tuple-based protein structure representation.

Parameters:
  • coordinates (np.ndarray)

  • aatype (np.ndarray)

  • atom_mask (np.ndarray)

  • residue_index (np.ndarray)

  • chain_index (np.ndarray)

  • full_coordinates (np.ndarray | None)

  • dihedrals (np.ndarray | None)

  • source (str | None)

  • mapping (np.ndarray | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:

StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0

Type:

ProteinSequence

and 20,

where 20 is ‘X’. Shape is [num_res].

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:

AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:

ResidueIndex

chain_index#

Chain index for each residue. Shape is [num_res].

Type:

ChainIndex

dihedrals#

Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.

Type:

BackboneDihedrals | None

coordinates: np.ndarray#

Alias for field number 0

aatype: np.ndarray#

Alias for field number 1

atom_mask: np.ndarray#

Alias for field number 2

residue_index: np.ndarray#

Alias for field number 3

chain_index: np.ndarray#

Alias for field number 4

full_coordinates: np.ndarray | None#

Alias for field number 5

dihedrals: np.ndarray | None#

Alias for field number 6

source: str | None#

Alias for field number 7

mapping: np.ndarray | None#

Alias for field number 8

class prxteinmpnn.utils.data_structures.TrajectoryStaticFeatures(aatype, static_atom_mask_37, residue_indices, chain_index, valid_atom_mask, nitrogen_mask, num_residues)[source]#

Bases: object

A container for pre-computed, frame-invariant protein features.

Parameters:
  • aatype (np.ndarray)

  • static_atom_mask_37 (np.ndarray)

  • residue_indices (np.ndarray)

  • chain_index (np.ndarray)

  • valid_atom_mask (np.ndarray)

  • nitrogen_mask (np.ndarray)

  • num_residues (int)

aatype: ndarray#
static_atom_mask_37: ndarray#
residue_indices: ndarray#
chain_index: ndarray#
valid_atom_mask: ndarray#
nitrogen_mask: ndarray#
num_residues: int#
class prxteinmpnn.utils.data_structures.Protein(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None, full_coordinates=None)[source]#

Bases: object

Protein structure or ensemble representation.

Parameters:
  • coordinates (Float[Array, 'num_residues num_atoms 3'])

  • aatype (Int[Array, 'num_residues'])

  • one_hot_sequence (Float[Array, 'num_residues num_classes'])

  • atom_mask (Int[Array, 'num_residues num_atoms'])

  • residue_index (Int[Array, 'num_residues'])

  • chain_index (Int[Array, 'num_residues'])

  • dihedrals (Float[Array, 'num_residues 3'] | None)

  • mapping (Int | None)

  • full_coordinates (Float[Array, 'num_residues num_atoms 3'] | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:

StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].

Type:

Sequence

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:

AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:

AtomResidueIndex

coordinates: Float[Array, 'num_residues num_atoms 3']#
aatype: Int[Array, 'num_residues']#
one_hot_sequence: Float[Array, 'num_residues num_classes']#
atom_mask: Int[Array, 'num_residues num_atoms']#
residue_index: Int[Array, 'num_residues']#
chain_index: Int[Array, 'num_residues']#
dihedrals: Float[Array, 'num_residues 3'] | None = None#
mapping: Int | None = None#
full_coordinates: Float[Array, 'num_residues num_atoms 3'] | None = None#
classmethod from_tuple(protein_tuple)[source]#

Create a Protein instance from a ProteinTuple.

Parameters:

protein_tuple (ProteinTuple) – The input protein tuple.

Returns:

The output protein dataclass.

Return type:

Protein

replace(**updates)#

Returns a new object replacing the specified fields with new values.

class prxteinmpnn.utils.data_structures.ProteinEnsemble(coordinates, aatype, one_hot_sequence, atom_mask, residue_index, chain_index, dihedrals=None, mapping=None)[source]#

Bases: object

Protein structure or ensemble representation.

Parameters:
  • coordinates (Float[Array, 'num_residues num_atoms 3'])

  • aatype (Int[Array, 'num_residues'])

  • one_hot_sequence (Float[Array, 'num_residues num_classes'])

  • atom_mask (Int[Array, 'num_residues num_atoms'])

  • residue_index (Int[Array, 'num_residues'])

  • chain_index (Int[Array, 'num_residues'])

  • dihedrals (Float[Array, 'num_residues 3'] | None)

  • mapping (Int | None)

coordinates#

Atom positions in the structure, represented as a 3D array. Cartesian coordinates of atoms in angstroms. The atom types correspond to residue_constants.atom_types, i.e. the first three are N, CA, CB. Shape is (num_res, num_atom_type, 3), where num_res is the number of residues, num_atom_type is the number of atom types (e.g., N, CA, CB, C, O), and 3 is the spatial dimension (x, y, z).

Type:

StructureAtomicCoordinates

aatype#

Amino-acid type for each residue represented as an integer between 0 and 20, where 20 is ‘X’. Shape is [num_res].

Type:

Sequence

atom_mask#

Binary float mask to indicate presence of a particular atom. 1.0 if an atom is present and 0.0 if not. This should be used for loss masking. Shape is [num_res, num_atom_type].

Type:

AtomMask

residue_index#

Residue index as used in PDB. It is not necessarily continuous or 0-indexed. Shape is [num_res].

Type:

AtomResidueIndex

chain_index#

Chain index for each residue. Shape is [num_res].

Type:

ChainIndex

dihedrals#

Dihedral angles for backbone atoms (phi, psi, omega). Shape is [num_res, 3]. If not provided, defaults to None.

Type:

BackboneDihedrals | None

mapping#

Optional array mapping residues in the ensemble to original structure indices. Shape is [num_res, num_frames]. If not provided, defaults to None.

Type:

jnp.Array | None

coordinates: Float[Array, 'num_residues num_atoms 3']#
aatype: Int[Array, 'num_residues']#
one_hot_sequence: Float[Array, 'num_residues num_classes']#
atom_mask: Int[Array, 'num_residues num_atoms']#
residue_index: Int[Array, 'num_residues']#
chain_index: Int[Array, 'num_residues']#
dihedrals: Float[Array, 'num_residues 3'] | None = None#
mapping: Int | None = None#
replace(**updates)#

Returns a new object replacing the specified fields with new values.