aa_convert#

Utility functions for converting between AlphaFold and ProteinMPNN amino acid orders.

prxteinmpnn.utils.aa_convert.af_to_mpnn(sequence)[source]#

Convert a sequence of integer indices from AlphaFold’s to ProteinMPNN’s alphabet order.

Return type:

Int[Array, 'num_residues']

Parameters:

sequence (Int[Array, 'num_residues'])

prxteinmpnn.utils.aa_convert.mpnn_to_af(sequence)[source]#

Convert a sequence of integer indices from ProteinMPNN’s to AlphaFold’s alphabet order.

Return type:

Int[Array, 'num_residues']

Parameters:

sequence (Int[Array, 'num_residues'])

prxteinmpnn.utils.aa_convert.string_key_to_index(string_keys, key_map, unk_index=None)[source]#

Convert string keys to integer indices based on a mapping.

Efficient vectorized implementation to convert a 1D array of string keys to a 1D array of integer indices using a provided mapping. If a key is not found in the mapping, it is replaced with a specified unknown index.

Parameters:
  • string_keys (ndarray) – A 1D array of string keys.

  • key_map (Mapping[str, int]) – A dictionary mapping string keys to integer indices.

  • unk_index (int | None) – The index to use for unknown keys not found in the mapping. If None, uses the length of the key_map as the unknown index.

Return type:

Array

Returns:

A 1D array of integer indices corresponding to the string keys.

prxteinmpnn.utils.aa_convert.string_to_protein_sequence(sequence, aa_map=None, unk_index=None)[source]#

Convert a string sequence to a ProteinSequence.

Parameters:
  • sequence (str) – A string containing the protein sequence.

  • aa_map (dict | None) – A dictionary mapping amino acid names to integer indices. If None, uses the default restype_order mapping.

  • unk_index (int | None) – The index to use for unknown amino acids not found in the mapping. If None, uses unk_restype_index.

Return type:

Int[Array, 'num_residues']

Returns:

A ProteinSequence containing the amino acid type indices corresponding to the input string.

prxteinmpnn.utils.aa_convert.protein_sequence_to_string(sequence, aa_map=None)[source]#

Convert a ProteinSequence to a string.

Parameters:
  • sequence (Int[Array, 'num_residues']) – A ProteinSequence containing amino acid type indices.

  • aa_map (dict | None) – A dictionary mapping amino acid type indices to their corresponding names. If None, uses the default restype_order mapping.

Return type:

str

Returns:

A string representation of the protein sequence.