foldcomp_utils#

Utilities for processing and manipulating protein structures from foldcomp.

prxteinmpnn.utils.foldcomp_utils

prxteinmpnn.utils.foldcomp_utils._setup_foldcomp_database(database)[source]#

Set up the FoldComp database synchronously.

This is designed to be called from within a synchronous worker process.

Return type:

None

Parameters:

database (Literal['esmatlas', 'esmatlas_v2023_02', 'highquality_clust30', 'afdb_uniprot_v4', 'afdb_swissprot_v4', 'afdb_rep_v4', 'afdb_rep_dark_v4', 'afdb_h_sapiens', 'a_thaliana', 'c_albicans', 'c_elegans', 'd_discoideum', 'd_melanogaster', 'd_rerio', 'e_coli', 'g_max', 'm_jannaschii', 'm_musculus', 'o_sativa', 'r_norvegicus', 's_cerevisiae', 's_pombe', 'z_mays'])

prxteinmpnn.utils.foldcomp_utils.get_protein_structures(protein_ids, database=None)[source]#

Retrieve protein structures from the FoldComp database and return them as a list of ensembles.

This is a synchronous, blocking function designed to be run in an executor.

Parameters:
  • protein_ids (Sequence[str]) – A sequence of protein IDs to retrieve.

  • database (Optional[Literal['esmatlas', 'esmatlas_v2023_02', 'highquality_clust30', 'afdb_uniprot_v4', 'afdb_swissprot_v4', 'afdb_rep_v4', 'afdb_rep_dark_v4', 'afdb_h_sapiens', 'a_thaliana', 'c_albicans', 'c_elegans', 'd_discoideum', 'd_melanogaster', 'd_rerio', 'e_coli', 'g_max', 'm_jannaschii', 'm_musculus', 'o_sativa', 'r_norvegicus', 's_cerevisiae', 's_pombe', 'z_mays']]) – The FoldCompDatabase to use.

Return type:

Generator[ProteinTuple, None]

Returns:

A list of ProteinEnsemble objects. Each ensemble contains the structure(s) for one of the requested protein IDs.