Contents

PrxteinMPNN: A functional interface for ProteinMPNN.

class prxteinmpnn.JacobianSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, noise_batch_size=1, jacobian_batch_size=16, combine=False, combine_batch_size=8, combine_noise_batch_size=1, combine_weights=None, combine_fn=None, combine_fn_kwargs=None, output_h5_path=None, compute_apc=True, apc_batch_size=8, apc_residue_batch_size=1000)[source]#

Bases: RunSpecification

Configuration for computing categorical Jacobians.

Parameters:
  • inputs (Sequence[str | StringIO] | str | StringIO)

  • topology (str | Path | None)

  • model_weights (ModelWeights)

  • model_version (ModelVersion)

  • batch_size (int)

  • backbone_noise (Sequence[float] | float)

  • foldcomp_database (FoldCompDatabase | None)

  • num_workers (int)

  • ar_mask (None | ArrayLike)

  • random_seed (int)

  • chain_id (Sequence[str] | str | None)

  • model (int | None)

  • altloc (Literal['first', 'all'])

  • decoding_order_fn (DecodingOrderFn | None)

  • conformational_states (ConformationalStates | None)

  • noise_batch_size (int)

  • jacobian_batch_size (int)

  • combine (bool)

  • combine_batch_size (int)

  • combine_noise_batch_size (int)

  • combine_weights (ArrayLike | None)

  • combine_fn (CombineCatJacPairFn | None)

  • combine_fn_kwargs (dict[str, Any] | None)

  • output_h5_path (str | Path | None)

  • compute_apc (bool)

  • apc_batch_size (int)

  • apc_residue_batch_size (int)

apc_batch_size: int = 8#
apc_residue_batch_size: int = 1000#
combine: bool = False#
combine_batch_size: int = 8#
combine_fn: CombineCatJacPairFn | None = None#
combine_fn_kwargs: dict[str, Any] | None = None#
combine_noise_batch_size: int = 1#
combine_weights: ArrayLike | None = None#
compute_apc: bool = True#
jacobian_batch_size: int = 16#
noise_batch_size: int = 1#
output_h5_path: str | Path | None = None#
class prxteinmpnn.RunSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None)[source]#

Bases: object

Configuration for running the model.

Parameters:
  • inputs (Sequence[str | StringIO] | str | StringIO)

  • topology (str | Path | None)

  • model_weights (ModelWeights)

  • model_version (ModelVersion)

  • batch_size (int)

  • backbone_noise (Sequence[float] | float)

  • foldcomp_database (FoldCompDatabase | None)

  • num_workers (int)

  • ar_mask (None | ArrayLike)

  • random_seed (int)

  • chain_id (Sequence[str] | str | None)

  • model (int | None)

  • altloc (Literal['first', 'all'])

  • decoding_order_fn (DecodingOrderFn | None)

  • conformational_states (ConformationalStates | None)

inputs#

A sequence of input file paths or StringIO objects, or a single input.

model_weights#

The model weights to use (default is “original”).

model_version#

The model version to use (default is “v_48_020.pkl”).

batch_size#

The batch size to use (default is 32).

backbone_noise#

The backbone noise levels to use (default is (0.0,)). Can be a single float or a sequence of floats.

foldcomp_database#

An optional path to a FoldComp database (default is None).

num_workers#

The number of worker processes for data loading (default is 0).

ar_mask#

An optional array-like mask for autoregressive positions (default is None).

random_seed#

The random seed to use (default is 42).

chain_id#

An optional chain ID to use (default is None).

model#

An optional model ID to use (default is None).

altloc#

The alternate location to use (default is “first”).

decoding_order_fn#

An optional function to generate the decoding order (default is None).

conformational_states#

ConformationalStates to use for coarse graining the inference.

altloc: Literal['first', 'all'] = 'first'#
ar_mask: None | ArrayLike = None#
backbone_noise: Sequence[float] | float = (0.0,)#
batch_size: int = 32#
chain_id: Sequence[str] | str | None = None#
conformational_states: ConformationalStates | None = None#
decoding_order_fn: DecodingOrderFn | None = None#
foldcomp_database: FoldCompDatabase | None = None#
model: int | None = None#
model_version: ModelVersion = 'v_48_020.pkl'#
model_weights: ModelWeights = 'original'#
num_workers: int = 0#
random_seed: int = 42#
topology: str | Path | None = None#
inputs: Sequence[str | StringIO] | str | StringIO#
class prxteinmpnn.SamplingSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, num_samples=1, sampling_strategy='temperature', temperature=0.1, bias=None, fixed_positions=None, iterations=None, learning_rate=None, output_h5_path=None)[source]#

Bases: RunSpecification

Configuration for sampling sequences.

Parameters:
  • inputs (Sequence[str | StringIO] | str | StringIO)

  • topology (str | Path | None)

  • model_weights (ModelWeights)

  • model_version (ModelVersion)

  • batch_size (int)

  • backbone_noise (Sequence[float] | float)

  • foldcomp_database (FoldCompDatabase | None)

  • num_workers (int)

  • ar_mask (None | ArrayLike)

  • random_seed (int)

  • chain_id (Sequence[str] | str | None)

  • model (int | None)

  • altloc (Literal['first', 'all'])

  • decoding_order_fn (DecodingOrderFn | None)

  • conformational_states (ConformationalStates | None)

  • num_samples (int)

  • sampling_strategy (Literal['temperature', 'straight_through'])

  • temperature (float)

  • bias (ArrayLike | None)

  • fixed_positions (ArrayLike | None)

  • iterations (int | None)

  • learning_rate (float | None)

  • output_h5_path (str | Path | None)

bias: ArrayLike | None = None#
fixed_positions: ArrayLike | None = None#
iterations: int | None = None#
learning_rate: float | None = None#
num_samples: int = 1#
output_h5_path: str | Path | None = None#
sampling_strategy: Literal['temperature', 'straight_through'] = 'temperature'#
temperature: float = 0.1#
class prxteinmpnn.ScoringSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, sequences_to_score=(), temperature=1.0, return_logits=False, return_decoding_orders=False, return_all_scores=False, score_batch_size=16, output_h5_path=None)[source]#

Bases: RunSpecification

Configuration for scoring sequences.

Parameters:
  • inputs (Sequence[str | StringIO] | str | StringIO)

  • topology (str | Path | None)

  • model_weights (ModelWeights)

  • model_version (ModelVersion)

  • batch_size (int)

  • backbone_noise (Sequence[float] | float)

  • foldcomp_database (FoldCompDatabase | None)

  • num_workers (int)

  • ar_mask (None | ArrayLike)

  • random_seed (int)

  • chain_id (Sequence[str] | str | None)

  • model (int | None)

  • altloc (Literal['first', 'all'])

  • decoding_order_fn (DecodingOrderFn | None)

  • conformational_states (ConformationalStates | None)

  • sequences_to_score (Sequence[str])

  • temperature (float)

  • return_logits (bool)

  • return_decoding_orders (bool)

  • return_all_scores (bool)

  • score_batch_size (int)

  • output_h5_path (str | Path | None)

sequences_to_score#

A sequence of amino acid sequences to score.

temperature#

The temperature for scoring (default is 1.0).

return_logits#

Whether to return the raw logits (default is False).

return_decoding_orders#

Whether to return decoding orders (default is False).

return_all_scores#

Whether to return scores for all sequences (default is False).

score_batch_size#

The batch size for scoring sequences (default is 16).

output_h5_path#

Optional path to an HDF5 file for streaming output.

output_h5_path: str | Path | None = None#
return_all_scores: bool = False#
return_decoding_orders: bool = False#
return_logits: bool = False#
score_batch_size: int = 16#
sequences_to_score: Sequence[str] = ()#
temperature: float = 1.0#
prxteinmpnn.categorical_jacobian(spec=None, **kwargs)[source]#

Compute the Jacobian of the model’s logits with respect to the input sequence.

Parameters:
  • spec (JacobianSpecification | None) – An optional JacobianConfig object. If None, a default will be created using

  • kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.

  • set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.

  • **kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Array | dict[str, JacobianSpecification] | None]

Returns:

A dictionary containing the Jacobian tensor and metadata.

prxteinmpnn.sample(spec=None, **kwargs)[source]#

Sample new sequences for the given input structures.

This function uses a high-performance Grain pipeline to load and process structures, then samples new sequences for each structure.

Parameters:
  • spec (SamplingSpecification | None) – An optional SamplingSpecification object. If None, a default will be created using

  • kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.

  • set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.

  • **kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Any]

Returns:

A dictionary containing sampled sequences, logits, and metadata.

prxteinmpnn.score(spec=None, **kwargs)[source]#

Score all provided sequences against all input structures.

This function uses a high-performance Grain pipeline to load and process structures, then scores all provided sequences against each structure.

Parameters:
  • spec (ScoringSpecification | None) – An optional ScoringSpecification object. If None, a default will be created using

  • kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.

  • set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.

  • **kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Any]

Returns:

A dictionary containing scores, logits, and metadata.