<no title>

class prxteinmpnn.JacobianSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, noise_batch_size=1, jacobian_batch_size=16, combine=False, combine_batch_size=8, combine_noise_batch_size=1, combine_weights=None, combine_fn=None, combine_fn_kwargs=None, output_h5_path=None, compute_apc=True, apc_batch_size=8, apc_residue_batch_size=1000)[source]#

Bases: RunSpecification

Configuration for computing categorical Jacobians.

Parameters:

inputs (Sequence[str | StringIO] | str | StringIO)
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
backbone_noise (Sequence[float] | float)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
chain_id (Sequence[str] | str | None)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
noise_batch_size (int)
jacobian_batch_size (int)
combine (bool)
combine_batch_size (int)
combine_noise_batch_size (int)
combine_weights (ArrayLike | None)
combine_fn (CombineCatJacPairFn | None)
combine_fn_kwargs (dict[str, Any] | None)
output_h5_path (str | Path | None)
compute_apc (bool)
apc_batch_size (int)
apc_residue_batch_size (int)

apc_batch_size: int = 8#

apc_residue_batch_size: int = 1000#

combine: bool = False#

combine_batch_size: int = 8#

combine_fn: CombineCatJacPairFn | None = None#

combine_fn_kwargs: dict[str, Any] | None = None#

combine_noise_batch_size: int = 1#

combine_weights: ArrayLike | None = None#

compute_apc: bool = True#

jacobian_batch_size: int = 16#

noise_batch_size: int = 1#

output_h5_path: str | Path | None = None#

class prxteinmpnn.RunSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None)[source]#

Bases: object

Configuration for running the model.

Parameters:

inputs (Sequence[str | StringIO] | str | StringIO)
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
backbone_noise (Sequence[float] | float)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
chain_id (Sequence[str] | str | None)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)

inputs#: A sequence of input file paths or StringIO objects, or a single input.

model_weights#: The model weights to use (default is “original”).

model_version#: The model version to use (default is “v_48_020.pkl”).

batch_size#: The batch size to use (default is 32).

backbone_noise#: The backbone noise levels to use (default is (0.0,)). Can be a single float or a sequence of floats.

foldcomp_database#: An optional path to a FoldComp database (default is None).

num_workers#: The number of worker processes for data loading (default is 0).

ar_mask#: An optional array-like mask for autoregressive positions (default is None).

random_seed#: The random seed to use (default is 42).

chain_id#: An optional chain ID to use (default is None).

model#: An optional model ID to use (default is None).

altloc#: The alternate location to use (default is “first”).

decoding_order_fn#: An optional function to generate the decoding order (default is None).

conformational_states#: ConformationalStates to use for coarse graining the inference.

altloc: Literal['first', 'all'] = 'first'#

ar_mask: None | ArrayLike = None#

backbone_noise: Sequence[float] | float = (0.0,)#

batch_size: int = 32#

chain_id: Sequence[str] | str | None = None#

conformational_states: ConformationalStates | None = None#

decoding_order_fn: DecodingOrderFn | None = None#

foldcomp_database: FoldCompDatabase | None = None#

model: int | None = None#

model_version: ModelVersion = 'v_48_020.pkl'#

model_weights: ModelWeights = 'original'#

num_workers: int = 0#

random_seed: int = 42#

topology: str | Path | None = None#

inputs: Sequence[str | StringIO] | str | StringIO#

class prxteinmpnn.SamplingSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, num_samples=1, sampling_strategy='temperature', temperature=0.1, bias=None, fixed_positions=None, iterations=None, learning_rate=None, output_h5_path=None)[source]#

Bases: RunSpecification

Configuration for sampling sequences.

Parameters:

inputs (Sequence[str | StringIO] | str | StringIO)
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
backbone_noise (Sequence[float] | float)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
chain_id (Sequence[str] | str | None)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
num_samples (int)
sampling_strategy (Literal['temperature', 'straight_through'])
temperature (float)
bias (ArrayLike | None)
fixed_positions (ArrayLike | None)
iterations (int | None)
learning_rate (float | None)
output_h5_path (str | Path | None)

bias: ArrayLike | None = None#

fixed_positions: ArrayLike | None = None#

iterations: int | None = None#

learning_rate: float | None = None#

num_samples: int = 1#

output_h5_path: str | Path | None = None#

sampling_strategy: Literal['temperature', 'straight_through'] = 'temperature'#

temperature: float = 0.1#

class prxteinmpnn.ScoringSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, sequences_to_score=(), temperature=1.0, return_logits=False, return_decoding_orders=False, return_all_scores=False, score_batch_size=16, output_h5_path=None)[source]#

Bases: RunSpecification

Configuration for scoring sequences.

Parameters:

inputs (Sequence[str | StringIO] | str | StringIO)
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
backbone_noise (Sequence[float] | float)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
chain_id (Sequence[str] | str | None)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
sequences_to_score (Sequence[str])
temperature (float)
return_logits (bool)
return_decoding_orders (bool)
return_all_scores (bool)
score_batch_size (int)
output_h5_path (str | Path | None)

sequences_to_score#: A sequence of amino acid sequences to score.

temperature#: The temperature for scoring (default is 1.0).

return_logits#: Whether to return the raw logits (default is False).

return_decoding_orders#: Whether to return decoding orders (default is False).

return_all_scores#: Whether to return scores for all sequences (default is False).

score_batch_size#: The batch size for scoring sequences (default is 16).

output_h5_path#: Optional path to an HDF5 file for streaming output.

output_h5_path: str | Path | None = None#

return_all_scores: bool = False#

return_decoding_orders: bool = False#

return_logits: bool = False#

score_batch_size: int = 16#

sequences_to_score: Sequence[str] = ()#

temperature: float = 1.0#

prxteinmpnn.categorical_jacobian(spec=None, **kwargs)[source]#

Compute the Jacobian of the model’s logits with respect to the input sequence.

Parameters:

spec (JacobianSpecification | None) – An optional JacobianConfig object. If None, a default will be created using
kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.
set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.
**kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Array | dict[str, JacobianSpecification] | None]

Returns:

A dictionary containing the Jacobian tensor and metadata.

prxteinmpnn.sample(spec=None, **kwargs)[source]#

Sample new sequences for the given input structures.

This function uses a high-performance Grain pipeline to load and process structures, then samples new sequences for each structure.

Parameters:

spec (SamplingSpecification | None) – An optional SamplingSpecification object. If None, a default will be created using
kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
**kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Any]

Returns:

A dictionary containing sampled sequences, logits, and metadata.

prxteinmpnn.score(spec=None, **kwargs)[source]#

Score all provided sequences against all input structures.

This function uses a high-performance Grain pipeline to load and process structures, then scores all provided sequences against each structure.

Parameters:

spec (ScoringSpecification | None) – An optional ScoringSpecification object. If None, a default will be created using
kwargs (Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
**kwargs – Additional keyword arguments for structure loading.

Return type:

dict[str, Any]

Returns:

A dictionary containing scores, logits, and metadata.

Contents