PrxteinMPNN: A functional interface for ProteinMPNN.
- class prxteinmpnn.JacobianSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, noise_batch_size=1, jacobian_batch_size=16, combine=False, combine_batch_size=8, combine_noise_batch_size=1, combine_weights=None, combine_fn=None, combine_fn_kwargs=None, output_h5_path=None, compute_apc=True, apc_batch_size=8, apc_residue_batch_size=1000)[source]#
Bases:
RunSpecificationConfiguration for computing categorical Jacobians.
- Parameters:
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
noise_batch_size (int)
jacobian_batch_size (int)
combine (bool)
combine_batch_size (int)
combine_noise_batch_size (int)
combine_weights (ArrayLike | None)
combine_fn (CombineCatJacPairFn | None)
output_h5_path (str | Path | None)
compute_apc (bool)
apc_batch_size (int)
apc_residue_batch_size (int)
- apc_batch_size: int = 8#
- apc_residue_batch_size: int = 1000#
- combine: bool = False#
- combine_batch_size: int = 8#
- combine_fn: CombineCatJacPairFn | None = None#
- combine_fn_kwargs: dict[str, Any] | None = None#
- combine_noise_batch_size: int = 1#
- combine_weights: ArrayLike | None = None#
- compute_apc: bool = True#
- jacobian_batch_size: int = 16#
- noise_batch_size: int = 1#
- output_h5_path: str | Path | None = None#
- class prxteinmpnn.RunSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None)[source]#
Bases:
objectConfiguration for running the model.
- Parameters:
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
- inputs#
A sequence of input file paths or StringIO objects, or a single input.
- model_weights#
The model weights to use (default is “original”).
- model_version#
The model version to use (default is “v_48_020.pkl”).
- batch_size#
The batch size to use (default is 32).
- backbone_noise#
The backbone noise levels to use (default is (0.0,)). Can be a single float or a sequence of floats.
- foldcomp_database#
An optional path to a FoldComp database (default is None).
- num_workers#
The number of worker processes for data loading (default is 0).
- ar_mask#
An optional array-like mask for autoregressive positions (default is None).
- random_seed#
The random seed to use (default is 42).
- chain_id#
An optional chain ID to use (default is None).
- model#
An optional model ID to use (default is None).
- altloc#
The alternate location to use (default is “first”).
- decoding_order_fn#
An optional function to generate the decoding order (default is None).
- conformational_states#
ConformationalStates to use for coarse graining the inference.
- altloc: Literal['first', 'all'] = 'first'#
- ar_mask: None | ArrayLike = None#
- backbone_noise: Sequence[float] | float = (0.0,)#
- batch_size: int = 32#
- chain_id: Sequence[str] | str | None = None#
- conformational_states: ConformationalStates | None = None#
- decoding_order_fn: DecodingOrderFn | None = None#
- foldcomp_database: FoldCompDatabase | None = None#
- model: int | None = None#
- model_version: ModelVersion = 'v_48_020.pkl'#
- model_weights: ModelWeights = 'original'#
- num_workers: int = 0#
- random_seed: int = 42#
- topology: str | Path | None = None#
- inputs: Sequence[str | StringIO] | str | StringIO#
- class prxteinmpnn.SamplingSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, num_samples=1, sampling_strategy='temperature', temperature=0.1, bias=None, fixed_positions=None, iterations=None, learning_rate=None, output_h5_path=None)[source]#
Bases:
RunSpecificationConfiguration for sampling sequences.
- Parameters:
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
num_samples (int)
sampling_strategy (Literal['temperature', 'straight_through'])
temperature (float)
bias (ArrayLike | None)
fixed_positions (ArrayLike | None)
iterations (int | None)
learning_rate (float | None)
output_h5_path (str | Path | None)
- bias: ArrayLike | None = None#
- fixed_positions: ArrayLike | None = None#
- iterations: int | None = None#
- learning_rate: float | None = None#
- num_samples: int = 1#
- output_h5_path: str | Path | None = None#
- sampling_strategy: Literal['temperature', 'straight_through'] = 'temperature'#
- temperature: float = 0.1#
- class prxteinmpnn.ScoringSpecification(inputs, topology=None, model_weights='original', model_version='v_48_020.pkl', batch_size=32, backbone_noise=(0.0,), foldcomp_database=None, num_workers=0, ar_mask=None, random_seed=42, chain_id=None, model=None, altloc='first', decoding_order_fn=None, conformational_states=None, sequences_to_score=(), temperature=1.0, return_logits=False, return_decoding_orders=False, return_all_scores=False, score_batch_size=16, output_h5_path=None)[source]#
Bases:
RunSpecificationConfiguration for scoring sequences.
- Parameters:
topology (str | Path | None)
model_weights (ModelWeights)
model_version (ModelVersion)
batch_size (int)
foldcomp_database (FoldCompDatabase | None)
num_workers (int)
ar_mask (None | ArrayLike)
random_seed (int)
model (int | None)
altloc (Literal['first', 'all'])
decoding_order_fn (DecodingOrderFn | None)
conformational_states (ConformationalStates | None)
sequences_to_score (Sequence[str])
temperature (float)
return_logits (bool)
return_decoding_orders (bool)
return_all_scores (bool)
score_batch_size (int)
output_h5_path (str | Path | None)
- sequences_to_score#
A sequence of amino acid sequences to score.
- temperature#
The temperature for scoring (default is 1.0).
- return_logits#
Whether to return the raw logits (default is False).
- return_decoding_orders#
Whether to return decoding orders (default is False).
- return_all_scores#
Whether to return scores for all sequences (default is False).
- score_batch_size#
The batch size for scoring sequences (default is 16).
- output_h5_path#
Optional path to an HDF5 file for streaming output.
- output_h5_path: str | Path | None = None#
- return_all_scores: bool = False#
- return_decoding_orders: bool = False#
- return_logits: bool = False#
- score_batch_size: int = 16#
- sequences_to_score: Sequence[str] = ()#
- temperature: float = 1.0#
- prxteinmpnn.categorical_jacobian(spec=None, **kwargs)[source]#
Compute the Jacobian of the model’s logits with respect to the input sequence.
- Parameters:
spec (
JacobianSpecification|None) – An optional JacobianConfig object. If None, a default will be created usingkwargs (
Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. batch_size: The number of structures to process in a single batch. noise_batch_size: Batch size for noise levels in Jacobian computation. jacobian_batch_size: Inner batch size for Jacobian computation. combine_batch_size: Batch size for combining Jacobians. num_workers: Number of parallel workers for data loading. combine_fn: Function or string specifying how to combine Jacobian pairs (e.g., “add”, “subtract”). combine_fn_kwargs: Optional dictionary of keyword arguments for the combine function. combine_weights: Optional weights to use when combining Jacobians. combine: Whether to combine Jacobians across samples. output_h5_path: Optional path to an HDF5 file for streaming output. compute_apc: Whether to compute APC-corrected Frobenius norm.
**kwargs – Additional keyword arguments for structure loading.
- Return type:
- Returns:
A dictionary containing the Jacobian tensor and metadata.
- prxteinmpnn.sample(spec=None, **kwargs)[source]#
Sample new sequences for the given input structures.
This function uses a high-performance Grain pipeline to load and process structures, then samples new sequences for each structure.
- Parameters:
spec (
SamplingSpecification|None) – An optional SamplingSpecification object. If None, a default will be created usingkwargs (
Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. num_samples: The number of sequences to sample per structure/noise level. sampling_strategy: The sampling strategy to use. temperature: The sampling temperature. bias: An optional array to bias the logits. fixed_positions: An optional array of residue indices to keep fixed. iterations: Number of optimization iterations for “straight_through” sampling. learning_rate: Learning rate for “straight_through” sampling. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
**kwargs – Additional keyword arguments for structure loading.
- Return type:
- Returns:
A dictionary containing sampled sequences, logits, and metadata.
- prxteinmpnn.score(spec=None, **kwargs)[source]#
Score all provided sequences against all input structures.
This function uses a high-performance Grain pipeline to load and process structures, then scores all provided sequences against each structure.
- Parameters:
spec (
ScoringSpecification|None) – An optional ScoringSpecification object. If None, a default will be created usingkwargs (
Any) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.set (options are provided as keyword arguments. The following options can be) – inputs: A single or sequence of inputs (files, PDB IDs, etc.). sequences_to_score: A list of protein sequences (strings) to score. chain_id: Specific chain(s) to parse from the structure. model: The model number to load. If None, all models are loaded. altloc: The alternate location identifier to use. model_version: The model version to use. model_weights: The model weights to use. foldcomp_database: The FoldComp database to use for FoldComp IDs. random_seed: The random number generator key. backbone_noise: The amount of noise to add to the backbone. ar_mask: An optional array of shape (L, L) to mask out certain residue pairs. batch_size: The number of structures to process in a single batch. num_workers: Number of parallel workers for data loading.
**kwargs – Additional keyword arguments for structure loading.
- Return type:
- Returns:
A dictionary containing scores, logits, and metadata.