sample_data#
- autoplex.data.common.jobs.sample_data(selection_method='random', num_of_selection=5, bcur_params=None, dir=None, structure=None, traj_path=None, isolated_atom_energies=None, random_seed=None, remove_traj_files=False)[source]#
Job to sample training configurations from trajectories of MD/RSS.
- Parameters:
selection_method (Literal['cur', 'bcur1s', 'bcur2s', 'random', 'uniform']) –
- Method for selecting samples. Options include:
’cur’: Pure CUR selection.
- ’bcur’: Boltzmann flat histogram in enthalpy, then CUR.
’bcur1s’: Execute bcur with one shot (1s)
’bcur2i’: Execute bcur with two iterations (2i)
’random’: Random selection.
’uniform’: Uniform selection.
num_of_selection (int) – Number of structures to be sampled.
bcur_params (dict) –
Parameters for Boltzmann CUR selection. The default dictionary includes: - ‘soap_paras’: SOAP descriptor parameters:
’l_max’: int, Maximum degree of spherical harmonics (default 12).
’n_max’: int, Maximum number of radial basis functions (default 12).
’atom_sigma’: float, Width of Gaussian smearing (default 0.0875).
’cutoff’: float, Radial cutoff distance (default 10.5).
’cutoff_transition_width’: float, Width of the transition region (default 1.0).
’zeta’: float, Exponent for dot-product SOAP kernel (default 4.0).
’average’: bool, Whether to average the SOAP vectors (default True).
’species’: bool, Whether to consider species information (default True).
’kt’: float, Temperature in eV for Boltzmann weighting (default 0.3).
’frac_of_bcur’: float, Fraction of Boltzmann CUR selections (default 0.8).
’bolt_max_num’: int, Maximum number of Boltzmann selections (default 3000).
’kernel_exp’: float, Exponent for the kernel (default 4.0).
’energy_label’: str, Label for the energy data (default ‘energy’).
dir (str) – Directory containing trajectory files for MD/RSS simulations. Default is None.
structure (list[Structure]) – List of structures for sampling. Default is None.
traj_path (list[list[str]]) – List of lists containing trajectory paths. Default is None.
isolated_atom_energies (dict) – Dictionary of isolated energy values for species. Required for ‘boltzhist_cur’ selection method. Default is None.
random_seed (int, optional) – Seed for random number generation, ensuring reproducibility of sampling.
remove_traj_files (bool) – Remove all trajectory files raised by RSS to save memory
- Returns:
The selected atoms.
- Return type:
list of ase.Atoms