sample_data

Contents

sample_data#

autoplex.data.common.jobs.sample_data(selection_method='random', num_of_selection=5, bcur_params=None, dir=None, structure=None, traj_path=None, isolated_atom_energies=None, random_seed=None, remove_traj_files=False)[source]#

Job to sample training configurations from trajectories of MD/RSS.

Parameters:
  • selection_method (Literal['cur', 'bcur1s', 'bcur2s', 'random', 'uniform']) –

    Method for selecting samples. Options include:
    • ’cur’: Pure CUR selection.

    • ’bcur’: Boltzmann flat histogram in enthalpy, then CUR.
      • ’bcur1s’: Execute bcur with one shot (1s)

      • ’bcur2i’: Execute bcur with two iterations (2i)

    • ’random’: Random selection.

    • ’uniform’: Uniform selection.

  • num_of_selection (int) – Number of structures to be sampled.

  • bcur_params (dict) –

    Parameters for Boltzmann CUR selection. The default dictionary includes: - ‘soap_paras’: SOAP descriptor parameters:

    • ’l_max’: int, Maximum degree of spherical harmonics (default 12).

    • ’n_max’: int, Maximum number of radial basis functions (default 12).

    • ’atom_sigma’: float, Width of Gaussian smearing (default 0.0875).

    • ’cutoff’: float, Radial cutoff distance (default 10.5).

    • ’cutoff_transition_width’: float, Width of the transition region (default 1.0).

    • ’zeta’: float, Exponent for dot-product SOAP kernel (default 4.0).

    • ’average’: bool, Whether to average the SOAP vectors (default True).

    • ’species’: bool, Whether to consider species information (default True).

    • ’kt’: float, Temperature in eV for Boltzmann weighting (default 0.3).

    • ’frac_of_bcur’: float, Fraction of Boltzmann CUR selections (default 0.8).

    • ’bolt_max_num’: int, Maximum number of Boltzmann selections (default 3000).

    • ’kernel_exp’: float, Exponent for the kernel (default 4.0).

    • ’energy_label’: str, Label for the energy data (default ‘energy’).

  • dir (str) – Directory containing trajectory files for MD/RSS simulations. Default is None.

  • structure (list[Structure]) – List of structures for sampling. Default is None.

  • traj_path (list[list[str]]) – List of lists containing trajectory paths. Default is None.

  • isolated_atom_energies (dict) – Dictionary of isolated energy values for species. Required for ‘boltzhist_cur’ selection method. Default is None.

  • random_seed (int, optional) – Seed for random number generation, ensuring reproducibility of sampling.

  • remove_traj_files (bool) – Remove all trajectory files raised by RSS to save memory

Returns:

The selected atoms.

Return type:

list of ase.Atoms