RssMaker

RssMaker#

class autoplex.auto.rss.flows.RssMaker(name='ml-driven rss', path_to_default_config_parameters=None)[source]#

Bases: Maker

Maker to set up and run RSS for exploring and learning potential-energy surfaces (from scratch).

Parameters:
  • name (str) – Name of the flow.

  • path_to_default_config_parameters (Path | str | None) – Path to the default RSS configuration file ‘rss_default_configuration.yaml’. If None, the default path will be used.

make(config_file=None, **kwargs)[source]#

Make a rss workflow using the specified configuration file and additional keyword arguments.

Parameters:
  • config_file (str | None) – Path to the configuration file that defines the setup parameters for the whole RSS workflow. If not provided, the default file ‘rss_default_configuration.yaml’ will be used.

  • kwargs (dict, optional) – Additional optional keyword arguments to customize the job execution.

Keyword Arguments:
  • tag (-) – Tag of systems. It can also be used for setting up elements and stoichiometry. For example, the tag of ‘SiO2’ will be recognized as a 1:2 ratio of Si to O and passed into the parameters of buildcell. However, note that this will be overwritten if the stoichiometric ratio of elements is defined in the ‘buildcell_options’.

  • train_from_scratch (-) – If True, it starts the workflow from scratch. If False, it resumes from a previous state.

  • resume_from_previous_state (-) –

    A dictionary containing the state information required to resume a previously interrupted or saved RSS workflow. When ‘train_from_scratch’ is set to False, this parameter is mandatory for the workflow to pick up from a saved state. Expected keys within this dictionary: - test_error: float

    The test error from the last completed training step.

    • pre_database_dir: str

      Path to the directory containing the pre-existing database for resuming.

    • mlip_path: str

      Path to the file of a previous MLIP model.

    • isolated_atom_energies: dict

      A dictionary of isolated atom energy values, with atomic numbers as keys and their energies as valuables.

  • generated_struct_numbers (-) – Expected number of generated randomized unit cells by buildcell.

  • buildcell_options (-) – Customized parameters for buildcell. Default is None.

  • fragment (-) – Fragment(s) for random structures, e.g., molecules, to be placed indivudally intact. atoms.arrays should have a ‘fragment_id’ key with unique identifiers for each fragment if in same Atoms. atoms.cell must be defined (e.g., Atoms.cell = np.eye(3)*20).

  • fragment_numbers (-) – Numbers of each fragment to be included in the random structures. Defaults to 1 for all specified.

  • num_processes_buildcell (-) – Number of processes to use for parallel computation during buildcell generation.

  • num_of_initial_selected_structs (-) – Number of structures to be sampled directly from the buildcell-generated randomized cells.

  • num_of_rss_selected_structs (-) – Number of structures to be selected from each RSS iteration.

  • initial_selection_enabled (-) – If true, sample structures from initially generated randomized cells using CUR.

  • rss_selection_method (-) – Method for selecting samples from the RSS trajectories: Boltzmann flat histogram in enthalpy first, then CUR. Options include: - ‘bcur1s’: Execute bcur with one shot (1s) - ‘bcur2i’: Execute bcur with two iterations (2i)

  • bcur_params (-) –

    Parameters for Boltzmann CUR selection. The default dictionary includes: - soap_paras: dict

    SOAP descriptor parameters: - l_max: int

    Maximum degree of spherical harmonics (default 12).

    • n_max: int

      Maximum number of radial basis functions (default 12).

    • atom_sigma: float

      Width of Gaussian smearing (default 0.0875).

    • cutoff: float

      Radial cutoff distance (default 10.5).

    • cutoff_transition_width: float

      Width of the transition region (default 1.0).

    • zeta: float

      Exponent for dot-product SOAP kernel (default 4.0).

    • average: bool

      Whether to average the SOAP vectors (default True).

    • species: bool

      Whether to consider species information (default True).

    • kb_temp: float

      Temperature in eV for Boltzmann weighting (default 0.3).

    • frac_of_bcur: float

      Fraction of Boltzmann CUR selections (default 0.8).

    • bolt_max_num: int

      Maximum number of Boltzmann selections (default 3000).

    • kernel_exp: float

      Exponent for the kernel (default 4.0).

    • energy_label: str

      Label for the energy data (default ‘energy’).

  • random_seed (-) – A seed to ensure reproducibility of CUR selection. Default is None.

  • include_isolated_atom (-) – If true, perform single-point calculations for isolated atoms.

  • isolatedatom_box (-) – List of the lattice constants for an isolated atom configuration.

  • e0_spin (-) – If true, include spin polarization in isolated atom and dimer calculations. Default is False.

  • include_dimer (-) – If true, perform single-point calculations for dimers only once. Default is False.

  • dimer_box (-) – The lattice constants of a dimer box.

  • dimer_range (-) – Range of distances for dimer calculations.

  • dimer_num (-) – Number of different distances to consider for dimer calculations. Default is 21.

  • custom_incar (-) – Dictionary of custom VASP input parameters. If provided, will update the default parameters. Default is None.

  • custom_potcar (-) – Dictionary of POTCAR settings to update. Keys are element symbols, values are the desired POTCAR labels. Default is None.

  • vasp_ref_file (-) – Reference file for VASP data. Default is ‘vasp_ref.extxyz’.

  • config_types (-) – Configuration types for the VASP calculations. Default is None.

  • rss_group (-) – Group name for RSS to setting up regularization.

  • test_ratio (-) – The proportion of the test set after splitting the data. The value is allowed to be set to 0; in this case, the testing error would not be meaningful anymore.

  • regularization (-) – If True, apply regularization. This only works for GAP to date. Default is False.

  • scheme (-) – Method to use for regularization. Options are: - ‘linear_hull’: for single-composition system, use 2D convex hull (E, V) - ‘volume-stoichiometry’: for multi-composition system, use 3D convex hull of (E, V, mole-fraction)

  • reg_minmax (-) – list of tuples of (min, max) values for energy, force, virial sigmas for regularization.

  • distillation (-) – If true, apply data distillation. Default is True.

  • force_max (-) – Maximum force value to exclude structures. Default is 50.

  • force_label (-) – The label of force values to use for distillation. Default is ‘REF_forces’.

  • pre_database_dir (-) – Directory where the previous database was saved.

  • mlip_type (-) – Choose one specific MLIP type to be fitted: ‘GAP’ | ‘J-ACE’ | ‘NEQUIP’ | ‘M3GNET’ | ‘MACE’. Default is ‘GAP’.

  • ref_energy_name (-) – Reference energy name. Default is ‘REF_energy’.

  • ref_force_name (-) – Reference force name. Default is ‘REF_forces’.

  • ref_virial_name (-) – Reference virial name. Default is ‘REF_virial’.

  • auto_delta (-) – If true, apply automatic determination of delta for GAP terms. Default is False.

  • num_processes_fit (-) – Number of processes used for fitting. Default is 1.

  • device_for_fitting (-) – Device to be used for model fitting, either “cpu” or “cuda”.

  • **fit_kwargs (-) –

    Additional keyword arguments for the MLIP fitting process.

  • scalar_pressure_method (-) – Method for adding external pressures. Acceptable options are: - ‘exp’: Applies pressure using an exponential distribution. - ‘uniform’: Applies pressure using a uniform distribution.

  • scalar_exp_pressure (-) – Scalar exponential pressure. Default is 100.

  • scalar_pressure_exponential_width (-) – Width for scalar pressure exponential. Default is 0.2.

  • scalar_pressure_low (-) – Low limit for scalar pressure. Default is 0.

  • scalar_pressure_high (-) – High limit for scalar pressure. Default is 50.

  • max_steps (-) – Maximum number of steps for relaxation. Default is 200.

  • force_tol (-) – Force residual tolerance for relaxation. Default is 0.05.

  • stress_tol (-) – Stress residual tolerance for relaxation. Default is 0.05.

  • hookean_repul (-) – If true, apply Hookean repulsion. Default is False.

  • hookean_paras (-) – Parameters for Hookean repulsion as a dictionary of tuples. Default is None.

  • keep_symmetry (-) – If true, preserve symmetry during relaxation. Default is False.

  • write_traj (-) – If true, write trajectory of RSS. Default is True.

  • num_processes_rss (-) – Number of processes used for running RSS. Default is 1.

  • device_for_rss (-) – Specify device to use “cuda” or “cpu” for running RSS. Default is “cpu”.

  • stop_criterion (-) – Convergence criterion for stopping RSS iterations. Default is 0.01.

  • max_iteration_number (-) – Maximum number of RSS iterations to perform. Default is 25.

  • num_groups (-) – Number of structure groups, used for assigning tasks across multiple nodes. For example, if there are 10,000 trajectories to relax and ‘num_groups=10’, the trajectories will be divided into 10 groups and 10 independent jobs will be created, with each job handling 1,000 trajectories.

  • initial_kb_temp (-) – Initial temperature (in eV) for Boltzmann sampling. Default is 0.3.

  • current_iter_index (-) – Index for the current RSS iteration. Default is 1.

  • Output

  • ------

  • dict

    A dictionary whose keys contains: - test_error: float

    The test error of the fitted MLIP.

    • pre_database_dir: str

      The directory of the latest RSS database.

    • mlip_path: str

      The path to the latest fitted MLIP.

    • isolated_atom_energies: dict

      The isolated energy values.

    • current_iter: int

      The current iteration index.

    • kb_temp: float

      The temperature (in eV) for Boltzmann sampling.