do_rss_iterations

do_rss_iterations#

autoplex.auto.rss.jobs.do_rss_iterations(input, tag, generated_struct_numbers, num_of_initial_selected_structs=None, buildcell_options=None, fragment_file=None, fragment_numbers=None, num_processes_buildcell=1, initial_selection_enabled=False, rss_selection_method=None, num_of_rss_selected_structs=100, bcur_params=None, random_seed=None, include_isolated_atom=False, isolatedatom_box=None, e0_spin=False, include_dimer=False, dimer_box=None, dimer_range=None, dimer_num=21, custom_incar=None, custom_potcar=None, config_types=None, vasp_ref_file='vasp_ref.extxyz', rss_group='rss', test_ratio=0.1, regularization=False, retain_existing_sigma=False, scheme=None, reg_minmax=None, distillation=True, force_max=200, force_label='REF_forces', mlip_type='GAP', ref_energy_name='REF_energy', ref_force_name='REF_forces', ref_virial_name='REF_virial', auto_delta=False, num_processes_fit=1, device_for_fitting='cpu', scalar_pressure_method='exp', scalar_exp_pressure=100, scalar_pressure_exponential_width=0.2, scalar_pressure_low=0, scalar_pressure_high=50, max_steps=200, force_tol=0.05, stress_tol=0.05, hookean_repul=False, hookean_paras=None, keep_symmetry=False, write_traj=True, num_processes_rss=1, device_for_rss='cpu', stop_criterion=0.01, max_iteration_number=5, num_groups=1, initial_kt=0.3, current_iter_index=1, **fit_kwargs)[source]#

Perform iterative RSS to improve the accuracy of a MLIP.

Each iteration involves generating new structures, sampling, running VASP calculations, collecting data, preprocessing data, and fitting a new MLIP.

Parameters:
  • input (dict) –

    A dictionary parameter used to pass specific input data required during the RSS iterations. The keys in this dictionary should be one of the following valid keys:

    test_error: float

    The test error of the fitted MLIP.

    pre_database_dir: str

    The directory of the preprocessed database.

    mlip_path: str

    The path to the fitted MLIP.

    isolated_atom_energies: dict

    The isolated energy values.

    current_iter: int

    The current iteration index.

    kt: float

    The value of kt.

  • tag (str) – Tag of systems. It can also be used for setting up elements and stoichiometry. For example, ‘SiO2’ will generate structures with a 2:1 ratio of Si to O.

  • generated_struct_numbers (list[int]) – Expected number of generated randomized unit cells.

  • num_of_initial_selected_structs (list[int] | None) – Number of structures to be sampled. Default is None.

  • buildcell_options (list[dict] | None) – Customized parameters for buildcell. Default is None.

  • fragment_file (Atoms | list[Atoms] | None) – Fragment(s) for random structures, e.g. molecules, to be placed indivudally intact. atoms.arrays should have a ‘fragment_id’ key with unique identifiers for each fragment if in same Atoms. atoms.cell must be defined (e.g. Atoms.cell = np.eye(3)*20).

  • fragment_numbers (list[str] | None) – Numbers of each fragment to be included in the random structures. Defaults to 1 for all specified.

  • num_processes_buildcell (int) – Number of processes to use for parallel computation during buildcell generation. Default is 1.

  • initial_selection_enabled (bool) – If true, sample structures using CUR. Default is False.

  • rss_selection_method (str) – Method for selecting samples from the generated structures. Default is None.

  • num_of_rss_selected_structs (int) – Number of structures to be selected.

  • bcur_params (dict | None) – Parameters for Boltzmann CUR selection. Default is None.

  • random_seed (int | None) – A seed to ensure reproducibility of CUR selection. Default is None.

  • include_isolated_atom (bool) – If true, perform single-point calculations for isolated atoms. Default is False.

  • isolatedatom_box (list[float] | None) – List of the lattice constants for an isolated atom configuration. Default is None.

  • e0_spin (bool) – If true, include spin polarization in isolated atom and dimer calculations. Default is False.

  • include_dimer (bool) – If true, perform single-point calculations for dimers only once. Default is False.

  • dimer_box (list[float] | None) – The lattice constants of a dimer box. Default is None.

  • dimer_range (list[float] | None) – Range of distances for dimer calculations. Default is None.

  • dimer_num (int) – Number of different distances to consider for dimer calculations. Default is 21.

  • custom_incar (dict | None) – Dictionary of custom VASP input parameters. If provided, will update the default parameters. Default is None.

  • custom_potcar (dict | None) – Dictionary of POTCAR settings to update. Keys are element symbols, values are the desired POTCAR labels. Default is None.

  • config_types (list[str] | None) – Configuration types for the VASP calculations. Default is None.

  • vasp_ref_file (str) – Reference file for VASP data. Default is ‘vasp_ref.extxyz’.

  • rss_group (str) – Group name for GAP RSS. Default is ‘rss’.

  • test_ratio (float) – The proportion of the test set after splitting the data. Default is 0.1.

  • regularization (bool) – If true, apply regularization. This only works for GAP. Default is False.

  • retain_existing_sigma (bool) – Whether to keep the current sigma values for specific configuration types. If set to True, existing sigma values for specific configurations will remain unchanged.

  • scheme (str | None) – Scheme to use for regularization. Default is None.

  • reg_minmax (list[tuple] | None) – A list of tuples representing the minimum and maximum values for regularization.

  • distillation (bool) – If true, apply data distillation. Default is True.

  • force_max (float) – Maximum force value to exclude structures. Default is 200.

  • force_label (str) – The label of force values to use for distillation. Default is ‘REF_forces’.

  • mlip_type (str) – Choose one specific MLIP type: ‘GAP’ | ‘J-ACE’ | ‘NequIP’ | ‘M3GNet’ | ‘MACE’. Default is ‘GAP’.

  • ref_energy_name (str) – Reference energy name. Default is ‘REF_energy’.

  • ref_force_name (str) – Reference force name. Default is ‘REF_forces’.

  • ref_virial_name (str) – Reference virial name. Default is ‘REF_virial’.

  • auto_delta (bool) – If true, apply automatic determination of delta for GAP terms. Default is False.

  • num_processes_fit (int) – Number of processes used for fitting. Default is 1.

  • device_for_fitting (str) – Device to be used for model fitting, either “cpu” or “cuda”.

  • scalar_pressure_method (str) – Method for adding external pressures. Default is ‘exp’.

  • scalar_exp_pressure (float) – Scalar exponential pressure. Default is 100.

  • scalar_pressure_exponential_width (float) – Width for scalar pressure exponential. Default is 0.2.

  • scalar_pressure_low (float) – Low limit for scalar pressure. Default is 0.

  • scalar_pressure_high (float) – High limit for scalar pressure. Default is 50.

  • max_steps (int) – Maximum number of steps for relaxation. Default is 200.

  • force_tol (float) – Force residual tolerance for relaxation. Default is 0.05.

  • stress_tol (float) – Stress residual tolerance for relaxation. Default is 0.05.

  • hookean_repul (bool) – If true, apply Hookean repulsion. Default is False.

  • hookean_paras (dict[tuple[int, int], tuple[float, float]] | None) – Parameters for Hookean repulsion as a dictionary of tuples. Default is None.

  • keep_symmetry (bool) – If true, preserve symmetry during relaxation. Default is False.

  • write_traj (bool) – If true, write trajectory of RSS. Default is True.

  • num_processes_rss (int) – Number of processes used for running RSS. Default is 1.

  • device_for_rss (str) – Specify device to use “cuda” or “cpu” for running RSS. Default is “cpu”.

  • stop_criterion (float) – Convergence criterion for stopping RSS iterations. Default is 0.01.

  • max_iteration_number (int) – Maximum number of RSS iterations to perform. Default is 5.

  • num_groups (int) – Number of structure groups, used for assigning tasks across multiple nodes. Default is 1.

  • initial_kt (float) – Initial temperature (in eV) for Boltzmann sampling. Default is 0.3.

  • current_iter_index (int) – Index for the current RSS iteration. Default is 1.

  • fit_kwargs – Additional keyword arguments for the MLIP fitting process.

Returns:

A dictionary with following information

  • ’test_error’: float, The test error of the fitted MLIP.

  • ’pre_database_dir’: str, The directory of the preprocessed database.

  • ’mlip_path’: str, The path to the fitted MLIP.

  • ’isolated_atom_energies’: dict, The isolated energy values.

  • ’current_iter’: int, The current iteration index.

  • ’kt’: float, The temperature (in eV) for Boltzmann sampling.

Return type:

dict