initial_rss

initial_rss#

autoplex.auto.rss.jobs.initial_rss(tag, generated_struct_numbers, num_of_initial_selected_structs=None, cell_seed_paths=None, buildcell_options=None, fragment_file=None, fragment_numbers=None, num_processes_buildcell=1, initial_selection_enabled=False, bcur_params=None, random_seed=None, include_isolated_atom=False, isolatedatom_box=None, e0_spin=False, include_dimer=False, dimer_box=None, dimer_range=None, dimer_num=21, custom_incar=None, custom_potcar=None, config_type=None, vasp_ref_file='vasp_ref.extxyz', rss_group='initial', test_ratio=0.1, regularization=False, retain_existing_sigma=False, scheme=None, element_order=None, reg_minmax=None, distillation=False, force_max=None, force_label='REF_forces', pre_database_dir=None, mlip_type='GAP', ref_energy_name='REF_energy', ref_force_name='REF_forces', ref_virial_name='REF_virial', auto_delta=False, num_processes_fit=1, device_for_fitting='cpu', **fit_kwargs)[source]#

Run initial Random Structure Searching (RSS) workflow from scratch.

Parameters:

tag (str) – Tag of systems. It can also be used for setting up elements and stoichiometry. For example, the tag of ‘SiO2’ will be recognized as a 1:2 ratio of Si to O and passed into the parameters of buildcell. However, note that this will be overwritten if the stoichiometric ratio of elements is defined in the ‘cell_seed_paths’ or ‘buildcell_options’.
generated_struct_numbers (list[int]) – Expected number of generated randomized unit cells.
num_of_initial_selected_structs (list[int] | None) – Number of structures to be sampled. Default is None.
cell_seed_paths (list[str]) – A list of paths to the custom buildcell control files, which ends with ‘.cell’. If these files exist, the buildcell_options argument will no longer take effect.
buildcell_options (list[dict] | None) – Customized parameters for buildcell. Default is None.
fragment_file (Atoms | list[Atoms] | None) – Fragment(s) for random structures, e.g. molecules, to be placed indivudally intact. atoms.arrays should have a ‘fragment_id’ key with unique identifiers for each fragment if in same Atoms. atoms.cell must be defined (e.g. Atoms.cell = np.eye(3)*20).
fragment_numbers (list[str] | None) – Numbers of each fragment to be included in the random structures. Defaults to 1 for all specified.
num_processes_buildcell (int) – Number of processes to use for parallel computation during buildcell generation. Default is 1.
initial_selection_enabled (bool) – If true, sample structures using CUR. Default is False.
bcur_params (dict | None) – Parameters for Boltzmann CUR selection. Default is None.
random_seed (int | None) – A seed to ensure reproducibility of CUR selection. Default is None.
include_isolated_atom (bool) – If true, perform single-point calculations for isolated atoms. Default is False.
isolatedatom_box (list[float] | None) – List of the lattice constants for an isolated atom configuration. Default is None.
e0_spin (bool) – If true, include spin polarization in isolated atom and dimer calculations. Default is False.
include_dimer (bool) – If true, perform single-point calculations for dimers. Default is False.
dimer_box (list[float] | None) – The lattice constants of a dimer box. Default is None.
dimer_range (list[float] | None) – Range of distances for dimer calculations. Default is None.
dimer_num (int) – Number of different distances to consider for dimer calculations. Default is 21.
custom_incar (dict | None) – Dictionary of custom VASP input parameters. If provided, will update the default parameters. Default is None.
custom_potcar (dict | None) – Dictionary of POTCAR settings to update. Keys are element symbols, values are the desired POTCAR labels. Default is None.
config_type (str | None) – Configuration type for the VASP calculations. Default is None.
vasp_ref_file (str) – Reference file for VASP data. Default is ‘vasp_ref.extxyz’.
rss_group (str) – Group name for GAP RSS. Default is ‘initial’.
test_ratio (float) – The proportion of the test set after splitting the data. If None, no splitting will be performed. Default is 0.1.
regularization (bool) – If true, apply regularization. This only works for GAP. Default is False.
retain_existing_sigma (bool) – Whether to keep the current sigma values for specific configuration types. If set to True, existing sigma values for specific configurations will remain unchanged.
scheme (str | None) – Scheme to use for regularization. Default is None.
element_order (list | None) – List of atomic numbers in order of choice (e.g. [42, 16] for MoS2). This value is useful when constructing high-dimensional convex hulls based on the “volume-stoichiometry” scheme. Specially, if the dataset contains compounds with different numbers of constituent elements (e.g., both binary and ternary structures), this value must be explicitly set to ensure the convex hull is constructed consistently.
reg_minmax (list[tuple] | None) – A list of tuples representing the minimum and maximum values for regularization.
distillation (bool) – If true, apply data distillation. Default is False.
force_max (float | None) – Maximum force value to exclude structures. Default is None.
force_label (str) – The label of force values to use for distillation. Default is ‘REF_forces’.
pre_database_dir (str | None) – Directory where the previous database was saved. Default is None.
mlip_type (Literal["GAP", "J-ACE", "NEP", "NEQUIP", "M3GNET", "MACE"]) – Choose one specific MLIP type to be fitted. Default is ‘GAP’.
ref_energy_name (str) – Reference energy name. Default is ‘REF_energy’.
ref_force_name (str) – Reference force name. Default is ‘REF_forces’.
ref_virial_name (str) – Reference virial name. Default is ‘REF_virial’.
auto_delta (bool) – If true, apply automatic determination of delta for GAP terms. Default is False.
num_processes_fit (int) – Number of processes used for fitting. Default is 1.
device_for_fitting (str) – Device to be used for model fitting, either “cpu” or “cuda”.
fit_kwargs – Additional keyword arguments for the MLIP fitting process.

Returns:

A dictionary with following information

’test_error’: float, The test error of the fitted MLIP.
’pre_database_dir’: str, The directory of the preprocessed database.
’mlip_path’: List of path to the fitted MLIP.
’isolated_atom_energies’: dict, The isolated energy values.
’current_iter’: int, The current iteration index, set to 0.

Return type:

dict

initial_rss

Contents

initial_rss#