initial_rss#

autoplex.data.rss.flows.initial_rss(struct_number=10000, tag='GeSb2Te4', selection_method='cur', num_of_selection=3, bcur_params=None, random_seed=None, e0_spin=False, isolated_atom=True, dimer=True, dimer_range=None, dimer_num=10, custom_set=None, config_types=None, vasp_ref_file='vasp_ref.extxyz', rss_group='initial', test_ratio=0.1, regularization=True, distillation=True, f_max=200, pre_database_dir=None, mlip_type='GAP', mlip_hyper=None, ref_energy_name='REF_energy', ref_force_name='REF_forces', ref_virial_name='REF_virial', num_processes_fit=None, kt=None, **fit_kwargs)[source]#

Run initial Random Structure Searching (RSS) workflow from scratch.

The workflow consists of the following jobs: job1 - RandomizedStructure: Generates randomized structures job2 - Sampling: Samples a subset of the generated structures using CUR job3 - DFTStaticMaker: Runs single-point calculations on the sampled structures job4 - VASP_collect_data: Collects VASP calculation data job5 - Data_preprocessing: Preprocesses the data for fitting ML models job6 - MLIPFitMaker: Fits a ML interatomic potential (MLIP)

Parameters:

struct_number (int, optional) – Number of structures to generate. Default is 10000.
tag (str, optional) – Tag for the generated structures. Default is ‘GeSb2Te4’.
selection_method (str, optional) – Method for selecting structures. Default is ‘cur’.
num_of_selection (int, optional) – Number of structures to select. Default is 3.
bcur_params (str, optional) – Parameters for the CUR method. Default is None.
random_seed (int, optional) – Seed for random number generator. Default is None.
e0_spin (bool, optional) – Whether to include spin polarization in the static calculations of isolated atoms and dimers. Default is False.
isolated_atom (bool, optional) – Whether to include isolated atom calculations. Default is True.
dimer (bool, optional) – Whether to include dimer calculations. Default is True.
dimer_range (list, optional) – Distance range for dimer calculations. Default is None.
dimer_num (int, optional) – Number of dimers generated for calculations. Default is None.
custom_set (dict, optional) – Custom set of parameters for VASP. Default is None.
config_types (list[str], optional) – List of configuration types corresponding to the structures. If provided, should have the same length as the ‘structures’ list. If None, defaults to ‘bulk’. Default is None.
vasp_ref_file (str, optional) – File name of collected VASP data. Default is ‘vasp_ref.extxyz’.
rss_group (str, optional) – Group name of structures for RSS. Default is ‘initial’.
test_ratio (float, optional) – The proportion of the test set after splitting the data. Default is 0.1.
regularization (bool, optional) – Whether to apply regularization. This only works for GAP. Default is True.
distillation (bool, optional) – Whether to apply distillation of structures. Default is True.
f_max (float, optional) – Maximum force value to exclude structures. Default is 200.
pre_database_dir (str, optional) – Directory for the preprocessed database. Default is None.
mlip_type (str, optional) – Type of MLIP to fit. Default is ‘GAP’.
mlip_hyper (str, optional) – Hyperparameters for the MLIP. Default is None.
ref_energy_name (str, optional) – Reference energy name. Default is “REF_energy”.
ref_force_name (str, optional) – Reference force name. Default is “REF_forces”.
ref_virial_name (str, optional) – Reference virial name. Default is “REF_virial”.
num_processes_fit (int, optional) – Number of processes for fitting. Default is None.
kt (float, optional) – Value of kT. Default is None.
fit_kwargs (dict, optional) – Additional arguments for the machine learning fit. Default is None.

Returns:

- test_error (float) – The test error of the fitted MLIP.
- pre_database_dir (str) – The directory of the preprocessed database.
- mlip_path (str) – The path to the fitted MLIP.
- isol_es (dict) – The isolated energy values.
- current_iter (int) – The current iteration index, set to 0.
- kt (float) – The value of kT.

initial_rss

Contents

initial_rss#