initial_rss

Contents

initial_rss#

autoplex.data.rss.flows.initial_rss(struct_number=10000, tag='GeSb2Te4', selection_method='cur', num_of_selection=3, bcur_params=None, random_seed=None, e0_spin=False, isolated_atom=True, dimer=True, dimer_range=None, dimer_num=10, custom_set=None, config_types=None, vasp_ref_file='vasp_ref.extxyz', rss_group='initial', test_ratio=0.1, regularization=True, distillation=True, f_max=200, pre_database_dir=None, mlip_type='GAP', mlip_hyper=None, ref_energy_name='REF_energy', ref_force_name='REF_forces', ref_virial_name='REF_virial', num_processes_fit=None, kt=None, **fit_kwargs)[source]#

Run initial Random Structure Searching (RSS) workflow from scratch.

The workflow consists of the following jobs: job1 - RandomizedStructure: Generates randomized structures job2 - Sampling: Samples a subset of the generated structures using CUR job3 - DFTStaticMaker: Runs single-point calculations on the sampled structures job4 - VASP_collect_data: Collects VASP calculation data job5 - Data_preprocessing: Preprocesses the data for fitting ML models job6 - MLIPFitMaker: Fits a ML interatomic potential (MLIP)

Parameters:
  • struct_number (int, optional) – Number of structures to generate. Default is 10000.

  • tag (str, optional) – Tag for the generated structures. Default is ‘GeSb2Te4’.

  • selection_method (str, optional) – Method for selecting structures. Default is ‘cur’.

  • num_of_selection (int, optional) – Number of structures to select. Default is 3.

  • bcur_params (str, optional) – Parameters for the CUR method. Default is None.

  • random_seed (int, optional) – Seed for random number generator. Default is None.

  • e0_spin (bool, optional) – Whether to include spin polarization in the static calculations of isolated atoms and dimers. Default is False.

  • isolated_atom (bool, optional) – Whether to include isolated atom calculations. Default is True.

  • dimer (bool, optional) – Whether to include dimer calculations. Default is True.

  • dimer_range (list, optional) – Distance range for dimer calculations. Default is None.

  • dimer_num (int, optional) – Number of dimers generated for calculations. Default is None.

  • custom_set (dict, optional) – Custom set of parameters for VASP. Default is None.

  • config_types (list[str], optional) – List of configuration types corresponding to the structures. If provided, should have the same length as the ‘structures’ list. If None, defaults to ‘bulk’. Default is None.

  • vasp_ref_file (str, optional) – File name of collected VASP data. Default is ‘vasp_ref.extxyz’.

  • rss_group (str, optional) – Group name of structures for RSS. Default is ‘initial’.

  • test_ratio (float, optional) – The proportion of the test set after splitting the data. Default is 0.1.

  • regularization (bool, optional) – Whether to apply regularization. This only works for GAP. Default is True.

  • distillation (bool, optional) – Whether to apply distillation of structures. Default is True.

  • f_max (float, optional) – Maximum force value to exclude structures. Default is 200.

  • pre_database_dir (str, optional) – Directory for the preprocessed database. Default is None.

  • mlip_type (str, optional) – Type of MLIP to fit. Default is ‘GAP’.

  • mlip_hyper (str, optional) – Hyperparameters for the MLIP. Default is None.

  • ref_energy_name (str, optional) – Reference energy name. Default is “REF_energy”.

  • ref_force_name (str, optional) – Reference force name. Default is “REF_forces”.

  • ref_virial_name (str, optional) – Reference virial name. Default is “REF_virial”.

  • num_processes_fit (int, optional) – Number of processes for fitting. Default is None.

  • kt (float, optional) – Value of kT. Default is None.

  • fit_kwargs (dict, optional) – Additional arguments for the machine learning fit. Default is None.

Returns:

  • - test_error (float) – The test error of the fitted MLIP.

  • - pre_database_dir (str) – The directory of the preprocessed database.

  • - mlip_path (str) – The path to the fitted MLIP.

  • - isol_es (dict) – The isolated energy values.

  • - current_iter (int) – The current iteration index, set to 0.

  • - kt (float) – The value of kT.