DataPreprocessing

DataPreprocessing#

class autoplex.fitting.common.flows.DataPreprocessing(name='data_preprocessing_for_fitting', split_ratio=0.5, regularization=False, separated=False, distillation=False, f_max=40.0)[source]#

Bases: Maker

Data preprocessing of the provided dataset.

Parameters:
  • name (str) – Name of the flows produced by this maker.

  • split_ratio (float) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1

  • regularization (bool) – For using sigma regularization.

  • separated (bool) – Repeat the fit for each data_type available in the (combined) database.

  • distillation (bool) – For using data distillation.

  • f_max (float) – Maximally allowed force in the data set.

make(fit_input, pre_database_dir=None, pre_xyz_files=None, atomwise_regularization_parameter=0.1, f_min=0.01, atom_wise_regularization=True)[source]#

Maker for data preprocessing.

Parameters:
  • fit_input (dict) – Mixed list of dictionary and lists of the fit input data.

  • pre_database_dir (str or None) – the pre-database directory.

  • pre_xyz_files (list[str] or None) – names of the pre-database train xyz file and test xyz file labeled by VASP.

  • atomwise_regularization_parameter (float) – regularization value for the atom-wise force components.

  • f_min (float) – minimal force cutoff value for atom-wise regularization.

  • atom_wise_regularization (bool) – for including atom-wise regularization.