DataPreprocessing#
- class autoplex.fitting.common.flows.DataPreprocessing(name='data_preprocessing_for_fitting', split_ratio=0.5, regularization=False, separated=False, distillation=False, f_max=40.0)[source]#
Bases:
Maker
Data preprocessing of the provided dataset.
- Parameters:
name (str) – Name of the flows produced by this maker.
split_ratio (float) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1
regularization (bool) – For using sigma regularization.
separated (bool) – Repeat the fit for each data_type available in the (combined) database.
distillation (bool) – For using data distillation.
f_max (float) – Maximally allowed force in the data set.
- make(fit_input, pre_database_dir=None, pre_xyz_files=None, atomwise_regularization_parameter=0.1, f_min=0.01, atom_wise_regularization=True)[source]#
Maker for data preprocessing.
- Parameters:
fit_input (dict) – Mixed list of dictionary and lists of the fit input data.
pre_database_dir (str or None) – the pre-database directory.
pre_xyz_files (list[str] or None) – names of the pre-database train xyz file and test xyz file labeled by VASP.
atomwise_regularization_parameter (float) – regularization value for the atom-wise force components.
f_min (float) – minimal force cutoff value for atom-wise regularization.
atom_wise_regularization (bool) – for including atom-wise regularization.