DataPreprocessing#
- class autoplex.fitting.common.flows.DataPreprocessing(name='data_preprocessing_for_fitting', split_ratio=0.5, regularization=False, separated=False, distillation=False, force_max=40.0, force_min=0.01, pre_database_dir=None, pre_xyz_files=None, atomwise_regularization_parameter=0.1, atom_wise_regularization=True, run_fits_on_different_cluster=False)[source]#
Bases:
Maker
Data preprocessing of the provided dataset.
- Parameters:
name (str) – Name of the flows produced by this maker.
split_ratio (float) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1
regularization (bool) – For using sigma regularization.
separated (bool) – Repeat the fit for each data_type available in the (combined) database.
distillation (bool) – For using data distillation.
force_max (float) – Maximally allowed force in the data set.
force_min (float) – Minimal force cutoff value for atom-wise regularization.
pre_database_dir (str or None) – The pre-database directory.
pre_xyz_files (list[str] or None) – Names of the pre-database train xyz file and test xyz file labelled by VASP.
atomwise_regularization_parameter (float) – Regularization value for the atom-wise force components.
atom_wise_regularization (bool) – If True, includes atom-wise regularization.
run_fits_on_different_cluster (bool) – If True, will copy the fitting database to the MongoDB