DataPreprocessing

DataPreprocessing#

class autoplex.fitting.common.flows.DataPreprocessing(name='data_preprocessing_for_fitting', split_ratio=0.5, regularization=False, separated=False, distillation=False, force_max=40.0, force_min=0.01, pre_database_dir=None, pre_xyz_files=None, atomwise_regularization_parameter=0.1, atom_wise_regularization=True, run_fits_on_different_cluster=False)[source]#

Bases: Maker

Data preprocessing of the provided dataset.

Parameters:
  • name (str) – Name of the flows produced by this maker.

  • split_ratio (float) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1

  • regularization (bool) – For using sigma regularization.

  • separated (bool) – Repeat the fit for each data_type available in the (combined) database.

  • distillation (bool) – For using data distillation.

  • force_max (float) – Maximally allowed force in the data set.

  • force_min (float) – Minimal force cutoff value for atom-wise regularization.

  • pre_database_dir (str or None) – The pre-database directory.

  • pre_xyz_files (list[str] or None) – Names of the pre-database train xyz file and test xyz file labelled by VASP.

  • atomwise_regularization_parameter (float) – Regularization value for the atom-wise force components.

  • atom_wise_regularization (bool) – If True, includes atom-wise regularization.

  • run_fits_on_different_cluster (bool) – If True, will copy the fitting database to the MongoDB

make(fit_input)[source]#

Maker for data preprocessing.

Parameters:

fit_input (dict) – Mixed list of dictionary and lists of the fit input data.