DataPreprocessing

DataPreprocessing#

class autoplex.fitting.common.flows.DataPreprocessing(name='data_preprocessing_for_fitting', split_ratio=0.5, regularization=False, separated=False, distillation=False, ref_energy_name='REF_energy', ref_force_name='REF_forces', ref_virial_name='REF_virial', force_max=40.0, force_min=0.01, pre_database_dir=None, pre_xyz_files=None, atomwise_regularization_parameter=0.1, atom_wise_regularization=True, train_data_file='train.extxyz', test_data_file='test.extxyz', run_fits_on_different_cluster=False)[source]#

Bases: Maker

Data preprocessing of the provided dataset.

Parameters:

name (str) – Name of the flows produced by this maker.
split_ratio (float) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1
regularization (bool) – For using sigma regularization.
separated (bool) – Repeat the fit for each data_type available in the (combined) database.
distillation (bool) – For using data distillation.
ref_energy_name (str) – Reference energy name in xyz file.
ref_force_name (str) – Reference force name in xyz file.
ref_virial_name (str) – Reference virial name in xyz file.
force_max (float) – Maximally allowed force in the data set.
force_min (float) – Minimal force cutoff value for atom-wise regularization.
pre_database_dir (str or None) – The pre-database directory.
pre_xyz_files (list[str] or None) – Names of the pre-database train xyz file and test xyz file labelled by VASP.
atomwise_regularization_parameter (float) – Regularization value for the atom-wise force components.
atom_wise_regularization (bool) – If True, includes atom-wise regularization.
train_data_file (str) – Name of the training xyz data file.
test_data_file (str) – Name of the test xyz data file.
run_fits_on_different_cluster (bool) – If True, will copy the fitting database to the MongoDB

make(fit_input)[source]#

Maker for data preprocessing.

Parameters:: fit_input (dict) – Mixed list of dictionary and lists of the fit input data.

DataPreprocessing

Contents

DataPreprocessing#