CompleteDFTvsMLBenchmarkWorkflow#

class autoplex.auto.phonons.flows.CompleteDFTvsMLBenchmarkWorkflow(name='add_data', add_dft_phonon_struct=True, add_dft_rattled_struct=True, add_rss_struct=False, displacement_maker=None, phonon_bulk_relax_maker=None, phonon_static_energy_maker=None, rattled_bulk_relax_maker=None, isolated_atom_maker=None, n_structures=10, displacements=<factory>, symprec=0.0001, uc=False, volume_custom_scale_factors=None, volume_scale_factor_range=None, rattle_std=0.01, distort_type=0, min_distance=1.5, angle_percentage_scale=10, angle_max_attempts=1000, rattle_type=0, rattle_mc_n_iter=10, w_angle=None, ml_models=<factory>, atomwise_regularization_parameter=0.1, atom_wise_regularization=True, force_max=40.0, force_min=0.01, split_ratio=0.4, regularization=False, separated=False, num_processes_fit=None, distillation=True, apply_data_preprocessing=True, auto_delta=False, hyper_para_loop=False, atomwise_regularization_list=None, soap_delta_list=None, n_sparse_list=None, supercell_settings=<factory>, benchmark_kwargs=<factory>, path_to_hyperparameters=PosixPath('/home/runner/micromamba/envs/autoplex_docs/lib/python3.10/site-packages/autoplex/fitting/common/mlip-phonon-defaults.json'), summary_filename_prefix='results_', glue_xml=False, glue_file_path='glue.xml', use_defaults_fitting=True, run_fits_on_different_cluster=False)[source]#

Bases: Maker

Maker to construct a DFT (VASP) based dataset, composed of the following two configuration types.

  1. single atom displaced supercells (based on the atomate2 PhononMaker subroutines)

  2. supercells with randomly displaced atoms (based on the ase rattled function).

Machine-learned interatomic potential(s) are then fitted on the dataset, followed by benchmarking the resulting potential(s) to DFT (VASP) level using the provided benchmark structure(s) and comparing the respective DFT and MLIP-based Phonon calculations. The benchmark metrics are provided in form of a phonon band structure comparison and q-point-wise phonons RMSE plots, as well as a summary text file.

Parameters:
  • name (str) – Name of the flow produced by this maker.

  • add_dft_phonon_struct (bool.) – If True, will add displaced supercells via phonopy for DFT calculation.

  • add_dft_rattled_struct (bool.) – If True, will add rattled structures for DFT calculation.

  • add_rss_struct (bool.) – If True, will add RSS generated structures for DFT calculation.

  • n_structures (int.) – The total number of randomly displaced structures to be generated.

  • displacement_maker (BaseVaspMaker) – Maker used for a static calculation for a supercell.

  • phonon_bulk_relax_maker (BaseVaspMaker) – Maker used for the bulk relax unit cell calculation.

  • rattled_bulk_relax_maker (BaseVaspMaker) – Maker used for the bulk relax unit cell calculation.

  • phonon_static_energy_maker (BaseVaspMaker) – Maker used for the static energy unit cell calculation.

  • isolated_atom_maker (IsoAtomStaticMaker) – VASP maker for the isolated atom calculation.

  • n_structures – Total number of distorted structures to be generated. Must be provided if distorting volume without specifying a range, or if distorting angles. Default=10.

  • displacements (list[float]) – Displacement distances for phonon data generation for the fiting. Only 0.01 is used for the benchmark at the moment. This value can currently not be changed.

  • symprec (float) – Symmetry precision to use in the reduction of symmetry to find the primitive/conventional cell (use_primitive_standard_structure, use_conventional_standard_structure) and to handle all symmetry-related tasks in phonopy.

  • uc (bool.) – If True, will generate randomly distorted structures (unitcells) and add static computation jobs to the flow.

  • distort_type (int.) – 0- volume distortion, 1- angle distortion, 2- volume and angle distortion. Default=0.

  • volume_scale_factor_range (list[float]) – [min, max] of volume scale factors. e.g. [0.90, 1.10] will distort volume +-10%.

  • volume_custom_scale_factors (list[float]) – Specify explicit scale factors (if range is not specified). If None, will default to [0.90, 0.95, 0.98, 0.99, 1.01, 1.02, 1.05, 1.10].

  • min_distance (float) – Minimum separation allowed between any two atoms. Default= 1.5A.

  • angle_percentage_scale (float) – Angle scaling factor. Default= 10 will randomly distort angles by +-10% of original value.

  • angle_max_attempts (int.) – Maximum number of attempts to distort structure before aborting. Default=1000.

  • w_angle (list[float]) – List of angle indices to be changed i.e. 0=alpha, 1=beta, 2=gamma. Default= [0, 1, 2].

  • rattle_type (int.) – 0- standard rattling, 1- Monte-Carlo rattling. Default=0.

  • rattle_std (float.) – Rattle amplitude (standard deviation in normal distribution). Default=0.01. Note that for MC rattling, displacements generated will roughly be rattle_mc_n_iter**0.5 * rattle_std for small values of n_iter.

  • rattle_mc_n_iter (int.) – Number of Monte Carlo iterations. Larger number of iterations will generate larger displacements. Default=10.

  • ml_models (list[str]) – List of the ML models to be used. Default is GAP.

  • force_max (float) – Maximum allowed force in the dataset.

  • force_min (float) – Minimal force cutoff value for atom-wise regularization.

  • split_ratio (float.) – Parameter to divide the training set and the test set. A value of 0.1 means that the ratio of the training set to the test set is 9:1.

  • regularization (bool) – For using sigma regularization.

  • distillation (bool) – For using data distillation.

  • separated (bool) – Repeat the fit for each data_type available in the (combined) database.

  • num_processes_fit (int) – Number of processes for fitting.

  • apply_data_preprocessing (bool) – Apply data preprocessing.

  • atomwise_regularization_parameter (float) – Regularization value for the atom-wise force components.

  • atom_wise_regularization (bool) – For including atom-wise regularization.

  • auto_delta (bool) – Automatically determines delta for 2b, 3b and soap terms.

  • hyper_para_loop (bool) – Making it easier to loop through several hyperparameter sets.

  • atomwise_regularization_list (list) – List of atom-wise regularization parameters that are checked.

  • soap_delta_list (list) – List of SOAP delta values that are checked.

  • n_sparse_list (list) – List of GAP n_sparse values that are checked.

  • supercell_settings (dict) – Settings for supercell generation

  • benchmark_kwargs (dict) – Keyword arguments for the benchmark flows

  • path_to_hyperparameters (str or Path.) – Path to JSON file containing the MLIP hyperparameters.

  • summary_filename_prefix (str) – Prefix of the result summary file.

  • glue_xml (bool) – Use the glue.xml core potential instead of fitting 2b terms.

  • glue_file_path (str) – Name of the glue.xml file path.

  • use_defaults_fitting (bool) – Use the fit defaults.

  • run_fits_on_different_cluster (bool) – Allows you to run fits on a different cluster than DFT (will transfer fit database via MongoDB, might be slow).

make(structure_list, mp_ids, dft_references=None, benchmark_structures=None, benchmark_mp_ids=None, pre_database_dir=None, pre_xyz_files=None, rattle_seed=42, fit_kwargs_list=None)[source]#

Make flow for constructing the dataset, fitting the potentials and performing the benchmarks.

Parameters:
  • structure_list (list[Structure]) – List of pymatgen structures.

  • mp_ids – Materials Project IDs.

  • dft_references (list[PhononBSDOSDoc] | None) – List of DFT reference files containing the PhononBSDOCDoc object. Reference files have to refer to a finite displacement of 0.01. For benchmarking, only 0.01 is supported

  • benchmark_structures (list[Structure] | None) – The pymatgen structure for benchmarking.

  • benchmark_mp_ids (list[str] | None) – Materials Project ID of the benchmarking structure.

  • pre_xyz_files (list[str] or None) – Names of the pre-database train xyz file and test xyz file.

  • pre_database_dir (str or None) – The pre-database directory.

  • rattle_seed (int | None) – Random seed for structure generation.

  • fit_kwargs_list (list[dict].) – Dict including MLIP fit keyword args.

Return type:

Flow

static add_dft_phonons(structure, mp_id, displacements, symprec, phonon_bulk_relax_maker, phonon_static_energy_maker, phonon_displacement_maker, supercell_settings)[source]#

Add DFT phonon runs for reference structures.

Parameters:
  • structure (Structure) – The pymatgen Structure object

  • mp_id (str) – Materials Project ID

  • displacements (list[float]) – Displacement distance for phonons

  • symprec (float) – Symmetry precision to use in the reduction of symmetry to find the primitive/conventional cell (use_primitive_standard_structure, use_conventional_standard_structure) and to handle all symmetry-related tasks in phonopy

  • phonon_displacement_maker (BaseVaspMaker) – Maker used to compute the forces for a supercell.

  • phonon_bulk_relax_maker (BaseVaspMaker) – Maker used for the bulk relax unit cell calculation.

  • phonon_static_energy_maker (BaseVaspMaker) – Maker used for the static energy unit cell calculation.

  • supercell_settings (dict) – Supercell settings

static add_dft_rattled(structure, mp_id, rattled_bulk_relax_maker, displacement_maker, uc=False, volume_custom_scale_factors=None, volume_scale_factor_range=None, rattle_std=0.01, distort_type=0, n_structures=10, min_distance=1.5, angle_percentage_scale=10, angle_max_attempts=1000, rattle_type=0, rattle_seed=42, rattle_mc_n_iter=10, w_angle=None, supercell_settings=None)[source]#

Add DFT static runs for randomly displaced structures.

Parameters:
  • structure (Structure) – The pymatgen Structure object

  • mp_id (str) – Materials Project ID

  • displacement_maker (BaseVaspMaker) – Maker used for a static calculation for a supercell.

  • rattled_bulk_relax_maker (BaseVaspMaker) – Maker used for the bulk relax unit cell calculation.

  • uc (bool.) – If True, will generate randomly distorted structures (unitcells) and add static computation jobs to the flow.

  • distort_type (int.) – 0- volume distortion, 1- angle distortion, 2- volume and angle distortion. Default=0.

  • n_structures (int.) – Total number of distorted structures to be generated. Must be provided if distorting volume without specifying a range, or if distorting angles. Default=10.

  • volume_scale_factor_range (list[float]) – [min, max] of volume scale factors. e.g. [0.90, 1.10] will distort volume +-10%.

  • volume_custom_scale_factors (list[float]) – Specify explicit scale factors (if range is not specified). If None, will default to [0.90, 0.95, 0.98, 0.99, 1.01, 1.02, 1.05, 1.10].

  • min_distance (float) – Minimum separation allowed between any two atoms. Default= 1.5A.

  • angle_percentage_scale (float) – Angle scaling factor. Default= 10 will randomly distort angles by +-10% of original value.

  • angle_max_attempts (int.) – Maximum number of attempts to distort structure before aborting. Default=1000.

  • w_angle (list[float]) – List of angle indices to be changed i.e. 0=alpha, 1=beta, 2=gamma. Default= [0, 1, 2].

  • rattle_type (int.) – 0- standard rattling, 1- Monte-Carlo rattling. Default=0.

  • rattle_std (float.) – Rattle amplitude (standard deviation in normal distribution). Default=0.01. Note that for MC rattling, displacements generated will roughly be rattle_mc_n_iter**0.5 * rattle_std for small values of n_iter.

  • rattle_seed (int.) – Seed for setting up NumPy random state from which random numbers are generated. Default=42.

  • rattle_mc_n_iter (int.) – Number of Monte Carlo iterations. Larger number of iterations will generate larger displacements. Default=10.

  • supercell_settings (dict) – Settings for supercells