Generating reference data#

This tutorial will explain how the reference data from DFT-based phonon structures for the MLIP fit can be generated. The idea behind the VASP reference data generation stems from J. Chem. Phys. 153, 044104 (2020) and demonstrates that a robust database of crystalline structures for MLIP reproducing accurate phonon structures can be build by generating the single-atom displaced supercells using phonopy and combining them with a set of rattled supercell structures, generated from the same unit cell models.

Single-atom displaced supercell structures#

The single-atom displaced supercell structures used in autoplex are generated by phonopy VASP-related routines, that are collected in the dft_phonopy_gen_data flow (see diagram in the general tutorial. The displacement default is 0.1 Å and can be adjusted. A phonopy calculation with a displacement of 0.1 Å is automatically included in the workflow for calculating the DFT benchmark reference (if no reference is provided by the user). The min_length parameter controls the supercells size and is set to 20 per default as this value is good for ensuring that the periodic boundary conditions and the energy convergence criteria for phonon calculations are met. More settings can be found in the API reference.

There is the possibility to run the complete autoplex workflow using only phonopy generated supercells:

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(add_dft_random_struct=False, min_length=20,
                                                 displacements=displacement_list).make(
    structure_list=structure_list, mp_ids=mpids, 
    benchmark_structures=benchmark_structure_list, benchmark_mp_ids=mpbenchmark)

By doing so, the generation of the randomized structures has to be tuned off by setting the add_dft_random_struct bool to False. You can also decide if you want to only use those single-atom displaced supercells only for the MLIP fit, or if the data shall be added to an existing database.

Adding data to an existing database is achieved by:

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(
    add_dft_random_struct=False,
    pre_xyz_files=["vasp_ref.extxyz"], pre_database_dir=path/to/database
).make(
    structure_list=structure_list, mp_ids=mpids, 
    benchmark_structures=benchmark_structure_list, benchmark_mp_ids=mpbenchmark,)

Where pre_xyz_files can also take a train and test database as argument, e.g. as pre_xyz_files=["pre_xyz_train.extxyz", "pre_xyz_test.extxyz"].

autoplex is equipped with a DFTPhononMaker class that inherits from the atomate2 PhononMaker with specific VASP input adjustments to guarantee high quality fit data. It can be used to run individual and customized phonopy workflows to generate MLIP fit data.

Rattled supercell structures#

There are several ways available in autoplex to rattle supercell structures, that are collected in the dft_random_gen_data flow (see diagram in the general tutorial. The size of the supercell is determined by the supercell_matrix, and there is the option of volume distortion, angle distortion or a combination of both provided by distort_type. The displacement of all atomic positions (“rattling”) is controlled by the parameter rattle_type, which uses the the ase rattle function (using a normal distribution of a certain standard deviation to draw the displacement value) by default and can be changed to Monte-Carlo rattling.

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(rattle_type=0, # 0 = standard ase.Atoms.rattle(stddev)
                                                 distort_type=0, # only volume distortion
                                                 supercell_matrix=[[3, 0, 0], [0, 3, 0]],
                                                 volume_scale_factor_range=[0.95, 1.05],
                                                 n_structures=20).make(
    structure_list=structure_list, mp_ids=mpids, 
    benchmark_structures=benchmark_structure_list, benchmark_mp_ids=mpbenchmark)

The combination of parameters volume_scale_factor_range and n_structures will produce 21 (20 volume distorted + the undistorted supercell) supercells with a volume range of 95 to 105% of the original supercell. Alternatively, the parameter volume_custom_scale_factors can be used to set specific scale factors.

ℹ️ It is important to note that by using volume_custom_scale_factors the parameter n_structures is ignored and only one rattled supercell for each given factor is generated. If more supercells with the same volume scale are needed, this can be achieved by e.g.

scale_factors = [0.90, 0.95, 1.00, 1.05, 1.10]

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(
    volume_custom_scale_factors=[val for val in scale_factors for _ in range(5)]).make(...)
                                  # will repeat each scale factor five times

Explicitly specifying volume_custom_scale_factors is useful if you don’t want evenly spaced intervals between scale factors as e.g., you want to sample around the minimum more closely.

More details and settings are given in the API reference.

Similar to the single-atom displaced supercells, you can run the complete autoplex workflow using only randomized structures by setting add_dft_phonon_struct to False.

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(add_dft_phonon_struct=False).make(
    structure_list=structure_list, mp_ids=mpids, preprocessing_data=True,
    benchmark_structures=benchmark_structure_list, benchmark_mp_ids=mpbenchmark)

It can also be used to extend an already existing database in the same way as demonstrated above.

As a counterpart to the DFTPhononMaker for generating data, autoplex includes a RandomStructuresDataGenerator that can be used to construct customized randomized structures workflows. autoplex provides a variety of utility subroutines to further customize a workflow.

Adjust supercell settings#

You can adjust the supercell settings by passing a dictionary containing your specific supercell settings for each MP-ID to CompleteDFTvsMLBenchmarkWorkflow, e.g. like:

mp_id = "mp-22905"

supercell_settings = {
    mp_id: {
        "supercell_matrix": [[0, 2, 0], [0, 0, 2], [2, 0, 0]]
    },
    "min_length": 11,  
    "max_length": 25,
    "fallback_min_length": 9,
    "min_atoms": 100,
    "max_atoms": 200,
    "step_size": 1,
}

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(
    ..., 
    supercell_settings=supercell_settings, 
    ...).make(...)

To keep the calculations consistent, this will adjust the settings of single-atom displaced and rattled supercells.

VASP settings#

This part will show you how you can adjust the different Makers for the VASP calculations in the workflow.

For the single-atom displaced as well as the rattled structures the autoplex TightDFTStaticMaker is used to set up the VASP calculation input and settings. PBEsol is the default GGA functional. For the VASP calculation of the isolated atoms’ energies, autoplex also provides its own IsoAtomStaticMaker, which settings you can further adjust. For the VASP geometry relaxation and static calculations of the unit cells as prerequisite calculations for generating the single-atom displaced as well as the rattled supercells, we rely on the atomate2 Makers StaticMaker, TightRelaxMaker in combination with the StaticSetGenerator VASP input set generator for this example.

from autoplex.auto.phonons.flows import CompleteDFTvsMLBenchmarkWorkflow
from autoplex.data.phonons.flows import IsoAtomStaticMaker, TightDFTStaticMaker
from atomate2.vasp.jobs.core import StaticMaker, TightRelaxMaker
from atomate2.vasp.sets.core import StaticSetGenerator

example_input_set = StaticSetGenerator(  # you can also define multiple input sets
    user_kpoints_settings={"grid_density": 1},
    user_incar_settings={
        "ALGO": "Normal",
        "IBRION": -1,
        "ISPIN": 1,
        "ISMEAR": 0,
         ...,         # set all INCAR tags you need
        "SIGMA": 0.05,
        "GGA": "PE",  # switches to PBE
         ...},
)
static_isolated_atom_maker = IsoAtomStaticMaker(
    name="isolated_atom_maker",
    input_set_generator=example_input_set,
)
displacement_maker = TightDFTStaticMaker(
    name="displacement_maker",
    input_set_generator=example_input_set,
)
rattled_bulk_relax_maker = TightRelaxMaker(
    name="bulk_rattled_maker",
    input_set_generator=example_input_set,
)
phonon_bulk_relax_maker = TightRelaxMaker(
    name="bulk_phonon_maker",
    input_set_generator=example_input_set,
)
phonon_static_energy_maker = StaticMaker(
    name="phonon_static_energy_maker",
    input_set_generator=example_input_set,
)

complete_flow = CompleteDFTvsMLBenchmarkWorkflow(
    displacement_maker=displacement_maker,  # one displacement maker for rattled and single-atom displaced supercells to keep VASP settings consistent
    phonon_bulk_relax_maker=phonon_bulk_relax_maker,
    phonon_static_energy_maker=phonon_static_energy_maker,
    rattled_bulk_relax_maker=rattled_bulk_relax_maker,
    isolated_atom_maker=static_isolated_atom_maker,).make(...)

Note, that for consistency of job handling, autoplex internally will override the jobs names to the autoplex defaults:

INFO Started executing jobs locally
INFO Starting job - reduce_supercell_size_job_mp-22905 
INFO Finished job - reduce_supercell_size_job_mp-22905 
INFO Starting job - rattled supercells_mp-22905 
INFO Finished job - rattled supercells_mp-22905 
INFO Starting job - dft tight relax_mp-22905 
INFO Finished job - dft tight relax_mp-22905 
INFO Starting job - generate_randomized_structures_mp-22905 
INFO Finished job - generate_randomized_structures_mp-22905 
INFO Starting job - run_phonon_displacements_mp-22905 
INFO Finished job - run_phonon_displacements_mp-22905 
INFO Starting job - dft rattle static 1/12_mp-22905 
INFO Finished job - dft rattle static 1/12_mp-22905
INFO Starting job - dft rattle static 2/12_mp-22905 
INFO Finished job - dft rattle static 2/12_mp-22905 
...
INFO Starting job - single-atom displaced supercells_mp-22905 
INFO Finished job - single-atom displaced supercells_mp-22905 
INFO Starting job - dft tight relax_mp-22905
INFO Finished job - dft tight relax_mp-22905 
INFO Starting job - dft static_mp-22905 
INFO Finished job - dft static_mp-22905 
INFO Starting job - generate_phonon_displacements_mp-22905 
INFO Finished job - generate_phonon_displacements_mp-22905 
INFO Starting job - run_phonon_displacements_mp-22905 
INFO Finished job - run_phonon_displacements_mp-22905 
INFO Starting job - dft phonon static 1/2_mp-22905 
INFO Finished job - dft phonon static 1/2_mp-22905 
INFO Starting job - dft phonon static 2/2_mp-22905 
INFO Finished job - dft phonon static 2/2_mp-22905
...
INFO Starting job - write_benchmark_metrics
INFO Finished job - write_benchmark_metrics
INFO Finished executing jobs locally