Datasets used in the publication "Machine Learning Stability and Bandgaps of Lead-Free Perovskites for Photovoltaics" [doi:10.1002/adts.201900178].
All structures were relaxed with the following parameters using Quantumwise QATK 2017:
- SG15-GGA norm-conserving (Vanderbilt) pseudopotentials employed in a LCAO-approach (200 Hartree cutoff)
- 2x1x2-cubic-perovskite-supercells, relaxed from cubic 11.4Åx5.7Åx11.4Å-structures (forces < 0.01eV/Å)
- 300K Fermi-Dirac-smearing
- a 6x12x6 k-point grid (Monkhorst-Pack)
Specifically, the included files are:
db_2.data: the actual database used for model building (json-format)
lead_set.data: the "external" test set used to test predictive power with out of sample compounds (json-format)
load_stanley_c.py: a python script to parse the .json-files to a python-dictionary including the structures (relaxed and unrelaxed) as ASE-atoms
The format of the datafiles is as follows (-1 generally denote values not parsed from the raw data):
{
"" : {
"trajectory" : n/a,
"energy" : total DFT energy in eV,
"rstruc" : relaxed structure, 3-tuple: (cell-vectors, scaled_positions, elements),
"gaps" : { "opt_gap", "ind_gap } - both direct and indirect gap,
"effective_mass" : n/a,
"iterations" : number of relaxation steps,
"calc" : some calculation metadata,
"ustruc" : unrelaxed input structure,
}
}
Missing ids relate to structures filtered out, because the calculation didn't converge.
Some code which works with a different representation of this data can be found at
https://github.com/jstanai/Machine-Learning-Perovskite-Properties-for-Photovoltaics