Dataset for 'Machine Learning Stability and Bandgaps of Lead-Free Perovskites for Photovoltaics'

Stanley, Jared C.; Gagliardi, Alessio

perovskite machine learning materials science

Organizations

MDF Open

Year

2023

Source Name

stanley_machine_learning_photovoltaics

DOI

10.18126/tu66-3hpl View on Datacite

Get the Data

Datasets used in the publication "Machine Learning Stability and Bandgaps of Lead-Free Perovskites for Photovoltaics" [doi:10.1002/adts.201900178]. All structures were relaxed with the following parameters using Quantumwise QATK 2017:

SG15-GGA norm-conserving (Vanderbilt) pseudopotentials employed in a LCAO-approach (200 Hartree cutoff)
2x1x2-cubic-perovskite-supercells, relaxed from cubic 11.4Åx5.7Åx11.4Å-structures (forces < 0.01eV/Å)
300K Fermi-Dirac-smearing
a 6x12x6 k-point grid (Monkhorst-Pack)

Specifically, the included files are: db_2.data: the actual database used for model building (json-format) lead_set.data: the "external" test set used to test predictive power with out of sample compounds (json-format) load_stanley_c.py: a python script to parse the .json-files to a python-dictionary including the structures (relaxed and unrelaxed) as ASE-atoms The format of the datafiles is as follows (-1 generally denote values not parsed from the raw data): { "" : { "trajectory" : n/a, "energy" : total DFT energy in eV, "rstruc" : relaxed structure, 3-tuple: (cell-vectors, scaled_positions, elements), "gaps" : { "opt_gap", "ind_gap } - both direct and indirect gap, "effective_mass" : n/a, "iterations" : number of relaxation steps, "calc" : some calculation metadata, "ustruc" : unrelaxed input structure,

} Missing ids relate to structures filtered out, because the calculation didn't converge. Some code which works with a different representation of this data can be found at https://github.com/jstanai/Machine-Learning-Perovskite-Properties-for-Photovoltaics