MDDB provides information on symmetry/phase labels (SPL), sample descriptors (DSC), material properties (PRO), material applications (APL), synthesis methods (SMT), and characterization methods (CMT) for inorganic materials. The data are automatically extracted from the over 10 million English language materials-related publications with ScholarBERT, a large language model for science, fine-tuned on the Solid State Materials dataset.
If you use the data, please cite the following article:
Hong, Zhi, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, and Ian Foster. "The Diminishing Returns of Masked Language Models to Science." arXiv preprint arXiv:2205.11342 (2023).