It is a truism that nothing in life is free. The same principle applies to materials properties. Generally, the more accurate a method to compute or measure a property, the more expensive it is. That is why despite maturing software and exponentially growing computing power, large databases of materials properties today are still primarily based on cheap but less accurate semi-local density functional theory (DFT) functionals. Data based on higher accuracy DFT methods and experimental measurements tend to be orders of magnitude smaller and less diverse in terms of coverage. This scarcity and heterogeneity of high-quality data is a critical bottleneck in the development of machine learning (ML) models for high-quality materials property predictions.
Our idea to address this fundamental trade-off is an extraordinarily simple one; by combining data from multiple fidelities to train a single model, we can leverage the large low-fidelity data to enable the model to learn better latent representations of materials, which would, in turn, lead to better accuracy on predictions on small high-fidelity datasets. Our model of choice is the MatErials Graph Network (MEGNet) framework, a deep learning approach that naturally represents atoms and bonds in a material as nodes and edges in a mathematical graph. Information flows between connected nodes and edges via graph convolutional layers, mimicking the atomic interactions in a real-world material. In addition, the MEGNet architecture incorporates a global state input, which provides a conduit for encoding fidelity information.
The effectiveness of our approach is summarized in the Figure below. The low-fidelity data set comprises more than 50,000 band gaps calculated using the standard semi-local PBE functional. The high-fidelity data sets are ~1000-3000 band gaps computed using the more accurate GLLB-SC, HSE, and SCAN functionals and from experimental measurements. The multi-fidelity (2-fi, 4-fi, and 5-fi) models lead to significantly lower mean absolute errors (~20-40% reduction) compared to the single-fidelity (1-fi) models.
It is serendipity that led us to one of the other potentially transformative features of the MEGNet approach. In the MEGNet framework, atomic attributes are represented as a learned length-16 embedding vector. The correlations between the embedding vectors for different elements reproduce the chemical trends in the periodic table of the elements. We found that interpolating these learned embedding vectors provides a way to model disordered materials, i.e., materials with sites occupied by more than one element and/or vacancies. While the bulk of computational and machine learning works have focused on ordered materials, disordered compounds actually form the majority of known materials. Using this approach, multi-fidelity graph network models can reproduce trends in the band gaps in disordered materials to reasonable accuracy (Figure 3).
If you are interested in more details, please refer to our paper published in Nature Computational Science “Learning Properties of Ordered and Disordered Materials from Multi-fidelity Data” or via the following link https://doi.org/10.1038/s43588-020-00002-x
 Chen, C.; Ye, W.; Zuo, Y.; Zheng, C.; Ong, S. P. Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chemistry of Materials 2019, 31(9), 3564-3572. doi:10.1021/acs.chemmater.9b01294
 Chen, C.; Zuo, Y.; Ye, W.; Li, X.G.; Ong, S. P. Learning Properties of Ordered and Disordered Materials from Multi-fidelity Data. Nature Computational Science, 2020, doi: 10.1038/s43588-020-00002-x.