The dataset contains a multi-fidelity dataset of PBE and HSE-PBE+SOC data which has been used to train the RGF ML model. This surrogate model have been used to predict and screen on a massive dataset of 151,140 novel halide perovskite alloy compoisitions. Another dataset also contains the 3043 screen halide perovskites found suitable for photocatalytic water splitting.