Dataset associated with "Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing," a paper published in the proceedings of Machine Learning in HPC Environments at SC'21. The dataset contains the source code used to produce the results in the paper, the output of all active learning runs reported in the paper, and the Jupyter notebooks used to create the figures in the paper.