Long-acting injectables are considered one of the most promising therapeutic strategies for the treatment of chronic diseases as they can afford improved therapeutic efficacy, safety, and patient compliance. The use of polymer materials in such a drug formulation strategy can offer unparalleled diversity owing to the ability to synthesize materials with a wide range of properties. However, the interplay between multiple parameters, including the physicochemical properties of the drug and polymer, make it very difficult to intuitively predict the performance of these systems. This necessitates the development and characterization of a wide array of formulation candidates through extensive and time-consuming in vitro experimentation. Machine learning is enabling leap-step advances in a number of fields including drug discovery and materials science. Our study takes a critical step towards data-driven drug formulation development with an emphasis on long-acting injectables. A series of machine learning algorithms were trained and refined for accurate prediction of experimental drug release profiles using this dataset.
The dataset was constructed from previously published studies by our research group and other research groups.The studies performed by our group include spherical and cylinder shaped polymeric LAIs. Data from external sources was identified using the Web of Science search engine and the keyword combination “polymeric microparticle” and “drug delivery”. Information related to the preparation, final composition, and release kinetics of drug from LAIs was collected. The latter was primarily extracted from figures of in vitro drug release profiles using the “GetData Graph Digitizer” application. The final dataset contained 181 drug release profiles for 43 unique drug-polymer combinations. In total this comprised 3783 individual fractional release measurements. The initially collected dataset was composed of a table of drug and polymer names, as well as physicochemical properties of the formulation, and fractional drug release values at various timepoints. In order to use this data to construct and train ML models it is necessary to describe various elements using machine-readable descriptors which were generated using RDkit. The polymers and LAI formulations were described exclusively using information reported in the relevant published articles, these included; polymer molecular weight (Polymer_MW), lactide-to-glycolide ratio (LA/GA; for non-PLGA systems this was set as zero), molecular crosslinking ratio of polymers (CL_Ratio; for non-cross-linked systems this was set as zero), initial drug-to-polymer ratio (Initial D/M ratio), drug loading capacity (DLC), surface area-to-volume (SA-V) ratio for the LAI system, fractional drug release at 6 h (T=0.25), fractional drug release at 12 h (T=0.5), fractional drug release at 24 h (T=1.0), and the precent of surfactant present in the release media (SE; where no surfactant was present in the release media, this was set as zero). With the exception of SA-V, T=0.25, T=0.5, and T=1.0, the 17 input features were either extracted from original publications or calculated using the RDkit package. SA-V was constructed and implemented for this study as it confers information that is related to the size and shape of the LAI system. This enables the inclusion of both spherical and cylindrical shaped LAIs in one model. For initial fractional drug release timepoints (i.e., T=0.25, T=0.5, and T=1.0), where these values were not available from the previously published studies, they were imputed using best fit polynomial curves that range from T = 0 to T = 2 days.
The code and results that support the findings of our study are available at the Aspuru-Guzik Group’s GitHub page (https://github.com/aspuru-guzik-group/long-acting-injectables
) and in the preprint of the related manuscript available on ChemRxiv (https://doi.org/10.26434/chemrxiv-2021-mxrxw-v2