The goal of the Open Access COVID-19 Drug Discovery Project is to engage physical and medical researchers and the machine learning (ML) community to advance data-centric approaches to SARS-CoV-2 drug discovery. ML-based approaches to drug design offer promise in drug discovery efforts because ML models are fast to apply and are therefore capable of screening large databases for the most promising candidates for further study.
While efforts to study new drugs through experimental and physics-based computational methods are essential to this effort, they may face limitations in the number of candidates they can assess per time. With over 12 million commercially available molecules, fast screening methods can provide a valuable way to downselect from such large sets of candidates. This is where ML comes in: by training models on the data generated in rigorous experimental and computational studies, ML models can then learn to identify which molecular features are most important and apply these design principles rapidly to millions of candidate therapies.
Through this project, we will aggregate and provide open access to relevant COVID-19 datasets, models, and screening results to the broader scientific community, while making every step of the modeling process transparent and open to criticism.
This is an open-access version of our materials informatics software platform, which we are actively populating with relevant datasets, trained models, and candidate screening results. Currently only users with an Aionics account can tune the models and designs on-platform, while guests can view the results or download the data at any step in the pipeline. Perhaps you want to download the datasets we’ve compiled from various literature sources, or the entire set of molecular features for a given dataset for your own modeling, or you just want to skip to the end and download the list of the most promising candidates discovered in the screening process. You can do all of these on the read-only platform. If you are interested in gaining greater access to the resources presented here, please contact us.
Any and all data presented here are available for your use. Please cite this resource as:
"Open Access COVID-19 Drug Discovery Project," Aionics Technologies, a business unit of Rho AI, accessed < date >, https://covid-19.aionics.io
We aim for this platform to enable the community to share and inspect the latest relevant datasets and ML predictions. If you find it valuable, we encourage you to take and use this data. Additionally, you can help us by: providing domain expertise on this problem to help us improve our approaches; letting us know if we’re missing some important data sources; providing feedback on the best sets of candidate drugs to consider in the screening steps; letting us know if you discover a better ML model for these datasets than those currently implemented here (we may implement your model and credit you!); or generally providing any other feedback about how to make this resource more valuable.
We are a collaboration between Aionics Technologies, Rho AI, and the research group of Prof. Evan Reed at Stanford University. We have experience in ML modeling for materials and chemicals, largely for battery applications, but are repurposing the Aionics platform to help in the search for solutions to the COVID-19 pandemic.
We want to hear from you! Please email Austin Sendek at firstname.lastname@example.org with any questions or feedback.