AWS COVID-19 data lake makes regularly updated and curated COVID-19 datasets available to anyone with access to an AWS account

Amazon Web Services (AWS) has formed a public AWS COVID-19 data lake, a centralized repository of datasets related to the spread of the novel coronavirus and associated illnesses.

AWS on April 8 said it was working with partners to make the growing collection of COVID-19 datasets freely available and keep it up-to-date. AWS has seeded the data lake with COVID-19 case tracking data from Johns Hopkins and the New York Times, hospital bed availability from Definitive Healthcare, and more than 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI. AWS will regularly add more data as they become publicly available.

The AWS COVID-19 data lake lets experimenters quickly run analyses on data in place without having to spend time extracting and wrangling data from all available data sources. Tools from AWS or third parties can be used to perform trend and question/answer analyses, execute keyword searches, build machine learning models, or run custom analyses to meet specific needs. Users can choose to work with the public lake data, combine it with their own data, or subscribe to source datasets via AWS Data Exchange.

AWS envisions local health authorities could build dashboards to track infections and collaborate to deploy vital resources such as ventilators or hospital beds. Epidemiologists could complement their own datasets and models to generate forecasts of trends and hotspots. In its April 8 bulletin, AWS provides an example of how to use the AWS COVID-19 data lake for analysis. To make use of the data lake, you must have access to an AWS account and permissions to create an AWS CloudFormation stack and AWS Glue resources. 

