Twitter launches a COVID-19 data set of tweets for approved developers and researchers
Twitter is making it possible for developers and researchers to study the public conversation around COVID-19 in real time with an update to its API platform. The company is introducing a new COVID-19 stream endpoint to those participating in Twitter Developer Labs — a program that offers access to new API endpoints and other features ahead of their public release. The new COVID-19 endpoint will allow approved developers to access COVID-19 and coronavirus-related tweets across languages, resulting in a data set that will include tens of millions of tweets daily, Twitter says.
The data can be used to research a range of topics related to the coronavirus pandemic, including things like the spread of the disease, the spread of misinformation, crisis management within communities and more.
Developers may also use the new data set to build machine learning and data tools to help the scientific community answer key questions about COVID-19, Twitter notes.
The company itself will determine which tweets qualify for inclusion in this data set based on which words are used in the tweets — like “COVID-19” or “coronavirus,” for example. It also will pull tweets that use common coronavirus hashtags, which tend to be language-agnostic. These, by the way, are the same keywords that Twitter uses for its existing COVID-19 topic, which is powered by a Tweet annotation.
Twitter will also filter this data stream to exclude spammy and low-quality content.
While access to the endpoint will be free, Twitter will be hand-selecting which developers and researchers will be granted permission to use it. Developers will also have to inform Twitter of their project plan, detail their experience in working with big data and detail the available resources they have to process such a data set.
“Given the expertise and computational resources necessary to handle this data, and recognizing the sensitivity of it, we’ve created a dedicated application to access this endpoint and plan to carefully review access requests to ensure they support the public good,” notes Twitter in an announcement. “We also encourage applicants to describe in detail the safeguards they intend to implement to protect the privacy and safety of people represented in these data, including applicable institutional reviews and ethics screenings,” it says.
Twitter says it will prioritize processing applications from researchers and developers with established expertise and resources.
The application and endpoint are launching today. No developers or researchers had early access.
In addition to the application requirement to access the new endpoint, developers will also need to already have an approved developer account and adhere to the terms of Twitter’s Developer Agreement and Policy, which provides guidance about restricted use cases relevant to projects analyzing health-related topics. To ensure the data is kept in compliance, approved developers will also gain access to a new compliance stream endpoint, as well.
The new endpoint is one of several efforts Twitter has made since the coronavirus outbreak began, focused on connecting people with information about the pandemic. Across its platform, it introduced changes to make COVID-19 facts and reliable health information more accessible. It also updated its ads policy, partnered with relief organizations and matched fundraising donations, among other things.
“Public conversation can help the world learn faster, solve problems better and realize we’re all in this together,” said Twitter CEO Jack Dorsey, in a statement Twitter shared today, that came from a recent interview. “Facing a devastating global pandemic really brings that, and Twitter’s role, to light,” he added.