Name File Type Size Last Modified
COVID19_twitter_full_dataset.csv.zip application/zip 4.8 GB 09/11/2021 09:27:AM
COVID_Twitter_database_paper.pdf application/pdf 1.4 MB 09/11/2021 08:43:AM
Dataset-Terms of Use.pdf application/pdf 318.8 KB 09/11/2021 08:09:AM
readme.txt text/plain 3.2 KB 09/11/2021 06:39:AM
tweetid_userid_keyword_sentiments_emotions_Argentina.csv.zip application/zip 1.4 MB 09/11/2021 06:16:AM
tweetid_userid_keyword_sentiments_emotions_Australia.csv.zip application/zip 76.3 MB 09/11/2021 06:14:AM
tweetid_userid_keyword_sentiments_emotions_Brazil.csv.zip application/zip 6.2 MB 09/11/2021 05:08:AM
tweetid_userid_keyword_sentiments_emotions_Canada.csv.zip application/zip 161 MB 09/11/2021 06:18:AM
tweetid_userid_keyword_sentiments_emotions_Colombia.csv.zip application/zip 1.3 MB 09/11/2021 06:18:AM
tweetid_userid_keyword_sentiments_emotions_Denmark.csv.zip application/zip 2.6 MB 09/11/2021 06:19:AM

Citation: 

Gupta, Raj, Vishwanath, Ajay, and Yang, Yinping. COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes: Twitter COVID dataset  Sep2021. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-11-04. https://doi.org/10.3886/E120321V11-101600

To view the citation for the overall project, see http://doi.org/10.3886/E120321V11.

Project Description

Summary:  View help for Summary This project aims to present a large dataset for researchers to discover public conversation on Twitter surrounding the COVID-19 pandemic. From 28 January 2020 to 1 September 2021, we collected over 198 million Twitter posts from more than 25 million unique users using four keywords: “corona”, “wuhan”, “nCov” and “covid”. Leveraging topic modeling techniques and pre-trained machine learning-based emotion analytic algorithms, we labeled each tweet with seventeen semantic attributes, including a) ten binary attributes indicating the tweet’s relevance or irrelevance to the top ten detected topics, b) five quantitative emotion attributes indicating the degree of intensity of the valence or sentiment (from 0: very negative to 1: very positive), and the degree of intensity of fear, anger, happiness and sadness emotions (from 0: not at all to 1: extremely intense), and c) two qualitative attributes indicating the sentiment category (very negative, negative, neutral or mixed, positive, very positive) and the dominant emotion category (fear, anger, happiness, sadness, no specific emotion) the tweet is mainly expressing. 

Scope of Project

Subject Terms:  View help for Subject Terms [COVID-19; pandemic; twitter; social media; , COVID-19; pandemic; twitter; social media; sentiment analysis; emotion recognition; ]
Geographic Coverage:  View help for Geographic Coverage Global
Universe:  View help for Universe Twitter posts
Data Type(s):  View help for Data Type(s) other; program source code; text
Collection Notes:  View help for Collection Notes  The latest version has data updated up to 1 Sep 2021, including additional csv downloads for 29 countries)


Related Publications

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.