Using social media data to assess the impact of COVID-19 on mental health in China

Yongjian Zhu; Liqing Cao; Jingui Xie; Yugang Yu; Anfan Chen; Fengming Huang

doi:10.1017/S0033291721001598

Using social media data to assess the impact of COVID-19 on mental health in China

Published online by Cambridge University Press: 20 April 2021

Yongjian Zhu

Liqing Cao ,

Jingui Xie ,

Yugang Yu ,

Anfan Chen and

Fengming Huang

Show author details

Yongjian Zhu: Affiliation:
School of Management, University of Science and Technology of China, Hefei, China
Liqing Cao*: Affiliation:
The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
Jingui Xie: Affiliation:
School of Management, Technical University of Munich, Heilbronn, Germany
Yugang Yu: Affiliation:
School of Management, University of Science and Technology of China, Hefei, China
Anfan Chen: Affiliation:
School of Humanity and Social Science, University of Science and Technology of China, Hefei, China
Fengming Huang: Affiliation:
The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
*: Author for correspondence: Liqing Cao, E-mail: caoliqing@ustc.edu.cn

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Materials and methods
Results
Discussion
Conclusion
Conflict of interest
References

Rights & Permissions

Abstract

Background

The outbreak and rapid spread of coronavirus disease 2019 (COVID-19) not only caused an adverse impact on physical health, but also brought about mental health problems among the public.

Methods

To assess the causal impact of COVID-19 on psychological changes in China, we constructed a city-level panel data set based on the expressed sentiment in the contents of 13 million geotagged tweets on Sina Weibo, the Chinese largest microblog platform.

Results

Applying a difference-in-differences approach, we found a significant deterioration in mental health status after the occurrence of COVID-19. We also observed that this psychological effect faded out over time during our study period and was more pronounced among women, teenagers and older adults. The mental health impact was more likely to be observed in cities with low levels of initial mental health status, economic development, medical resources and social security.

Conclusions

Our findings may assist in the understanding of mental health impact of COVID-19 and yield useful insights into how to make effective psychological interventions in this kind of sudden public health event.

Keywords

COVID-19 difference-in-differences mental health sentiment analysis social media

Type: Original Article
Information: Psychological Medicine , Volume 53 , Issue 2 , January 2023 , pp. 388 - 395

DOI: https://doi.org/10.1017/S0033291721001598 [Opens in a new window]
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press

Introduction

The epidemic of coronavirus disease 2019 (COVID-19) has become a severe public health crisis (Sohrabi et al., Reference Sohrabi, Alsafi, O'Neill, Khan, Kerwan, Al-Jabir and Agha2020). In addition to the adverse impact on physical health, the outbreak and rapid spread of COVID-19 have also brought about mental health problems among the public, such as anxiety and depression (Holmes et al., Reference Holmes, O'Connor, Perry, Tracey, Wessely, Arseneault and Everall2020; Liu et al., Reference Liu, Yang, Zhang, Xiang, Liu, Hu and Zhang2020; Wang et al., Reference Wang, Pan, Wan, Tan, Xu, Ho and Ho2020a). To capture the psychological problems during the COVID-19 epidemic, online questionnaires and surveys are widely used in ongoing studies (Gao et al., Reference Gao, Zheng, Jia, Chen, Mao, Chen and Dai2020; Hao et al., Reference Hao, Tan, Jiang, Zhang, Zhao, Zou and McIntyre2020; Rajkumar, Reference Rajkumar2020; Wang et al., Reference Wang, Pan, Wan, Tan, Xu, McIntyre and Sharma2020b, Reference Wang, Duan, Ma, Mao, Li, Wilson and Zhouc). Researchers detect the symptoms of mental illness and identify risk factors by asking participants to answer well-designed questions and report their characteristics. The challenge of these traditional methods is that it is difficult to monitor the mental health condition in real time and understand its dynamic changes (Areán, Ly, & Andersson, Reference Areán, Ly and Andersson2016; Gruebner et al., Reference Gruebner, Lowe, Sykora, Shankardass, Subramanian and Galea2017). The large-scale and real-time data generated by the widespread use of social media provide an approach to overcome these problems. By applying natural language processing (NLP), the expressed sentiment of tweets posted on the online social media platforms could be extracted from the text (Conway & O'Connor, Reference Conway and O'Connor2016; Gohil, Vuik, & Darzi, Reference Gohil, Vuik and Darzi2018). This is an effective indicator to reflect psychological response and has been increasingly used for measuring the mental health status (Gruebner et al., Reference Gruebner, Sykora, Lowe, Shankardass, Trinquart, Jackson and Galea2016; Liu, Zhu, Yu, Rasin, & Young, Reference Liu, Zhu, Yu, Rasin and Young2017; Wongkoblap, Vadillo, & Curcin, Reference Wongkoblap, Vadillo and Curcin2017).

In this study, we investigate how the COVID-19 epidemic affected mental health across China's cities using social media data from Sina Weibo, the largest microblog platform in China. The data included around 13 million geotagged tweets in mainland China between 1 January 2020 and 1 March 2020 from active Weibo users. For each tweet, we conducted the sentiment analysis to extract the expressed sentiment using the open-source NLP technique from Baidu (Tian et al., Reference Tian, Gao, Xiao, Liu, He, Wu and Wu2020a). Then we measured the daily mental health status for a city by calculating the median sentiment value based on tweets in that city on each day (Zheng, Wang, Sun, Zhang, & Kahn, Reference Zheng, Wang, Sun, Zhang and Kahn2019), which ranges from 0 to 1 with 0 indicating a strongly negative emotion and 1 indicating a strongly positive emotion. To quantify the causal effect of COVID-19 epidemic on mental health, we employed a difference-in-differences (DiD) approach (Dimick & Ryan, Reference Dimick and Ryan2014; Donald & Lang, Reference Donald and Lang2007; He, Pan, & Tanaka, Reference He, Pan and Tanaka2020). The treatment group was defined as cities that have reported the first COVID-19 case. Following the definition, our analyses included 324 treated cities and 35 control cities. Specifically, the COVID-19 was first detected in Wuhan city in December 2019, but the pathogen was unknown and the severity was underestimated at first (Li, Wang, Xue, Zhao, & Zhu, Reference Li, Wang, Xue, Zhao and Zhu2020; Tian et al., Reference Tian, Liu, Li, Wu, Chen, Kraemer and Yang2020b; Yu et al., Reference Yu, Li, Kang, Xiong, Wang, Lin and Deng2020). Therefore, the situation in Wuhan is different from other cities in treatment group and we excluded it from our data. We controlled daily air pollution and weather conditions since these factors could also affect the expressed sentiment on Weibo tweets (Zheng et al., Reference Zheng, Wang, Sun, Zhang and Kahn2019).

Our study has the following strengths and contributions. First, the scale of our data is large, which are collected based on a 20-million-level active user pool in Sina Weibo (Hu, Huang, Chen, & Mao, Reference Hu, Huang, Chen and Mao2020). All geotagged tweets posted by these active users during our study period were selected and used to construct a national panel data set, covering 359 cities in China. Second, the DiD approach helps us to infer the causal relationship between COVID-19 epidemic and mental health. For example, since the occurrence of COVID-19 in China almost coincided with the Chinese Spring Festival (25 January 2020), it is hard to distinguish the effect of the national holiday from the impact of COVID-19 epidemic just by before-after comparison (Li et al., Reference Li, Wang, Xue, Zhao and Zhu2020; Su et al., Reference Su, Xue, Liu, Wu, Chen, Chen and Zhu2020). In our DiD strategy, cities without COVID-19 cases can serve as the counterfactual and various confounding factors can be controlled in the model. Therefore, we could plausibly identify the causal impact of COVID-19. Third, our comprehensive data set allows us to examine whether COVID-19 disproportionately affects the mental health among different segments of the population, categorized by gender and age, and investigate whether the psychological effect varies across different types of cities. Relying on these strengths, our findings may assist the policymakers to understand the impact of COVID-19 on mental health in detail using social media data and provide useful implications for the psychological interventions when facing this kind of public health crisis.

Materials and methods

Data

Social media data

Sina Weibo (https://www.weibo.com/), the Chinese equivalent of Twitter, is the largest microblog platform in China. Large-scale data access is difficult for Weibo because of the limitation of its application programing interface (Shen et al., Reference Shen, Chen, Luo, Zhang, Feng and Liao2020). To solve this problem, our Weibo data were obtained based on a pool of 20 million active users (Hu et al., Reference Hu, Huang, Chen and Mao2020), which was selected from over 250 million Weibo users generated by snowball sampling. We collected all geotagged tweets of these active users between 1 January 2020 and 1 March 2020. Geotagged tweets mean that the users share their location information based on the exact latitude and longitude when they post these tweets. Then, 13 million geotagged tweets in mainland China during our study period were selected, including the gender and age information of their users.

Using these data, we conducted our sentiment analysis by applying the SKEP model (Tian et al., Reference Tian, Gao, Xiao, Liu, He, Wu and Wu2020a) from Baidu Senta (an open-source python library) published in 2020, which integrated sentiment knowledge into pre-trained models and achieved new state-of-the-art results on most of the test data sets. For each tweet, the sentiment analysis could return two probabilities representing the intensity of the positive and negative emotions based on the text, and the sum of these two probabilities is 1. In this study, we used the positive probability as a measurement of the user's mental health status at the time when the tweet was posted. The daily mental health status for a city is measured by calculating the median positive probability for that city on each day (Zheng et al., Reference Zheng, Wang, Sun, Zhang and Kahn2019). This city-level mental health status ranges from 0 to 1 with 0 indicating a strongly negative emotion and 1 indicating a strongly positive emotion. We also calculated the mean value of the positive probabilities and used it to measure city-level mental health status in our robustness check.

COVID-19 epidemic data

In this paper, the treatment group was defined as cities that have reported the first COVID-19 case. We collected the date of the first confirmed case in each city from the official websites of local health commissions. COVID-19 was first detected in Wuhan city in December 2019, but the pathogen was unknown at first and human-to-human transmission was not verified. Therefore, the situation in Wuhan is different from other cities in the treatment group and we excluded Wuhan from our data. Finally, our data included 324 treated cities and 35 control cities. The geographical distribution of these 359 cities and the cumulative number of treated cities by 1 March 2020 are presented in Fig. 1 and online Supplementary Fig. S1.

Fig. 1. Geographical distribution of 359 cities. As of 1 March 2020, 324 cities (Wuhan was excluded) have reported COVID-19 cases and the rest is the control group, including 35 cities.

Air pollution and weather data

Since air pollution and weather conditions could affect the expressed sentiment on social media (Zheng et al., Reference Zheng, Wang, Sun, Zhang and Kahn2019), these confounders should be controlled in our analyses. In China, the air quality index (AQI) is a composite measure of air pollution, constructed by the concentrations of PM_2.5, PM₁₀, SO₂, CO, O₃ and NO₂ (Zhong, Yu, & Zhu, Reference Zhong, Yu and Zhu2019). A lower AQI means better air quality. We collected daily city-level AQI data from the Ministry of Ecology and Environment in China (https://datacenter.mee.gov.cn/). City-level weather data including daily mean temperature, wind speed, rainfall and cloud, were obtained from an online platform called Huiju Data (http://hz.zc12369.com/), which collects data from China Meteorological Administration.

Socio-economic data

To explore the heterogeneity across cities, we collected the cities' socio-economic status from the 2019 China City Statistical Yearbook (National Bureau of Statistics of China, 2019). These data contain city-level statistics reflecting economic development, medical resources and social security level.

Summary statistics

The summary statistics of different variables between 1 January 2020 and 1 March 2020 are reported in online Supplementary Table S1. The average city-level mental health status was 0.6397, with a standard deviation of 0.0684. We observed a decline in the mental health status of treated cities after reporting COVID-19 cases.

Models

We used a DiD model to identify the impact of COVID-19 epidemic on mental health in China. This model could estimate the relative change in mental health status between the treated and control cities, specified as follows:

(1)$$Y_{it} = \alpha + \beta \cdot {\rm COVID}\_19_{it} + \gamma \cdot X_{it} + \mu _i + \pi _i + \varepsilon _{it}\;\;\;\;$$

where Y _it represents the mental health status in city i on date t measured by the social media data. ${\rm COVID}\_19_{it}$ denotes whether the COVID-19 epidemic has occurred in city i on date t, and takes the value 1 if the city has reported the first COVID-19 case and 0 otherwise. X _it are the control variables, including AQI, mean temperature, mean temperature squared, rainfall, wind speed and cloud. μ _i indicate city fixed effects, which are a set of city-specific dummy variables. By introducing the city fixed effects, we can control for time-invariant confounders specific to each city, such as geographical conditions and short-term economic level. π _i indicate the date fixed effects, which are a set of dummy variables accounting for shocks that are common to all cities on a given day, such as the Chinese Spring Festival Spring and nationwide policies. In this specification, both location and time-fixed effects are included in the regression, so the coefficient β estimates the difference in mental health status between the treatment cities and the control cities before and after the occurrence of the COVID-19 epidemic. We expected β to be negative, as both the coronavirus itself and counter-COVID-19 measures such as lockdown could harm the mental health (Fu et al., Reference Fu, Wang, Zou, Guo, Lu, Yan and Mao2020; Pfefferbaum & North, Reference Pfefferbaum and North2020).

The underlying assumption for the DiD estimator is that treatment and control cities would have parallel trends in mental health status in the absence of the COVID-19 event. Even if the results show that mental health status declines in treated city after the occurrence of COVID-19, the results may not be driven by the epidemic, but by systematic differences in treatment and control cities. For example, if treatment cities have a decreasing trend in mental health status and the control cities not, this could also drive the results. Although we cannot observe what would happen to mental health in the treated cities if the COVID-19 epidemic did not occur, we can still examine the parallel trends in mental health for both groups before the COVID-19 epidemic and investigate whether the two groups are comparable. To achieve this goal, we adopted an event study approach using the following relative time model (Burtch, Carnahan, & Greenwood, Reference Burtch, Carnahan and Greenwood2018; Greenwood & Agarwal, Reference Greenwood and Agarwal2016; Liu & Bharadwaj, Reference Liu and Bharadwaj2020):

(2)$$ \eqalign{ Y_{it} = \alpha & + \mathop \sum \limits_{m = k, m\ne -1}^M \beta ^k\cdot {\rm COVID}\_19_{it, k} + \gamma \cdot X_{it} + \mu _i + \pi _i \cr & + \varepsilon _{it}\;\;\;\; } $$

where ${\rm COVID}\_19_{it, k}$ are a set of dummy variables, which indicate the treatment status at different periods (weeks). Here, 7 days (1 week) are put into one bin (bin m ∈ M), so the high volatility of the daily mental health level could not affect the trend test (He et al., Reference He, Pan and Tanaka2020). We omit the dummy for m = −1 (1 week before the event), so the coefficient β ^k measures the difference in mental health status between the treatment and control cities in period k relative to the difference 1 week before the treatment. This specification could not only test the parallel trend assumption, but also examine whether the impact of COVID-19 epidemic fades out over time. If the pre-treatment trends are parallel, the coefficient β ^k would be not significantly different from zero when k ≤ −2. The psychological effect of COVID-19 would fade out over time during our study period if we observe that β ^k is negative at first and then becomes not significantly different from zero in subsequent periods when k ≥ 0. In all analyses, the standard errors were clustered at the city level.

Results

The impact of COVID-19 epidemic on mental health

We estimated the relative change in mental health status between the treated and control cities by using equation (1). The results reported in Table 1 indicate that the occurrence of COVID-19 had a significantly negative impact on mental health. After reporting the first COVID-19 case, the mental health status measured by the median sentiment value of Weibo tweets in treated cities declined by 0.0097 relative to cities without COVID-19 cases when controlling air pollution, weather conditions and a set of fixed effects (in column 2). We also observed that the inclusion of air pollution and weather variables made the R ² of our DiD model become higher, which increased the fit performance of the regression. This finding is consistent with our expectation which expects that both the coronavirus itself and subsequent transmission control measures such as lockdown could aggravate the mental health status (Fu et al., Reference Fu, Wang, Zou, Guo, Lu, Yan and Mao2020; Pfefferbaum & North, Reference Pfefferbaum and North2020).

Table 1. Effect of COVID-19 on mental health

Note. Due to some missing values of air pollution and weather data, the numbers of observations in the two columns are not the same. Standard errors are clustered at the city level and shown in parentheses. *p < 0.05; **p < 0.01; ***p < 0.001.

We conducted some additional analyses to validate the robustness of our main finding. We first excluded cities in Hubei province, the worst-hit region in China during the epidemic. Similar results suggest that the psychological effect of COVID-19 is not only driven by these cities (online Supplementary Table S2). We conducted further robustness check by replacing the dependent variable with the mean value (instead of the median value) of the expressed sentiment on Weibo tweets. The finding is still consistent (online Supplementary Table S3). After the occurrence of COVID-19, our Weibo data may contain more tweets from those people who are more sensitive to COVID-19 since they participate more in social media during the epidemic, which may affect our results. To address this issue, we conducted a tweet-level analysis and controlled user-fixed effects. The dependent variable is the expressed sentiment of each tweet, and the independent variable is a binary variable, which is equal to 1 when the tweet was posted after the city reported the first COVID-19 case, and 0 otherwise. We still observed similar results when controlling the user-fixed effects (online Supplementary Table S4).

Although our results are consistent across various robustness checks, it is still possible that the decrease in the mental health status may be driven by some unobserved differences between the treatment and control groups. If this were true, the psychological effect of COVID-19 would be statistically significant with any ordering of COVID-19 occurrence in treatment cities. Thus, we carried out a random implementation model to determine how likely it was that a random occurrence of COVID-19 would yield an aggregate effect size comparable to our true estimates (Greenwood & Agarwal, Reference Greenwood and Agarwal2016; Greenwood & Wattal, Reference Greenwood and Wattal2017; Liu & Bharadwaj, Reference Liu and Bharadwaj2020). First, we randomly assigned the pseudo-presence of COVID-19 to our treated cities, and then estimated the effect of the random occurrence of COVID-19 using equation (1) to obtain the coefficient for the pseudo-treated (denoted as β _pseudo). This procedure was repeated 1000 times and then we calculated the mean and standard deviation of β _pseudo. The Z-score was used to examine the difference between our original estimate β (reported in Table 1) and the mean of β _pseudo. In addition, we also replicated the whole procedure on all cities (instead of only the treatment group). The results show that the mean of β _pseudo is close to 0 and significantly different from the true estimate β (online Supplementary Table S4). Therefore, our original estimation is not spurious, and the causal claim is strengthened as well.

Test for pre-treatment parallel trends

To test whether the parallel trends assumption in our DiD model is violated, we adopted an event study approach and fitted a relative time model [see equation (2)] (Greenwood & Agarwal, Reference Greenwood and Agarwal2016; He et al., Reference He, Pan and Tanaka2020; Liu & Bharadwaj, Reference Liu and Bharadwaj2020). This model could measure the difference in mental health status between treated and control cities in each period relative to the difference 1 week before the treatment. The estimated coefficients and their 95% confidence intervals are plotted in Fig. 2. We find that the estimated coefficients are not significantly different from 0 before the occurrence of COVID-19, suggesting that there is no systematic difference in trends between treated and control cities before the treatment. This implies that parallel trends assumption of our DiD model would be reasonable in the absence of the COVID-19 epidemic.

Fig. 2. Effect of COVID-19 on mental health over time. The estimated coefficients from equation (2) and their 95% confidence intervals (error bars) are shown. The dummy variable indicating 1 week before the occurrence of COVID-19 is omitted from the regression. Thus, the difference in mental health status between treated and control cities 1 week before the treatment is set to be zero and serves as the reference point. The estimation signifies the difference in mental health status in each period relative to the difference 1 week before the treatment.

In addition to the test for pre-treatment parallel trends, this relative time model could also examine whether the impact of COVID-19 epidemic on mental health status among the public changes over time. We expect that the estimated coefficients are negative and statistically significant at first after the occurrence of the epidemic, and then become not significantly different from 0 in subsequent periods because the psychological effect is likely to fade out when the epidemic tends to be stable during our study period. All results shown in Fig. 2 are consistent with our expectation except for the estimated coefficient in 1 week after the occurrence of COVID-19 epidemic. The unexpected result is probably due to the temporary ‘pulling together’ or ‘honeymoon period’ phenomenon (Gordon, Bresin, Dombeck, Routledge, & Wonderlich, Reference Gordon, Bresin, Dombeck, Routledge and Wonderlich2011; Madianos & Evi, Reference Madianos and Evi2010; Matsubayashi, Sawada, & Ueda, Reference Matsubayashi, Sawada and Ueda2013). That is, to fight with COVID-19, social connectedness, community cohesion and mutual support are enhanced, mitigating the negative psychological impact of the epidemic. More future studies are needed to explore the underlying mechanisms. Besides, we also obtained similar results when using the mean sentiment value as the dependent variable in equation (2) (online Supplementary Fig. S2).

Heterogeneity across different subpopulations and cities

To investigate the heterogeneous effects of COVID-19 epidemic on mental health, we conducted two types of heterogeneity analyses. In the first analysis, we examined whether different subpopulations are disproportionately affected by the occurrence of COVID-19. To do so, we divided the tweets data into subgroups according to the self-reported gender and age information of the Weibo users. Then we calculated the daily city-level mental health status for each subgroup by the same method mentioned before and estimated equation (1) separately. The disproportionate effects on mental health among different gender and age groups are shown in Fig. 3. We find that women are more susceptible to the psychological impact of COVID-19. After the occurrence of this epidemic, the negative effect on mental health is more pronounced among teenagers (younger than 18 years old) and older adults (older than 45 years old). Collectively, these results imply that, by and large, the adverse psychological outcomes caused by COVID-19 are more likely to be observed among the vulnerable groups.

Fig. 3. Heterogeneous effects of COVID-19 on mental health across different subpopulations. Each row means a separate regression using equation (1) on the corresponding subsample. We use the gender and age information of Weibo users to separate our data. The estimated effects of COVID-19 and their 95% confidence intervals (error bars) are plotted.

In the second heterogeneity analysis, we investigated whether the psychological effect of COVID-19 varies across different types of cities. We first collected socio-economic statistics reported in the 2019 China City Statistical Yearbook (National Bureau of Statistics of China, 2019) for the cities in our data, such as regional GDP and the number of hospitals. For the initial mental health status, we measured it by using the median sentiment value of tweets posted in each city during the first week of our study period. Then our data were partitioned into high and low based on the median value for each factor. For example, if the regional GDP in a city is lower than the median GDP, it falls into a low GDP group, otherwise a high GDP group. The psychological effect was estimated separately using equation (1) based on data in each subgroup. We expect that the deterioration of mental health after occurrence of COVID-19 is more likely to be observed in cities with low levels of economic development, medical resources and social security, since these areas own poor financial, material and human support in the fight against this epidemic and the provision of mental health service. Our conjecture is confirmed in Fig. 4a–c: the negative effect is more notable in the low group. In Fig. 4d, we find that cities with poor initial mental health status are more susceptible to the psychological impact of COVID-19, so more related measures should be taken in these areas after the occurrence of epidemic.

Fig. 4. Heterogeneous effects of COVID-19 on mental health across cities. These heterogeneity analyses are divided into four categories: economic development (a), medical resources (b), social security (c) and initial mental health status (d). Data are partitioned into High and Low based on the median value for each factor. Each row means a separate regression using equation (1) on the corresponding subsample. The estimated effects of COVID-19 and their 95% confidence intervals (error bars) are plotted.

Discussion

In addition to the physical harm, the outbreak and rapid spread of COVID-19 has caused some additional effects, such as the improvement in air quality (He et al., Reference He, Pan and Tanaka2020) and the changes in mental health (Pfefferbaum & North, Reference Pfefferbaum and North2020). To fully understand the influence of this unprecedented event, we need to quantify these additional effects and this paper is an essential component. Our findings in this study could contribute to answering three research questions related to mental health impact of COVID-19.

First, does COVID-19 has a causal effect on the psychological changes reflected on social media in China? Applying a DiD approach on a comprehensive panel data set, our analyses reveal a deterioration in mental health status caused by the occurrence of COVID-19 among users on Sina Weibo, the Chinese equivalent of Twitter. This finding is robust in a set of robustness checks. However, the mental health measure is derived from the people who post tweets on social media. Although this group contains a large number of people, we acknowledge that it is not randomly drawn from the full population. Little children and people who are very old are less likely to use Sina Weibo (Wong, Merchant, & Moreno, Reference Wong, Merchant and Moreno2014; Zheng et al., Reference Zheng, Wang, Sun, Zhang and Kahn2019), and these individuals in fact may be more vulnerable to the psychological effect of COVID-19 (Jiao et al., Reference Jiao, Wang, Liu, Fang, Jiao, Pettoello-Mantovani and Somekh2020; Yang et al., Reference Yang, Li, Zhang, Zhang, Cheung and Xiang2020). Therefore, our results may underestimate the overall adverse effect of COVID-19 epidemic on the mental health status of a representative sample of the full population. In addition, due to the fear of legal consequences, from the accusation of spreading rumors, the self-censorship of sensitive conversation in Weibo could exist among the general public, especially during the early stage of COVID-19. This phenomenon may lead to bias in our data. Moreover, although Weibo is the largest microblog platform in China, it cannot be neglected that some social media users may employ a virtual private network to access overseas media information. Thus, the heavy reliance on one single data source, that is Weibo, may not be an ideal case. Further studies considering the diversity of data sources are needed to validate our results and take a more all-rounded look at the mental health impact of COVID-19.

Second, does the psychological impact of COVID-19 fade out as the epidemic tends to be stable over time? The results of our relative time model show that the effect of COVID-19 on mental health is likely to fade out during our study period. But, our results do not allow us to draw any conclusion that the psychological effect will disappear in the long term although the epidemic in China has been almost controlled. The end of this COVID-19 epidemic could not mean the disappearance of its effect on mental health among the public. The socio-economic effects caused by COVID-19, such as economic recession and social inequalities, are also harmful to our mental health status in the post-epidemic era, which might last for a long period (Kathirvel, Reference Kathirvel2020). Besides, a group of people may have difficulties in adjusting back to normal life when the epidemic is over, such as the students (Lee, Reference Lee2020). For example, during the COVID-19, students have to adapt themselves to online study. However, if the schools are reopened, they have to readjust to the traditional classes. The frequent shifts in lifestyle could bring about further psychological problems. Moreover, subsequent vaccine problems could also cause mental health changes among the public. Therefore, the assessment of psychological impact in post COVID-19 stage may be complex and need further rigorous analysis.

Third, does the effect of COVID-19 on mental health vary across different population groups and cities? Our first heterogeneity analysis shows that the psychological effect is more pronounced among women, teenagers (younger than 18 years old) and older adults (older than 45 years old). Thus, we should pay more attention to these vulnerable people when providing mental health services. Nevertheless, we are unable to capture the heterogeneous effects on little children and people who are very old, due to the limitation of the age distribution of Weibo users (Wong et al., Reference Wong, Merchant and Moreno2014; Zheng et al., Reference Zheng, Wang, Sun, Zhang and Kahn2019). Traditional questionnaires and surveys may be better methods to investigate the psychological impact on these population groups. Besides, it would be interesting to further categorize the population into caregiver group who is living with the elderly, or non-caregiver group who is living separately with their elder families if the data are accessible. Future studies could examine whether the COVID-19 epidemic would pose further stress to this group of caregivers when the mortality rate is so high among the elderly (Jordan, Adab, & Cheng, Reference Jordan, Adab and Cheng2020). The results of the second heterogeneity analysis imply that mental health impact of COVID-19 is more likely to be observed in cities with low levels of initial mental health status, economic development, medical resources and social security. Therefore, people with poor mental health status before COVID-19 and those living in underdeveloped areas that lack financial, material and human support could suffer from more serious mental health problems. These findings may help the government to grasp the point in decision making. For example, when allocating public resources and providing mental health support, giving priority to these areas at high risk may make the inputs produce more benefits. Additionally, the heterogeneity analysis also reminds us of the important role of the economic state, medical resources and social security in mitigating the negative psychological effect.

Conclusion

We conclude this paper by pointing out several directions for future research. First, we only focus on the text of tweets. However, some tweets contain other types of valuable data, such as pictures and videos, which provide rich information (Pittman & Reich, Reference Pittman and Reich2016). More further studies are needed to extract sentiment from them and take advantage of these data to measure the psychological response more accurately. Besides, bad mental health status could lead to subsequent severe consequences, such as suicide behavior (Sher, Reference Sher2020). This suggests the need to collect related data to quantify the causal impact of COVID-19 on these adverse outcomes. In addition, the outbreak of COVID-19 simultaneously brought about infodemic (Zarocostas, Reference Zarocostas2020). The rapid spread of misinformation through social media platforms may also affect mental health, and assessing this phenomenon is a meaningful task (Casigliani et al., Reference Casigliani, De Nard, De Vita, Arzilli, Grosso, Quattrone and Lopalco2020). We believe that our findings in this study, together with future research, will assist in the understanding of mental health impact of COVID-19 and yield useful insights into how to make effective psychological interventions in this kind of sudden public health event.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291721001598.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (NSFC) with grant Nos. 71921001 and 71571176, the 2019 New Humanities Funding of University of Science and Technology of China (grant No. YD2110002015), and the 25th Department Funding of University of Science and Technology of China (grant No. DA2110251001). We also thank Y. Hu, H. Huang, X.-L. Mao, M. Zhang and Q. Zhao for their contributions to Weibo data collection and cleaning.

Conflict of interest

The authors declare no competing interests.

References

Areán, P. A., Ly, K. H., & Andersson, G. (2016). Mobile technology for mental health assessment. Dialogues in Clinical Neuroscience, 18(2), 163.CrossRef Google Scholar PubMed

Burtch, G., Carnahan, S., & Greenwood, B. N. (2018). Can you gig it? An empirical examination of the gig economy and entrepreneurial activity. Management Science, 64(12), 5497–5520.CrossRef Google Scholar

Casigliani, V., De Nard, F., De Vita, E., Arzilli, G., Grosso, F. M., Quattrone, F., … Lopalco, P. (2020). Too much information, too little evidence: Is waste in research fuelling the COVID-19 infodemic? BMJ, 370, m2672.CrossRef Google Scholar PubMed

Conway, M., & O'Connor, D. (2016). Social media, big data, and mental health: Current advances and ethical implications. Current Opinion in Psychology, 9, 77–82.CrossRef Google Scholar PubMed

Dimick, J. B., & Ryan, A. M. (2014). Methods for evaluating changes in health care policy: The difference-in-differences approach. JAMA, 312(22), 2401–2402. doi: 10.1001/jama.2014.16153.CrossRef Google Scholar PubMed

Donald, S. G., & Lang, K. (2007). Inference with difference-in-differences and other panel data. The Review of Economics and Statistics, 89(2), 221–233. doi: 10.1162/rest.89.2.221.CrossRef Google Scholar

Fu, W., Wang, C., Zou, L., Guo, Y., Lu, Z., Yan, S., & Mao, J. (2020). Psychological health, sleep quality, and coping styles to stress facing the COVID-19 in Wuhan, China. Translational Psychiatry, 10(1), 225.CrossRef Google Scholar PubMed

Gao, J., Zheng, P., Jia, Y., Chen, H., Mao, Y., Chen, S., … Dai, J. (2020). Mental health problems and social media exposure during COVID-19 outbreak. PLoS One, 15(4), e0231924.CrossRef Google Scholar PubMed

Gohil, S., Vuik, S., & Darzi, A. (2018). Sentiment analysis of health care tweets: Review of the methods used. JMIR Public Health and Surveillance, 4(2), e43.CrossRef Google Scholar PubMed

Gordon, K. H., Bresin, K., Dombeck, J., Routledge, C., & Wonderlich, J. A. (2011). The impact of the 2009 Red River Flood on interpersonal risk factors for suicide. Crisis, 32(1), 52–55.CrossRef Google Scholar PubMed

Greenwood, B. N., & Agarwal, R. (2016). Matching platforms and HIV incidence: An empirical investigation of race, gender, and socioeconomic status. Management Science, 62(8), 2281–2303.CrossRef Google Scholar

Greenwood, B. N., & Wattal, S. (2017). Show me the way to go home: An empirical investigation of ride-sharing and alcohol related motor vehicle fatalities. Mis Quarterly, 41(1), 163–187.CrossRef Google Scholar

Gruebner, O., Lowe, S. R., Sykora, M., Shankardass, K., Subramanian, S., & Galea, S. (2017). A novel surveillance approach for disaster mental health. PLoS One, 12(7), e0181233.CrossRef Google Scholar PubMed

Gruebner, O., Sykora, M., Lowe, S. R., Shankardass, K., Trinquart, L., Jackson, T., … Galea, S. (2016). Mental health surveillance after the terrorist attacks in Paris. The Lancet, 387(10034), 2195–2196.CrossRef Google Scholar PubMed

Hao, F., Tan, W., Jiang, L., Zhang, L., Zhao, X., Zou, Y., … McIntyre, R. S. (2020). Do psychiatric patients experience more psychiatric symptoms during COVID-19 pandemic and lockdown? A case-control study with service and research implications for immunopsychiatry. Brain, Behavior, and Immunity, 87, 100–106.CrossRef Google Scholar PubMed

He, G., Pan, Y., & Tanaka, T. (2020). The short-term impacts of COVID-19 lockdown on urban air pollution in China. Nature Sustainability, 3, 1005–1011. doi: 10.1038/s41893-020-0581-y.CrossRef Google Scholar

Holmes, E. A., O'Connor, R. C., Perry, V. H., Tracey, I., Wessely, S., Arseneault, L., … Everall, I. (2020). Multidisciplinary research priorities for the COVID-19 pandemic: A call for action for mental health science. The Lancet Psychiatry, 7(6), 547–560.CrossRef Google Scholar

Hu, Y., Huang, H., Chen, A., & Mao, X.-L. (2020). Weibo-COV: A Large-Scale COVID-19 Tweets Dataset from Webio. In arXiv preprint arXiv:2005.09174.Google Scholar

Jiao, W. Y., Wang, L. N., Liu, J., Fang, S. F., Jiao, F. Y., Pettoello-Mantovani, M., & Somekh, E. (2020). Behavioral and emotional disorders in children during the COVID-19 epidemic. The Journal of Pediatrics, 221, 264.CrossRef Google Scholar PubMed

Jordan, R. E., Adab, P., & Cheng, K. K. (2020). COVID-19: Risk factors for severe disease and death. BMJ, 368, m1198.CrossRef Google Scholar PubMed

Kathirvel, N. (2020). Post COVID-19 pandemic mental health challenges. Asian Journal of Psychiatry, 53, 102430.CrossRef Google Scholar PubMed

Lee, J. (2020). Mental health effects of school closures during COVID-19. The Lancet Child & Adolescent Health, 4(6), 421.CrossRef Google Scholar PubMed

Li, S., Wang, Y., Xue, J., Zhao, N., & Zhu, T. (2020). The impact of COVID-19 epidemic declaration on psychological consequences: A study on active Weibo users. International Journal of Environmental Research and Public Health, 17(6), 2032.CrossRef Google Scholar

Liu, J., & Bharadwaj, A. (2020). Drug abuse and the internet: Evidence from craigslist. Management Science, 66(5), 2040–2049. doi: 10.1287/mnsc.2019.3479.CrossRef Google Scholar

Liu, S., Yang, L., Zhang, C., Xiang, Y.-T., Liu, Z., Hu, S., & Zhang, B. (2020). Online mental health services in China during the COVID-19 outbreak. The Lancet Psychiatry, 7(4), e17–e18.CrossRef Google Scholar PubMed

Liu, S., Zhu, M., Yu, D. J., Rasin, A., & Young, S. D. (2017). Using real-time social media technologies to monitor levels of perceived stress and emotional state in college students: A web-based questionnaire study. JMIR Mental Health, 4(1), e2.CrossRef Google Scholar

Madianos, M. G., & Evi, K. (2010). Trauma and natural disaster: The case of earthquakes in Greece. Journal of Loss and Trauma, 15(2), 138–150.CrossRef Google Scholar

Matsubayashi, T., Sawada, Y., & Ueda, M. (2013). Natural disasters and suicide: Evidence from Japan. Social Science and Medicine, 82, 126–133.CrossRef Google Scholar PubMed

National Bureau of Statistics of China (2019). 2019 China city statistical yearbook. Beijing: China Statistics Press.Google Scholar

Pfefferbaum, B., & North, C. S. (2020). Mental health and the COVID-19 pandemic. New England Journal of Medicine, 383(6), 510–512.CrossRef Google Scholar PubMed

Pittman, M., & Reich, B. (2016). Social media and loneliness: Why an Instagram picture may be worth more than a thousand twitter words. Computers in Human Behavior, 62, 155–167.CrossRef Google Scholar

Rajkumar, R. P. (2020). COVID-19 and mental health: A review of the existing literature. Asian Journal of Psychiatry, 52, 102066.CrossRef Google Scholar PubMed

Shen, C., Chen, A., Luo, C., Zhang, J., Feng, B., & Liao, W. (2020). Using reports of symptoms and diagnoses on social media to predict COVID-19 case counts in mainland China: Observational infoveillance study. Journal of Medical Internet Research, 22(5), e19421.CrossRef Google Scholar PubMed

Sher, L. (2020). The impact of the COVID-19 pandemic on suicide rates. QJM: An International Journal of Medicine, 113(10), 707–712.CrossRef Google Scholar PubMed

Sohrabi, C., Alsafi, Z., O'Neill, N., Khan, M., Kerwan, A., Al-Jabir, A., … Agha, R. (2020). World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). International Journal of Surgery, 76, 71–76.CrossRef Google Scholar PubMed

Su, Y., Xue, J., Liu, X., Wu, P., Chen, J., Chen, C., … Zhu, T. (2020). Examining the impact of COVID-19 lockdown in Wuhan and Lombardy: A psycholinguistic analysis on Weibo and Twitter. International Journal of Environmental Research and Public Health, 17(12), 4552.CrossRef Google Scholar PubMed

Tian, H., Gao, C., Xiao, X., Liu, H., He, B., Wu, H., … Wu, F. (2020a). SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis. In arXiv preprint arXiv:2005.05635.CrossRef Google Scholar

Tian, H., Liu, Y., Li, Y., Wu, C.-H., Chen, B., Kraemer, M. U., … Yang, Q. (2020b). An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science (New York, N.Y.), 368(6491), 638–642. doi: 10.1126/science.abb6105.CrossRef Google Scholar PubMed

Wang, Y., Duan, Z., Ma, Z., Mao, Y., Li, X., Wilson, A., … Zhou, F. (2020c). Epidemiology of mental health problems among patients with cancer during COVID-19 pandemic. Translational psychiatry, 10(1), 263.CrossRef Google Scholar PubMed

Wang, C., Pan, R., Wan, X., Tan, Y., Xu, L., Ho, C. S., & Ho, R. C. (2020a). Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China. International Journal of Environmental Research and Public Health, 17(5), 1729.CrossRef Google Scholar PubMed

Wang, C., Pan, R., Wan, X., Tan, Y., Xu, L., McIntyre, R. S., … Sharma, V. K. (2020b). A longitudinal study on the mental health of general population during the COVID-19 epidemic in China. Brain, Behavior, and Immunity, 87, 40–48.CrossRef Google Scholar PubMed

Wong, C. A., Merchant, R. M., & Moreno, M. A.. (2014). Using social media to engage adolescents and young adults with their health. Healthcare, 2(4), 220–224.CrossRef Google Scholar

Wongkoblap, A., Vadillo, M. A., & Curcin, V. (2017). Researching mental health disorders in the era of social media: Systematic review. Journal of Medical Internet Research, 19(6), e228.CrossRef Google Scholar PubMed

Yang, Y., Li, W., Zhang, Q., Zhang, L., Cheung, T., & Xiang, Y.-T. (2020). Mental health services for older adults in China during the COVID-19 outbreak. The Lancet Psychiatry, 7(4), e19.CrossRef Google Scholar PubMed

Yu, N., Li, W., Kang, Q., Xiong, Z., Wang, S., Lin, X., … Deng, D. (2020). Clinical features and obstetric and neonatal outcomes of pregnant patients with COVID-19 in Wuhan, China: A retrospective, single-centre, descriptive study. The Lancet Infectious Diseases, 20(5), 559–694.CrossRef Google Scholar

Zarocostas, J. (2020). How to fight an infodemic. The Lancet, 395(10225), 676.CrossRef Google Scholar PubMed

Zheng, S., Wang, J., Sun, C., Zhang, X., & Kahn, M. E. (2019). Air pollution lowers Chinese urbanites’ expressed happiness on social media. Nature Human Behaviour, 3(3), 237–243.CrossRef Google Scholar PubMed

Zhong, S., Yu, Z., & Zhu, W. (2019). Study of the effects of air pollutants on human health based on Baidu indices of disease symptoms and air quality monitoring data in Beijing, China. International Journal of Environmental Research and Public Health, 16(6), 1014.CrossRef Google Scholar PubMed

Fig. 1. Geographical distribution of 359 cities. As of 1 March 2020, 324 cities (Wuhan was excluded) have reported COVID-19 cases and the rest is the control group, including 35 cities.

Table 1. Effect of COVID-19 on mental health

Zhu et al. supplementary material

File 104.4 KB

Article contents

Using social media data to assess the impact of COVID-19 on mental health in China

Abstract

Keywords

Introduction

Materials and methods

Data

Social media data

COVID-19 epidemic data

Air pollution and weather data

Socio-economic data

Summary statistics

Models

Results

The impact of COVID-19 epidemic on mental health

Test for pre-treatment parallel trends

Heterogeneity across different subpopulations and cities

Discussion

Conclusion

Supplementary material

Acknowledgements

Conflict of interest

References

Zhu et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests