We have been monitoring the Twitter social media platform for tweets relevant to European COVID-19 mobile contact tracing apps for a year (from July 2020 to June 2021). We used the official streaming Twitter APIs to collect the relevant tweets, and we opened a tweet streaming for each contract tracing app. The table below shows the analysed contact tracing apps and the search keys used for streaming.
For every individual tweet, a set of predetermined analyses were conducted in a real-time way combining the Tweepy library, Apache kafka and Elasticsearch. We developed a system capable of processing different streams of tweets and extracting place names, opinions, hashtags, entities and quite a number of aggregations.
The table below shows the number of tweets collected for each mobile contact tracing app, ordered by the number of tweets. Moreover, the table reports the percentage of tweets with opinions, geographic (geo) information and the number of detected relevant EMM news.
The opinions were extracted for the languages English, French, German, Spanish and Italian, which are the more frequently used languages in the our dataset. For the language detection we used the standard Twitter engine for language identification and evaluation. In the Sentiment analysis section you can find more information and our relevant observations.
Tweets that contained any kind of geo information were identified using the geoparser library Mordecai. The table shows the percentage of tweets which contain a place name in the tweet text or in the user profile.
A cross checking was performed between the dedicated EMM channel for Contact tracing apps and the tweet-linked pages in order to detect meaningful tweets.
Finally, we considered the hashtags that are a very good source of information. In the section Hashtags there are more analyses regarding hashtags with related results and observations.
The stacked area chart below shows the user activities about the main European COVID-19 mobile contact tracing apps. We can notice a high activity during the second wave of the pandemic between the October and November months. Furthermore, the chart shows some peak-areas with a temporal extension of around five days that usually represent relevant activities, like, for example software releases, news and any kind of relevant events.
The plot chart below shows the number of the relevant EMM-news detected by cross checking with the linked pages, and the background shows the trend of the tweets.
As the two analyses return related results, dedicated sections have been created. The first section includes a deep analysis about opinions and the second one an exploratory analysis using hashtags and network analysis.