This is the repository with all the publicly available* data and analysis scripts for the research paper: "Who is driving the conversation? Studying the nodality of British MPs and journalists on social media".
*: Twitter data is made available in accordance with Twitter’s terms of service.
The package is written in Python (version: 3.8). We recommend that the installation is made inside a virtual environment and to do this, one can use conda (recommended in order to control the Python version).
The tool conda, which comes bundled with Anaconda has the advantage that it lets us specify the version of Python that we want to use. Python=3.8 is required.
After locating in the local github folder, like cd $PATH$ e.g. Documents/Local_Github/nodality, a new environment can be created with
$ conda env create -f environment.yamlThe environment's name will be nodality. The environment must be activated before using it with
$ conda activate nodalityFollowing Twitter's terms and conditions, we can only share Tweet IDs. Our database consists of the activity of UK journalists and MPs from January 14, 2022, to January 13, 2023.
The labels used are:
- 1 for the Russia-Ukraine War
- 2 for the COVID-19 pandemic
- 5 for the Cost of Living Crisis
- 6 for Brexit
- -1 for any other topic
We use a weak supervision classifier based on (Ratner et al, 2019). The classifier can be found in classifier.py, while the accompanying labeling functions can be reviewed in labeling_functions.py. A confusion matrix of the classifier can be reviews in [confusion_matrix.csv]
Given the classification of tweets, we generate the interaction networks where nodes are Twitter users and links represent the interactions they have between themselves.
The analysis of the networks are done in the different folders linear_model/ and pca_kmeans/.
The folder figures_tables/ contains all the notebooks to create the Figures and Tables of the pre-print.