Twitter Sentiment Analysis

In this experiment, we reproduce the statistical analysis experiment conducted in the [LEAF paper](https://arxiv.org/abs/1812.01097). Specifically, we investigate the effect of varying the minimum number of samples per user (for training) on model accuracy when training using `FedAvg` algorithm, using the LEAF framework. For this example, we shall use Sentiment140 dataset (containing 1.6 million tweets), and we shall train a 2-layer LSTM model with cross-entropy loss, and using pre-trained GloVe embeddings. # Experiment Setup and Execution ## Quickstart script In the interest of ease of use, we provide a script for execution of the experiment for different min-sample counts, which may be executed as: ```bash leaf/paper_experiments $> ./sent140.sh ``` This script will execute the instructions provided below for min-sample counts of 3, 10, 30 and 100, reproducibly generating the data partitions and results observed by the authors during analysis. ## Pre-requisites Since this experiment requires pre-trained word embeddings, we recommend running the `models/sent140/get_embs.sh` file, which fetches 300-dimensional pretrained GloVe vectors. ```bash leaf/models/sent140/ $> ./get_embs.sh ``` After extraction, this data is stored in `models/sent140/embs.json`. ## Dataset fetching and pre-processing LEAF contains powerful scripts for fetching and conversion of data into JSON format for easy utilization. Additionally, these scripts are also capable of subsampling from the dataset, and splitting the dataset into training and testing sets. For our experiment, as a first step, we shall use 50% of the dataset in an 80-20 train/test split, and we shall discard all users with less than 10 tweets. The following command shows how this can be accomplished (the `--spltseed` flag in this case is to enable reproducible generation of the dataset) ```bash leaf/data/sent140/ $> ./preprocess.sh --sf 0.5 -t sample -s niid --tf 0.8 -k 3 --spltseed 1549775860 ``` After running this script, the `data/sent140/data` directory should contain `train/` and `test/` directories. ## Model Execution Now that we have our data, we can execute our model! For this experiment, the model file is stored at `models/sent140/stacked_lstm.py`. In order train this model using `FedAvg` with 2 clients every round for 10 rounds, we execute the following command: ```bash leaf/models $> python3 main.py -dataset sent140 -model stacked_lstm -lr 0.0003 --clients-per-round 2 --num-rounds 10 ``` Alternatively, passing `-t small` in place of the latter 2 flags provides the same functionality (as defined in `models/baseline_constants.py` file). ## Metrics Collection Executing the above command will write out system and statistical metrics to `leaf/models/metrics/stat_metrics.csv` and `leaf/models/metrics/sys_metrics.csv` - since these are overwritten for every run, we __highly recommend__ storing the generated metrics files at a different location. To experiment with a different min-sample setting, re-run the preprocessing script with a different `-k` flag. The plots shown below can be generated using `plots.py` file in the repo root. # Results and Analysis Upon performing this experiment, we see that, while median performance degrades only slightly with data-deficient users (i.e., k = 3), the 25th percentile (bottom of box) degrades dramatically.
![](../_static/images/leaf_rep_sent140.png "Sentiment140 Results")
# More Information More information about the framework, challenges and experiments can be found in the [LEAF paper](https://arxiv.org/abs/1812.01097).