Signal-1M Related Tweets

A TREC-like data collection to evaluate approaches for the task of related-tweet retrieval for news articles.

Download

You would need to follow this link to get the dataset.
https://goo.gl/forms/R9yYo3lQSQTUtnHc2

Format

Upon downloading the data, you get a single compressed file. You can uncompress it using unzip. Uncompressing yields a folder with 2 files:

topics : a file containing all of the topics (also known as articles) used as queries to retrieve tweets.
signal1m_tweets_qrels : A TREC Qrels formatted file with the following fields:
- TOPIC - a unique identifier for an article from the Signal-1M dataset
- ITERATION - Unused (always 0); included to match TREC Qrels format.
- DOCUMENT - a tweet ID
- RELEVANCY -
  - 0: not relevant
  - 1: somewhat relevant
  - 2: highly relevant

Using the dataset

As in any TREC task, to use the dataset:

Use the topics file as an input to your tweet retrieval approach. In particular, your approach should return a ranked list of tweet IDs for each news article (topic) in a TREC results file format. Let's call it approach.result.
Each line in your file should conform to the following:
```
topic Q0 tweet-id rank score NAME
```
You can find the tweet collection used to build this dataset here.
Use trec_eval to evaluate the effectiveness of your approach by running:
```
trec_eval -q signal1m_tweets_qrels approach.result
```

Citing

This collection was described in a paper on ECIR 2018: A Data Collection for Evaluating the Retrieval of Related Tweets to News Articles .

@inproceedings{Signal1MRelatedTweetsRetrieval2018,
  author    = {Axel Suarez, Dyaa Albakour, David Corney, Miguel Martinez and Jose Esquivel},
  title     = {A Data Collection for Evaluating the Retrieval of Related Tweets to News Articles},
  booktitle = {40th European Conference on Information Retrieval Research {(ECIR} 2018), Grenoble, France, March, 2018.},
  year      = {2018},
  pages     = {780-786},
  url       = {https://link.springer.com/chapter/10.1007/978-3-319-76941-7_76}
}