A summary article aggregates several otherwise disparate topical sections.
. If one writes a summary of another topical article (an article discussing
one topic), the resulting article is still clearly topical. By contrast, in our definition, a
summary article encompasses a collection of topics that do not bear any manifest relation.
such article sare often created by web aggregators.
At Signal AI we process massive streams of news articles and supply
topics and entities of interest to our customers. In this process,
we face several information overload problems.
Apart from identifying topically relevant articles, this includes
identifying duplicates as well as filtering
summary articles that comprise of disparate topical sections.
Below is an example of a topical and summary article and the ECIR 2019
paper describes this dataset in Section 4.1.
Note that the total numbers of labels in this dataset is 2878. This is short of the 2900 labels described in the paper.
The reason is that we removed exact duplicates as some documents were labeled twice.
Download
You would need to follow this link to get the dataset.
https://goo.gl/forms/0LLs8R2HuRbTgwUk2
Format
Upon downloading the data, you'll get a single CSV file with the following columns:
- id : The id of a document in the Signal 1M dataset.
- label : Whether or not the article with this id is a summary article (1 means positive, 0 means negative).
Citing
This dataset was described in a paper on ECIR 2019: Recognising Summary Articles.
@inproceedings{Signal1MRecognisingSummaryArticles2019, author = {Mark Fisher, Dyaa Albakour, Udo Kruschwitz, Miguel Martinez}, title = {Recognising Summary Articles}, booktitle = {41st European Conference on Information Retrieval Research {(ECIR} 2019), Cologne, Germany, April, 2019.}, year = {2019}, pages = (To appear), url = (To appear) }