A summary article aggregates several otherwise disparate topical sections. . If one writes a summary of another topical article (an article discussing one topic), the resulting article is still clearly topical. By contrast, in our definition, a summary article encompasses a collection of topics that do not bear any manifest relation. such article sare often created by web aggregators. At Signal AI we process massive streams of news articles and supply topics and entities of interest to our customers. In this process, we face several information overload problems. Apart from identifying topically relevant articles, this includes identifying duplicates as well as filtering summary articles that comprise of disparate topical sections. Below is an example of a topical and summary article and the ECIR 2019 paper describes this dataset in Section 4.1.

Note that the total numbers of labels in this dataset is 2878. This is short of the 2900 labels described in the paper. The reason is that we removed exact duplicates as some documents were labeled twice.

Examples of topical (left) and summary (right) articles.


You would need to follow this link to get the dataset.


Upon downloading the data, you'll get a single CSV file with the following columns:


This dataset was described in a paper on ECIR 2019: Recognising Summary Articles.

  author    = {Mark Fisher, Dyaa Albakour, Udo Kruschwitz, Miguel Martinez},
  title     = {Recognising Summary Articles},
  booktitle = {41st European Conference on Information Retrieval Research {(ECIR} 2019), Cologne, Germany, April, 2019.},
  year      = {2019},
  pages     = (To appear),
  url       = (To appear)