Aug 19, 2016,

Daily Mail

The UK media landscape is divided between print and digital, but this schism rarely affects content in any meaningful sense. For many publishers, articles appear in both print and online editions with very minimal, if any, changes. However, certain publications have made a point of separating print editorial from their digital operations, in order to effectively capitalise on two very different markets. The Daily Mail and its digital counterpart Mail Online are perhaps the best examples of this.

In order to explore the scale of the changes between the Mail’s print and online editions, Signal AI performed a statistical comparison of the language used. To get a representative sample of data, we pulled 100 stories per day from both online and print sources throughout the month of June, giving us a corpus of 6,000 articles to analyse. We then used chi-squared (\(\chi^2\)) analysis to see just how different the online and print language is. (The full table of results is below.)

For the uninitiated, chi-squared is a way of determining the relationship between two groups through highlighting difference. When analysing text, one group’s high-scoring entries will be the terms which are most different from the other group: it indicates both a high presence in one group as well as a relatively low presence in the other. More formally, the Pearson’s chi-squared statistic compares the observed and expected frequencies of an event; in our analysis, we treat word frequencies in one source as the ‘observed’ and in the other as the ‘expected’ values.

Different terms

The differences between the two groups are stark. Interestingly, the top entry for print is the word yesterday. Here, the 24-hour news cycle is particularly visible: the relative term yesterday will become incorrect if the article is viewed more than a day after being written. Instead, absolute terms like Wednesday, Tuesday and Friday are far more present online than in print. Print newspapers spend a lot of time referring to the previous day, it seems, with the clear context provided by the current date at the top of every page. (The word today is notable for its absence from both lists - this indicates that it is used in roughly equivalent amounts both in print and online.)

The subject matter of print and digital is also very different. Looking at the two lists more generally, high-scoring print terms are redolent of the establishment, including Tory, MPs, Sir, royal and Commons. The two lists put Mail Online’s strategy into sharp focus; top-scoring terms include brunette, model, blonde, beauty and wore. The different emphases of the Mail’s online and print publications are laid out plainly here, with a strong tendency towards clickbait to maximise online ad revenue.

This is perhaps most relevant when we look at how the media itself is represented in the two datasets. For the Daily Mail, the BBC, BBC1 and BBC2 are high on the list; meanwhile, high-scoring terms for Mail Online include Instagram, shared, video, captioned, posted and image. The presence and significance of social networks and multimedia is scored throughout the language of Mail Online, and Signal AI’s Chi-Squared data quantifies the extent to which content like this is absent from the printed Daily Mail.

Different audiences

Newspaper proprietors are becoming increasingly reliant on digital, both as a revenue stream and, more generally, to remain relevant to audiences in the 21st century. However, retaining a ‘traditional’ print brand is also an important part of many organisations’ business models. This poses its own problem: how far publications should distinguish between the content they provide to increasingly different audiences. As news presentation becomes ever more personalised, will every reader become an audience of one, shown their own filtered content? The Mail shows us how far one publisher will go in separating out print and digital, with this difference being quantified in our Chi-squared analysis. Will other newspapers look enviously at the Mail’s readership figures and follow suit with this approach? Or will the Mail’s strategy remain an outlier?

Top 50 terms for each source

Daily Mail (print) \(\chi^2\) Mail Online \(\chi^2\)
yesterday 168.0 June 222.4
Scotland 38.5 editing 185.2
pc 37.6 video 182.4
Scottish 33.6 told 147.9
Tory 27.6 Wednesday 143.0
Glasgow 26.8 reporting 135.8
9:00 PM 26.4 instagram 131.1
UK 24.0 scroll 128.3
EU 23.2 pictured 128.1
MP 21.9 Thursday 123.4
co 19.6 pair 122.6
BBC 19.4 Tuesday 122.0
Scotland’s 19.2 shared 121.9
MPs 18.4 Friday 110.0
Wimbledon 17.8 star 105.3
bbc2 17.8 percent 103.5
Edinburgh 17.3 according 99.6
sir 16.2 photo 90.7
miss 15.8 added 90.2
bbc1 15.0 beauty 87.9
Cameron 14.6 posted 84.3
10:00 PM 14.4 earlier 81.7
ch4 14.1 afp 81.1
England 14.0 wore 79.4
SNP 13.5 reported 77.3
royal 13.4 black 76.5
British 13.4 media 75.0
commons 12.2 Monday 74.4
ch5 12.0 white 72.9
tennis 11.8 looked 72.7
Jeremy 11.7 Sunday 71.7
Scots 11.7 appeared 69.7
bbc4 11.5 locks 69.5
pensions 11.4 blonde 67.3
Labour 11.1 seen 66.9
Britain 11.1 Sydney 64.8
firm 10.9 wrote 64.5
BHS 10.8 snap 63.2
8:00 PM 10.8 following 63.1
eighties 10.6 statement 61.9
effective 10.5 image 60.6
pages 10.4 incident 60.5
ie 10.3 captioned 60.4
pension 10.3 brunette 60.4
patients 10.0 2015 59.7
Cambridge 9.9 model 57.7
masterclass 9.8 Australia 56.0
Labour’s 9.8 file 55.7
Donovan 9.6 morning 55.5