Aug 30, 2016,

The UK prides often itself on being a developed country which champions the achievements of women, both in the workplace and more generally. Despite this, opinions differ as to how effective initiatives like the Davies Review have been in improving female representation at the highest levels of business.

Meanwhile, the perception that women receive a different kind of media coverage to men, being scrutinised in different ways, is a persistent one. Digging into sentiment is appealing, but presents its own challenges. Easier, and in some ways more concretely instructive, is monitoring the presence of gendered pronouns across a range of segments of the UK’s press.

Men and women are represented through five key personal pronouns: he/she; his/hers; he’s/she’s; him/her; and himself/herself. (People are referred to in many other ways, of course, but analysing the same number of pronouns ensures that coverage remains proportionate across genders.) Using these terms to search across different aspects of the UK media, we can uncover the comparative amounts of coverage generated by both men and women. Turning this into a ratio, we are also able to compare, for example, the proportional variance between print versus online news, or national versus regional papers.

Compiling the data

To get our dataset, Signal AI selected articles from the following six media segments, split into three opposing pairs: UK national newspapers versus regional newspapers; global online news versus print news; and lastly, blog versus broadcast content. For each of these segments, we analysed 100 randomly-selected articles per day throughout June 2016, giving us a collection of 3,000 articles per segment. We split the text into tokens (roughly speaking, words) and counted the tokens corresponding to the pronouns listed above.

It’s worth pointing out here that the data used isn’t all directly comparable against every other segment. National and regional coverage can certainly be compared against each other; however, these segments can be incorporated into online, print, blog and broadcast comparisons, too, given that these are illustrative of coverage type rather than any specific scale or focus.

Personal pronoun frequencies

  Nationals Regional Online Print Blogs Broadcast
Male 21,742 12,649 14,021 11,187 8965 11,057
Female 12,789 4129 6338 3330 3246 4389
All tokens 1,855,254 1,170,765 1,422,016 1,093,327 1,346,826 1,181,451
Male (% tokens) 1.17% 1.08% 0.99% 1.02% 0.67% 0.94%
Female (% tokens) 0.69% 0.35% 0.45% 0.30% 0.24% 0.37%
Male:Female ratio 1.70 3.06 2.21 3.36 2.76 2.52

Simply looking at volume of mentions, the gender coverage gap becomes clear: across every media type, male pronouns are referred to far more often than female. However, we can also see that proportionally, mentions of men versus women differ dramatically depending on the media type being analysed.

It is interesting to observe that from an identically-sized cache of articles, national and regional newspapers seem to use pronouns far more than, for example, blog publications. Does this indicate that blogs are simply less interested in talking about people than newspapers? Potentially, given newspapers’ reliance on interviews and other forms of reportage. Clearly, more research is required to determine exactly why volumes differ depending on the kind of media one examines.

Turning the individual tallies of male and female pronouns into ratios (last row of table) allows us to see just how much more coverage is received by men than by women. Regional newspapers exhibit a highly uneven ratio of male versus female pronouns, using a male identifier three times more often. The ratio is more balanced when we look at national newspapers. However, a sample of all Signal AI print coverage (which includes national and regional print editions as well as non-UK print sources) brings back the highest ratio of all segments, with male pronouns being used nearly three-and-a-half times more often than their female counterparts.

The finding that male pronouns are much more common than female pronouns supports previous analysis, such as counting the most common words in the Corpus of Contemporary American English. This analysis of 450m words from many sources ranks “he” as the 15th most common word and “she” as the 31st, for example, with a similar difference for other pronouns.

Does the Media reflect society?

It’s clear that even a simple analysis of text can be very revealing when investigating how the media deals with certain subjects. Gender is a contentious issue and the data generated by Signal AI shows that there is a clear discrepancy in how men and women are referred to in the press. But should we reduce this to a simple case of media bias? The problem could stem from a wider problem with female representation in society: after all, if the men are making the news, they are the people who will be written about. However, it’s clear that data is an increasingly important part of quantifying such nebulous concepts.