How do you say “data”?
I only ask because it’s a contentious issue. Along with split infinitives, getting this one wrong offends and delights in equal measure. And, as we write about data every day, we’re either getting it very wrong or very right.
The Wall Street Journal has just published this blog post, in which it finally decides to move away from data “are”, saying:
Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, and we hereby join the majority.
As usage has evolved from the word’s origin as the Latin plural of datum, singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.
Otherwise, generally continue to use the plural: Data are still being collected.
When we asked the question a couple of years ago, loads of you debated it in a much-polarised manner on Twitter.
@jhugman Data is plural. Unsure the correct “datum point” will catch on though. Having referenda about latin declentions belong in musea.
@mkdDCC No to datum. We need to relax about the data is/are thing. It may not be good Latin, but we’re not speaking Latin.
@DerekL Of course data is plural. And what is wrong with datum for a single item of data?
@holizz Singular data annoys the same people that find split infinitives objectionable – pedants with no understanding of linguistics.
Here’s the root of the matter: strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.
In Latin, data is the plural of datum and, historically and in specialized scientific fields , it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified . In modern non-scientific use, however , despite the complaints of traditionalists, it is often not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which cannot normally have a plural and which takes a singular verb. Sentences such as data was (as well as data were ) collected over a number of years are now widely accepted in standard English.
The official view from the Office for National Statistics takes the traditional approach. The ONS style guide for those writing official statistics says:
The word data is a plural noun so write “data are”. Datum is the singular.
Andrew Garratt of the Royal Statistical Society says the debate goes back to the 1920s – and reared its head recently with some heated discussion in the Society’s newsletter. “We don’t have an official view,” he says. “Statisticians of a certain age and status refer to them as plural but people like me use it in the singular.” National Geographic magazine has debated it too.
For what it’s worth, I can confidently say that this will probably be the only time I ever write the word “datum” in a Datablog post. Data as a plural term may be the proper usage but language evolves and we want to write in terms that everyone understands – and that don’t seem ridiculous.
So, over to Guardian style guru David Marsh, who makes the rules in these parts about language use. He says:
It’s like agenda, a Latin plural that is now almost universally used as a singular. Technically the singular is datum/agendum, but we feel it sounds increasingly hyper-correct, old-fashioned and pompous to say “the data are”.
Data takes a singular verb (like agenda), though strictly a plural; no one ever uses “agendum” or “datum”