Using Twitter for linguistic research: Benefits and difficulties
University of Kent, Canterbury
Tuesday 31st May 2016

*Submission deadline: 29 April 2016*

In recent years, researchers in a number of fields (including sociology, psychology, political science) have turned to Twitter to investigate how ideas, news, and opinions spread in real time (DeAndrea et al 2012, Tumasjan et al 2010, Huberman et al 2009). Unlike other text corpora, which are static, Twitter corpora are collected from a continuous stream of tweets occurring in real time, and therefore provide a unique opportunity to track changes in response to specific events. For linguists, Twitter can provide access to a large body of language data that (1) comes from a wide sample of the world’s population of English speakers and (2) contains a high proportion of “everyday” language. This makes Twitter different from most of the corpora used in corpus linguistics research, which are often collections of news articles or texts from other narrow genres, and which do not reflect the most contemporary uses of language. In addition, collecting linguistic data from Twitter provides linguists with a solution to the so-called Observer’s Paradox: the very awareness that their language is being observed can make people use language differently than they otherwise would. Linguists have always had to grapple with the challenge of how to get spontaneous and naturalistic language data from people who are unaware they are being observed, without violating ethical protocol. This is exactly what Twitter provides.

Because of the recency of the use of Twitter corpora for linguistic research, there is little published work on the topic (some notable exceptions include Zappavigna (2011), Page (2012)); however, researchers spanning a wide range of linguistic subdisciplines are beginning to exploit Twitter as a source of linguistically informative data. For instance, Wieling et al. (submitted) identified the use of different hesitation markers (e.g. “um”) in Germanic languages, Haddican & Johnson (2012) found differences between British and American English in phrasal verbs, and Willis et al (ongoing) are using Twitter as part of their data collection for the Syntactic Atlas of Welsh Dialects ( Hardaker (2013) studies aggression, trolling and forensic aspects of online language, while Vessey (2015) investigates the interaction of French and English language ideologies in Canadian Twitter users.

Twitter has a further benefit to researchers as a way to spread awareness of research and to increase participation. If a study captures public appeal, it can quickly spread through a community, allowing members of the general public to access research that is normally only available to an academic audience. In addition, this exposure can attract many more responses than traditional data collection methods, as is the case with Durham & Bailey’s ongoing research into the use of the word “cheeky”. This necessitates a new look at data collection and sampling methods in order to maintain rigorous standards of research while maximising the opportunities afforded by this rapid spread.

Twitter studies are in their infancy: they have extraordinary potential, but thus far have not coalesced into a set of methodological tools widely accessible to language researchers. What is needed is a sense of direction from experts in the field, showing what has been achieved successfully, and what kinds of questions can and cannot be addressed using Twitter.

The scope of the conference covers the syntax, semantics, pragmatics and discourse of tweets in English and other modern languages, as well as methods of collecting and analysing a large corpus of tweets. As such, it will appeal to researchers in linguistics, modern languages, English literature, psycholinguistics, journalism, sociology, as well as any discipline that is interested in what people are talking about and how they say it. We are particularly interested in engaging with postgraduate researchers in these areas.

We invite submission of abstracts on any area of linguistic research that engages with Twitter or similar media. Email abstracts of no more than 300 words to

Talks will be 20+10 minutes.

  • DeAndrea, D. C., Ellison, N. B., LaRose, R., Steinfield, C., and Fiore, A. (2012). Serious social media: On the use of social media for improving students’ adjustment to college. The Internet and Higher Education, 15(1), 15-23.
  • Haddican, B. and Johnson, D. E. (2012). Effects on the particle verb alternation across English dialects. University of Pennsylvania Working Papers in Linguistics 18(2): Selected papers from NWAV 40. Available at:
  • Hardaker, C. (2013). “Uh…..not to be nitpicky,,,,,but…the past tense of drag is dragged, not drug.”: an overview of trolling strategies. Journal of Language Aggression and Conflict 1(1), p. 57-86.
  • Huberman, B.A., Romero, D.M. and Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday 14(1-5).
  • Page, R. (2012). Stories and Social Media: Identities and Interaction. Routledge.
  • Tumasjan, A.; Sprenger, T.; Sandner, P. G.; and Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proc. of 4th ICWSM, 178-185. AAAI Press.
  • Vessey R. (2015). Food fight: conflicting language ideologies in English and French news and social media. Journal of Multicultural Discourses 10(2), 253-271.
  • Wieling, M., Grieve, J., Bouma, G. and Liberman, M. (submitted). Variation and change in the use of hesitation markers in Germanic languages. Language Dynamics and Change.