Skip to main content
Dawn Knight  BA, MA, PhD (Nottingham), FLSW

Professor Dawn Knight

BA, MA, PhD (Nottingham), FLSW

Professor

School of English, Communication and Philosophy

Email
KnightD5@cardiff.ac.uk
Telephone
+44 29208 76325
Campuses
John Percival Building, Room 3.57, Colum Drive, Cardiff, CF10 3EU
Users
Available for postgraduate supervision

Overview

I am a member of the Centre for Language and Communication Research, and have been employed by Cardiff University since 2015. I have been involved, as Principal Investigator(PI)/Co-Investigator (CI) in a range of externally funded research funded projects (with circa £3.6m external funding obtained to date). Recent projects (i.e. 2021+) include the following:

  • 2022-23: CI, Welsh Government funded ‘ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Using Word Embeddings to Create a Thesaurus of Contemporary Welsh’ project. Working with colleagues from the Schools of Welsh and Computer Science at Cardiff University and Lancaster University respectively, this project developed an open-access, freely available online thesaurus of the Welsh language, for Welsh speakers and learners alike. We received £90,000 for this project. For more information on this project, see here
  • 2022-23: PI, AHRC funded ‘FreeTxt: supporting bilingual free-text survey and questionnaire data analysis’. Working with colleagues from Lancaster University, and co-designed and co-constructed with partners Cadw and National Trust Wales, this project created an innovative open-source online free-text analysis tool that enables the quick and easy analysis of English and Welsh language data. We received £100,000 for this project. For more information on this project, see here
  • 2022-23: CI, AHRC-Funded ‘Wild Swimming and Blue Spaces: Mobilising interdisciplinary knowledge and partnerships to combat health inequalities at scale’ project (with Adolphs, Nottingham as PI). This project aims to develop a new mixed methods approach, drawing on corpus linguistics and narrative analysis, to create effective public health messaging (with a focus on the benefits of wild swimming) that includes content from a range of academic disciplines. Ultimately this project will benefit the many individuals and diverse communities who will be enabled to enjoy wild swimming in a safe way to improve health, and to gain an increased awareness of the nature of blue spaces and their role as a community asset. We received £178,000 for this project. Visit the project website here.
  • 2021-24: Co-PI (with Anne O’Keeffe, Mary Immaculate College), AHRC/IRC funded ‘Interactional variation online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts’ project. Working with colleagues from Mary Immaculate College, Swansea University, The University of Nottingham, University College Dublin, and University of Aberdeen, the project first aimed to examine virtual workplace communication to gain depth of insight into the potential barriers to effective communication. Our second aim was to propose the next generation of frameworks for analysing online discourse and will make these frameworks available to all arts and humanities research and end user communities. We received £390,000 from AHRC +€270,000 [circa £620,700] from IRC for this project. Visit the project website here.

From 2016-2020, I was also PI on the ‘CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction’ project. Funded by the ESRC (Economic and Social Research Council) and AHRC (Arts and Humanities Research Council), this £1.8 million inter-disciplinary and multi-institutional project led to the creation of a large-scale, open-source corpus of contemporary Welsh language. Full details of project outputs, including links to the: corpus query interface, full corpus dataset, project report, Y Tiwtiadur pedagogic toolkit, CyTag part-of-speech tagger/tag-set and CySemTag semantic tagger/tag-set can be found on the CorCenCC project website and via the CorCenCC GitHub page.

Details of my other research activities, and previously funded projects, can be found on the 'research' tab of this page.

Regarding external and professional leadership roles, I was Chair of BAAL (British Association for Applied Linguistics) from 2018-2021. BAAL is a learned society with over 1,300 members internationally, making it the most influential forum for academics and professionals interested in language and applied linguistics within the UK and beyond. For further information see: www.baal.org.uk

I am currently a member of the Economic and Social Research Council’s (ESRC) Strategic Advisory Network (SAN) - 2021-2024. The SAN is comprised of leading experts from the academic and user communities. It helps the ESRC exploit opportunities and access the voice and expertise of its communities. For further details of the SAN, see here. I am also an academic lead of the AHRC (Arts, Humanities and Research Council) Peer Review College (2022-2025) and ESRC Peer Review College (2024+), the strategic lead for the ESRC IAA at Cardiff University (2023-2026), and am currently the Director of Research Funding for ENCAP.

I am a Fellow of the Learned Society of Wales (FLSW, 2023+).

Publication

2024

2023

2022

2021

2020

2019

  • Ezeani, I., Piao, S., Neale, S., Rayson, P. and Knight, D. 2019. Leveraging pre-trained embeddings for Welsh Taggers. Presented at: 4th Workshop on Representation Learning for NLP, Florence, Italy, July 2019ACL Anthology: Proceedings of the 4th Workshop on Representation Learning for NLP, Vol. W19-43. Association for Computational Linguistics pp. -., (10.18653/v1/W19-4332)
  • Spasic, I., Owen, D., Knight, D. and Artemiou, A. 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019 Presented at Lynn, T. et al. eds.Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation

2018

2017

2016

2015

2014

2013

2011

2010

  • Knight, D., Tennent, P., Adolphs, S. and Carter, R. 2010. Developing heterogeneous corpora using the Digital Replay System (DRS).. Presented at: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 18 May 2010 Presented at Kipp, M. et al. eds.Proceedings of the LREC 2010 (Language Resources Evaluation Conference) Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, May 2010, Malta.. European Language Resources Association pp. 16-21.
  • Adolphs, S. and Knight, D. 2010. Building a spoken corpus: What are the basics?. In: O’Keeffe, A. and McCarthy, M. eds. The Routledge handbook of corpus linguistics. Routledge handbooks in applied linguistics Oxford: Routledge

2009

2008

2006

  • Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. 2006. Beyond the text: construction and analysis of multi-modal linguistic corpora. Presented at: 2nd International Conference on e-Social Science, Manchester, UK, 28-30 June 2006Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28 - 30 June 2006.. ICeSS pp. n/a.

Articles

Book sections

Books

Conferences

Monographs

Thesis

Research

Research interests:

I am an applied linguist whose research interests lie in the areas of corpus linguistics, discourse analysis, and multimodality. I have expertise in conceptualising, theorising and applying innovative interdisciplinary approaches/methodologies for extracting and predicting language patterning within/across social and linguistic contexts (within the broad scope of the aforementioned research areas). While located at its core in the area of Linguistics and Digital Humanities, my research is fundamentally interdisciplinary, and this is reflected in the multi-authored nature of my publications and interdisciplinary research projects.

My work on Welsh language resource development, supported by major AHRC, ESRC and Welsh Government grants (e.g. CorCenCC, also see  here for further information), is aiming to change the landscape of minoritised language research and the potential real-world applications of corpora/corpus-based enquiry.

I have (co)presented 104 papers and posters, and delivered 48 keynotes at seminars and conferences since 2006.

Externally funded research projects:

  • 2023: £20,000 received from the Welsh Government to create the GDC-WDG Welsh language resource site.
  • 2022: £90,000 received from the Welsh Government for the ‘ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Using Word Embeddings to Create a Thesaurus of Contemporary Welsh’ project. Working with colleagues from WELSH and Computer Science at Cardiff and Lancaster Universities (with Morris as PI - I am one of the CIs), the project developed an open-access, freely available online thesaurus of the Welsh language, for Welsh speakers and learners alike.
  • 2022: £178,000 received from the AHRC for the ‘Wild Swimming and Blue Spaces: Mobilising interdisciplinary knowledge and partnerships to combat health inequalities at scale’ project (with Adolphs, Nottingham as PI - I am one of the CIs). This project developed a new mixed methods approach, drawing on corpus linguistics and narrative analysis, for effective public health messaging (with a focus on the benefits of wild swimming) that includes content from a range of academic disciplines. Visit the project website here.
  • 2022: £100,000 received from the AHRC for the 'FreeTxt: supporting bilingual free-text survey and questionnaire data analysis’ project. I was PI on this project. Working with colleagues from Lancaster University, and co-designed and co-constructed with partners Cadw and National Trust Wales, this project created an innovative open-source online free-text analysis tool that enables the quick and easy analysis of English and Welsh language data: FreeTxt. Visit the project website here.
  • 2021-24: Co-PI (with Anne O’Keeffe, Mary Immaculate College), AHRC/IRC funded ‘Interactional variation online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts’ project. Working with colleagues from Mary Immaculate College, Swansea University, The University of Nottingham, University College Dublin, and University of Aberdeen, the project first aimed to examine virtual workplace communication to gain depth of insight into the potential barriers to effective communication. Our second aim was to propose the next generation of frameworks for analysing online discourse and will make these frameworks available to all arts and humanities research and end user communities. We received £390,000 from AHRC +€270,000 [circa £620,700] from IRC for this project. Visit the project website here.
  • 2021: £14,988 received from the ESRC Impact Acceleration Account (IAA). This was for a project, working with the National Centre for Learning Welsh, that supported the creation of vocabulary lists, based on data extracted from CorCenCC (National Corpus of Contemporary Welsh). 
  • 2021: £90,000 received from the Welsh-Government for the ‘Welsh Automatic Text Summarisation’ project. Working with colleagues from WELSH and Computer Science at Cardiff and Lancaster Universities, the project team built a summarisation tool that will allow professionals to quickly summarise long documents for efficient presentation. Visit the project website here.
  • 2021: £450,000 received from AHRC for the 'Coronavirus Discourses: linguistic evidence for effective public health messaging' project. Developed in partnership with Public Health England, Public Health Wales and NHS Education for Scotland, this project addressed key challenges that the coronavirus pandemic presents in relation to understanding the flow and impact of public health messages as reflected in public and private discourses. Led by Svenja Adolphs (Nottingham - I was CI on this project), this interdisciplinary project carried out the first large scale analysis of the trajectories of public health messages relating to the coronavirus pandemic in the UK [£465,000]. Visit the project website here.
  • 2020: £90,000 received from the Welsh Government for the 'Learning English-Welsh bilingual embeddings and applications in text categorisation' project. This was an interdisciplinary project involving Irena Spasić, Padraig Corcoran, Luis Espinosa-Anke (School of Computer Science and Informatics – COMSC) and Geraint Palmer (School of Mathematics) as Co-Investigators (CIs). In was PI on this project. For more information, see here.
  • 2019: £90,000 received from the Welsh Government for the ‘Welsh words by numbers: “Wales” + “capital” = “Cardiff”’ project (focusing on word embeddings for Welsh). I am a CI on this project.
  • 2019: £2,100 received for the internally funded CUROP project entitled ‘FreeTxt: analysing free-text comments using a corpus-based approach’. I was PI on this project.
  • 2019: £20,000 received from the Welsh Government for the Welsh Stemmer project, I was CI on this project with Irena Spasić (Cardiff) as PI.
  • 2018: £2,100 received for the internally funded CUROP project entitled ‘Corpws Cenedlaethol Cymraeg Cyfoes: National Corpus of Contemporary Welsh – a focus on spoken data’. I was PI on this project (with Lowri Williams).
  • 2018: £2,100 received for the internally funded CUROP project entitled ‘Corpws Cenedlaethol Cymraeg Cyfoes: National Corpus of Contemporary Welsh – semantic tagging and data annotation’. I was PI on this project (with Paul Rayson).
  • 2017: £19,964 received from the Grant Cymraeg 2050 fund to automatically construct a WordNet for Welsh, a lexical database in which words are grouped into sets of synonyms (synsets), which are then organised into a network of lexico-semantic relationships. I was CI on this project.
  • 2017: £2,000 received (as PI) from the British Council in support of a launch event for the CorCenCC project (held on 28th February 2017).
  • 2016-19: £1,800,000 received from the ESRC and AHRC for the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction). I am PI on this project.
  • 2016: £1,600 received for the internally funded CUROP project entitled ‘Analysis on non-verbal communication in construction industry interactions’. I was CI on this project (with Mike Handford).
  • 2015: £24,999 received from the AHSS (College of Arts, Humanities and Social Sciences) Network Digital Humanities Initiator Bid. The aim of this network Bid is to bring build significant capacity in Digital Humanities at Cardiff University. I was CI on this project.
  • 2014: £3,850 received from the Newcastle University Faculty Research Fund for a project entitled Crowdsourcing data collection for corpus compilation: Scoping methods for the future’ (with Patrick Olivier).
  • 2013: £3900 received from the Newcastle University Faculty Bid Preparation Fund for Corpws Cenedlaethol Cymraeg (CorCenCC) to support the development of the bid application.
  • 2013: £17,500 funding received from the British Council Aptis Research Grants for a project entitled ‘Characterising interactional competence in higher education small group talk’. I am a Co-I on this project with Steve Walsh (PI) and Paul Seedhouse.
  • 2012: £3,920 received from the Newcastle University Faculty Research Fund for a pilot project entitled ‘Gesture and talk ‘in the wild’ (with Professor Olivier).

Research experience/positions:

  • Research Fellow on Crowd Sourcing: A Toolkit-based Approach (2010-2011). RCUK Grant EP/G065802/1 Horizon Digital Economy Research. Work carried out at The University of Nottingham.
  • Research Associate on DReSS II (Understanding Digital Records for eSocial Science (2008-2011). ESRC Grant No. RES-149-25-1067. Work carried out at The University of Nottingham.
  • Research Assistant on DReSS I (Understanding Digital Records for eSocial Science (2005-2008). ESRC Grant No. RES-149-25-0035RA on Headtalk (2005-2006). ESRC Grant No. RES-149-25-1016. Work carried out at The University of Nottingham.
  • I have also been involved in work with the Cambridge University Press (CUP) on the English Profile (EP) Project and from 2009-2012 I was involved in the construction of CANELC, the Cambridge and Nottingham e-Language Corpus (working with CUP and staff from the University of Nottingham), the first large-scale corpus of digital discourse.

Biography

  • 2015Certificate in Advanced Studies in Academic Practice, Newcastle University
  • 2004 – 2009: PhD in Applied Linguistics, The University of Nottingham
    • Thesis title: A multi-modal corpus approach to the analysis of backchanneling behaviour
    • Funding: ESRC +3 award winner
  • 2003 – 2004: MA in Applied Linguistics, The University of Nottingham
  • 2000 – 2003BA in English Studies, The University of Nottingham

Professional memberships

  • Fellow, Learned Society of Wales (FLSW), 2023-present.
  • Associate Fellow of the Higher Education Academy (AFHEA), 2013 – present.
  • Member, BAAL (British Association for Applied Linguistics).
  • Executive Committee member, CRiLLS (Centre for Research in Linguistics and Language Sciences, Newcastle University), 2011 – 2015.
  • Member, CRAL (Centre for Research in Applied Linguistics), 2006 – 2011.
  • Member, IVACS (Inter-Varietal Applied Corpus Studies), 2004 – present
  • Member, AILA (International Association of Applied Linguistics), 2004 – present
  • Member, Language Teaching and Technology; Language Learning and Teaching and iLaB (ICT) research clusters in ECLS, 2012 – 2015.

Academic positions

  • 2016 – present: Reader in Applied Linguistics, Cardiff University
  • 2015 – 2016: Senior Lecturer in Applied Linguistics, Cardiff University.
  • 2014 – 2015: Senior Lecturer in Applied Linguistics, Newcastle University.
  • 2011 – 2014Lecturer in Applied Linguistics, Newcastle University.
  • 2009 – 2011: Part-time Research Fellow and lecturer on BA and M-Level home and distance learning modules, The University of Nottingham.
  • 2006 – 2009: Part-time Research Assistant and lecturer on BA and M-Level home and distance learning modules, The University of Nottingham.
  • 2005 – 2006: Full-time Research Assistant, ESRC funded HeadTalk interdisciplinary project, The University of Nottingham.
  • 2004 – 2005: Resident Hall Tutor, Hugh Stewart Hall, The University of Nottingham.

Committees and reviewing

  • Editorial board member of Applied Linguistics (journal, 2021+)
  • Ambassador of the Data Innovation Research Institute (DIRI) at Cardiff University. In this role I lead a special interest group (SIG) that facilitates deep interdisciplinary collaboration across the University in the area of data science (2018+).
  • Editorial board member of Elements in Corpus Linguistics (book series) published by Cambridge University Press.
  • Lead organiser and Chair of the 2020 online BAAL conference. Over 400 members of the association registered to participate in this conference.
  • Lead organiser of the biannual International Corpus Linguistic Conference (CL2019), a 5-day globally leading conference for academics working within this discipline (2018-2019).
  • Member of the ESRC’s Centres for Doctoral Training (CDT) Peer Review College (2016+)
  • Honorary Visiting Fellow at the Centre for Research in Applied Linguistics (CRAL), The University of Nottingham (May–July 2018, during Research Leave)
  • Visiting Researcher at the Department of English Language and Applied Linguistics, Swansea University (April–July 2018, during Research Leave)
  • General Secretary for BAAL, the British Association for Applied Linguistics (2013 - 2018); Meetings Secretary for BAAL (2010-2013); Postgraduate Development and Liaison Officer for BAAL (2007-2009).
  • Co-organiser of the IVACS (Inter-Varietal and Applied Corpus Studies) 2006 and IVACS 2014 conferences.
  • Editor (with Professor Svenja Adolphs) of the Routledge Handbook of English Language and the Digital Humanities [under contract].
  • Reviews Editor for the Yearbook of Corpus Linguistics and Pragmatics, 2012-2015 (Springer Verlag).
  • Editorial board member for the journal Discourse, Context and Media
  • Reviewer for International Journal of Corpus Linguistics (IJCL), Journal of Pragmatics, Context and Discourse, Corpora Journal and the BAAL annual book prize.
  • Programme committee member: Big Data and Natural Language Processing workshop hosted at IEEE Big Data, December 2016.
  • Programme committee member: 9th International Corpus Linguistics conference, July 2017, University of Birmingham; Challenges in the Management of Large Corpora + Big Data and Natural Language Processing joint meeting, July 2017, University of Birmingham.
  • Advisory Editorial Board member for the Journal of Corpus Linguistics and Pragmatics (Springer Verlag).
  • Advisory board member for Language, Texts and Society (LTS) – a journal produced at the University of Nottingham.
  • Advisory board member for CLiC – a corpus tool for the analysis of literary texts, led by Professor Mahlberg, University of Birmingham (funded by the AHRC).

Supervisions

  • Corpus linguistics
  • Corpus pragmatics
  • Language use in context
  • Non-verbal communication
  • Discourse analysis
  • Digital interaction (‘E-language’)

Current supervision

Jen Jordan-Grote

Jen Jordan-Grote

Research student

Debora Cabral Lima

Debora Cabral Lima

Research Associate

Yipei Kou

Yipei Kou

Research student

Charlie Brookes

Charlie Brookes

Research student

Past projects

In addiition to the students listed above, I also supervised the RAs involved in work on the CorCenCC, IVO and FreeTxt projects and co-supervised the following PhD students to completion (at 50%, unless otherwise stated):

  1. Shanru Yang (30:70 with Steve Walsh, Newcastle University)
  2. Rezan Alharbi (with Mei Lin, Newcastle University)
  3. Vigneshwaran Muralidaran (with Irena Spasic, COMSC) 
  4. David Griffin (with Christopher Heffer, ENCAP)
  5. Emily Powell (with Christopher Heffer, ENCAP)
  6. Kate Barber (with Amanda Potts, ENCAP)

Specialisms

  • Applied linguistics and educational linguistics
  • mutlimodal discourse analysis
  • Discourse and pragmatics
  • Corpus linguistics