Professor Dawn Knight
BA, MA, PhD (Nottingham), FLSW
Professor
School of English, Communication and Philosophy
- Available for postgraduate supervision
Overview
I am a member of the Centre for Language and Communication Research, and have been employed by Cardiff University since 2015. I have been involved, as Principal Investigator(PI)/Co-Investigator (CI) in a range of externally funded research funded projects (with projects receiving circa £4.2m in external funding to date). Recent projects (i.e. 2021+) include:
- 2024-27: CI, NIHR funded 'MASS: Mobilising Alliances to Enhance Community Capacity Building for SOGIESC affirming Mental Health Services' project (with Sharifah Ayeshah Syed Mohd Noori, Universiti Malaya as PI). [£576,384]
- 2024-25: PI, Welsh Government funded 'A Small Welsh Language Model Pilot for Sentiment Analysis Testing' project. This collaborative project includes computer science colleagues from Lancaster University. [£8,000]
- 2024-25: PI, Welsh Government funded 'Welsh Digital Grid' project. The Welsh Digital Grid is an online collection of freely available digital resources designed to support the exploration, analysis, learning and reference of the Welsh language. This collaborative project includes computer science colleagues from Lancaster University. [£15,000]
- 2022-23: CI, Welsh Government funded ‘ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Using Word Embeddings to Create a Thesaurus of Contemporary Welsh’ project. Working with colleagues from the Schools of Welsh and Computer Science at Cardiff University and Lancaster University respectively, this project developed an open-access, freely available online thesaurus of the Welsh language, for Welsh speakers and learners alike. For more information see here. [£90,000]
- 2022-23: PI, AHRC funded ‘FreeTxt: supporting bilingual free-text survey and questionnaire data analysis’. Working with colleagues from Lancaster University, and co-designed and co-constructed with partners Cadw, Museums Wales and National Trust Wales, this project created an innovative open-source online free-text analysis tool that enables the quick and easy analysis of English and Welsh language data. For more information see here. [£100,000]
- 2022-23: CI, AHRC-Funded ‘Wild Swimming and Blue Spaces: Mobilising interdisciplinary knowledge and partnerships to combat health inequalities at scale’ project (with Adolphs, Nottingham as PI). This project aimed to develop a new mixed methods approach, drawing on corpus linguistics and narrative analysis, to create effective public health messaging (with a focus on the benefits of wild swimming) that includes content from a range of academic disciplines. Visit the project website here. [£178,000]
- 2021-24: Co-PI (with Anne O’Keeffe, Mary Immaculate College), AHRC/IRC funded ‘Interactional variation online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts’ project. Working with colleagues from Mary Immaculate College, The University of Nottingham, University College Dublin, and University of Aberdeen, the project first aimed to examine virtual workplace communication to gain depth of insight into the potential barriers to effective communication. The second aim was to propose the next generation of frameworks for analysing online discourse and will make these frameworks available to all arts and humanities research and end user communities. Visit the project website here. [£390,000 from AHRC +€270,000 from the IRC = circa £620,700]
From 2016-2020, I was also PI on the ‘CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction’ project. Funded by the ESRC (Economic and Social Research Council) and AHRC (Arts and Humanities Research Council), this £1.8 million inter-disciplinary and multi-institutional project led to the creation of a large-scale, open-source corpus of contemporary Welsh language. Full details of project outputs, including links to the: corpus query interface, full corpus dataset, project report, Y Tiwtiadur pedagogic toolkit, CyTag part-of-speech tagger/tag-set and CySemTag semantic tagger/tag-set can be found on the CorCenCC project website and via the CorCenCC GitHub page.
Details of my other research activities, and previously funded projects, can be found on the 'research' tab of this page.
Regarding external and professional leadership roles, I was Chair of BAAL (British Association for Applied Linguistics) from 2018-2021. BAAL is a learned society with over 1,300 members internationally, making it the most influential forum for academics and professionals interested in language and applied linguistics within the UK and beyond. For further information see: www.baal.org.uk
I am currently a member of the ESRC's Strategic Advisory Network (SAN) - 2021-2026. The SAN is comprised of leading experts from the academic and user communities who help the ESRC to exploit opportunities and access the voice and expertise of its communities. I am also a member of AHRC Peer Review College (2022-2025) and ESRC Peer Review College (2024+), and was the strategic lead for the ESRC Impact Acceleration Account (IAA) at Cardiff University (2023-2024) and Director of Research Funding for ENCAP (2023-2024).
I am a Fellow of the Learned Society of Wales (FLSW, 2023+).
Publication
2025
- Knight, D. et al. 2025. Corpus linguistics for virtual workplace discourse. Abingdon and New York: Routledge.
2024
- Knight, D., Khallaf, N., Rayson, P., El-Haj, M., Ezeani, I. and Morris, S. 2024. FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit. Applied Corpus Linguistics 4(3), article number: 100103. (10.1016/j.acorp.2024.100103)
- Chen, Y., Adolphs, S. and Knight, D. 2024. Towards a speech-gesture profile of discourse markers: The case of ‘I mean’. Lingua 312, article number: 103836. (10.1016/j.lingua.2024.103836)
- Knight, D. et al. 2024. Indicating engagement in online workplace meetings: The role of backchannelling head nods. International Journal of Corpus Linguistics (IJCL) 29(3), pp. 389-416. (10.1075/ijcl.24060.kni)
- Watkins, G. et al. 2024. Crynhoi Testun Awtomatig ar gyfer y Gymraeg. Prifysgol Bangor.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., El-Haj, M. and Knight, D. Watkins, G. ed. 2024. Language and Technology in Wales: Volume II. Language and Technology in Wales Vol. 2. Bangor University.
- Vilar Lluch, S., McClaughlin, E., Adolphs, S., Knight, D. and Nichele, E. 2024. The effects of modal value and imperative mood on self-predicted compliance to health guidance: The case of COVID-19. Text & Talk (10.1515/text-2023-0125)
- Morris, J., Arfon, E., Khallaf, N., El-Haj, M. and Knight, D. 2024. Datblygu thesawrws y Gymraeg drwy dechnoleg. [Online]. Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- Fitzgerald, C. et al. 2024. Multi-modal considerations for social media discourse analysis: A specialised corpus of Twitter commentary on working from home. In: Coats, S. and Laippala, V. eds. Linguistics across Disciplinary Borders - The March of Data. London: Bloomsbury, pp. 187-212.
- Arfon, E., Morris, J., Khalaf, N. and Knight, D. 2024. Developing the Welsh thesaurus through technology. [Online]. Golwg 360 Cymru - Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- O'Keeffe, A. et al. 2024. “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions. Research in Corpus Linguistics 12(2), pp. 1-23. (10.32714/ricl.12.02.02)
2023
- Vilar Lluch, S., McClaughlin, E., Knight, D., Adolphs, S. and Nichele, E. 2023. The language of vaccination campaigns during COVID-19. Medical Humanities 49(3), pp. 487-496. (10.1136/medhum-2022-012583)
- Knight, D., Fitzpatrick, T., Morris, S., Tovey-Walsh, B., Prosser, H. and Davies, E. 2023. Corpus to curriculum: Developing word lists for adult learners of Welsh. Applied Corpus Linguistic 3(2), article number: 100052. (10.1016/j.acorp.2023.100052)
- Adolphs, S. et al. 2023. Communicating health threats: Linguistic evidence for effective public health messaging during the Covid-19 pandemic. University of Nottingham.
- Khallaf, N. et al. 2023. Open-source thesaurus development for under-resourced languages: a Welsh case study. Presented at: LDK 2023 – 4th Conference on Language, Data and Knowledge, Vienna, Austria, 12-15 September 2023.
2022
- McClaughlin, E. et al. 2022. The reception of public health messages during the COVID-19 pandemic. Applied Corpus Linguistics 3(1), article number: 100037. (10.1016/j.acorp.2022.100037)
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M. and Knight, D. 2022. Welsh automatic text summarisation. Presented at: Wales Academic Symposium on Language Technologies 2022, Bangor, Wales, 28/01/2022Language and Technology in Wales, Vol. 2. Bangor: Banolfan Bedwyr
- Clos, J., McClaughlin, E., Barnard, P., Nichele, E., Knight, D., McAuley, D. and Adolphs, S. 2022. PriPA: a tool for privacy-preserving analytics of linguistic data. Presented at: Legal and Ethical Issues in Human Language Technologies 2022, Marseille, France, 24 June 2022.
- El-Haj, M., Ezeani, I., Morris, J. and Knight, D. 2022. Creation of an evaluation corpus and baseline evaluation scores for Welsh text summarisation. Presented at: 4th Celtic Language Technology Workshop (CLTW 2022), Marseille, France, 20 June 2022.
- Ezeani, I., El-Haj, M., Morris, J. and Knight, D. 2022. Introducing the Welsh text summarisation dataset and baseline systems. Presented at: 13th ELRA Language Resources and Evaluation Conference (LREC 2022), Marseille, France, 20-25 June 2022.
2021
- McClaughlin, E. et al. 2021. Privacy preserving corpus linguistics: investigating the trajectories of public health messaging online. University of Nottingham.
- Muralidaran, V., Spasic, I. and Knight, D. 2021. A systematic review of unsupervised approaches to grammar induction. Natural Language Engineering 27(6), pp. 647-689. (10.1017/S1351324920000327)
- Knight, D., Morris, S., Arman, L., Needs, J. and Rees, M. 2021. Building a national corpus: a Welsh language case study. Basingstoke: Palgrave Macmillan.
- Knight, D., Loizides, F., Neale, S., Anthony, L. and Spasic, I. 2021. Developing computational infrastructure for the CorCenCC corpus - the National Corpus of Contemporary Welsh. Language Resources and Evaluation 55, pp. 789-816. (10.1007/s10579-020-09501-9)
- McClaughlin, E. et al. 2021. Public health messaging by political leaders: a corpus linguistic analysis of COVID-19 speeches delivered by Boris Johnson. University of Nottingham. Available at: https://doi.org/10.17639/3fgb-fn44
- Corcoran, P., Palmer, G., Arman, L., Knight, D. and Spasic, I. 2021. Creating Welsh language word embeddings. Applied Sciences 11(15), article number: 6896. (10.3390/app11156896)
- Espinosa-Anke, L., Palmer, G., Filimonov, M., Corcoran, P., Spasic, I. and Knight, D. 2021. English–Welsh cross-lingual embeddings. Applied Sciences 11(14), article number: 6541. (10.3390/app11146541)
- Knight, D., Morris, S. and Fitzpatrick, T. 2021. Corpus design and construction in minoritised language contexts - Cynllunio a chreu corpws mewn cyd-destunau Ieithoedd lleiafrifoledig: The National Corpus of Contemporary Welsh - Corpws Cenedlaethol Cymraeg Cyfoes. Basingstoke: Palgrave Macmillan.
- McClaughlin, E. et al. 2021. Using online news comments to gather fast feedback on issues with public health messaging: The Guardian as a case study. Project Report. [Online]. University of Nottingham. Available at: https://nottingham-repository.worktribe.com/output/5717332
- Palmer, G., Corcoran, P., Arman, L., Knight, D. and Spasic, I. 2021. A closer look at Welsh word embeddings. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 21-29.
- Muralidaran, V., Palmer, G., Arman, L., O'Hare, K., Knight, D. and Spasic, I. 2021. A practical implementation of a porter stemmer for Welsh. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 30-43.
2020
- Chen, Y., Adolphs, S. and Knight, D. 2020. Multimodal discourse analysis. In: Friginal, E. and Hardy, J. eds. The Routledge Handbook of Corpus Approaches to Discourse Analysis. London: Routledge
- Knight, D. and Adolphs, S. 2020. Multimodal corpora. In: Paquot, M. and Gries, S. T. eds. A Practical Handbook of Corpus Linguistics. Springer International Publishing, pp. 351-369.
- Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I. and Môn Thomas, E. 2020. The national corpus of contemporary Welsh: project report | Y corpws cenedlaethol Cymraeg cyfoes: adroddiad y prosiect.. Project Report. CorCenCC.
- Muralidaran, V., Spasic, I. and Knight, D. 2020. A cognitive approach to parsing with neural networks. Presented at: International Conference on Statistical Language and Speech Processing (SLSP), Cardiff, UK, 14–16 Oct 2020Statistical Language and Speech Processing, Vol. 12379. Springer Verlag pp. 71-84., (10.1007/978-3-030-59430-5_6)
- Adolphs, S., Knight, D., Smith, C. and Price, D. 2020. Crowdsourcing formulaic phrases: towards a new type of spoken corpus. Corpora 15(2), pp. 141-168. (10.3366/COR.2020.0192)
- Adolphs, S. and Knight, D. eds. 2020. The Routledge handbook of English language and digital humanities. Routledge Handbooks in English Language Studies. Abingdon: Routledge.
2019
- Ezeani, I., Piao, S., Neale, S., Rayson, P. and Knight, D. 2019. Leveraging pre-trained embeddings for Welsh Taggers. Presented at: 4th Workshop on Representation Learning for NLP, Florence, Italy, July 2019ACL Anthology: Proceedings of the 4th Workshop on Representation Learning for NLP, Vol. W19-43. Association for Computational Linguistics pp. -., (10.18653/v1/W19-4332)
- Spasic, I., Owen, D., Knight, D. and Artemiou, A. 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019 Presented at Lynn, T. et al. eds.Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation
2018
- Piao, S., Rayson, P., Knight, D. and Watkins, G. 2018. Towards a Welsh semantic annotation system.. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan., 7 - 12 May 2018.
- Neale, S., Donnelly, K., Watkins, G. and Knight, D. 2018. Leveraging lexical resources and constraint grammar for rule-based part-of-speech tagging in Welsh. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan, 7 - 12 May 2018.
2017
- Neale, S. et al. 2017. The CorCenCC crowdsourcing app: a bespoke tool for the user-driven creation of the national corpus of contemporary Welsh. Presented at: The 9th International Corpus Linguistics Conference, Birmingham, UK, 24-28 July 2017.
- Knight, D., Walsh, S. and Papagiannidis, S. 2017. I’m having a spring clear out: a corpus-based analysis of e-transactional discourse. Applied Linguistics 38(2), pp. 234-257. (10.1093/applin/amv019)
2016
- Walsh, S. and Knight, D. 2016. Analysing spoken discourse in University small group teaching. In: Corrigan, K. P. and Mearns, A. eds. Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement., Vol. 3. Basingstoke: Palgrave Macmillan, pp. 291-319.
- Knight, D. et al. 2016. Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages. Presented at: LREC 2016, Tenth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Portoro, Slovenia, 23-28 May 2016.
- Seedhouse, P. and Dawn, K. 2016. Applying digital sensor technology: A problem-solving approach. Applied Linguistics 37(1), pp. 7-32. (10.1093/applin/amv065)
2015
- Knight, D. 2015. e-Language: communication in the digital age. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Basingstoke: Palgrave Macmillan, London, pp. 20-40., (10.1057/9781137431738_2)
- Crabtree, A., Tennent, P., Brundell, P. and Knight, D. 2015. Digital records and the digital replay system. In: Halfpenny, P. J. and Proctor, R. eds. Innovations in Digital Research Methods. London: Sage
- Dörk, M. and Knight, D. 2015. WordWanderer: A navigational approach to text visualisation. Corpora 10(1), pp. 83-94. (10.3366/cor.2015.0067)
- Adolphs, S. and Knight, D. 2015. Beyond monomodal spoken corpora. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Houndsmill, Basingstoke: Palgrave Macmillan, pp. 41-62.
2014
- Knight, D., Adolphs, S. and Ronald, C. 2014. CANELC – constructing an e-language corpus. Corpora 9(1), pp. 29-56. (10.3366/cor.2014.0050)
2013
- Knight, D. 2013. Corpus linguistics: methods, theory and practice by Tony McEnery and Andrew Hardie [Book Review]. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 275-277., (10.1007/978-94-007-6250-3_13)
- Knight, D., Adolphs, S. and Carter, R. 2013. Formality in digital discourse: a study of hedging in CANELC. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 131-152., (10.1007/978-94-007-6250-3_7)
2011
- Adolphs, S., Knight, D. and Carter, R. 2011. Capturing context for heterogeneous corpus analysis: some first steps. International journal of corpus linguistics 16(3), pp. 305-324. (10.1075/ijcl.16.3.02ado)
- Knight, D. 2011. The future of multimodal corpora. Revista Brasileira de Linguística Aplicada 11(2), pp. 391-415. (10.1590/S1984-63982011000200006)
- Knight, D. 2011. Multimodality and active listenership: a corpus approach. Corpus and discourse. London: Bloomsbury.
2010
- Knight, D., Tennent, P., Adolphs, S. and Carter, R. 2010. Developing heterogeneous corpora using the Digital Replay System (DRS).. Presented at: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 18 May 2010 Presented at Kipp, M. et al. eds.Proceedings of the LREC 2010 (Language Resources Evaluation Conference) Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, May 2010, Malta.. European Language Resources Association pp. 16-21.
- Adolphs, S. and Knight, D. 2010. Building a spoken corpus: What are the basics?. In: O’Keeffe, A. and McCarthy, M. eds. The Routledge handbook of corpus linguistics. Routledge handbooks in applied linguistics Oxford: Routledge
2009
- Knight, D. 2009. A multi-modal corpus approach to the analysis of backchanneling behaviour. PhD Thesis, University of Nottingham.
- Knight, D., Evans, D., Carter, R. and Adolphs, S. 2009. HeadTalk, HandTalk and the corpus: towards a framework for multi-modal, multi-media corpus development. Corpora 4(1), pp. 1-32. (10.3366/E1749503209000203)
2008
- Knight, D. and Tennent, P. 2008. Introducing DRS (The Digital Replay System): A tool for the future of corpus linguistic research and analysis. Presented at: Sixth International Conference on Language Resources and Evaluation (LREC'08, Marrakesh, Morocco, 26 May -1 June 2008 Presented at Calzolari, N. et al. eds.Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 26-31.
- Brundell, P. et al. 2008. The experience of using Digital Replay System for social science research. Presented at: 4th International Conference on e-Social Science (ICeSS), Manchester, UK, 18-20 June 2008Proceedings of the 4th International Conference on e-Social Science (ICeSS), Manchester, 18-20 June 2008. ICeSS pp. 1-10.
- Knight, D., Adolphs, S., Tennent, P. and Carter, R. 2008. The Nottingham Multi-Modal Corpus: a demonstration. Presented at: 6th Language Resources and Evaluation Conference (LREC), Marrakesh, Morocco, 28-30 May 2008Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 1-7.
- Knight, D. and Adolphs, S. 2008. Multi-modal corpus pragmatics: the case of active listenership. In: Romero-Trillo, J. ed. Pragmatics and corpus linguistics: a mutualistic entente. Mouton series in pragmatics Vol. 2. Mouton de Gruyter, pp. 175-190.
- Brundell, P. et al. 2008. Digital Replay System (DRS): a tool for interaction analysis. Presented at: ICLS2008: International Perspectives in the Learning Sciences Cre8ing a learning world, Utrecht, The Netherlands, 23-28 June 2008.
2006
- Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. 2006. Beyond the text: construction and analysis of multi-modal linguistic corpora. Presented at: 2nd International Conference on e-Social Science, Manchester, UK, 28-30 June 2006Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28 - 30 June 2006.. ICeSS pp. n/a.
Articles
- Knight, D., Khallaf, N., Rayson, P., El-Haj, M., Ezeani, I. and Morris, S. 2024. FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit. Applied Corpus Linguistics 4(3), article number: 100103. (10.1016/j.acorp.2024.100103)
- Chen, Y., Adolphs, S. and Knight, D. 2024. Towards a speech-gesture profile of discourse markers: The case of ‘I mean’. Lingua 312, article number: 103836. (10.1016/j.lingua.2024.103836)
- Knight, D. et al. 2024. Indicating engagement in online workplace meetings: The role of backchannelling head nods. International Journal of Corpus Linguistics (IJCL) 29(3), pp. 389-416. (10.1075/ijcl.24060.kni)
- Vilar Lluch, S., McClaughlin, E., Adolphs, S., Knight, D. and Nichele, E. 2024. The effects of modal value and imperative mood on self-predicted compliance to health guidance: The case of COVID-19. Text & Talk (10.1515/text-2023-0125)
- O'Keeffe, A. et al. 2024. “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions. Research in Corpus Linguistics 12(2), pp. 1-23. (10.32714/ricl.12.02.02)
- Vilar Lluch, S., McClaughlin, E., Knight, D., Adolphs, S. and Nichele, E. 2023. The language of vaccination campaigns during COVID-19. Medical Humanities 49(3), pp. 487-496. (10.1136/medhum-2022-012583)
- Knight, D., Fitzpatrick, T., Morris, S., Tovey-Walsh, B., Prosser, H. and Davies, E. 2023. Corpus to curriculum: Developing word lists for adult learners of Welsh. Applied Corpus Linguistic 3(2), article number: 100052. (10.1016/j.acorp.2023.100052)
- McClaughlin, E. et al. 2022. The reception of public health messages during the COVID-19 pandemic. Applied Corpus Linguistics 3(1), article number: 100037. (10.1016/j.acorp.2022.100037)
- Muralidaran, V., Spasic, I. and Knight, D. 2021. A systematic review of unsupervised approaches to grammar induction. Natural Language Engineering 27(6), pp. 647-689. (10.1017/S1351324920000327)
- Knight, D., Loizides, F., Neale, S., Anthony, L. and Spasic, I. 2021. Developing computational infrastructure for the CorCenCC corpus - the National Corpus of Contemporary Welsh. Language Resources and Evaluation 55, pp. 789-816. (10.1007/s10579-020-09501-9)
- Corcoran, P., Palmer, G., Arman, L., Knight, D. and Spasic, I. 2021. Creating Welsh language word embeddings. Applied Sciences 11(15), article number: 6896. (10.3390/app11156896)
- Espinosa-Anke, L., Palmer, G., Filimonov, M., Corcoran, P., Spasic, I. and Knight, D. 2021. English–Welsh cross-lingual embeddings. Applied Sciences 11(14), article number: 6541. (10.3390/app11146541)
- Adolphs, S., Knight, D., Smith, C. and Price, D. 2020. Crowdsourcing formulaic phrases: towards a new type of spoken corpus. Corpora 15(2), pp. 141-168. (10.3366/COR.2020.0192)
- Knight, D., Walsh, S. and Papagiannidis, S. 2017. I’m having a spring clear out: a corpus-based analysis of e-transactional discourse. Applied Linguistics 38(2), pp. 234-257. (10.1093/applin/amv019)
- Seedhouse, P. and Dawn, K. 2016. Applying digital sensor technology: A problem-solving approach. Applied Linguistics 37(1), pp. 7-32. (10.1093/applin/amv065)
- Dörk, M. and Knight, D. 2015. WordWanderer: A navigational approach to text visualisation. Corpora 10(1), pp. 83-94. (10.3366/cor.2015.0067)
- Knight, D., Adolphs, S. and Ronald, C. 2014. CANELC – constructing an e-language corpus. Corpora 9(1), pp. 29-56. (10.3366/cor.2014.0050)
- Adolphs, S., Knight, D. and Carter, R. 2011. Capturing context for heterogeneous corpus analysis: some first steps. International journal of corpus linguistics 16(3), pp. 305-324. (10.1075/ijcl.16.3.02ado)
- Knight, D. 2011. The future of multimodal corpora. Revista Brasileira de Linguística Aplicada 11(2), pp. 391-415. (10.1590/S1984-63982011000200006)
- Knight, D., Evans, D., Carter, R. and Adolphs, S. 2009. HeadTalk, HandTalk and the corpus: towards a framework for multi-modal, multi-media corpus development. Corpora 4(1), pp. 1-32. (10.3366/E1749503209000203)
Book sections
- Fitzgerald, C. et al. 2024. Multi-modal considerations for social media discourse analysis: A specialised corpus of Twitter commentary on working from home. In: Coats, S. and Laippala, V. eds. Linguistics across Disciplinary Borders - The March of Data. London: Bloomsbury, pp. 187-212.
- Palmer, G., Corcoran, P., Arman, L., Knight, D. and Spasic, I. 2021. A closer look at Welsh word embeddings. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 21-29.
- Muralidaran, V., Palmer, G., Arman, L., O'Hare, K., Knight, D. and Spasic, I. 2021. A practical implementation of a porter stemmer for Welsh. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 30-43.
- Chen, Y., Adolphs, S. and Knight, D. 2020. Multimodal discourse analysis. In: Friginal, E. and Hardy, J. eds. The Routledge Handbook of Corpus Approaches to Discourse Analysis. London: Routledge
- Knight, D. and Adolphs, S. 2020. Multimodal corpora. In: Paquot, M. and Gries, S. T. eds. A Practical Handbook of Corpus Linguistics. Springer International Publishing, pp. 351-369.
- Walsh, S. and Knight, D. 2016. Analysing spoken discourse in University small group teaching. In: Corrigan, K. P. and Mearns, A. eds. Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement., Vol. 3. Basingstoke: Palgrave Macmillan, pp. 291-319.
- Knight, D. 2015. e-Language: communication in the digital age. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Basingstoke: Palgrave Macmillan, London, pp. 20-40., (10.1057/9781137431738_2)
- Crabtree, A., Tennent, P., Brundell, P. and Knight, D. 2015. Digital records and the digital replay system. In: Halfpenny, P. J. and Proctor, R. eds. Innovations in Digital Research Methods. London: Sage
- Adolphs, S. and Knight, D. 2015. Beyond monomodal spoken corpora. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Houndsmill, Basingstoke: Palgrave Macmillan, pp. 41-62.
- Knight, D. 2013. Corpus linguistics: methods, theory and practice by Tony McEnery and Andrew Hardie [Book Review]. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 275-277., (10.1007/978-94-007-6250-3_13)
- Knight, D., Adolphs, S. and Carter, R. 2013. Formality in digital discourse: a study of hedging in CANELC. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 131-152., (10.1007/978-94-007-6250-3_7)
- Adolphs, S. and Knight, D. 2010. Building a spoken corpus: What are the basics?. In: O’Keeffe, A. and McCarthy, M. eds. The Routledge handbook of corpus linguistics. Routledge handbooks in applied linguistics Oxford: Routledge
- Knight, D. and Adolphs, S. 2008. Multi-modal corpus pragmatics: the case of active listenership. In: Romero-Trillo, J. ed. Pragmatics and corpus linguistics: a mutualistic entente. Mouton series in pragmatics Vol. 2. Mouton de Gruyter, pp. 175-190.
Books
- Knight, D. et al. 2025. Corpus linguistics for virtual workplace discourse. Abingdon and New York: Routledge.
- Watkins, G. et al. 2024. Crynhoi Testun Awtomatig ar gyfer y Gymraeg. Prifysgol Bangor.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., El-Haj, M. and Knight, D. Watkins, G. ed. 2024. Language and Technology in Wales: Volume II. Language and Technology in Wales Vol. 2. Bangor University.
- Knight, D., Morris, S., Arman, L., Needs, J. and Rees, M. 2021. Building a national corpus: a Welsh language case study. Basingstoke: Palgrave Macmillan.
- Knight, D., Morris, S. and Fitzpatrick, T. 2021. Corpus design and construction in minoritised language contexts - Cynllunio a chreu corpws mewn cyd-destunau Ieithoedd lleiafrifoledig: The National Corpus of Contemporary Welsh - Corpws Cenedlaethol Cymraeg Cyfoes. Basingstoke: Palgrave Macmillan.
- Adolphs, S. and Knight, D. eds. 2020. The Routledge handbook of English language and digital humanities. Routledge Handbooks in English Language Studies. Abingdon: Routledge.
- Knight, D. 2011. Multimodality and active listenership: a corpus approach. Corpus and discourse. London: Bloomsbury.
Conferences
- Khallaf, N. et al. 2023. Open-source thesaurus development for under-resourced languages: a Welsh case study. Presented at: LDK 2023 – 4th Conference on Language, Data and Knowledge, Vienna, Austria, 12-15 September 2023.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M. and Knight, D. 2022. Welsh automatic text summarisation. Presented at: Wales Academic Symposium on Language Technologies 2022, Bangor, Wales, 28/01/2022Language and Technology in Wales, Vol. 2. Bangor: Banolfan Bedwyr
- Clos, J., McClaughlin, E., Barnard, P., Nichele, E., Knight, D., McAuley, D. and Adolphs, S. 2022. PriPA: a tool for privacy-preserving analytics of linguistic data. Presented at: Legal and Ethical Issues in Human Language Technologies 2022, Marseille, France, 24 June 2022.
- El-Haj, M., Ezeani, I., Morris, J. and Knight, D. 2022. Creation of an evaluation corpus and baseline evaluation scores for Welsh text summarisation. Presented at: 4th Celtic Language Technology Workshop (CLTW 2022), Marseille, France, 20 June 2022.
- Ezeani, I., El-Haj, M., Morris, J. and Knight, D. 2022. Introducing the Welsh text summarisation dataset and baseline systems. Presented at: 13th ELRA Language Resources and Evaluation Conference (LREC 2022), Marseille, France, 20-25 June 2022.
- Muralidaran, V., Spasic, I. and Knight, D. 2020. A cognitive approach to parsing with neural networks. Presented at: International Conference on Statistical Language and Speech Processing (SLSP), Cardiff, UK, 14–16 Oct 2020Statistical Language and Speech Processing, Vol. 12379. Springer Verlag pp. 71-84., (10.1007/978-3-030-59430-5_6)
- Ezeani, I., Piao, S., Neale, S., Rayson, P. and Knight, D. 2019. Leveraging pre-trained embeddings for Welsh Taggers. Presented at: 4th Workshop on Representation Learning for NLP, Florence, Italy, July 2019ACL Anthology: Proceedings of the 4th Workshop on Representation Learning for NLP, Vol. W19-43. Association for Computational Linguistics pp. -., (10.18653/v1/W19-4332)
- Spasic, I., Owen, D., Knight, D. and Artemiou, A. 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019 Presented at Lynn, T. et al. eds.Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation
- Piao, S., Rayson, P., Knight, D. and Watkins, G. 2018. Towards a Welsh semantic annotation system.. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan., 7 - 12 May 2018.
- Neale, S., Donnelly, K., Watkins, G. and Knight, D. 2018. Leveraging lexical resources and constraint grammar for rule-based part-of-speech tagging in Welsh. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan, 7 - 12 May 2018.
- Neale, S. et al. 2017. The CorCenCC crowdsourcing app: a bespoke tool for the user-driven creation of the national corpus of contemporary Welsh. Presented at: The 9th International Corpus Linguistics Conference, Birmingham, UK, 24-28 July 2017.
- Knight, D. et al. 2016. Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages. Presented at: LREC 2016, Tenth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Portoro, Slovenia, 23-28 May 2016.
- Knight, D., Tennent, P., Adolphs, S. and Carter, R. 2010. Developing heterogeneous corpora using the Digital Replay System (DRS).. Presented at: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 18 May 2010 Presented at Kipp, M. et al. eds.Proceedings of the LREC 2010 (Language Resources Evaluation Conference) Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, May 2010, Malta.. European Language Resources Association pp. 16-21.
- Knight, D. and Tennent, P. 2008. Introducing DRS (The Digital Replay System): A tool for the future of corpus linguistic research and analysis. Presented at: Sixth International Conference on Language Resources and Evaluation (LREC'08, Marrakesh, Morocco, 26 May -1 June 2008 Presented at Calzolari, N. et al. eds.Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 26-31.
- Brundell, P. et al. 2008. The experience of using Digital Replay System for social science research. Presented at: 4th International Conference on e-Social Science (ICeSS), Manchester, UK, 18-20 June 2008Proceedings of the 4th International Conference on e-Social Science (ICeSS), Manchester, 18-20 June 2008. ICeSS pp. 1-10.
- Knight, D., Adolphs, S., Tennent, P. and Carter, R. 2008. The Nottingham Multi-Modal Corpus: a demonstration. Presented at: 6th Language Resources and Evaluation Conference (LREC), Marrakesh, Morocco, 28-30 May 2008Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 1-7.
- Brundell, P. et al. 2008. Digital Replay System (DRS): a tool for interaction analysis. Presented at: ICLS2008: International Perspectives in the Learning Sciences Cre8ing a learning world, Utrecht, The Netherlands, 23-28 June 2008.
- Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. 2006. Beyond the text: construction and analysis of multi-modal linguistic corpora. Presented at: 2nd International Conference on e-Social Science, Manchester, UK, 28-30 June 2006Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28 - 30 June 2006.. ICeSS pp. n/a.
Monographs
- Adolphs, S. et al. 2023. Communicating health threats: Linguistic evidence for effective public health messaging during the Covid-19 pandemic. University of Nottingham.
- McClaughlin, E. et al. 2021. Privacy preserving corpus linguistics: investigating the trajectories of public health messaging online. University of Nottingham.
- McClaughlin, E. et al. 2021. Public health messaging by political leaders: a corpus linguistic analysis of COVID-19 speeches delivered by Boris Johnson. University of Nottingham. Available at: https://doi.org/10.17639/3fgb-fn44
- McClaughlin, E. et al. 2021. Using online news comments to gather fast feedback on issues with public health messaging: The Guardian as a case study. Project Report. [Online]. University of Nottingham. Available at: https://nottingham-repository.worktribe.com/output/5717332
- Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I. and Môn Thomas, E. 2020. The national corpus of contemporary Welsh: project report | Y corpws cenedlaethol Cymraeg cyfoes: adroddiad y prosiect.. Project Report. CorCenCC.
Thesis
- Knight, D. 2009. A multi-modal corpus approach to the analysis of backchanneling behaviour. PhD Thesis, University of Nottingham.
Websites
- Morris, J., Arfon, E., Khallaf, N., El-Haj, M. and Knight, D. 2024. Datblygu thesawrws y Gymraeg drwy dechnoleg. [Online]. Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- Arfon, E., Morris, J., Khalaf, N. and Knight, D. 2024. Developing the Welsh thesaurus through technology. [Online]. Golwg 360 Cymru - Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
Research
Research interests:
I am an applied linguist whose research interests lie in the areas of corpus linguistics, discourse analysis, and multimodality. I have expertise in conceptualising, theorising and applying innovative interdisciplinary approaches/methodologies for extracting and predicting language patterning within/across social and linguistic contexts (within the broad scope of the aforementioned research areas). While located at its core in the area of Linguistics and Digital Humanities, my research is fundamentally interdisciplinary, and this is reflected in the multi-authored nature of my publications and interdisciplinary research projects.
My work on Welsh language resource development, supported by major AHRC, ESRC and Welsh Government grants (e.g. CorCenCC, also see here for further information), is aiming to change the landscape of minoritised language research and the potential real-world applications of corpora/corpus-based enquiry.
I have (co)presented 106 papers and posters, and delivered 54 keynotes and invited talks at seminars and conferences since 2006.
Externally funded research projects:
- 2024-27: £576,384 received from NIHR for a project entitled MASS: Mobilising Alliances to Enhance Community Capacity Building for SOGIESC-affirming Mental Health Services project (with Sharifah Ayeshah Syed Mohd Noori, University of Malaya as PI).
- 2024-25: £8,000 received from the Welsh Government for a project entitled A Small Welsh Language Model Pilot for Sentiment Analysis Testing. This collaborative project included computer science colleagues from Lancaster University.
- 2023-24: £15,000 from the Welsh Government for the Welsh Digital Grid project. The Welsh Digital Grid is an online collection of freely available digital resources designed to support the exploration, analysis, learning and reference of the Welsh language. This collaborative project included computer science colleagues from Lancaster University.
- 2022-23: £90,000 received from the Welsh Government for the ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Using Word Embeddings to Create a Thesaurus of Contemporary Welsh project. Working with colleagues from WELSH and Computer Science at Cardiff and Lancaster Universities (with Morris as PI - I was one of the CIs), the project developed an open-access, freely available online thesaurus of the Welsh language, for Welsh speakers and learners alike.
- 2022-23: £178,000 received from the AHRC for the Wild Swimming and Blue Spaces: Mobilising interdisciplinary knowledge and partnerships to combat health inequalities at scale project (with Svenja Adolphs, Nottingham as PI - I was one of the CIs). This project developed a new mixed methods approach, drawing on corpus linguistics and narrative analysis, for effective public health messaging (with a focus on the benefits of wild swimming) that includes content from a range of academic disciplines. Visit the project website here.
- 2022-23: £100,000 received from the AHRC for the FreeTxt: supporting bilingual free-text survey and questionnaire data analysis project. I was PI on this project. Working with colleagues from Lancaster University, and co-designed and co-constructed with partners Cadw, Museums Wales and National Trust Wales, this project created an innovative open-source online free-text analysis tool that enables the quick and easy analysis of English and Welsh language data: FreeTxt. Visit the project website here.
- 2021-24: Co-PI (with Anne O’Keeffe, Mary Immaculate College), AHRC/IRC funded Interactional variation online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts project. Working with colleagues from Mary Immaculate College, Swansea University, The University of Nottingham, University College Dublin, and University of Aberdeen, the project first aimed to examine virtual workplace communication to gain depth of insight into the potential barriers to effective communication. Our second aim was to propose the next generation of frameworks for analysing online discourse and will make these frameworks available to all arts and humanities research and end user communities. We received £390,000 from AHRC +€270,000 [circa £620,700] from IRC for this project. Visit the project website here.
- 2021-22: £14,988 received from the ESRC Impact Acceleration Account (IAA). This was for a project, working with the National Centre for Learning Welsh, that supported the creation of vocabulary lists, based on data extracted from CorCenCC (National Corpus of Contemporary Welsh). For further information see here.
- 2021-22: £90,000 received from the Welsh Government for the Welsh Automatic Text Summarisation project. Working with colleagues from WELSH and Computer Science at Cardiff and Lancaster Universities, the project team built a summarisation tool that will allow professionals to quickly summarise long documents for efficient presentation. Visit the project website here.
- 2021-22: £450,000 received from AHRC for the Coronavirus Discourses: linguistic evidence for effective public health messaging project. Developed in partnership with Public Health England, Public Health Wales and NHS Education for Scotland, this project addressed key challenges that the coronavirus pandemic presents in relation to understanding the flow and impact of public health messages as reflected in public and private discourses. Led by Svenja Adolphs (Nottingham - I was CI on this project), this interdisciplinary project carried out the first large scale analysis of the trajectories of public health messages relating to the coronavirus pandemic in the UK. Visit the project website here.
- 2020-21: £90,000 received from the Welsh Government for the Learning English-Welsh bilingual embeddings and applications in text categorisation project. This was an interdisciplinary project involving Irena Spasić, Padraig Corcoran, Luis Espinosa-Anke (School of Computer Science and Informatics – COMSC) and Geraint Palmer (School of Mathematics) as Co-Investigators (CIs). In was PI on this project. For more information, see here.
- 2019-20: £90,000 received from the Welsh Government for the Welsh words by numbers: “Wales” + “capital” = “Cardiff” project (focusing on word embeddings for Welsh). I was a CI on this project with Irena Spasić (Cardiff) as PI.
- 2019: £20,000 received from the Welsh Government for the Welsh Stemmer project, I was CI on this project with Irena Spasić (computer science, Cardiff) as PI.
- 2017: £19,964 received from the Grant Cymraeg 2050 fund to automatically construct a WordNet for Welsh, a lexical database in which words are grouped into sets of synonyms (synsets), which are then organised into a network of lexico-semantic relationships. I was CI on this project with Irena Spasić (computer science, Cardiff) as PI.
- 2017: £2,000 received (as PI) from the British Council in support of a launch event for the CorCenCC project (held on 28th February 2017).
- 2016-20: £1,800,000 received from the ESRC and AHRC for the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction). I was PI on this project. For more information see here.
- 2016: £17,500 received from the British Council for an Aptis Research Grants project entitled Characterising interactional competence in higher education small group talk project (with Walsh, Newcastle University, as PI).
Research experience/positions:
- Research Fellow on Crowd Sourcing: A Toolkit-based Approach (2010-2011). RCUK Grant EP/G065802/1 Horizon Digital Economy Research. Work carried out at The University of Nottingham.
- Research Associate on DReSS II (Understanding Digital Records for eSocial Science (2008-2011). ESRC Grant No. RES-149-25-1067. Work carried out at The University of Nottingham.
- Research Assistant on DReSS I (Understanding Digital Records for eSocial Science (2005-2008). ESRC Grant No. RES-149-25-0035 on Headtalk (2005-2006). ESRC Grant No. RES-149-25-1016. Work carried out at The University of Nottingham.
- I have also been involved in work with the Cambridge University Press (CUP) on the English Profile (EP) Project and from 2009-2012 I was involved in the construction of CANELC, the Cambridge and Nottingham e-Language Corpus (working with CUP and staff from the University of Nottingham), the first large-scale corpus of digital discourse.
Biography
- 2015: Certificate in Advanced Studies in Academic Practice, Newcastle University [FHEA status]
- 2004 – 2009: PhD in Applied Linguistics, The University of Nottingham
- Thesis title: A multi-modal corpus approach to the analysis of backchanneling behaviour
- Funding: ESRC +3 award winner
- 2003 – 2004: MA in Applied Linguistics, The University of Nottingham
- 2000 – 2003: BA in English Studies, The University of Nottingham
Professional memberships
- Fellow, Learned Society of Wales (FLSW), 2023-present.
- Associate Fellow of the Higher Education Academy (AFHEA), 2013 – present.
- Member, BAAL (British Association for Applied Linguistics).
- Executive Committee member, CRiLLS (Centre for Research in Linguistics and Language Sciences, Newcastle University), 2011 – 2015.
- Member, CRAL (Centre for Research in Applied Linguistics), 2006 – 2011.
- Member, IVACS (Inter-Varietal Applied Corpus Studies), 2004 – present
- Member, AILA (International Association of Applied Linguistics), 2004 – present
- Member, Language Teaching and Technology; Language Learning and Teaching and iLaB (ICT) research clusters in ECLS, 2012 – 2015.
Academic positions
- 2023 – present: Professor of English Language and Applied Linguistics, Cardiff University
- 2016 – 2023: Reader in Applied Linguistics, Cardiff University
- 2015 – 2016: Senior Lecturer in Applied Linguistics, Cardiff University.
- 2014 – 2015: Senior Lecturer in Applied Linguistics, Newcastle University.
- 2011 – 2014: Lecturer in Applied Linguistics, Newcastle University.
- 2009 – 2011: Part-time Research Fellow and lecturer on BA and M-Level home and distance learning modules, The University of Nottingham.
- 2006 – 2009: Part-time Research Assistant and lecturer on BA and M-Level home and distance learning modules, The University of Nottingham.
- 2005 – 2006: Full-time Research Assistant, ESRC funded HeadTalk interdisciplinary project, The University of Nottingham.
- 2004 – 2005: Resident Hall Tutor, Hugh Stewart Hall, The University of Nottingham.
Speaking engagements
- Knight, D. (2025). Invited panel discussant at the Nakba-NLP 2025 - The 1st International Workshop on Nakba Narratives as Language Resources, COLING-2025, Abu Dhabi, UAE, January 2025.
- Knight, D. (2015). Automating Qualitative Text Analysis using FreeTxt: A Demonstration. Invited talk delivered as part of the DSTL speaker series. DSTL, January 2025.
- Knight, D. (2014). Raising hands, waving and nodding: Exploring verbal and non-verbal behaviour in virtual workplace meetings. Invited plenary delivered as part of ALR2024 (International Symposium on Applied Linguistic Research), Riyadh, Saudi Arabia, November 2024.
- Knight, D. Invited panel contributor at the ‘Trust the text? Generative AI and Corpus Linguistics’ panel held as part of the IVACS (Intervarietal Applied Corpus Studies) 2024 conference, Cambridge University, July 2024.
- Knight, D. (2024). Applying Corpus Linguistics: impacts of corpus research in a minoritised language context. Annual John Sinclair Lecture. Birmingham University, July 2024.
- Knight, D. (2023). Why applied linguistics really matters: impact of corpus-based research. Invited Inaugural lecture. York St John University, November 2023.
- Knight, D. (2022). Enhancing language technology resources in minoritised language contexts: roles and applications of corpora. Invited presentation delivered as part of the Applied Linguistics and Discourse Analysis (ALDS) Seminar Series, December 2022.
- Knight, D. and O’Keeffe. (2022). The contribution of corpus linguistics to research on online interaction. Invited podcast talk, delivered as part of the CorpusCast series, November 2022.
- O’Keeffe, A., Knight, D. and Fitzgerald, C. (2022). “I think you’re on mute”: Variation in Online Workplace Discourse. Invited presentation delivered as part of the CALS Seminar Series, Mary Immaculate College, Ireland, March 2022.
- Knight, D. and Fitzgerald, C. (2022). Navigating Virtual Meetings: Multimodality and Variation in Online Professional Discourse. Invited presentation delivered as part of the DiscourseNet Seminar Series, Open University, February 2022.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M. and Knight, D. (2022). Welsh Automatic Text Summarisation. Wales Academic Symposium on Language Technologies 2022, Bangor University, 28 January 2022.
- Atkins, S. and Knight, D. (2021). Good Practice in Applied Linguistics: a call to action. Invited paper delivered as part of the BAAL Executive Committee Invited Colloquium: Ethics in Social Justice in Applied Linguistics, BAAL (British Association for Applied Linguistics) 2021 conference, Northumbria University, UK.
- Knight, D. (2020). Ethical considerations for corpus construction: A Welsh language case study. Invited seminar presentation delivered as part of the Centre of Forensic Linguistics seminar series. Aston University, December 2020.
- Knight, D. (2019). Multimodal Corpus Linguistics: Looking back and thinking forward. Invited keynote delivered at the CLAVIER conference on Knowledge Dissemination and Multimodal Literacy: Research Perspectives on ESP in a Digital World. University of Pisa, November 2019.
- Knight, D. (2019). The application of corpora: minoritised language contexts: supporting and informing the pedagogic landscape. Invited keynote presentation delivered at the Assessing World Languages conference 2019, University of Macau, Macau, China.
- Knight, D. (2019). Invited LTF pre-conference workshop entitled ‘Corpus Linguistics for researchers and practitioners’ workshop’ delivered at the UKALTA (UK Association for Language Testing and Assessment) conference, Swansea University, November 2019.
- Knight, D. and Morris, S. (2019). Invited LTF pre-conference workshop entitled ‘Exploring the National Corpus of Contemporary Welsh (CorCenCC): user-driven corpus design for under-resourced languages’ delivered at the UKALTA (UK Association for Language Testing and Assessment) conference, Swansea University, November 2019.
- Knight, D. (2019). CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh. Invited keynote delivered at the Welsh Government hosted British-Irish Council’s, Indigenous, Minority and Lesser-used Languages (IML) meeting, Welsh Government, Bedwas, October 2019.
- Knight, D. (2019). From ECR to PI: some reflections from a decade of Dr-hood. Invited panel member for the PGR BAAL Colloquium, ‘How can an Early-Career Researcher best succeed in Applied Linguistics’, held at the annual BAAL conference, Manchester Met University, August 2019.
- Knight, D. (2019). Welsh language in healthcare. Invited panel member for the Healthcare Text Analytics Conference, 24-25 April 2019. Cardiff University, Cardiff.
- Knight, D. (2019). Examining patterns of language use: a guide to WMatrix. Invited workshop delivered as part of the Applied Linguistics Research Seminar Series, Swansea University, 6th March 2019.
- Knight, D. (2018). Representativeness in CorCenCC: corpus design in minoritised languages. Invited plenary delivered to the JET workshop as part of the French Cognitive Linguistics Association (AFLiCo) conference, 3 – 4 May 2018. Paris, France.
- Knight, D. (2018). An overview of the CorCenCC Welsh Corpus project. Invited presentation delivered as part of the Applied Linguistics Research Seminar Series, Swansea University, 2nd February 2018.
- Knight, D. (2018). A corpus approach to free-text analysis: examining the NSS. Invited presentation delivered to members of the Senior Management Team at York St John University, 24th January 2018.
- Knight, D. (2017). Perseverance pays: reflections on getting your first grant. Invited presentation delivered as part of the Securing your First Research Grant workshop event, 28/11/17, Cardiff University.
- Knight, D. (2017). A corpus-based analysis of the NSS results. Invited presentation delivered to the Data Professionals Group, 3/7/17, Cardiff University.
- Knight, D. (2017). Analysing the NSS – some insights from corpus linguistics. Invited presentation delivered to the Academic Performance Group, 3/7/17, Cardiff University.
- Knight, D. (2017). Qualitative Analysis of NSS Reponses. Invited presentation delivered to the Business Intelligence Unit, 3/4/17, Cardiff University.
- Knight, D. (2017). Big Data and Corpus Construction Introducing CorCenCC. Invited seminar presentation at the Investigating (with) Big Data event run by the Cardiff University Digital Humanities Network, 24/5/17, Cardiff University.
- Knight, D. (2017). Research funding and building networks in the Arts, Humanities and Social Sciences: the case of CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh). Invited seminar presentation as part of the Cardiff School of Journalism, Media and Cultural Studies 2016/17 research seminar series, 5/4/17, Cardiff University.
- Knight, D. (2017). Constructing corpora of minoritised languages: A focus on CorCenCC. Invited plenary presentation delivered as part of the Corpus Linguistics in the South Conference, 4/3/17, Birkbeck University.
- Knight, D. (2016). Constructing E-Language Corpora: a focus on CorCenCC (The National Corpus of Contemporary Welsh). Invited plenary presentation at the 4th Computer-Mediated Communication and Social Media Corpora for the Humanities conference, 27-28/9/16, University of Ljubljana, Slovenia.
- Knight, D. (2016). Innovations in corpus-based research. Invited seminar presentation at the Tokyo Chapter of the Japanese Association of Language Teachers (JALT) meeting, 9/9/16, Tokyo.
- Knight, D. (2016). The application of corpora: supporting and informing the pedagogic landscape. Invited plenary presentation at the InForm Conference, 16/7/16, Durham University.
- Knight, D. (2016). Corpora and Pedagogy: developing the community-driven National Corpus of Contemporary Welsh. Invited presentation at the Welsh for Adults annual conference, 8/7/16, Cardiff.
- Knight, D. (2016). The National Corpus of Contemporary Welsh: A community driven approach to linguistic corpus construction. Invited presentation at the UCREL Corpus Research Seminar Series, 9/6/16, Lancaster University.
- Knight, D. (2015). Getting that grant: from ECR to PI. Invited plenary at the AHSS Early Career Grant event, 4th December 2015, Cardiff University.
- Knight, D. (2015). Applications of corpus-based enquiry. Invited workshop delivered as part of the MA TESOL seminar series, 2/7/15, Bath Spa University.
- Knight, D. (2015). Dispelling the myths: the ubiquity of corpora in linguistic research. Invited keynote presentation at the annual Cardiff University ENCAP Postgraduate Conference, 2/6/15, Cardiff University.
- Knight, D. (2015). Multimodal Corpus Linguistics. Invited presentation delivered at the joint seminar between Lund University Humanities Lab and the Linneaus Centre CCL, 26/5/15, Lund University.
- Knight, D. (2015). Analysing Literature using Corpora. Invited presentation delivered as part of the Cardiff BookTalk series, 30/4/15, Cardiff University.
- Knight, D. (2015). Multimodal Corpus Linguistics. Invited presentation delivered as part of the Vlunch Seminar Series, School of Computer Science and informatics, Cardiff University, 30th April.
- Knight, D. (2015). WMatrix workshop. Workshop delivered as part of the Lexical Studies Corpus Day, 9th March 2015.
- Knight, D. (2014). Practical applications for corpora. Invited workshop at Welsh tutors conference, 5/12/14, Cardiff University.
- Knight, D. (2014). (Re)defining context in corpus linguistics. Invited presentation delivered as part of the Information Visualization seminar series, 5/11/14, Potsdam University of Applied Sciences.
- Knight, D. and Murphy, B. (2014). Exploring the meta in 'meta-data': corpus investigations in sociolinguistic contexts. Invited keynote presentation at IVACS 2014 (Inter-Varietal Applied Corpus Studies Conference), 13/6/14, Newcastle University.
- Knight, D. (2013). A corpus-based approach to Digital Discourse. Invited keynote presentation at the BAAL Language and New Media SIG event ‘Research Methods and Approaches for Analysing Social Media’, 22/11/13, Leicester University.
- Knight, D. (2013). Record – Transcribe – Code – Analyse: Tackling Multimodal Data. Invited keynote presentation at the annual Newcastle University ECLS Postgraduate Conference, 20/6/13, Newcastle University.
- Knight, D. (2013). Recording and analysing real-life interaction ‘in the wild’. Invited keynote presentation at the Cardiff School of English PhD Applied Linguistics (Lexical Studies) Annual conference, 21/3/13, Cardiff University, Wales.
- Knight, D. (2013). Gesture and talk ‘in the wild’. Invited keynote presentation at the BAAL Corpus Linguistics SIG event, 22/2/13, Edinburgh.
- Carter, R. and Knight, D. (2012). CANELC – The Cambridge and Nottingham e-Language Corpus. Invited keynote presentation at the ELT Insights Seminar, 24/1/13, Cambridge University Press, Cambridge.
- Knight, D. and Adolphs, S. (2011). Multimodal Corpora for Sign Language Research. Invited keynote presentation at the 2nd Symposium in Applied Sign Linguistics. 25/6/11, Bristol.
- Knight, D. (2011). Mobile and Location-based Data: Capture, Representation and Analysis. Invited paper at the CAQDAS digital social research showcase event, 23/2/11, Oxford.
Committees and reviewing
- Learned society leadership (elected): finance committee of the Learned Society of Wales (2024+); Chair of the British Association for Applied Linguistics (BAAL, 2018-21), the most influential forum for academics and professionals interested in language and applied linguistics within the UK, with an international membership of over 1,300 members; BAAL Secretary (2013-18); BAAL Meetings Secretary (2010-13); BAAL Postgraduate Development and Liaison Officer (2007-09).
- Other memberships: member of the CLARIN knowledge centre entitled Digital Resources for the Languages in Ireland and Britain (2024+).
- Reviews editor for the Yearbook of Corpus Linguistics and Pragmatics (2013-16).
- Editorial board memberships: Applied Corpus Linguistics (journal); Discourse, Context and Media (journal); Elements in Corpus Linguistics (book series, Cambridge University Press); Advancing Disciplinary Methods for the Social Sciences (book series, Palgrave Macmillan).
- Advisory board membership: Applied Linguistics (journal); Journal of Corpus Linguistics and Pragmatics (2016+); CLiC project - a corpus tool for the analysis of literary texts, led by Professor Mahlberg; Language, Texts and Society (a journal produced at the University of Nottingham).
- Requests to review funding proposals: ESRC Centres Competition 2018; ESRC-NCRM; Leverhulme; ESRC-CPT; ESRC; AHRC; SSHRC (Canada’s Social Sciences and Humanities Research Council).
- Reviewing articles and journals: Context and Discourse; Language Sciences; Language and Communication; Journal of Pragmatics; Multimodal Communication; International Journal of Corpus Linguistics; Communication and Medicine; Corpora Journal; BAAL book prize.
- Programme committee memberships: Challenges in the Management of Large Corpora Workshop, 2017, 2018, 2020, 2022; 9th International Corpus Linguistics conference (2017); Big Data and Natural Language Processing workshop hosted at IEEE Big Data (2016); scientific committee member for the Language Resources and Evaluation Conference in 2020 and 2022 and the 31st International Conference on Computational Linguistics (2025).
Supervisions
- Corpus linguistics
- Corpus pragmatics
- Language use in context
- Non-verbal communication
- Discourse analysis
- Digital interaction (‘E-language’)
Current supervision
Jen Jordan-Grote
Research student
Debora Cabral Lima
Teaching Associate
Yipei Kou
Research student
Charlie Brookes
Research student
Past projects
In addiition to the students listed above, I also supervised the RAs involved in work on the CorCenCC, IVO and FreeTxt projects and co-supervised the following PhD students to completion (at 50%, unless otherwise stated):
- Shanru Yang (30:70 with Steve Walsh, Newcastle University)
- Rezan Alharbi (with Mei Lin, Newcastle University)
- Vigneshwaran Muralidaran (with Irena Spasic, COMSC)
- David Griffin (with Christopher Heffer, ENCAP)
- Emily Powell (with Christopher Heffer, ENCAP)
- Kate Barber (with Amanda Potts, ENCAP)
Contact Details
+44 29208 76325
John Percival Building, Room 3.57, Colum Drive, Cardiff, CF10 3EU
Research themes
Specialisms
- Applied linguistics and educational linguistics
- mutlimodal discourse analysis
- Discourse and pragmatics
- Corpus linguistics