Yr Athro Dawn Knight
BA, MA, PhD (Nottingham), FLSW
Ysgol Saesneg, Cyfathrebu ac Athroniaeth
- KnightD5@caerdydd.ac.uk
- +44 29208 76325
- Adeilad John Percival , Ystafell 3.57, Rhodfa Colum, Caerdydd, CF10 3EU
- Ar gael fel goruchwyliwr ôl-raddedig
Rwy'n aelod o'r Ganolfan Ymchwil Iaith a Chyfathrebu, ac wedi cael fy nghyflogi gan Brifysgol Caerdydd ers 2015. Rwyf wedi cymryd rhan, fel Prif Ymchwilydd (PI)/Cyd-ymchwilydd (CI) mewn ystod o brosiectau a ariennir gan ymchwil allanol (gyda thua £3.6m o gyllid allanol wedi'i gael hyd yma). Mae prosiectau diweddar (h.y. 2021+) yn cynnwys:
- 2022-23: CI, ariannwyd gan Lywodraeth Cymru 'ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Defnyddio Gwreiddiau Geiriau i Greu Thesawrws o Gymraeg Cyfoes'. Gan weithio gyda chydweithwyr o Ysgolion y Gymraeg a Chyfrifiadureg ym Mhrifysgol Caerdydd a Phrifysgol Lancaster yn y drefn honno, datblygodd y prosiect hwn thesawrws mynediad agored, sydd ar gael yn rhwydd ar-lein i'r Gymraeg, i siaradwyr Cymraeg a dysgwyr fel ei gilydd. Rydym wedi derbyn £90,000 ar gyfer y prosiect hwn. Am fwy o wybodaeth am y prosiect hwn, gweler yma.
- 2022-23: Cyllidwyd gan PI, 'FreeTxt: supporting bilingual free text survey and questionnaire data analysis'. Gan weithio gyda chydweithwyr o Brifysgol Caerhirfryn, a chyd-ddylunio a chyd-adeiladu gyda'i bartneriaid Cadw ac Ymddiriedolaeth Genedlaethol Cymru, creodd y prosiect hwn offeryn dadansoddi testun rhad ac am ddim ar-lein ffynhonnell agored arloesol sy'n galluogi dadansoddiad cyflym a hawdd o ddata Saesneg a Chymraeg. Rydym wedi derbyn £100,000 ar gyfer y prosiect hwn. Am fwy o wybodaeth am y prosiect hwn, gweler yma.
- 2022-23: CI, prosiect 'Nofio Gwyllt a Mannau Glas: Mobileiddio gwybodaeth a phartneriaethau rhyngddisgyblaethol i frwydro yn erbyn anghydraddoldebau iechyd ar raddfa' (gydag Adolphs, Nottingham fel PI). Nod y prosiect hwn yw datblygu dull dulliau cymysg newydd, gan dynnu ar ieithyddiaeth corpws a dadansoddi naratif, i greu negeseuon iechyd cyhoeddus effeithiol (gyda ffocws ar fanteision nofio gwyllt) sy'n cynnwys cynnwys cynnwys o ystod o ddisgyblaethau academaidd. Yn y pen draw, bydd y prosiect hwn o fudd i'r unigolion niferus a'r cymunedau amrywiol a fydd yn cael eu galluogi i fwynhau nofio gwyllt mewn ffordd ddiogel i wella iechyd, ac i gael mwy o ymwybyddiaeth o natur mannau glas a'u rôl fel ased cymunedol. Rydym wedi derbyn £178,000 ar gyfer y prosiect hwn. Ewch i wefan y prosiect yma.
- 2021-24: Cyd-PI (gydag Anne O'Keeffe, Coleg Mary Immaculate), ariennir AHRC/IRC prosiect 'Rhyngweithio amrywiad ar-lein: harneisio technolegau sy'n dod i'r amlwg yn y dyniaethau digidol i ddadansoddi disgwrs ar-lein mewn cyd-destunau gweithle gwahanol'. Gan weithio gyda chydweithwyr o Goleg Mary Immaculate, Prifysgol Abertawe, Prifysgol Nottingham, Coleg Prifysgol Dulyn, a Phrifysgol Aberdeen, nod y prosiect oedd archwilio cyfathrebu rhithwir yn y gweithle i gael dealltwriaeth fanwl o'r rhwystrau posibl i gyfathrebu effeithiol. Ein hail nod oedd cynnig y genhedlaeth nesaf o fframweithiau ar gyfer dadansoddi disgwrs ar-lein a bydd yn sicrhau bod y fframweithiau hyn ar gael i holl ymchwil y celfyddydau a'r dyniaethau a chymunedau defnyddwyr terfynol. Cawsom £390,000 gan AHRC +€270,000 [tua £620,700] gan IRC ar gyfer y prosiect hwn. Ewch i wefan y prosiect yma.
Rhwng 2016-2020, roeddwn hefyd yn PI ar brosiect 'CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes (Corpws Cenedlaethol y Gymraeg Cyfoes): Dull cymunedol o adeiladu corpws ieithyddol'. Wedi'i ariannu gan yr ESRC (Cyngor Ymchwil Economaidd a Chymdeithasol) a'r AHRC (Cyngor Ymchwil y Celfyddydau a'r Dyniaethau), arweiniodd y prosiect rhyngddisgyblaethol ac aml-sefydliadol hwn gwerth £1.8 miliwn at greu corpws ffynhonnell agored ar raddfa fawr o Gymraeg gyfoes. Mae manylion llawn allbynnau'r prosiect, gan gynnwys dolenni i'r: rhyngwyneb ymholiad corpws, set ddata corpws llawn, adroddiad prosiect, pecyn cymorth pedagogig Y Tiwtiadur, tagger/tag-set CyTag a tagger/tag-set semantig CySemTag i'w gweld ar wefan prosiect CorCenCC a thrwy dudalen CorCenCC GitHub.
Mae manylion fy ngweithgareddau ymchwil eraill, a phrosiectau a ariannwyd yn flaenorol, i'w gweld ar y tab 'ymchwil' ar y dudalen hon.
O ran rolau arweinyddiaeth allanol a phroffesiynol, roeddwn yn Gadeirydd BAAL (Cymdeithas Ieithyddiaeth Gymhwysol Prydain) rhwng 2018 a 2021. Mae BAAL yn gymdeithas ddysgedig gyda dros 1,300 o aelodau yn rhyngwladol, sy'n golygu mai dyma'r fforwm mwyaf dylanwadol i academyddion a gweithwyr proffesiynol sydd â diddordeb mewn iaith ac ieithyddiaeth gymhwysol yn y DU a thu hwnt. Am fwy o wybodaeth ewch i: www.baal.org.uk
Ar hyn o bryd rwy'n aelod o Rwydwaith Cynghori Strategol (ESRC) y Cyngor Ymchwil Economaidd a Chymdeithasol (SAN) - 2021-2024. Mae'r SAN yn cynnwys arbenigwyr blaenllaw o'r cymunedau academaidd a defnyddwyr. Mae'n helpu'r ESRC i fanteisio ar gyfleoedd a chael mynediad at lais ac arbenigedd ei gymunedau. Am fwy o fanylion am y SAN, gweler yma. Rwyf hefyd yn arweinydd academaidd Coleg Adolygu Cyfoed yr AHRC (2022-2025) a Choleg Adolygu Cymheiriaid ESRC (2024+), yr arweinydd strategol ar gyfer IAA ESRC ym Mhrifysgol Caerdydd (2023-2026), ac ar hyn o bryd fi yw Cyfarwyddwr Cyllid Ymchwil ENCAP.
Rwy'n Gymrawd Cymdeithas Ddysgedig Cymru (FLSW, 2023+).
- Knight, D., Khallaf, N., Rayson, P., El-Haj, M., Ezeani, I. and Morris, S. 2024. FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit. Applied Corpus Linguistics (10.1016/j.acorp.2024.100103)
- Morris, J., Arfon, E., Khallaf, N., El-Haj, M. and Knight, D. 2024. Datblygu thesawrws y Gymraeg drwy dechnoleg. [Online]. Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- O'Keeffe, A. et al. 2024. “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions. Research in Corpus Linguistics 12(2) (10.32714/ricl.12.02.02)
- Fitzgerald, C. et al. 2024. Multi-modal considerations for social media discourse analysis: A specialised corpus of Twitter commentary on working from home. In: Coats, S. and Laippala, V. eds. Linguistics across Disciplinary Borders - The March of Data. London: Bloomsbury, pp. 187-212.
- Knight, D. et al. 2024. Indicating engagement in online workplace meetings: The role of backchannelling head nods. International Journal of Corpus Linguistics (IJCL)
- Arfon, E., Morris, J., Khalaf, N. and Knight, D. 2024. Developing the Welsh thesaurus through technology. [Online]. Golwg 360 Cymru - Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- Adolphs, S., Chen, Y. and Knight, D. 2024. Towards a speech-gesture profile of discourse markers: The case of "I mean". Lingua
- Vilar Lluch, S., McClaughlin, S., Adolphs, S., Knight, D. and Nichele, E. 2024. The effects of modal value and imperative mood on self-predicted compliance to health guidance: The case of COVID-19. Text & Talk
- Vilar-Lluch, S., McClaughlin, E., Knight, D., Adolphs, S. and Nichele, E. 2023. The language of vaccination campaigns during COVID-19. Medical Humanities 49(3), pp. 487-496. (10.1136/medhum-2022-012583)
- Knight, D., Fitzpatrick, T., Morris, S., Tovey-Walsh, B., Prosser, H. and Davies, E. 2023. Corpus to curriculum: Developing word lists for adult learners of Welsh. Applied Corpus Linguistic 3(2), article number: 100052. (10.1016/j.acorp.2023.100052)
- Adolphs, S. et al. 2023. Communicating health threats: Linguistic evidence for effective public health messaging during the Covid-19 pandemic. University of Nottingham.
- Khallaf, N. et al. 2023. Open-source thesaurus development for under-resourced languages: a Welsh case study. Presented at: LDK 2023 – 4th Conference on Language, Data and Knowledge, Vienna, Austria, 12-15 September 2023.
- McClaughlin, E. et al. 2022. The reception of public health messages during the COVID-19 pandemic. Applied Corpus Linguistics 3(1), article number: 100037. (10.1016/j.acorp.2022.100037)
- Ezeani, I., El-Haj, M., Morris, J. and Knight, D. 2022. Introducing the Welsh text summarisation dataset and baseline systems. Presented at: 13th ELRA Language Resources and Evaluation Conference (LREC 2022), Marseille, France, 20-25 June 2022.
- El-Haj, M., Ezeani, I., Morris, J. and Knight, D. 2022. Creation of an evaluation corpus and baseline evaluation scores for Welsh text summarisation. Presented at: 4th Celtic Language Technology Workshop (CLTW 2022), Marseille, France, 20 June 2022.
- Clos, J., McClaughlin, E., Barnard, P., Nichele, E., Knight, D., McAuley, D. and Adolphs, S. 2022. PriPA: a tool for privacy-preserving analytics of linguistic data. Presented at: Legal and Ethical Issues in Human Language Technologies 2022, Marseille, France, 24 June 2022.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M. and Knight, D. 2022. Welsh automatic text summarisation. Presented at: Wales Academic Symposium on Language Technologies 2022, Bangor, Wales, 28/01/2022Language and Technology in Wales, Vol. 2. Bangor: Banolfan Bedwyr
- McClaughlin, E. et al. 2021. Privacy preserving corpus linguistics: investigating the trajectories of public health messaging online. University of Nottingham.
- Muralidaran, V., Spasic, I. and Knight, D. 2021. A systematic review of unsupervised approaches to grammar induction. Natural Language Engineering 27(6), pp. 647-689. (10.1017/S1351324920000327)
- Knight, D., Morris, S., Arman, L., Needs, J. and Rees, M. 2021. Building a national corpus: a Welsh language case study. Basingstoke: Palgrave Macmillan.
- Knight, D., Loizides, F., Neale, S., Anthony, L. and Spasic, I. 2021. Developing computational infrastructure for the CorCenCC corpus - the National Corpus of Contemporary Welsh. Language Resources and Evaluation 55, pp. 789-816. (10.1007/s10579-020-09501-9)
- McClaughlin, E. et al. 2021. Public health messaging by political leaders: a corpus linguistic analysis of COVID-19 speeches delivered by Boris Johnson. University of Nottingham. Available at: https://doi.org/10.17639/3fgb-fn44
- Corcoran, P., Palmer, G., Arman, L., Knight, D. and Spasic, I. 2021. Creating Welsh language word embeddings. Applied Sciences 11(15), article number: 6896. (10.3390/app11156896)
- Espinosa-Anke, L., Palmer, G., Filimonov, M., Corcoran, P., Spasic, I. and Knight, D. 2021. English–Welsh cross-lingual embeddings. Applied Sciences 11(14), article number: 6541. (10.3390/app11146541)
- Knight, D., Morris, S. and Fitzpatrick, T. 2021. Corpus design and construction in minoritised language contexts - Cynllunio a chreu corpws mewn cyd-destunau Ieithoedd lleiafrifoledig: The National Corpus of Contemporary Welsh - Corpws Cenedlaethol Cymraeg Cyfoes. Basingstoke: Palgrave Macmillan.
- McClaughlin, E. et al. 2021. Using online news comments to gather fast feedback on issues with public health messaging: The Guardian as a case study. Project Report. [Online]. University of Nottingham. Available at: https://nottingham-repository.worktribe.com/output/5717332
- Palmer, G., Corcoran, P., Arman, L., Knight, D. and Spasic, I. 2021. A closer look at Welsh word embeddings. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 21-29.
- Muralidaran, V., Palmer, G., Arman, L., O'Hare, K., Knight, D. and Spasic, I. 2021. A practical implementation of a porter stemmer for Welsh. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 30-43.
- Chen, Y., Adolphs, S. and Knight, D. 2020. Multimodal discourse analysis. In: Friginal, E. and Hardy, J. eds. The Routledge Handbook of Corpus Approaches to Discourse Analysis. London: Routledge
- Knight, D. and Adolphs, S. 2020. Multimodal corpora. In: Paquot, M. and Gries, S. T. eds. A Practical Handbook of Corpus Linguistics. Springer International Publishing, pp. 351-369.
- Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I. and Môn Thomas, E. 2020. The national corpus of contemporary Welsh: project report | Y corpws cenedlaethol Cymraeg cyfoes: adroddiad y prosiect.. Project Report. CorCenCC.
- Muralidaran, V., Spasic, I. and Knight, D. 2020. A cognitive approach to parsing with neural networks. Presented at: International Conference on Statistical Language and Speech Processing (SLSP), Cardiff, UK, 14–16 Oct 2020Statistical Language and Speech Processing, Vol. 12379. Springer Verlag pp. 71-84., (10.1007/978-3-030-59430-5_6)
- Adolphs, S., Knight, D., Smith, C. and Price, D. 2020. Crowdsourcing formulaic phrases: towards a new type of spoken corpus. Corpora 15(2), pp. 141-168. (10.3366/COR.2020.0192)
- Adolphs, S. and Knight, D. eds. 2020. The Routledge handbook of English language and digital humanities. Routledge Handbooks in English Language Studies. Abingdon: Routledge.
- Ezeani, I., Piao, S., Neale, S., Rayson, P. and Knight, D. 2019. Leveraging pre-trained embeddings for Welsh Taggers. Presented at: 4th Workshop on Representation Learning for NLP, Florence, Italy, July 2019ACL Anthology: Proceedings of the 4th Workshop on Representation Learning for NLP, Vol. W19-43. Association for Computational Linguistics pp. -., (10.18653/v1/W19-4332)
- Spasic, I., Owen, D., Knight, D. and Artemiou, A. 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019 Presented at Lynn, T. et al. eds.Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation
- Piao, S., Rayson, P., Knight, D. and Watkins, G. 2018. Towards a Welsh semantic annotation system.. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan., 7 - 12 May 2018.
- Neale, S., Donnelly, K., Watkins, G. and Knight, D. 2018. Leveraging lexical resources and constraint grammar for rule-based part-of-speech tagging in Welsh. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan, 7 - 12 May 2018.
- Neale, S. et al. 2017. The CorCenCC crowdsourcing app: a bespoke tool for the user-driven creation of the national corpus of contemporary Welsh. Presented at: The 9th International Corpus Linguistics Conference, Birmingham, UK, 24-28 July 2017.
- Knight, D., Walsh, S. and Papagiannidis, S. 2017. I’m having a spring clear out: a corpus-based analysis of e-transactional discourse. Applied Linguistics 38(2), pp. 234-257. (10.1093/applin/amv019)
- Walsh, S. and Knight, D. 2016. Analysing spoken discourse in University small group teaching. In: Corrigan, K. P. and Mearns, A. eds. Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement., Vol. 3. Basingstoke: Palgrave Macmillan, pp. 291-319.
- Knight, D. et al. 2016. Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages. Presented at: LREC 2016, Tenth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Portoro, Slovenia, 23-28 May 2016.
- Seedhouse, P. and Dawn, K. 2016. Applying digital sensor technology: A problem-solving approach. Applied Linguistics 37(1), pp. 7-32. (10.1093/applin/amv065)
- Knight, D. 2015. e-Language: communication in the digital age. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Basingstoke: Palgrave Macmillan, London, pp. 20-40., (10.1057/9781137431738_2)
- Crabtree, A., Tennent, P., Brundell, P. and Knight, D. 2015. Digital records and the digital replay system. In: Halfpenny, P. J. and Proctor, R. eds. Innovations in Digital Research Methods. London: Sage
- Dörk, M. and Knight, D. 2015. WordWanderer: A navigational approach to text visualisation. Corpora 10(1), pp. 83-94. (10.3366/cor.2015.0067)
- Adolphs, S. and Knight, D. 2015. Beyond monomodal spoken corpora. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Houndsmill, Basingstoke: Palgrave Macmillan, pp. 41-62.
- Knight, D., Adolphs, S. and Ronald, C. 2014. CANELC – constructing an e-language corpus. Corpora 9(1), pp. 29-56. (10.3366/cor.2014.0050)
- Knight, D., Adolphs, S. and Carter, R. 2013. Formality in digital discourse: a study of hedging in CANELC. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 131-152., (10.1007/978-94-007-6250-3_7)
- Knight, D. 2013. Corpus linguistics: methods, theory and practice by Tony McEnery and Andrew Hardie [Book Review]. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 275-277., (10.1007/978-94-007-6250-3_13)
- Knight, D. 2011. Multimodality and active listenership: a corpus approach. Corpus and discourse. London: Bloomsbury.
- Knight, D. 2011. The future of multimodal corpora. Revista Brasileira de Linguística Aplicada 11(2), pp. 391-415. (10.1590/S1984-63982011000200006)
- Adolphs, S., Knight, D. and Carter, R. 2011. Capturing context for heterogeneous corpus analysis: some first steps. International journal of corpus linguistics 16(3), pp. 305-324. (10.1075/ijcl.16.3.02ado)
- Knight, D., Tennent, P., Adolphs, S. and Carter, R. 2010. Developing heterogeneous corpora using the Digital Replay System (DRS).. Presented at: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 18 May 2010 Presented at Kipp, M. et al. eds.Proceedings of the LREC 2010 (Language Resources Evaluation Conference) Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, May 2010, Malta.. European Language Resources Association pp. 16-21.
- Adolphs, S. and Knight, D. 2010. Building a spoken corpus: What are the basics?. In: O’Keeffe, A. and McCarthy, M. eds. The Routledge handbook of corpus linguistics. Routledge handbooks in applied linguistics Oxford: Routledge
- Knight, D., Evans, D., Carter, R. and Adolphs, S. 2009. HeadTalk, HandTalk and the corpus: towards a framework for multi-modal, multi-media corpus development. Corpora 4(1), pp. 1-32. (10.3366/E1749503209000203)
- Knight, D. 2009. A multi-modal corpus approach to the analysis of backchanneling behaviour. PhD Thesis, University of Nottingham.
- Brundell, P. et al. 2008. The experience of using Digital Replay System for social science research. Presented at: 4th International Conference on e-Social Science (ICeSS), Manchester, UK, 18-20 June 2008Proceedings of the 4th International Conference on e-Social Science (ICeSS), Manchester, 18-20 June 2008. ICeSS pp. 1-10.
- Knight, D. and Tennent, P. 2008. Introducing DRS (The Digital Replay System): A tool for the future of corpus linguistic research and analysis. Presented at: Sixth International Conference on Language Resources and Evaluation (LREC'08, Marrakesh, Morocco, 26 May -1 June 2008 Presented at Calzolari, N. et al. eds.Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 26-31.
- Knight, D., Adolphs, S., Tennent, P. and Carter, R. 2008. The Nottingham Multi-Modal Corpus: a demonstration. Presented at: 6th Language Resources and Evaluation Conference (LREC), Marrakesh, Morocco, 28-30 May 2008Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 1-7.
- Knight, D. and Adolphs, S. 2008. Multi-modal corpus pragmatics: the case of active listenership. In: Romero-Trillo, J. ed. Pragmatics and corpus linguistics: a mutualistic entente. Mouton series in pragmatics Vol. 2. Mouton de Gruyter, pp. 175-190.
- Brundell, P. et al. 2008. Digital Replay System (DRS): a tool for interaction analysis. Presented at: ICLS2008: International Perspectives in the Learning Sciences Cre8ing a learning world, Utrecht, The Netherlands, 23-28 June 2008.
- Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. 2006. Beyond the text: construction and analysis of multi-modal linguistic corpora. Presented at: 2nd International Conference on e-Social Science, Manchester, UK, 28-30 June 2006Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28 - 30 June 2006.. ICeSS pp. n/a.
- Knight, D., Khallaf, N., Rayson, P., El-Haj, M., Ezeani, I. and Morris, S. 2024. FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit. Applied Corpus Linguistics (10.1016/j.acorp.2024.100103)
- O'Keeffe, A. et al. 2024. “We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions. Research in Corpus Linguistics 12(2) (10.32714/ricl.12.02.02)
- Knight, D. et al. 2024. Indicating engagement in online workplace meetings: The role of backchannelling head nods. International Journal of Corpus Linguistics (IJCL)
- Adolphs, S., Chen, Y. and Knight, D. 2024. Towards a speech-gesture profile of discourse markers: The case of "I mean". Lingua
- Vilar Lluch, S., McClaughlin, S., Adolphs, S., Knight, D. and Nichele, E. 2024. The effects of modal value and imperative mood on self-predicted compliance to health guidance: The case of COVID-19. Text & Talk
- Vilar-Lluch, S., McClaughlin, E., Knight, D., Adolphs, S. and Nichele, E. 2023. The language of vaccination campaigns during COVID-19. Medical Humanities 49(3), pp. 487-496. (10.1136/medhum-2022-012583)
- Knight, D., Fitzpatrick, T., Morris, S., Tovey-Walsh, B., Prosser, H. and Davies, E. 2023. Corpus to curriculum: Developing word lists for adult learners of Welsh. Applied Corpus Linguistic 3(2), article number: 100052. (10.1016/j.acorp.2023.100052)
- McClaughlin, E. et al. 2022. The reception of public health messages during the COVID-19 pandemic. Applied Corpus Linguistics 3(1), article number: 100037. (10.1016/j.acorp.2022.100037)
- Muralidaran, V., Spasic, I. and Knight, D. 2021. A systematic review of unsupervised approaches to grammar induction. Natural Language Engineering 27(6), pp. 647-689. (10.1017/S1351324920000327)
- Knight, D., Loizides, F., Neale, S., Anthony, L. and Spasic, I. 2021. Developing computational infrastructure for the CorCenCC corpus - the National Corpus of Contemporary Welsh. Language Resources and Evaluation 55, pp. 789-816. (10.1007/s10579-020-09501-9)
- Corcoran, P., Palmer, G., Arman, L., Knight, D. and Spasic, I. 2021. Creating Welsh language word embeddings. Applied Sciences 11(15), article number: 6896. (10.3390/app11156896)
- Espinosa-Anke, L., Palmer, G., Filimonov, M., Corcoran, P., Spasic, I. and Knight, D. 2021. English–Welsh cross-lingual embeddings. Applied Sciences 11(14), article number: 6541. (10.3390/app11146541)
- Adolphs, S., Knight, D., Smith, C. and Price, D. 2020. Crowdsourcing formulaic phrases: towards a new type of spoken corpus. Corpora 15(2), pp. 141-168. (10.3366/COR.2020.0192)
- Knight, D., Walsh, S. and Papagiannidis, S. 2017. I’m having a spring clear out: a corpus-based analysis of e-transactional discourse. Applied Linguistics 38(2), pp. 234-257. (10.1093/applin/amv019)
- Seedhouse, P. and Dawn, K. 2016. Applying digital sensor technology: A problem-solving approach. Applied Linguistics 37(1), pp. 7-32. (10.1093/applin/amv065)
- Dörk, M. and Knight, D. 2015. WordWanderer: A navigational approach to text visualisation. Corpora 10(1), pp. 83-94. (10.3366/cor.2015.0067)
- Knight, D., Adolphs, S. and Ronald, C. 2014. CANELC – constructing an e-language corpus. Corpora 9(1), pp. 29-56. (10.3366/cor.2014.0050)
- Knight, D. 2011. The future of multimodal corpora. Revista Brasileira de Linguística Aplicada 11(2), pp. 391-415. (10.1590/S1984-63982011000200006)
- Adolphs, S., Knight, D. and Carter, R. 2011. Capturing context for heterogeneous corpus analysis: some first steps. International journal of corpus linguistics 16(3), pp. 305-324. (10.1075/ijcl.16.3.02ado)
- Knight, D., Evans, D., Carter, R. and Adolphs, S. 2009. HeadTalk, HandTalk and the corpus: towards a framework for multi-modal, multi-media corpus development. Corpora 4(1), pp. 1-32. (10.3366/E1749503209000203)
Book sections
- Fitzgerald, C. et al. 2024. Multi-modal considerations for social media discourse analysis: A specialised corpus of Twitter commentary on working from home. In: Coats, S. and Laippala, V. eds. Linguistics across Disciplinary Borders - The March of Data. London: Bloomsbury, pp. 187-212.
- Palmer, G., Corcoran, P., Arman, L., Knight, D. and Spasic, I. 2021. A closer look at Welsh word embeddings. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 21-29.
- Muralidaran, V., Palmer, G., Arman, L., O'Hare, K., Knight, D. and Spasic, I. 2021. A practical implementation of a porter stemmer for Welsh. In: Prys, D. ed. Language and Technology in Wales: Volume 1. Bangor: Bangor University, pp. 30-43.
- Chen, Y., Adolphs, S. and Knight, D. 2020. Multimodal discourse analysis. In: Friginal, E. and Hardy, J. eds. The Routledge Handbook of Corpus Approaches to Discourse Analysis. London: Routledge
- Knight, D. and Adolphs, S. 2020. Multimodal corpora. In: Paquot, M. and Gries, S. T. eds. A Practical Handbook of Corpus Linguistics. Springer International Publishing, pp. 351-369.
- Walsh, S. and Knight, D. 2016. Analysing spoken discourse in University small group teaching. In: Corrigan, K. P. and Mearns, A. eds. Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement., Vol. 3. Basingstoke: Palgrave Macmillan, pp. 291-319.
- Knight, D. 2015. e-Language: communication in the digital age. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Basingstoke: Palgrave Macmillan, London, pp. 20-40., (10.1057/9781137431738_2)
- Crabtree, A., Tennent, P., Brundell, P. and Knight, D. 2015. Digital records and the digital replay system. In: Halfpenny, P. J. and Proctor, R. eds. Innovations in Digital Research Methods. London: Sage
- Adolphs, S. and Knight, D. 2015. Beyond monomodal spoken corpora. In: Baker, P. and McEnery, T. eds. Corpora and Discourse Studies: Integrating Discourse and Corpora. Palgrave Advances in Language and Linguistics Houndsmill, Basingstoke: Palgrave Macmillan, pp. 41-62.
- Knight, D., Adolphs, S. and Carter, R. 2013. Formality in digital discourse: a study of hedging in CANELC. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 131-152., (10.1007/978-94-007-6250-3_7)
- Knight, D. 2013. Corpus linguistics: methods, theory and practice by Tony McEnery and Andrew Hardie [Book Review]. In: Romero-Trillo, J. ed. Yearbook of corpus linguistics and pragmatics 2013: new domains and methodologies. Yearbook of corpus linguistics and pragmatics Vol. 1. Springer Netherlands, pp. 275-277., (10.1007/978-94-007-6250-3_13)
- Adolphs, S. and Knight, D. 2010. Building a spoken corpus: What are the basics?. In: O’Keeffe, A. and McCarthy, M. eds. The Routledge handbook of corpus linguistics. Routledge handbooks in applied linguistics Oxford: Routledge
- Knight, D. and Adolphs, S. 2008. Multi-modal corpus pragmatics: the case of active listenership. In: Romero-Trillo, J. ed. Pragmatics and corpus linguistics: a mutualistic entente. Mouton series in pragmatics Vol. 2. Mouton de Gruyter, pp. 175-190.
- Knight, D., Morris, S., Arman, L., Needs, J. and Rees, M. 2021. Building a national corpus: a Welsh language case study. Basingstoke: Palgrave Macmillan.
- Knight, D., Morris, S. and Fitzpatrick, T. 2021. Corpus design and construction in minoritised language contexts - Cynllunio a chreu corpws mewn cyd-destunau Ieithoedd lleiafrifoledig: The National Corpus of Contemporary Welsh - Corpws Cenedlaethol Cymraeg Cyfoes. Basingstoke: Palgrave Macmillan.
- Adolphs, S. and Knight, D. eds. 2020. The Routledge handbook of English language and digital humanities. Routledge Handbooks in English Language Studies. Abingdon: Routledge.
- Knight, D. 2011. Multimodality and active listenership: a corpus approach. Corpus and discourse. London: Bloomsbury.
- Khallaf, N. et al. 2023. Open-source thesaurus development for under-resourced languages: a Welsh case study. Presented at: LDK 2023 – 4th Conference on Language, Data and Knowledge, Vienna, Austria, 12-15 September 2023.
- Ezeani, I., El-Haj, M., Morris, J. and Knight, D. 2022. Introducing the Welsh text summarisation dataset and baseline systems. Presented at: 13th ELRA Language Resources and Evaluation Conference (LREC 2022), Marseille, France, 20-25 June 2022.
- El-Haj, M., Ezeani, I., Morris, J. and Knight, D. 2022. Creation of an evaluation corpus and baseline evaluation scores for Welsh text summarisation. Presented at: 4th Celtic Language Technology Workshop (CLTW 2022), Marseille, France, 20 June 2022.
- Clos, J., McClaughlin, E., Barnard, P., Nichele, E., Knight, D., McAuley, D. and Adolphs, S. 2022. PriPA: a tool for privacy-preserving analytics of linguistic data. Presented at: Legal and Ethical Issues in Human Language Technologies 2022, Marseille, France, 24 June 2022.
- Morris, J., Ezeani, I., Gruffydd, I., Young, K., Davies, L., El-Haj, M. and Knight, D. 2022. Welsh automatic text summarisation. Presented at: Wales Academic Symposium on Language Technologies 2022, Bangor, Wales, 28/01/2022Language and Technology in Wales, Vol. 2. Bangor: Banolfan Bedwyr
- Muralidaran, V., Spasic, I. and Knight, D. 2020. A cognitive approach to parsing with neural networks. Presented at: International Conference on Statistical Language and Speech Processing (SLSP), Cardiff, UK, 14–16 Oct 2020Statistical Language and Speech Processing, Vol. 12379. Springer Verlag pp. 71-84., (10.1007/978-3-030-59430-5_6)
- Ezeani, I., Piao, S., Neale, S., Rayson, P. and Knight, D. 2019. Leveraging pre-trained embeddings for Welsh Taggers. Presented at: 4th Workshop on Representation Learning for NLP, Florence, Italy, July 2019ACL Anthology: Proceedings of the 4th Workshop on Representation Learning for NLP, Vol. W19-43. Association for Computational Linguistics pp. -., (10.18653/v1/W19-4332)
- Spasic, I., Owen, D., Knight, D. and Artemiou, A. 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019 Presented at Lynn, T. et al. eds.Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation
- Piao, S., Rayson, P., Knight, D. and Watkins, G. 2018. Towards a Welsh semantic annotation system.. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan., 7 - 12 May 2018.
- Neale, S., Donnelly, K., Watkins, G. and Knight, D. 2018. Leveraging lexical resources and constraint grammar for rule-based part-of-speech tagging in Welsh. Presented at: LREC (Language Resources Evaluation) 2018 Conference, Miyazaki, Japan, 7 - 12 May 2018.
- Neale, S. et al. 2017. The CorCenCC crowdsourcing app: a bespoke tool for the user-driven creation of the national corpus of contemporary Welsh. Presented at: The 9th International Corpus Linguistics Conference, Birmingham, UK, 24-28 July 2017.
- Knight, D. et al. 2016. Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages. Presented at: LREC 2016, Tenth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Portoro, Slovenia, 23-28 May 2016.
- Knight, D., Tennent, P., Adolphs, S. and Carter, R. 2010. Developing heterogeneous corpora using the Digital Replay System (DRS).. Presented at: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 18 May 2010 Presented at Kipp, M. et al. eds.Proceedings of the LREC 2010 (Language Resources Evaluation Conference) Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, May 2010, Malta.. European Language Resources Association pp. 16-21.
- Brundell, P. et al. 2008. The experience of using Digital Replay System for social science research. Presented at: 4th International Conference on e-Social Science (ICeSS), Manchester, UK, 18-20 June 2008Proceedings of the 4th International Conference on e-Social Science (ICeSS), Manchester, 18-20 June 2008. ICeSS pp. 1-10.
- Knight, D. and Tennent, P. 2008. Introducing DRS (The Digital Replay System): A tool for the future of corpus linguistic research and analysis. Presented at: Sixth International Conference on Language Resources and Evaluation (LREC'08, Marrakesh, Morocco, 26 May -1 June 2008 Presented at Calzolari, N. et al. eds.Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 26-31.
- Knight, D., Adolphs, S., Tennent, P. and Carter, R. 2008. The Nottingham Multi-Modal Corpus: a demonstration. Presented at: 6th Language Resources and Evaluation Conference (LREC), Marrakesh, Morocco, 28-30 May 2008Proceedings of the 6th Language Resources and Evaluation Conference (LREC), Palais des Congrés, Marrakech, Morocco, 28-30th May 2008. European Language Resources Association pp. 1-7.
- Brundell, P. et al. 2008. Digital Replay System (DRS): a tool for interaction analysis. Presented at: ICLS2008: International Perspectives in the Learning Sciences Cre8ing a learning world, Utrecht, The Netherlands, 23-28 June 2008.
- Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. 2006. Beyond the text: construction and analysis of multi-modal linguistic corpora. Presented at: 2nd International Conference on e-Social Science, Manchester, UK, 28-30 June 2006Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28 - 30 June 2006.. ICeSS pp. n/a.
- Adolphs, S. et al. 2023. Communicating health threats: Linguistic evidence for effective public health messaging during the Covid-19 pandemic. University of Nottingham.
- McClaughlin, E. et al. 2021. Privacy preserving corpus linguistics: investigating the trajectories of public health messaging online. University of Nottingham.
- McClaughlin, E. et al. 2021. Public health messaging by political leaders: a corpus linguistic analysis of COVID-19 speeches delivered by Boris Johnson. University of Nottingham. Available at: https://doi.org/10.17639/3fgb-fn44
- McClaughlin, E. et al. 2021. Using online news comments to gather fast feedback on issues with public health messaging: The Guardian as a case study. Project Report. [Online]. University of Nottingham. Available at: https://nottingham-repository.worktribe.com/output/5717332
- Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I. and Môn Thomas, E. 2020. The national corpus of contemporary Welsh: project report | Y corpws cenedlaethol Cymraeg cyfoes: adroddiad y prosiect.. Project Report. CorCenCC.
- Knight, D. 2009. A multi-modal corpus approach to the analysis of backchanneling behaviour. PhD Thesis, University of Nottingham.
- Morris, J., Arfon, E., Khallaf, N., El-Haj, M. and Knight, D. 2024. Datblygu thesawrws y Gymraeg drwy dechnoleg. [Online]. Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
- Arfon, E., Morris, J., Khalaf, N. and Knight, D. 2024. Developing the Welsh thesaurus through technology. [Online]. Golwg 360 Cymru - Gwerddon Fach: Golwg Ltd. Available at: https://golwg.360.cymru/gwerddon/2143591-datblygu-thesawrws-gymraeg-drwy-dechnoleg
Diddordebau ymchwil:
Rwy'n ieithydd cymhwysol y mae ei ddiddordebau ymchwil ym meysydd ieithyddiaeth corpws, dadansoddi disgwrs ac amlfoddedd. Mae gen i arbenigedd mewn cysyniadu, damcaniaethu a chymhwyso dulliau/methodolegau rhyngddisgyblaethol arloesol ar gyfer echdynnu a rhagfynegi patrymau iaith o fewn/ar draws cyd-destunau cymdeithasol ac ieithyddol (o fewn cwmpas eang y meysydd ymchwil uchod). Er ei fod wedi'i leoli wrth wraidd y maes Ieithyddiaeth a'r Dyniaethau Digidol, mae fy ymchwil yn rhyngddisgyblaethol yn sylfaenol, ac mae hyn yn cael ei adlewyrchu yn natur aml-awdur fy nghyhoeddiadau a phrosiectau ymchwil rhyngddisgyblaethol.
Mae fy ngwaith ar ddatblygu adnoddau Cymraeg, gyda chymorth grantiau mawr AHRC, ESRC a Llywodraeth Cymru (e.e. CorCenCC®, gweler yma hefyd am ragor o wybodaeth), yn anelu at newid tirwedd ymchwil iaith leiafrifol a chymwysiadau posibl ymholi seiliedig ar gorpora/corpws yn y byd go iawn.
Rwyf (ar y cyd) wedi cyflwyno 104 o bapurau a phosteri, ac wedi cyflwyno 48 o nodiadau allweddol mewn seminarau a chynadleddau ers 2006.
Prosiectau ymchwil a ariennir yn allanol:
- 2023: £20,000 a dderbyniwyd gan Lywodraeth Cymru i greu safle adnoddau Cymraeg GDC-WDG.
- 2022: £90,000 a dderbyniwyd gan Lywodraeth Cymru ar gyfer prosiect 'ThACC – Thesawrws Ar-lein Cymraeg Cyfoes - Defnyddio Gwreiddiau Geiriau i Greu Thesawrws o Gymraeg Cyfoes'. Gan weithio gyda chydweithwyr o GYMRAEG a Chyfrifiadureg ym Mhrifysgolion Caerdydd a Lancaster (gyda Morris fel PI - rwy'n un o'r CIs), datblygodd y prosiect thesawrws mynediad agored, sydd ar gael yn rhwydd ar-lein yn y Gymraeg, i siaradwyr Cymraeg a dysgwyr fel ei gilydd.
- 2022: £178,000 a dderbyniwyd gan AHRC ar gyfer y prosiect 'Wild Swimming and Blue Spaces: Mobilising interdisciplinary knowledge and partnerships to combat health inequality at scale' (gydag Adolphs, Nottingham fel PI - rwy'n un o'r CIs). Datblygodd y prosiect hwn ddull dulliau cymysg newydd, gan dynnu ar ieithyddiaeth corpws a dadansoddi naratif, ar gyfer negeseuon iechyd cyhoeddus effeithiol (gyda ffocws ar fanteision nofio gwyllt) sy'n cynnwys cynnwys cynnwys o ystod o ddisgyblaethau academaidd. Ewch i wefan y prosiect yma.
- 2022: £100,000 a dderbyniwyd gan yr AHRC ar gyfer y prosiect 'FreeTxt: supporting bilingual free text survey and questionnaire data analysis'. Roeddwn i'n aelod o'r prosiect hwn. Gan weithio gyda chydweithwyr o Brifysgol Caerhirfryn, a chyd-ddylunio a chyd-adeiladu gyda phartneriaid Cadw ac Ymddiriedolaeth Genedlaethol Cymru, creodd y prosiect hwn offeryn dadansoddi testun rhad ac am ddim ar-lein ffynhonnell agored arloesol sy'n galluogi'r dadansoddiad cyflym a hawdd o ddata Saesneg a Chymraeg: FreeTxt. Ewch i wefan y prosiect yma.
- 2021-24: Cyd-PI (gydag Anne O'Keeffe, Coleg Mary Immaculate), ariennir AHRC/IRC 'Amrywiad rhyngweithiol ar-lein: harneisio technolegau sy'n dod i'r amlwg yn y dyniaethau digidol i ddadansoddi disgwrs ar-lein mewn cyd-destunau gweithle gwahanol'. Gan weithio gyda chydweithwyr o Goleg Mary Immaculate, Prifysgol Abertawe, Prifysgol Nottingham, Coleg Prifysgol Dulyn, a Phrifysgol Aberdeen, nod y prosiect oedd archwilio cyfathrebu rhithwir yn y gweithle i gael dealltwriaeth fanwl o'r rhwystrau posibl i gyfathrebu effeithiol. Ein hail nod oedd cynnig y genhedlaeth nesaf o fframweithiau ar gyfer dadansoddi disgwrs ar-lein a bydd yn sicrhau bod y fframweithiau hyn ar gael i holl ymchwil y celfyddydau a'r dyniaethau a chymunedau defnyddwyr terfynol. Cawsom £390,000 gan AHRC +€270,000 [tua £620,700] gan IRC ar gyfer y prosiect hwn. Ewch i wefan y prosiect yma.
- 2021: £14,988 a dderbyniwyd gan Gyfrif Cyflymu Effaith ESRC (IAA). Roedd hwn ar gyfer prosiect, gan weithio gyda'r Ganolfan Dysgu Cymraeg Genedlaethol, a oedd yn cefnogi creu rhestrau geirfa, yn seiliedig ar ddata a dynnwyd o CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes).
- 2021: £90,000 a dderbyniwyd gan Lywodraeth Cymru ar gyfer prosiect 'Crynhoad Testun Awtomatig Cymru'. Gan weithio gyda chydweithwyr o GYMRAEG a Chyfrifiadureg ym Mhrifysgolion Caerdydd a Lancaster, adeiladodd tîm y prosiect offeryn crynhoi a fydd yn caniatáu i weithwyr proffesiynol grynhoi'n gyflym ddogfennau hir i'w cyflwyno'n effeithlon. Ewch i wefan y prosiect yma.
- 2021: £450,000 a dderbyniwyd gan AHRC ar gyfer y prosiect 'Trafodaethau Coronafeirws: tystiolaeth ieithyddol ar gyfer negeseuon iechyd cyhoeddus effeithiol'. Wedi'i ddatblygu mewn partneriaeth ag Iechyd Cyhoeddus Lloegr, Iechyd Cyhoeddus Cymru ac NHS Education for Scotland, aeth y prosiect hwn i'r afael â'r heriau allweddol y mae pandemig y coronafeirws yn eu cyflwyno mewn perthynas â deall llif ac effaith negeseuon iechyd cyhoeddus fel yr adlewyrchir mewn trafodaethau cyhoeddus a phreifat. Dan arweiniad Svenja Adolphs (Nottingham - roeddwn yn CI ar y prosiect hwn), cynhaliodd y prosiect rhyngddisgyblaethol hwn y dadansoddiad graddfa fawr cyntaf o lwybrau negeseuon iechyd cyhoeddus yn ymwneud â phandemig y coronafeirws yn y DU [£465,000]. Ewch i wefan y prosiect yma.
- 2020: £90,000 a dderbyniwyd gan Lywodraeth Cymru ar gyfer y prosiect 'Dysgu Saesneg-Cymraeg dwyieithog ymgorffori a cheisiadau mewn categoreiddio testunau'. Roedd hwn yn brosiect rhyngddisgyblaethol a oedd yn cynnwys Irena Spasić, Padraig Corcoran, Luis Espinosa-Anke (Ysgol Cyfrifiadureg a Gwybodeg – COMSC) a Geraint Palmer (Ysgol Mathemateg) fel Cyd-ymchwilwyr (CIs). Roedd DP ar y prosiect hwn. Am fwy o wybodaeth, gweler yma.
- 2019: £90,000 a dderbyniwyd gan Lywodraeth Cymru ar gyfer y prosiect 'Geiriau Cymraeg yn ôl rhifau: "Cymru" + "cyfalaf" = "Caerdydd"' (gan ganolbwyntio ar wreiddio geiriau ar gyfer y Gymraeg). Rwy'n aelod o'r prosiect hwn.
- 2019: derbyniwyd £2,100 ar gyfer prosiect CUROP a ariennir yn fewnol o'r enw 'FreeTxt: analysing free text comments using a corpus-based approach'. Roeddwn i'n aelod o'r prosiect hwn.
- 2019: derbyniwyd £20,000 gan Lywodraeth Cymru ar gyfer prosiect Stemmer Cymru, roeddwn yn CI ar y prosiect hwn gydag Irena Spasić (Caerdydd) fel PI.
- 2018: derbyniwyd £2,100 ar gyfer prosiect CUROP a ariennir yn fewnol o'r enw 'Corpws Cenedlaethol Cymraeg Cyfoes: Corpws Cenedlaethol Cymraeg Cyfoes – ffocws ar ddata llafar'. Roeddwn i'n PI ar y prosiect hwn (gyda Lowri Williams).
- 2018: derbyniwyd £2,100 ar gyfer y prosiect CUROP a ariennir yn fewnol o'r enw 'Corpws Cenedlaethol Cymraeg Cyfoes: Corpws Cenedlaethol Cymraeg Cyfoes – tagio semantig ac anodi data'. Roeddwn i'n aelod o'r prosiect hwn (gyda Paul Rayson).
- 2017: £19,964 a dderbyniwyd o gronfa Grant Cymraeg 2050 i adeiladu WordNet yn awtomatig ar gyfer y Gymraeg, cronfa ddata geirfaol lle mae geiriau'n cael eu grwpio yn setiau o gyfystyron (synsets), sydd wedyn yn cael eu trefnu'n rhwydwaith o berthnasau lexico-semantig. Roeddwn i'n aelod o'r prosiect hwn.
- 2017: £2,000 a dderbyniwyd (fel PI) gan y British Council i gefnogi digwyddiad lansio ar gyfer prosiect CorCenCC (a gynhaliwyd ar 28 Chwefror 2017).
- 2016-19: £1,800,000 a dderbyniwyd gan ESRC a'r AHRC ar gyfer prosiect CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes (Corpws Cenedlaethol Cymraeg Cyfoes): Ymagwedd gymunedol at adeiladu corpws ieithyddol). Rwy'n aelod o'r prosiect hwn.
- 2016: derbyniwyd £1,600 ar gyfer y prosiect CUROP a ariennir yn fewnol o'r enw 'Analysis on non-verbal communication in construction industry interactions'. Roeddwn yn CI ar y prosiect hwn (gyda Mike Handford).
- 2015: £24,999 a dderbyniwyd gan Gais Cychwynnwr y Dyniaethau Digidol AHSS (Coleg y Celfyddydau, y Dyniaethau a'r Gwyddorau Cymdeithasol). Nod y cynnig rhwydwaith hwn yw meithrin capasiti sylweddol yn y Dyniaethau Digidol ym Mhrifysgol Caerdydd. Roeddwn yn CI ar y prosiect hwn.
- 2014: derbyniwyd £3,850 gan Gronfa Ymchwil Cyfadran Prifysgol Newcastle ar gyfer prosiect o'r enw 'Crowdsourcing data collection for corpus compilation: Scoping methods for the future' (gyda Patrick Olivier).
- 2013: derbyniwyd £3900 gan Gronfa Paratoi Cynnig Cyfadran Prifysgol Newcastle ar gyfer Corpws Cenedlaethol Cymraeg (CorCenCC) i gefnogi datblygiad y cais am gynnig.
- 2013: £17,500 o gyllid a dderbyniwyd gan Grantiau Ymchwil Aptis y Cyngor Prydeinig ar gyfer prosiect o'r enw 'Nodweddu cymhwysedd rhyngweithiadol mewn addysg uwch siarad grŵp bach'. Rwy'n gyd-fyfyriwr ar y prosiect hwn gyda Steve Walsh (PI) a Paul Seedhouse.
- 2012: £3,920 a dderbyniwyd gan Gronfa Ymchwil Cyfadran Prifysgol Newcastle ar gyfer prosiect peilot o'r enw 'Ystum a siarad 'yn y gwyllt' (gyda'r Athro Olivier).
Profiad / Ymchwil:
- Cymrawd Ymchwil ar Gyrchu Torf: Ymagwedd Seiliedig ar Becynnau Cymorth (2010-2011). Grant RCUK EP/G065802/1 Ymchwil Economi Ddigidol Horizon. Gwaith a wnaed ym Mhrifysgol Nottingham.
- Cydymaith Ymchwil ar DReSS II (Understanding Digital Records for eSocial Science (2008-2011). Grant ESRC Rhif RES-149-25-1067. Gwaith a wnaed ym Mhrifysgol Nottingham.
- Cynorthwy-ydd Ymchwil ar DReSS I (Understanding Digital Records for eSocial Science (2005-2008). Grant ESRC Rhif RES-149-25-0035 ar Headtalk (2005-2006). Grant ESRC Rhif RES-149-25-1016. Gwaith a wnaed ym Mhrifysgol Nottingham.
- Rwyf hefyd wedi bod yn rhan o waith gyda Gwasg Prifysgol Caergrawnt (CUP) ar y Prosiect Proffil Saesneg (EP) ac o 2009-2012 roeddwn yn rhan o'r gwaith o adeiladu CANELC, Corpws e-Iaith Caergrawnt a Nottingham (gan weithio gyda CUP a staff o Brifysgol Nottingham), y corpws cyntaf ar raddfa fawr o drafodaeth ddigidol.
- 2015: Tystysgrif mewn Astudiaethau Uwch mewn Ymarfer Academaidd, Prifysgol Newcastle
- 2004 – 2009: PhD mewn Ieithyddiaeth Gymhwysol, Prifysgol Nottingham
- Teitl traethawd ymchwil: Dull corpws amlfodd o ddadansoddi ymddygiad ôl-sianelu
- Cyllid: Enillydd gwobr ESRC + 3
- 2003 – 2004: MA mewn Ieithyddiaeth Gymhwysol, Prifysgol Nottingham
- 2000 – 2003: BA mewn Astudiaethau Saesneg, Prifysgol Nottingham
Aelodaethau proffesiynol
- Cymrodyr, Cymdeithas Ddysgedig Cymru (FLSW), 2023-presennol.
- Cymrawd Cyswllt yr Academi Addysg Uwch (AFHEA), 2013 – presennol.
- Aelod, BAAL (Cymdeithas Ieithyddiaeth Gymhwysol Prydain).
- Aelod o'r Pwyllgor Gwaith, CRiLLS (Canolfan Ymchwil mewn Ieithyddiaeth a Gwyddorau Iaith, Prifysgol Newcastle), 2011 – 2015.
- Aelod, CRAL (Canolfan Ymchwil mewn Ieithyddiaeth Gymhwysol), 2006 – 2011.
- Aelod, IVACS (Astudiaethau Corpws Cymhwysol Rhyng-Amrywiol), 2004 – presennol
- Aelod, AILA (International Association of Applied Linguistics), 2004 – presennol
- Aelod, Addysgu Iaith a Thechnoleg; Dysg ac Addysgu Iaith a chlystyrau ymchwil iLaB (TGCh) yn ECLS, 2012 – 2015.
Safleoedd academaidd blaenorol
- 2015 – present: Senior Lecturer in Applied Linguistics, Cardiff University
- 2014 – 2015: Senior Lecturer in Applied Linguistics, Newcastle University
- 2011 – 2014: Lecturer in Applied Linguistics, Newcastle University
- 2006 – 2011: Research Assistant (then Associate, then Fellow), The University of Nottingham
Pwyllgorau ac adolygu
- Aelod o fwrdd golygyddol Ieithyddiaeth Gymhwysol (cyfnodolyn, 2021+)
- Llysgennad y Sefydliad Ymchwil Arloesedd Data (DIRI) ym Mhrifysgol Caerdydd. Yn y rôl hon rwy'n arwain grŵp diddordeb arbennig (SIG) sy'n hwyluso cydweithio rhyngddisgyblaethol dwfn ledled y Brifysgol ym maes gwyddor data (2018+).
- Aelod o fwrdd golygyddol Elements in Corpus Linguistics (cyfres lyfrau) a gyhoeddwyd gan Cambridge University Press.
- Trefnydd arweiniol a Chadeirydd cynhadledd BAAL ar-lein 2020. Cofrestrodd dros 400 o aelodau o'r gymdeithas i gymryd rhan yn y gynhadledd hon.
- Prif drefnydd Cynhadledd Ryngwladol Ieithyddol Corpus (CL2019), cynhadledd 5 diwrnod sy'n arwain y byd ar gyfer academyddion sy'n gweithio yn y ddisgyblaeth hon (2018-2019).
- Aelod o Goleg Adolygu Cymheiriaid Canolfannau Hyfforddiant Doethurol ESRC (2016+)
- Cymrawd Gwadd er Anrhydedd yn y Ganolfan Ymchwil mewn Ieithyddiaeth Gymhwysol (CRAL), Prifysgol Nottingham (Mai-Gorffennaf 2018, yn ystod Gwyliau Ymchwil)
- Ymchwilydd Gwadd yn Adran Iaith Saesneg ac Ieithyddiaeth Gymhwysol, Prifysgol Abertawe (Ebrill–Gorffennaf 2018, yn ystod Absenoldeb Ymchwil)
- Ysgrifennydd Cyffredinol BAAL, Cymdeithas Ieithyddiaeth Gymhwysol Prydain (2013 - 2018); Ysgrifennydd Cyfarfodydd BAAL (2010-2013); Swyddog Datblygu a Chyswllt Ôl-raddedig ar gyfer BAAL (2007-2009).
- Cyd-drefnydd cynadleddau IVACS (Astudiaethau Corpws Rhyng-Amrywiol a Chymhwysol) 2006 ac IVACS 2014.
- Golygydd (gyda'r Athro Svenja Adolphs) o Lawlyfr Routledge Language a'r Dyniaethau Digidol [dan gontract].
- Golygydd Adolygiadau ar gyfer y Yearbook of Corpus Linguistics and Pragmatics, 2012-2015 (Springer Verlag).
- Aelod o fwrdd golygyddol y cyfnodolyn Discourse, Context and Media
- Adolygydd ar gyfer International Journal of Corpus Linguistics (IJCL), Journal of Pragmatics, Cyd-destun a Discourse, Corpora Journal a gwobr llyfr flynyddol BAAL.
- Aelod o bwyllgor y rhaglen: Gweithdy Prosesu Data Mawr ac Iaith Naturiol a gynhaliwyd yn IEEE Big Data, Rhagfyr 2016.
- Aelod o bwyllgor y rhaglen: 9fed cynhadledd Ryngwladol Corpus Ieithyddiaeth, Gorffennaf 2017, Prifysgol Birmingham; Heriau wrth reoli Cydgyfarfod ar y cyd Prosesu Data Mawr + Data Mawr ac Iaith Naturiol , Gorffennaf 2017, Prifysgol Birmingham.
- Aelod o'r Bwrdd Golygyddol Ymgynghorol ar gyfer y Journal of Corpus Linguistics and Pragmatics (Springer Verlag).
- Aelod o'r bwrdd ymgynghorol dros Iaith, Testunau a Chymdeithas (LTS) – cyfnodolyn a gynhyrchwyd ym Mhrifysgol Nottingham.
- Aelod o'r Bwrdd Cynghori ar CLiC – offeryn corpws ar gyfer dadansoddi testunau llenyddol, dan arweiniad yr Athro Mahlberg, Prifysgol Birmingham (a ariennir gan yr AHRC).
Meysydd goruchwyliaeth
- Ieithyddiaeth Corpus
- Corpus pragmatics
- Defnydd iaith mewn cyd-destun
- Cyfathrebu di-eiriau
- Dadansoddiad disgwrs
- Rhyngweithio digidol ('E-iaith')
Goruchwyliaeth gyfredol
Jen Jordan-Grote
Myfyriwr ymchwil
Debbie Cabral Lima
Myfyriwr ymchwil
Yipei Kou
Myfyriwr ymchwil
Charlie Brookes
Myfyriwr ymchwil
Prosiectau'r gorffennol
Wrth ddinidoddi i'r myfyrwyr a restrir uchod, goruchwyliais hefyd yr RAs sy'n ymwneud â gwaith ar brosiectau CorCenCC, IVO a FreeTxt a chyd-oruchwylio'r myfyrwyr PhD canlynol i'w cwblhau (ar 50%, oni nodir yn wahanol):
- Shanru Yang (30:70 gyda Steve Walsh, Prifysgol Newcastle)
- Rezan Alharbi (gyda Mei Lin, Prifysgol Newcastle)
- Vigneshwaran Muralidaran (gydag Irena Spasic, COMSC)
- David Griffin (gyda Christopher Heffer, ENCAP)
- Emily Powell (gyda Christopher Heffer, ENCAP)
- Kate Barber (gyda Amanda Potts, ENCAP)
Themâu ymchwil
- Ieithyddiaeth gymhwysol ac ieithyddiaeth addysgol
- dadansoddiad disgwrs mutlimodal
- Disgwrs a phragmatig
- Corpus ieithyddiaeth