Skip to main content
Thomas Connor

Professor Thomas Connor


School of Biosciences

+44 29208 74147
Sir Martin Evans Building, Room Cardiff School of Biosciences, Sir Martin Evans Building, Museum Avenue, Cardiff CF10 3AX, Museum Avenue, Cardiff, CF10 3AX
Media commentator
Available for postgraduate supervision


The research that is undertaken in my lab seeks to understand pathogen variation in order to answer a range of questions from how pathogens evolve, to how and why they spread in local and global outbreaks. This work has a number of labels, but broadly speaking we use genomic epidemiology, phylogenomics and population genetics approaches to answer our research questions. 

Historically we have examined questions on a range of pathogens, including gastrointestinal pathogens (E. coli, Salmonella, Shigella, C. difficile) which collectively account for over a billion cases of disease around the world every year, pathogens which are particularly associated with antimicrobial resistance and viruses including HIV and Influenza. 

My research is underpinned by whole genome sequencing, using the data from organisms genomes to work out how they are related to, and different from, other organisms of interest. This work is heavily computational, and we use and develop mathematical and computational approaches to analyse and interpret the "Big Biological Data" that we and our collaborators generate.

This work is also readily translatable, and I have spent the past few years working closely with colleagues in the NHS to translate the approaches we develop into diagnostic and surveillance tools that can be used at a local and national level. In a practical sense this has seen us take approaches we use in our resarch, and build clinical diagnostic and surveillance services in the NHS in Wales for HIV, TB and other Mycobacteria, C. difficile and Influenza. Our Pathogen Genomics work is undertaken with the Public Health Wales Pathogen Genomics Unit and Genomics Partnership Wales. In addition to my research team, I also lead the Bioinformatics team within the Pathogen Genomics Unit.

Our HIV work was highlighted in the 2020 Cardiff University Innovation Awards: 

As part of the COVID-19 pandemic, almost all of my time has been spent supporting the pandemic response in Wales. We have taken our expertise in other pathogens to develop the capacity to sequence and analyse COVID-19 from Welsh patients, sequencing more than 7,500 SARS-CoV-2 genomes since the start of the pandemic. We use this data to understand the spread of the pandemic over time, and to support outbreak response and national surveillance efforts. This work is undertaken as part of the COVID-19 Genomics UK Consortium.

In addition to my pathogen research activities, I have an additional focus on the development and design of research infrastructures to support the analysis of the genomic datasets that we produce. This work has included the design and development of national computational resources in the UK, as well as the design of computational resources used in translational settings in the NHS and Industry. The largest of these infrastructures, MRC CLIMB, which I helped design and play a role in leading supports over 1,000 microbiology researchers from across the UK, and has been the key peice of UK infrastructure that has underpinned the COG-UK sequencing effort to date. The CLIMB-COVID system has provided a core infrastructure for the collation and analysis of over 70,000 SARS-CoV-2 genomes sequenced across the UK. 

Microbiomes, Microbes and Informatics

The Connor group is part of the recently formed Microbiomes, Microbes and Informatics (MMI) group (webpage underdevelopment). The MMI group currently comprises the research groups of Thomas Connor, Esh Mahenthiralingam, Julian Marchesi and Andrew Weightman, and has over 25 active research staff and postgraduate students.

The MMI group are highly research active generating over £3.5 million in grant income between 2010 and 2017, and publishing extensively in top journals (cumulative h index > 150, > 400 publications, and > 25,000 citations; source

The four current MMI staff recently moved (June 2017) to a single shared location within a new £1.6 million refurbished area of the Sir Martin Evans Building. This comprises a large class II certified research laboratory, equipment and tissue culture rooms, a group office area and academic offices. The MMI group welcomes approaches by potential fellowship applicants and funded PhD students to host their research and expand our strategic research on Microbiomes, Microbes and Informatics.


Module Leader: BI3252 The ‘omics revolution (Bioinformatics & Functional Genomics)

Biocomputing Research Hub lead

Member of College of Biomedical and Life Sciences Data Strategy Group

Member of Supercomputing Wales Infrastructure Committee

Member of Cardiff Supercomputing Facility Oversight Group

Wales regional lead and technical lead for the Cloud Infrastructure for Microbial Bioinformatics

Bioinformatics Lead for the Public Health Wales Pathogen Genomics Unit.

Interested in joining my lab as a self-funded post-graduate student or a postdoc/fellow?  Please contact me by email.






















Population and Comparative Genomics

Whole genome sequences provide us with the complete blueprint for the organisms that we are investigating. To understand our organisms of interest, we consider how their genomes vary between organisms (comparative genomics) and how they have changed/evolved over time (population genomics).

Unlike eukaryotic organisms, bacteria have highly variable genomes; they can gain and loose genes at a very high frequency, and members of the same named species may have fewer than half of their genes in common. This genomic plasticity is hugely important, as the genes that vary between strains are often the genes that are associated with characteristics of interest – such as virulence or antimicrobial resistance. Using whole genome sequence data we perform comparative genomics to:

  • work out how pathogens are related, in terms of the gene content they share
  • work out how they vary in their gene content
  • work out how their genetic variation relates to differences in their phenotype (basically their behaviour – such as the seriousness of disease that they cause)

We complement comparative genomics with phylogenetics, which enables us to determine the relationships between isolates, and by integrating the results from these in silico analyses with phenotypic data produced from in vitro and in vivo experimental work, we are able to derive a better understanding of how, and why our organisms of interest cause disease.

While the comparative genomics work is focused on examining the similarities and differences between organisms, and how this relates to the phenotype of organisms, we supplement this by performing population genetic analyses to identify structure within the population, and to infer the recent evolutionary history of strains of interest.  This work has been underpinned by a strong, longstanding collaboration with Professor Jukka Corander of the University of Helsinki, with whom I have developed a number of population genetic approaches to analyse bacterial genome-scale datasets (Cheng et al. 2011, Cheng et al. 2013, Marttinen et al. 2012).

I have developed considerable expertise using these approaches and to date I have applied these approaches to datasets including those comprising Vibrio cholerae (Mutreja et al. 2011), Salmonella Typhimurium (Mather et al. 2013, Okoro et al. 2012) and Clostridium difficile (He et al. 2013). In these cases, using a population genetic framework called BEAST, we reconstructed the evolutionary history of these organisms not in evolutionary time, but in human-understandable calendar units – years/days. Using this data I have been able to contribute significantly to answering key questions about how, and when outbreaks have begun, as well as being able to identify key events in the evolution of the pathogens of interest.

Virus research

The same approaches that can be used to investigate and characterise bacteria can also be used to examine viruses. Over the past four years we have worked to develop tools to examine HIV and Influenza, and to implement these in clinical service. Our HIV analyses enable the identification of mutations that relate to drug resistance, with the systems we have built forming the basis of clinical services that are provided to all of the HIV patients in Wales. In the case of Influenza our systems enable the examination of different Influenza cases from across Wales to support surveillance and tracking activities. 

As part of COVID-19 we have utilised many of the genomic epidemiology approaches we use for other pathogens of interest to enable the analysis of SARS-CoV-2 genomes to understand the spread of COVID-19.


Bacteria do not respect borders; and local outbreaks can, and sadly sometimes do, lead to global epidemics. By combining population genomic approaches with excellent metadata, we are able to move beyond simple dated phylogenies towards a greater understanding of how bacteria move in time and space. I have worked extensively in projects that have examined the phylogeography of bacterial pathogens such as Vibrio cholerae, Salmonella Typhimurium and Clostridium difficile, deploying approaches to combine strain metadata and genomic information to derive insight into how and when pathogens of interest have spread around the world.


Cheng L, Connor T R, Aanensen D M, Spratt B G and Corander J (2011) Bayesian semi-supervised classification of bacterial samples using MLST databases. BMC Bioinformatics 12 302.

Cheng L, Connor T R, Siren J, Aanensen D M and Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30 (5) 1224-1228.

Dziva F, Hauser H*, Connor T R*, van Diemen P M, Prescott G, Langridge G C, Eckert S, Chaudhuri R R, Ewers C, Mellata M, Mukhopadhyay S, Curtiss R, 3rd, Dougan G, Wieler L H, Thomson N R, Pickard D J and Stevens M P (2013) Sequencing and functional annotation of avian pathogenic Escherichia coli serogroup O78 strains reveal the evolution of E. coli lineages pathogenic for poultry via distinct mechanisms. Infect Immun 81 (3) 838-849.

Fookes M, Schroeder G N, Langridge G C, Blondel C J, Mammina C, Connor T R, Seth-Smith H, Vernikos G S, Robinson K S, Sanders M, Petty N K, Kingsley R A, Baumler A J, Nuccio S P, Contreras I, Santiviago C A, Maskell D, Barrow P, Humphrey T, Nastasi A, Roberts M, Frankel G, Parkhill J, Dougan G and Thomson N R (2011) Salmonella bongori provides insights into the evolution of the Salmonellae. PLoS Pathog 7 (8) e1002191.

He M, Miyajima F, Roberts P, Ellison L, Pickard D J, Martin M J, Connor T R, Harris S R, Fairley D, Bamford K B, D'Arc S, Brazier J, Brown D, Coia J E, Douce G, Gerding D, Kim H J, Koh T H, Kato H, Senoh M, Louie T, Michell S, Butt E, Peacock S J, Brown N M, Riley T, Songer G, Wilcox M, Pirmohamed M, Kuijper E, Hawkey P, Wren B W, Dougan G, Parkhill J and Lawley T D (2013) Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet 45 (1) 109-113.

Marttinen P, Hanage W P, Croucher N J, Connor T R, Harris S R, Bentley S D and Corander J (2012) Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res 40 (1) e6.

Mather A E, Reid S W, Maskell D J, Parkhill J, Fookes M C, Harris S R, Brown D J, Coia J E, Mulvey M R, Gilmour M W, Petrovska L, de Pinna E, Kuroda M, Akiba M, Izumiya H, Connor T R, Suchard M A, Lemey P, Mellor D J, Haydon D T and Thomson N R (2013) Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts. Science 341 (6153) 1514-1517.

Mutreja A, Kim D W, Thomson N R, Connor T R, Lee J H, Kariuki S, Croucher N J, Choi S Y, Harris S R, Lebens M, Niyogi S K, Kim E J, Ramamurthy T, Chun J, Wood J L, Clemens J D, Czerkinsky C, Nair G B, Holmgren J, Parkhill J and Dougan G (2011) Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477 (7365) 462-465.

Okoro C K, Kingsley R A, Connor T R, Harris S R, Parry C M, Al-Mashhadani M N, Kariuki S, Msefula C L, Gordon M A, de Pinna E, Wain J, Heyderman R S, Obaro S, Alonso P L, Mandomando I, MacLennan C A, Tapia M D, Levine M M, Tennant S M, Parkhill J and Dougan G (2012) Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nat Genet 44 (11) 1215-1221.


At the present time the majority of my time is taken up working with the NHS.

My teaching interests predominantly relate to Bioinformatics, and I tend to teach on modules that have a strong bioinformatics or genomics component. When I am able to host students for projects, they are normally focused on questions that can be answered using computational approaches. 


I grew up in Essex, the eldest of four children in what was a single parent family after my father passed away when I was 8.

I completed my undergraduate degree at the University of Nottingham in Biochemistry and Genetics, and moved back home to complete my Masters and PhD at Imperial College London.

Following this I took up a Posdoctoral fellowship at the Sanger Institute in 2010, before joining Cardiff as a junior lecturer in 2012.

I have been at Cardiff ever since, being awarded a personal chair in the summer of 2020.