Professor Jose Camacho Collados
- Available for postgraduate supervision
Teams and roles for Jose Camacho Collados
Overview
I am a Professor at the School of Computer Science and Informatics of Cardiff University. I am also currently a UKRI Future Leaders Fellow. Previously I was a Google Doctoral Fellow in the area of Natural Language Processing and completed his PhD at Sapienza University of Rome.
My main research interest is Natural Language Processing (NLP), where I have worked in different areas such as semantics, multilinguality and social media.
Please check my personal website for more details.
Publication
2025
- Edwards, A. and Camacho-Collados, J. 2025. Language models for text classification: is in-context learning enough?. Presented at: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Torino, Italy 20 - 25 May, 2024. Published in: Calzolari, N. et al., Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italy: ELRA and ICCL. , pp.10058–10072.
- Edwards, A. et al. 2025. Large language model–supported identification of intellectual disabilities in clinical free-text summaries: mixed methods study. JMIR AI 4 e72256. (10.2196/72256)
- Grijalba, J. O. et al., 2025. Overview of PRESTA at IberLEF 2025: question answering over tabular data In Spanish. Procesamiento del Lenguaje Natural 75 , pp.475-486. (10.26342/2025-75-35)
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2025. RelBERT: Embedding relations with language models. Artificial Intelligence 347 104359. (10.1016/j.artint.2025.104359)
2024
- Antypas, D. et al. 2024. Words as trigger points in social media discussions. [Online].arXiv. (10.48550/arXiv.2405.10213)Available at: https://doi.org/10.48550/arXiv.2405.10213.
- Antypas, D. , Preece, A. and Camacho Collados, J. 2024. A multi-faceted NLP analysis of misinformation spreaders in Twitter. Presented at: 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Bangkok, Thailand 15 August 2024. Published in: De Clercq, O. et al., Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. Association for Computational Linguistics. , pp.71-83.
- Antypas, D. et al. 2024. Sensitive content classification in social media: A holistic resource and evaluation. [Online].arXiv. (10.48550/arXiv.2411.19832)Available at: https://doi.org/10.48550/arXiv.2411.19832.
- Grijalba, J. O. et al., 2024. Towards quality benchmarking in question answering over tabular data in Spanish. Procesamiento del Lenguaje Natural 73 , pp.283-296. (10.26342/2024-73-21)
- Koch, E. et al., 2024. How real-world data can facilitate the development of precision medicine treatment in psychiatry. Biological Psychiatry 96 (7), pp.543-551. (10.1016/j.biopsych.2024.01.001)
- Lee, N. et al., 2024. Exploring cross-cultural differences in English hate speech annotations: From dataset construction to analysis. Presented at: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics Mexico City, Mexico 16-21 June 2024. Published in: Duh, K. , Gomez, H. and Bethard, S. eds. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). , pp.4205-4224. (10.18653/v1/2024.naacl-long.236)
- Myung, J. et al., 2024. BLEND: A benchmark for LLMs on everyday knowledge in diverse cultures and languages. Presented at: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks Vancouver, BC, Canada 9-15 December 2024. Published in: Globerson, A. et al., NeurIPS Proceedings: Advances in Neural Information Processing Systems. Vol. 37.Curran Associates, Inc.. , pp.78104-78146.
- Owen, D. et al. 2024. AI for analyzing mental health disorders among social media users: Quarter-century narrative review of progress and challenges. Journal of Medical Internet Research 26 e59225. (10.2196/59225)
- Perez Almendros, C. and Camacho Collados, J. 2024. Do Large Language Models understand mansplaining? Well, actually.... Presented at: LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation Torino, Itay 20-25 May 2024. Published in: Calzolari, N. et al., Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL. , pp.5235-5246.
- Rodríguez-Barroso, N. et al., 2024. Federated learning for exploiting annotators? Disagreements in natural language processing. Transactions of the Association for Computational Linguistics 12 , pp.630–648. (10.1162/tacl_a_00664)
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2024. A RelEntLess benchmark for modelling graded relations between named entities. Presented at: The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL) St Julian's, Malta 17-22 March 2024. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. Vol. 1.Association for Computational Linguistics. , pp.2473-2486.
2023
- Antypas, D. , Preece, A. and Camacho Collados, J. 2023. Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication. Online Social Media and Networks 33 100242. (10.1016/j.osnem.2023.100242)
- Antypas, D. et al., 2023. SuperTweetEval: A challenging, unified and heterogeneous benchmark for social media NLP research. Presented at: The 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6 - 10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics. , pp.12590-12697. (10.18653/v1/2023.findings-emnlp.838)
- Boisson, J. , Espinosa-Anke, L. and Camacho Collados, J. 2023. Construction artifacts in metaphor identification datasets. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6-10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.6581–6590. (10.18653/v1/2023.emnlp-main.406)
- Doval, Y. et al., 2023. Meemi: a simple method for post-processing and integrating cross-lingual word embeddings. Natural Language Engineering 29 (3), pp.746-768. (10.1017/S1351324921000280)
- Owen, D. et al. 2023. Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation. JMIR AI 2 e41205. (10.2196/41205)
- Raganato, A. et al., 2023. SemEval-2023 Task 1: Visual word sense disambiguation. Presented at: 17th International Workshop on Semantic Evaluation (SemEval-2023) Toronto, ON, Canada 13-14 July 2023. Published in: Bouamar, H. , Pino, J. and Bali, K. eds. Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). Association for Computational Linguistics. , pp.2227–2234. (10.18653/v1/2023.semeval-1.308)
- Zhou, Y. , Camacho Collados, J. and Bollegala, D. 2023. A predictive factor analysis of social biases and task-performance in pretrained masked language models. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6-10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.11082–11100. (10.18653/v1/2023.emnlp-main.683)
2022
- Edwards, A. et al. 2022. Guiding generative language models for data augmentation in few-shot text classification. Presented at: DaSH 2022 Abu Dhabi, United Arab Emirates (Hybrid) 08 December 2022. Published in: Dragut, E. et al., Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). Abu Dhabi, United Arab Emirates: ACL. , pp.51-63.
- Loureiro, D. , Mário Jorge, A. and Camacho-Collados, J. 2022. LMMS reloaded: Transformer-based sense embeddings for disambiguation and beyond. Artificial Intelligence 305 103661. (10.1016/j.artint.2022.103661)
- Ushio, A. , Alva Manchego, F. and Camacho Collados, J. 2022. Generative language models for paragraph-level question generation. Presented at: Conference on Empirical Methods in Natural Language Processing Abu Dhabi, UAE 7-11 December 2022. Published in: Goldberg, Y. , Kozareva, Z. and Zhang, Y. eds. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.670-688. (10.18653/v1/2022.emnlp-main.42)
2021
- Camacho Collados, J. , Liberatore, F. and Ushio, A. 2021. Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction. Presented at: EMNLP 2021 Conference online and at Punta Cana, Dominican Republic 7-11 November 2021. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.8089-8103.
- Ito, T. et al., 2021. Learning company embeddings from annual reports for fine-grained industry characterization. Presented at: FinNLP-2020 Kyoto, Japan 11-13 July 2020. Published in: Chen, C. -. et al., Proceedings of the Second Workshop on Financial Technology and Natural Language Processing. , pp.27-33.
- Li, N. et al., 2021. Modelling general properties of nouns by selectively averaging contextualised embeddings. Presented at: 30th International Joint Conference on Artificial Intelligence (IJCAI 2021) Virtual 21-26 August 2021.
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2021. Distilling relation embeddings from pre-trained language models. Presented at: 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) Punta Cana, Dominican Republic 7-11 November 2021. Published in: Moens, M. et al., Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.9044-9062. (10.18653/v1/2021.emnlp-main.712)
- Ushio, A. et al. 2021. BERT is to NLP what AlexNet is to CV: can pre-trained language models identify analogies?. Presented at: 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) Bangkok, Thailand 1-6 August 2021.
2020
- Bouraoui, Z. et al., 2020. Modelling semantic categories using conceptual neighborhood. Presented at: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) New York, NY, USA 7-12 February 2020. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34(05).PKP Publishing Services. , pp.7448-7455. (10.1609/aaai.v34i05.6241)
- Bouraoui, Z. , Camacho Collados, J. and Schockaert, S. 2020. Inducing relational knowledge from BERT. Presented at: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) New York, NY, USA 7-12 February 2020. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34.Vol. 5. , pp.7456-7463. (10.1609/aaai.v34i05.6242)
- Camacho Collados, J. et al. 2020. Learning cross-lingual word embeddings from Twitter via distant supervision. Proceedings of the International AAAI Conference on Web and Social Media 14 (1), pp.72-82.
- Chiang, H. , Camacho-Collados, J. and Pardos, Z. 2020. Understanding the source of semantic regularities in word embeddings. Presented at: SIGNLL Conference Computational Natural Language Learning (CoNLL 2020) Virtual 19-20 November 2020. Proceedings of the 24th Conference on Computational Natural Language Learning. Association for Computational Linguistics. , pp.119-131.
- Edwards, A. et al. 2020. Go simple and pre-train on domain-specific corpora: on the role of training data for text classification. Presented at: 28th International Conference on Computational Linguistics Barcelona, Spain 8-13 December 2020. Published in: Scott, D. , Bel, N. and Zong, C. eds. Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. , pp.5522–5529. (10.18653/v1/2020.coling-main.481)
- Lee, J. H. et al. 2020. Capturing word order in averaging based sentence embeddings. Presented at: European Conference on Artificial Intelligence (ECAI2020) Santiago de Compostela, Spain 29 August - 2 September 2020. Published in: De Giacomo, G. et al., 24th European Conference on Artificial Intelligence. Vol. 325.IOS Press. , pp.2062-2069. (10.3233/FAIA200328)
- Owen, D. , Camacho Collados, J. and Espinosa-Anke, L. 2020. Towards preemptive detection of depression and anxiety in Twitter. Presented at: Social Media Mining for Health Applications Workshop & Shared Task 2020 Barcelona, Spain 8-13 December 2020. Published in: Gonzalez-Hernandez, G. et al., Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task. Association for Computational Linguistics. , pp.82-89.
- Tuxworth, D. et al. 2020. Deriving disinformation insights from geolocalized Twitter callouts. Presented at: Workshop On Deriving Insights From User-Generated Text @KDD2021 14 -18 August 2021.
2019
- Camacho Collados, J. et al. 2019. A latent variable model for learning distributional relation vectors. Presented at: IJCAI-19: International Joint Conference on Artificial Intelligence Macau, China 10-16 August 2019. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track. IJCAI. , pp.4911-4917. (10.24963/ijcai.2019/682)
- Camacho Collados, J. , Espinosa-Anke, L. and Schockaert, S. 2019. Relational word embeddings. Presented at: 57th Annual Meeting of the Association for Computational Linguistics (ACL) Florence, Italy 28 July - 2 August 2019. Published in: Korhonen, A. , Traum, D. and Marquez, L. eds. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. , pp.3286-3296.
- Sinoara, R. et al., 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Systems 163 , pp.955-971. (10.1016/j.knosys.2018.10.026)
2018
- Barbieri, F. and Camacho-Collados, J. 2018. How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter. Presented at: 7th Conference on Lexical and Computational Semantics (*SEM 2018) New Orleans, Louisiana 5-6 June 2018. Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM). Stroudsburg, PA: The Association for Computational Linguistics. , pp.101-106. (10.18653/v1/S18-2011)
- Camacho Collados, J. and Pilehvar, M. T. 2018. From word to sense embeddings: a survey on vector representations of meaning. Journal of Artificial Intelligence Research 63 , pp.743-788. (10.1613/jair.1.11259)
- Quijano-Sánchez, L. et al., 2018. Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police. Knowledge-Based Systems 149 , pp.155-168. (10.1016/j.knosys.2018.03.010)
2017
- Camacho Collados, J. et al. 2017. SemEval-2017 Task 2: Multilingual and cross-lingual semantic word similarity. Presented at: 11th International Workshop on Semantic Evaluations (SemEval-2017) Vancouver, Canada 3rd-4th August 2017. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017). Stroudsburg, PA: The Association for Computational Linguistics. , pp.15-26. (10.18653/v1/S17-2002)
- Delli Bovi, C. et al., 2017. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. Presented at: The 55th Annual Meeting of the Association for Computational Linguistics Vancouver, Canada 30th July - 4th August 2017. Proceedings of the the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: The Association for Computational Linguistics. , pp.594-600. (10.18653/v1/P17-2094)
- Mancini, M. et al., 2017. Embedding words and senses together via joint knowledge-enhanced training. Presented at: 21st Conference on Computational Natural Language Learning (CoNLL 2017) Vancouver, Canada 3rd-4th August 2017. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Stroudsburg, PA: The Association for Computational Linguistics. , pp.100-111. (10.18653/v1/K17-1012)
- Pilehvar, M. T. et al., 2017. Towards a seamless integration of word senses into downstream NLP applications. Presented at: The 55th Annual Meeting of the Association for Computational Linguistics Vancouver, Canada 30th July - 4th August 2017. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: The Association for Computational Linguistics. , pp.1857-1869. (10.18653/v1/P17-1170)
2016
- Camacho Collados, J. and Navigli, R. 2016. Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. Presented at: 1st Workshop on Evaluating Vector Space Representations for NLP Berlin 12 August 2016. Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP. Stroudsburg, PA: The Association for Computational Linguistics. , pp.43-50. (10.18653/v1/W16-2508)
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2016. NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240 , pp.36-64. (10.1016/j.artint.2016.07.005)
2015
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. A framework for the construction of monolingual and cross-lingual word similarity datasets. Presented at: ACL-IJCNLP 2015: 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Beijing, China 26-31 July 2015. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics. , pp.1-7.
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. A unified multilingual semantic representation of concepts. Presented at: ACL-IJCNLP 2015 Beijing, China 26-31 July 2015. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). , pp.741-751.
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. NASARI: A novel approach to a semantically-aware representation of items. Presented at: NAACL HLT 2015 Denver, CO 31 May - 5 June. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. , pp.567-577.
2014
- Billami, M. et al., 2014. Annotation sémantique et validation terminologique en texte intégral en SHS. Presented at: TALN 2014 Marseille 1-4 July 2014. Actes de la 21e conférence sur le Traitement Automatique des Langues Naturelles.
- Camacho Collados, J. et al. 2014. Approche statistique pour le filtrage terminologique des occurrences de candidats termes en texte intégral. Presented at: JADT 2014 Paris 3-6 June 2014. Proceedings of the 12th International Conference on the Statistical Analysis of Textual Data.
2013
- Camacho Collados, J. 2013. Splitting complex sentences for natural language processing applications: Building a simplified Spanish corpus. Procedia Social and Behavioral Sciences 95 , pp.464-472. (10.1016/j.sbspro.2013.10.670)
- Camacho Collados, J. 2013. Syntactic simplification for machine translation. BULAG: Bulletin de Linguistique Appliquée et Générale 38
Articles
- Antypas, D. , Preece, A. and Camacho Collados, J. 2023. Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication. Online Social Media and Networks 33 100242. (10.1016/j.osnem.2023.100242)
- Camacho Collados, J. 2013. Splitting complex sentences for natural language processing applications: Building a simplified Spanish corpus. Procedia Social and Behavioral Sciences 95 , pp.464-472. (10.1016/j.sbspro.2013.10.670)
- Camacho Collados, J. 2013. Syntactic simplification for machine translation. BULAG: Bulletin de Linguistique Appliquée et Générale 38
- Camacho Collados, J. et al. 2020. Learning cross-lingual word embeddings from Twitter via distant supervision. Proceedings of the International AAAI Conference on Web and Social Media 14 (1), pp.72-82.
- Camacho Collados, J. and Pilehvar, M. T. 2018. From word to sense embeddings: a survey on vector representations of meaning. Journal of Artificial Intelligence Research 63 , pp.743-788. (10.1613/jair.1.11259)
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2016. NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240 , pp.36-64. (10.1016/j.artint.2016.07.005)
- Doval, Y. et al., 2023. Meemi: a simple method for post-processing and integrating cross-lingual word embeddings. Natural Language Engineering 29 (3), pp.746-768. (10.1017/S1351324921000280)
- Edwards, A. et al. 2025. Large language model–supported identification of intellectual disabilities in clinical free-text summaries: mixed methods study. JMIR AI 4 e72256. (10.2196/72256)
- Grijalba, J. O. et al., 2024. Towards quality benchmarking in question answering over tabular data in Spanish. Procesamiento del Lenguaje Natural 73 , pp.283-296. (10.26342/2024-73-21)
- Grijalba, J. O. et al., 2025. Overview of PRESTA at IberLEF 2025: question answering over tabular data In Spanish. Procesamiento del Lenguaje Natural 75 , pp.475-486. (10.26342/2025-75-35)
- Koch, E. et al., 2024. How real-world data can facilitate the development of precision medicine treatment in psychiatry. Biological Psychiatry 96 (7), pp.543-551. (10.1016/j.biopsych.2024.01.001)
- Loureiro, D. , Mário Jorge, A. and Camacho-Collados, J. 2022. LMMS reloaded: Transformer-based sense embeddings for disambiguation and beyond. Artificial Intelligence 305 103661. (10.1016/j.artint.2022.103661)
- Owen, D. et al. 2023. Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation. JMIR AI 2 e41205. (10.2196/41205)
- Owen, D. et al. 2024. AI for analyzing mental health disorders among social media users: Quarter-century narrative review of progress and challenges. Journal of Medical Internet Research 26 e59225. (10.2196/59225)
- Quijano-Sánchez, L. et al., 2018. Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police. Knowledge-Based Systems 149 , pp.155-168. (10.1016/j.knosys.2018.03.010)
- Rodríguez-Barroso, N. et al., 2024. Federated learning for exploiting annotators? Disagreements in natural language processing. Transactions of the Association for Computational Linguistics 12 , pp.630–648. (10.1162/tacl_a_00664)
- Sinoara, R. et al., 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Systems 163 , pp.955-971. (10.1016/j.knosys.2018.10.026)
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2025. RelBERT: Embedding relations with language models. Artificial Intelligence 347 104359. (10.1016/j.artint.2025.104359)
Conferences
- Antypas, D. , Preece, A. and Camacho Collados, J. 2024. A multi-faceted NLP analysis of misinformation spreaders in Twitter. Presented at: 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Bangkok, Thailand 15 August 2024. Published in: De Clercq, O. et al., Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. Association for Computational Linguistics. , pp.71-83.
- Antypas, D. et al., 2023. SuperTweetEval: A challenging, unified and heterogeneous benchmark for social media NLP research. Presented at: The 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6 - 10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics. , pp.12590-12697. (10.18653/v1/2023.findings-emnlp.838)
- Barbieri, F. and Camacho-Collados, J. 2018. How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter. Presented at: 7th Conference on Lexical and Computational Semantics (*SEM 2018) New Orleans, Louisiana 5-6 June 2018. Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM). Stroudsburg, PA: The Association for Computational Linguistics. , pp.101-106. (10.18653/v1/S18-2011)
- Billami, M. et al., 2014. Annotation sémantique et validation terminologique en texte intégral en SHS. Presented at: TALN 2014 Marseille 1-4 July 2014. Actes de la 21e conférence sur le Traitement Automatique des Langues Naturelles.
- Boisson, J. , Espinosa-Anke, L. and Camacho Collados, J. 2023. Construction artifacts in metaphor identification datasets. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6-10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.6581–6590. (10.18653/v1/2023.emnlp-main.406)
- Bouraoui, Z. et al., 2020. Modelling semantic categories using conceptual neighborhood. Presented at: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) New York, NY, USA 7-12 February 2020. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34(05).PKP Publishing Services. , pp.7448-7455. (10.1609/aaai.v34i05.6241)
- Bouraoui, Z. , Camacho Collados, J. and Schockaert, S. 2020. Inducing relational knowledge from BERT. Presented at: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) New York, NY, USA 7-12 February 2020. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34.Vol. 5. , pp.7456-7463. (10.1609/aaai.v34i05.6242)
- Camacho Collados, J. et al. 2014. Approche statistique pour le filtrage terminologique des occurrences de candidats termes en texte intégral. Presented at: JADT 2014 Paris 3-6 June 2014. Proceedings of the 12th International Conference on the Statistical Analysis of Textual Data.
- Camacho Collados, J. et al. 2019. A latent variable model for learning distributional relation vectors. Presented at: IJCAI-19: International Joint Conference on Artificial Intelligence Macau, China 10-16 August 2019. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track. IJCAI. , pp.4911-4917. (10.24963/ijcai.2019/682)
- Camacho Collados, J. , Espinosa-Anke, L. and Schockaert, S. 2019. Relational word embeddings. Presented at: 57th Annual Meeting of the Association for Computational Linguistics (ACL) Florence, Italy 28 July - 2 August 2019. Published in: Korhonen, A. , Traum, D. and Marquez, L. eds. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. , pp.3286-3296.
- Camacho Collados, J. , Liberatore, F. and Ushio, A. 2021. Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction. Presented at: EMNLP 2021 Conference online and at Punta Cana, Dominican Republic 7-11 November 2021. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.8089-8103.
- Camacho Collados, J. and Navigli, R. 2016. Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. Presented at: 1st Workshop on Evaluating Vector Space Representations for NLP Berlin 12 August 2016. Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP. Stroudsburg, PA: The Association for Computational Linguistics. , pp.43-50. (10.18653/v1/W16-2508)
- Camacho Collados, J. et al. 2017. SemEval-2017 Task 2: Multilingual and cross-lingual semantic word similarity. Presented at: 11th International Workshop on Semantic Evaluations (SemEval-2017) Vancouver, Canada 3rd-4th August 2017. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017). Stroudsburg, PA: The Association for Computational Linguistics. , pp.15-26. (10.18653/v1/S17-2002)
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. A framework for the construction of monolingual and cross-lingual word similarity datasets. Presented at: ACL-IJCNLP 2015: 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Beijing, China 26-31 July 2015. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics. , pp.1-7.
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. A unified multilingual semantic representation of concepts. Presented at: ACL-IJCNLP 2015 Beijing, China 26-31 July 2015. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). , pp.741-751.
- Camacho Collados, J. , Pilehvar, M. T. and Navigli, R. 2015. NASARI: A novel approach to a semantically-aware representation of items. Presented at: NAACL HLT 2015 Denver, CO 31 May - 5 June. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. , pp.567-577.
- Chiang, H. , Camacho-Collados, J. and Pardos, Z. 2020. Understanding the source of semantic regularities in word embeddings. Presented at: SIGNLL Conference Computational Natural Language Learning (CoNLL 2020) Virtual 19-20 November 2020. Proceedings of the 24th Conference on Computational Natural Language Learning. Association for Computational Linguistics. , pp.119-131.
- Delli Bovi, C. et al., 2017. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. Presented at: The 55th Annual Meeting of the Association for Computational Linguistics Vancouver, Canada 30th July - 4th August 2017. Proceedings of the the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: The Association for Computational Linguistics. , pp.594-600. (10.18653/v1/P17-2094)
- Edwards, A. and Camacho-Collados, J. 2025. Language models for text classification: is in-context learning enough?. Presented at: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Torino, Italy 20 - 25 May, 2024. Published in: Calzolari, N. et al., Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italy: ELRA and ICCL. , pp.10058–10072.
- Edwards, A. et al. 2020. Go simple and pre-train on domain-specific corpora: on the role of training data for text classification. Presented at: 28th International Conference on Computational Linguistics Barcelona, Spain 8-13 December 2020. Published in: Scott, D. , Bel, N. and Zong, C. eds. Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. , pp.5522–5529. (10.18653/v1/2020.coling-main.481)
- Edwards, A. et al. 2022. Guiding generative language models for data augmentation in few-shot text classification. Presented at: DaSH 2022 Abu Dhabi, United Arab Emirates (Hybrid) 08 December 2022. Published in: Dragut, E. et al., Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). Abu Dhabi, United Arab Emirates: ACL. , pp.51-63.
- Ito, T. et al., 2021. Learning company embeddings from annual reports for fine-grained industry characterization. Presented at: FinNLP-2020 Kyoto, Japan 11-13 July 2020. Published in: Chen, C. -. et al., Proceedings of the Second Workshop on Financial Technology and Natural Language Processing. , pp.27-33.
- Lee, J. H. et al. 2020. Capturing word order in averaging based sentence embeddings. Presented at: European Conference on Artificial Intelligence (ECAI2020) Santiago de Compostela, Spain 29 August - 2 September 2020. Published in: De Giacomo, G. et al., 24th European Conference on Artificial Intelligence. Vol. 325.IOS Press. , pp.2062-2069. (10.3233/FAIA200328)
- Lee, N. et al., 2024. Exploring cross-cultural differences in English hate speech annotations: From dataset construction to analysis. Presented at: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics Mexico City, Mexico 16-21 June 2024. Published in: Duh, K. , Gomez, H. and Bethard, S. eds. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). , pp.4205-4224. (10.18653/v1/2024.naacl-long.236)
- Li, N. et al., 2021. Modelling general properties of nouns by selectively averaging contextualised embeddings. Presented at: 30th International Joint Conference on Artificial Intelligence (IJCAI 2021) Virtual 21-26 August 2021.
- Mancini, M. et al., 2017. Embedding words and senses together via joint knowledge-enhanced training. Presented at: 21st Conference on Computational Natural Language Learning (CoNLL 2017) Vancouver, Canada 3rd-4th August 2017. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Stroudsburg, PA: The Association for Computational Linguistics. , pp.100-111. (10.18653/v1/K17-1012)
- Myung, J. et al., 2024. BLEND: A benchmark for LLMs on everyday knowledge in diverse cultures and languages. Presented at: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks Vancouver, BC, Canada 9-15 December 2024. Published in: Globerson, A. et al., NeurIPS Proceedings: Advances in Neural Information Processing Systems. Vol. 37.Curran Associates, Inc.. , pp.78104-78146.
- Owen, D. , Camacho Collados, J. and Espinosa-Anke, L. 2020. Towards preemptive detection of depression and anxiety in Twitter. Presented at: Social Media Mining for Health Applications Workshop & Shared Task 2020 Barcelona, Spain 8-13 December 2020. Published in: Gonzalez-Hernandez, G. et al., Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task. Association for Computational Linguistics. , pp.82-89.
- Perez Almendros, C. and Camacho Collados, J. 2024. Do Large Language Models understand mansplaining? Well, actually.... Presented at: LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation Torino, Itay 20-25 May 2024. Published in: Calzolari, N. et al., Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL. , pp.5235-5246.
- Pilehvar, M. T. et al., 2017. Towards a seamless integration of word senses into downstream NLP applications. Presented at: The 55th Annual Meeting of the Association for Computational Linguistics Vancouver, Canada 30th July - 4th August 2017. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: The Association for Computational Linguistics. , pp.1857-1869. (10.18653/v1/P17-1170)
- Raganato, A. et al., 2023. SemEval-2023 Task 1: Visual word sense disambiguation. Presented at: 17th International Workshop on Semantic Evaluation (SemEval-2023) Toronto, ON, Canada 13-14 July 2023. Published in: Bouamar, H. , Pino, J. and Bali, K. eds. Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). Association for Computational Linguistics. , pp.2227–2234. (10.18653/v1/2023.semeval-1.308)
- Tuxworth, D. et al. 2020. Deriving disinformation insights from geolocalized Twitter callouts. Presented at: Workshop On Deriving Insights From User-Generated Text @KDD2021 14 -18 August 2021.
- Ushio, A. , Alva Manchego, F. and Camacho Collados, J. 2022. Generative language models for paragraph-level question generation. Presented at: Conference on Empirical Methods in Natural Language Processing Abu Dhabi, UAE 7-11 December 2022. Published in: Goldberg, Y. , Kozareva, Z. and Zhang, Y. eds. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.670-688. (10.18653/v1/2022.emnlp-main.42)
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2024. A RelEntLess benchmark for modelling graded relations between named entities. Presented at: The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL) St Julian's, Malta 17-22 March 2024. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. Vol. 1.Association for Computational Linguistics. , pp.2473-2486.
- Ushio, A. , Camacho Collados, J. and Schockaert, S. 2021. Distilling relation embeddings from pre-trained language models. Presented at: 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) Punta Cana, Dominican Republic 7-11 November 2021. Published in: Moens, M. et al., Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.9044-9062. (10.18653/v1/2021.emnlp-main.712)
- Ushio, A. et al. 2021. BERT is to NLP what AlexNet is to CV: can pre-trained language models identify analogies?. Presented at: 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) Bangkok, Thailand 1-6 August 2021.
- Zhou, Y. , Camacho Collados, J. and Bollegala, D. 2023. A predictive factor analysis of social biases and task-performance in pretrained masked language models. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing Singapore 6-10 December 2023. Published in: Bouamor, H. , Pino, J. and Bali, K. eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. , pp.11082–11100. (10.18653/v1/2023.emnlp-main.683)
Websites
- Antypas, D. et al. 2024. Words as trigger points in social media discussions. [Online].arXiv. (10.48550/arXiv.2405.10213)Available at: https://doi.org/10.48550/arXiv.2405.10213.
- Antypas, D. et al. 2024. Sensitive content classification in social media: A holistic resource and evaluation. [Online].arXiv. (10.48550/arXiv.2411.19832)Available at: https://doi.org/10.48550/arXiv.2411.19832.
Teaching
Currently, I am teaching on the following module:
Previously, I was a module leader of the following master modules:
- CMT307: Applied Machine Learning, MSc Data Science and Analytics.
- CMT316: Applications of Machine Learning: Natural Language Processing and Computer Vision, MSc Artificial Intelligence.
Supervisions
I'm currently actively supervising the following PhD students:
- David Owen, with Luis Espinosa-Anke.
- Joanne Boisson, with Luis Espinosa-Anke.
- Dimosthenis Antypas, with Alun Preece.
- Kiamehr Rezaee, with Taher Pilehvar
- Yuefeng Shi, with Nedjma Djouhra Ousidhoum
- Jingxuan Chen, with Taher Pilehvar
- Asahi Ushio, with Steven Schockaert (graduated in 2023)
I'm also co-supervising the following PhD students:
- David Tuxworth, with Alun Preece, Luis Espinosa-Anke and Martin Innes.
- Tristan Naidoo (Imperial College London), with Neil Ferguson and Xingyi Song.
As part of my UKRI Future Leaders Fellowship, the following postdocs have worked on my team:
- Mark Anderson
- Daniel Loureiro
- Yi Zhou
- Aleksandra Edwards
Current supervision
Contact Details
Specialisms
- AI
- Natural language processing