Jan Botha

Research page



(see also Google Scholar page)

Cross-Lingual Morphological Tagging for Low-Resource Languages
J. Buys and J.A. Botha
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016.

Compositional Morphology for Word Representations and Language Modelling
J.A. Botha and P. Blunsom
In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 2014.
*Award for best application paper* | slides | talk | poster

Adaptor Grammars for Learning Non-Concatenative Morphology
J.A. Botha and P. Blunsom
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Seattle, Washington, USA, 2013.
slides | code

Bayesian Language Modelling of German Compounds
J.A. Botha, C. Dyer, P. Blunsom
In Proceedings of the International Conference on Computational Linguistics (COLING), Mumbai, India, 2012.

Hierarchical Bayesian Language Modelling for the Linguistically Informed
J.A. Botha
In Proceedings of the EACL 2012 Student Research Workshop, Avignon, France, 2012.
*Award for best student paper*


Probabilistic Modelling of Morphologically Rich Languages
DPhil Dissertation, University of Oxford, 2014. (citation info)
Phil Blunsom, Stephen Pulman
Examiners: Nando de Freitas, Sharon Goldwater

ICML-2014 data

To facilitate comparisons and development of future models, I have released the preprocessed version of the smaller monolingual dataset used for language modelling experiments in my ICML-2014 paper. The paper contains the aggregated perplexity numbers only; more detailled numbers can be found on page 74 of my dissertation.