Jan A. Botha

Research page

email

Publications

(see also Google Scholar page)

Natural Language Processing with Small Feed-Forward Networks
J.A. Botha, E. Pitler, J. Ma, A. Bakalov,
A. Salcianu, D. Weiss, R. McDonald, S. Petrov
In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 2017.
slides | arXiv with Supplementary Material | bibtex

Cross-Lingual Morphological Tagging for Low-Resource Languages
J. Buys and J.A. Botha
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016.
poster | bibtex

Compositional Morphology for Word Representations and Language Modelling
J.A. Botha and P. Blunsom
In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 2014.
*Award for best application paper* | slides | talk | poster

Adaptor Grammars for Learning Non-Concatenative Morphology
J.A. Botha and P. Blunsom
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Seattle, Washington, USA, 2013.
slides | code | bibtex

Bayesian Language Modelling of German Compounds
J.A. Botha, C. Dyer, P. Blunsom
In Proceedings of the International Conference on Computational Linguistics (COLING), Mumbai, India, 2012.
slides | bibtex

Hierarchical Bayesian Language Modelling for the Linguistically Informed
J.A. Botha
In Proceedings of the EACL 2012 Student Research Workshop, Avignon, France, 2012.
*Award for best student paper* | bibtex

Dissertation

Probabilistic Modelling of Morphologically Rich Languages
DPhil Dissertation, University of Oxford, 2014. (citation info)
Supervisor: Phil Blunsom, Stephen Pulman
Examiners: Nando de Freitas, Sharon Goldwater

ICML-2014 data

To facilitate comparisons and development of future models, I have released the preprocessed version of the smaller monolingual dataset used for language modelling experiments in my ICML-2014 paper. The paper contains the aggregated perplexity numbers only; more detailled numbers can be found on page 74 of my dissertation.

Contact

email