DISAMBIGUATING SYMBOLIC EXPRESSIONS IN INFORMAL DOCUMENTS

Müller D, Kaliszyk C (2021)


Publication Type: Conference contribution

Publication year: 2021

Publisher: International Conference on Learning Representations, ICLR

Conference Proceedings Title: ICLR 2021 - 9th International Conference on Learning Representations

Event location: Virtual, Online

Abstract

We propose the task of disambiguating symbolic expressions in informal STEM documents in the form of LATEX files - that is, determining their precise semantics and abstract syntax tree - as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid LATEX before overfitting. Consequently, we describe a methodology using a transformer language model pre-trained on sources obtained from arxiv.org, which yields promising results despite the small size of the dataset. We evaluate our model using a plurality of dedicated techniques, taking the syntax and semantics of symbolic expressions into account.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Müller, D., & Kaliszyk, C. (2021). DISAMBIGUATING SYMBOLIC EXPRESSIONS IN INFORMAL DOCUMENTS. In ICLR 2021 - 9th International Conference on Learning Representations. Virtual, Online: International Conference on Learning Representations, ICLR.

MLA:

Müller, Dennis, and Cezary Kaliszyk. "DISAMBIGUATING SYMBOLIC EXPRESSIONS IN INFORMAL DOCUMENTS." Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual, Online International Conference on Learning Representations, ICLR, 2021.

BibTeX: Download