January 22, 2025
16:00 -
18:00
Wednesday 22nd January 2025, 16:00-18:00. Room OMHP C0.23 at Oude Manhuispoort 4-6, Center campus
Mini-Workshop on the theme of Multilingual NLP, on the occasion of Rochelle Choenni's PhD defense
Organised by Computational Linguistics Seminar
Wednesday 22nd January 2025, 16:00-18:00. Room OMHP C0.23 at Oude Manhuispoort 4-6, Center campus (note the unusual location and time!).
  • 16h00-16h50: Arianna Bisazza (University of Groningen): Studying Language Evolution and Acquisition with Modern Neural Networks
  • 17h00-17h50: Goran Glavaš (University of Würzburg): How Many Words is A Picture Really Worth? On Training and Evaluating Multilingual Vision-Language Models

Studying Language Evolution and Acquisition with Modern Neural Networks

Arianna Bisazza (University of Groningen)

Why do human languages look the way they do? And what makes us so good at learning language as we grow up? In this talk, I’ll propose that modern NNs are a valuable tool to simulate and study processes of human language evolution and acquisition, provided they are used in the right way. That means: under controlled setups where training data, model architecture, and learning setup are known and can be changed across experiments. I will then present two lines of research following this approach, namely: (1) simulating processes of language change using small NN-agents that learn to communicate with pre-defined artificial languages, and (2) simulating the acquisition of syntax by training LMs on more realistic input data, such as child-directed language. After presenting some of my work in these directions, I’ll end with a discussion of the value of interdisciplinarity and the importance of experimenting in small controlled setups, rather than focusing all our efforts on the evaluation of Large Pre-trained Language Models.

How Many Words is A Picture Really Worth? On Training and Evaluating Multilingual Vision-Language Models

Goran Glavaš (University of Würzburg)

Large Vision-Language Models (LVLMs), commonly obtained by aligning a pretrained visual encoder (e.g., a Vision Transformer, ViT) to a pre-trained large language model (LLM), have recently led to impressive results not only in image captioning, but also on a wide range of visual understanding and reasoning tasks (e.g., visual question answering). Nonetheless, there are a number of factors involved, ranging from the architecture of the alignment module to the exact “training mix” (i.e., training tasks and data) that strongly determine the effectiveness of the resulting LVLM. Moreover, LVLMs (much like their text-only counterparts), are not inherently multilingual and suffer from hallucination. In this talk, I’ll explore training and evaluation protocols for LVLMs, focusing in particular on (i) efficiently training  competitive massively multilingual LVLMs, (ii) training with grounding objectives, reported to reduce hallucinative tendencies of LVLMs, and (iii) pitfalls of existing LVLM evaluation and possible remedies.