ELLIS AMS team showcasing results in the BIG-bench challenge

We are very glad to announce the first ELLIS AMS team is going to showcase their results of the Beyond the Imitation Game Benchmark (BIG-bench) collaborative task.  The talk will happen on Tuesday 5th October, 16:00. The seminar will be held online, the zoom meeting link will be sent out later.

Title: 

The Beyond the Imitation Game Benchmark (BIG-bench) challenge

Abstract:

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and provide concrete evidence of their capabilities and limitations. In the community building spirit of ELLIS-Amsterdam, we have formed three teams mixing Bachelor’s, Master’s, and PhD students and have contributed three tasks to the benchmark. In the seminar, we will briefly introduce the BIG-bench challenge and then the three teams will present their benchmarking tasks.

The Metaphor Understanding task tests the capability of language models to understand English metaphors. It consists of two subtasks: in the first one, a language model is asked to correctly map a metaphorical expression to its correct literal paraphrases; in the second one, the model needs to map a literal paraphrase to the corresponding metaphorical expression. The two subtasks form a new dataset that takes into account the lessons learned from existing models and benchmarks.

The Implicit Relations task evaluates a model’s ability to infer relations between characters from short passages of English narratives, where the relations are left implicit. In each example, a passage and a question of the form “What is X to Y?” is presented, and the model must select the correct relation. Our new dataset makes use of 25 labels ranging from familial relations to professional relations.

Finally, the Fantasy Reasoning task assesses a language model’s ability to reason within situations that go against common sense or in some way violate the rules of the real world; humans do this easily, e.g., when reading a science fiction book. We collect a corpus of contexts that language models are extremely unlikely to be familiar with, paired with yes-no questions.

References:

Metaphor Understanding: https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/metaphor_understanding

Implicit Relations: https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/implicit_relations

Fantasy Reasoning: https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/fantasy_reasoning