foundation_models

This repository is a course project on Foundation Models. Our domain is scientific question answering, based on Science QA dataset.

📄 Data 📄

The dataset contains middle to high school level tasks in natural sciences and social sciences. Task input is always a short question and 2-5 answer choices, optionally with lecture material, hint, and a relevant image (the image can be part of the question or an additional illustration). The output should be the correct answer + a plausible explanation of the answer choice.

We work with four experimental settings for input:

Question - Task - Choices - Hint (QTCH)
Question - Task - Choices - Hint - Lecture (QTCHL)
Question - Task - Choices - Hint - Lecture - Solution (QTCHLS)
Question - Task - Choices - Hint - Solution (QTCHS)

Image is always attached to the input if it is available; otherwise a dummy placeholder blank image is created.

🚀 Approach 🚀

We benchmark several large multimodal LLMs to assess:

their capabilities of answering scientific questions from existing knowledge and common sense;
their ability to retrieve information from context (i.e. choose the correct answer if the solution is given);
their reasoning abilities (i.e. coming to a correct conclusion provided lecture material).

Our metrics are: accuracy for answer choice; average of several normalised textual similarity metrics: BLEU-1, BLEU-4, ROUGE-L, METEOR, cosine similarity (BERT embeddings).

We further attempt:

adapter tuning to adapt existing small language models to the dataset, as well as
learning from a teacher model's outputs (knowledge distillation scenario).

💡 Research Questions & Analysis 💡

1. How well do front-tier LLMs perform on Scientific QA & Scientific Reasoning?

They can achieve high accuracy in answers to scientific questions from multimodal inputs (avg. accuracy 80.5125 across 4 experimental settings and 6 models);
Big LLMs can benefit from more context for better reasoning (settings with more context have consistently better scores both in accuracy and semantic similarity than task-only setting);
Extracting information (from solutions) is easier than producing new ideas (from lectures), as shown by accuracies and textual similarity scores.

2. How would learning an adapter for correct solution generation affect generations of smaller multimodal LLMs for this task?

Learning an adapter affects the GenLM's probability to generate explanations that deviate from golden ones
In an experimental setting which required the model to produce the exact golden explanation from the training dataset, the adapters silenced model's generations, resulting in empty strings or answers with unknown rarely seen tokens.
We suspect this is due to an overly strict error function, which supposedly creates an adapter-brake that discourages the model from making any error

3. How does learning from golden data compare to learning from teacher model's outputs?

Learning from humanly prepared data (original dataset) consistently outperforms learning from the generations of the teacher model on the train partition, as shown by textual similarity metrics of the explanations.
The error function on validation in KD-setting is close to the error function on validation in golden-setting; the error on train in KD-setting is consistently higher then in golden setting.

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
adapter_experiments		adapter_experiments
artifacts		artifacts
benchmarking_test		benchmarking_test
benchmarking_val		benchmarking_val
bwcluster		bwcluster
docs		docs
evaluation_scripts		evaluation_scripts
grouped_csvs		grouped_csvs
.gitignore		.gitignore
README.md		README.md
bring results together.ipynb		bring results together.ipynb
gemini_1_5_flash_output_train.csv		gemini_1_5_flash_output_train.csv
helpers.py		helpers.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

foundation_models

📄 Data 📄

🚀 Approach 🚀

💡 Research Questions & Analysis 💡

About

Releases

Packages

Contributors 4

Languages

katja-kolos/foundation_models

Folders and files

Latest commit

History

Repository files navigation

foundation_models

📄 Data 📄

🚀 Approach 🚀

💡 Research Questions & Analysis 💡

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages