FoodVideoQA

Highlights

Cost-Effective: Uses pre-trained vision-language models without requiring fine-tuning, expensive GPUs, or specialized datasets.
Context-Aware Analysis: Detects foods, utensils, and eating actions frame-by-frame for accurate tracking throughout video input.
Domain Adaptable/Scalable: Provides labeled dietary insights applicable to healthcare, childcare, and assisted living environments without additional equipment.

Functionality

VLM-Driven Insights

Extracts nutritional information, ingredients, and utensils from video frames using Vision-Language Models. Groups frames into intervals based on consistent food item presence. The code can be modified to accomodate any of the following HuggingFace VLMs:

liuhaotian/llava-v1.5-7b
llava-hf/llava-1.5-7b-hf
llava-hf/llava-v1.6-mistral-7b-hf
Salesforce/blip2-opt-2.7b

Pose Estimation

Detects eating behavior by checking if the mouth is open and if food is near the mouth using bounding boxes and pose landmarks. We use DWPose to detect mouth landmarks, and GroundingDINO to localize food items.

Person Eating	Person Not Eating

Example face plot using DWPose:

Hyperparameters

Hyperparameter	Symbol	Value
Frame Step Size	$\tau$	20 frames
Frame Tolerance Threshold	$\epsilon$	15 frames
Lip Separation Threshold	$\beta$	8.0
IoU Threshold	$\delta$	0.15

View and modify hyperparameters here.

🙏 Acknowledgements

This work was supported by the National Research Council Canada (NRC) through the Aging in Place (AiP) Challenge Program. Project number AiP-006.
The authors thank the Vision and Image Processing Lab (VIP Lab) at the University of Waterloo for facilitating this project.

Name		Name	Last commit message	Last commit date
Latest commit History 678 Commits
assets		assets
pose		pose
vlm		vlm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hyperparameters.py		hyperparameters.py
inference.py		inference.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
structure.md		structure.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoodVideoQA

Highlights

Functionality

VLM-Driven Insights

Pose Estimation

Example face plot using DWPose:

Hyperparameters

🙏 Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

isobarbaric/FoodVideoQA

Folders and files

Latest commit

History

Repository files navigation

FoodVideoQA

Highlights

Functionality

VLM-Driven Insights

Pose Estimation

Example face plot using DWPose:

Hyperparameters

🙏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages