Stars
[IEEE SPL] Official Implementation for Pose-promote: Progressive Visual Perception for Indoor Action Recognition
ZitongYu / PhysMamba
Forked from Chaoqi31/PhysMambaPhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba
[ECCV 2024🔥] The official code for the paper DiffFAS: Face Anti-Spoofing via Generative Diffusion Models.
RizhaoCai / FAS-Aug
Forked from CeceliaSoh/FAS-AugBag of Augmentations for Generalized Face Anti-Spoofing
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba
The source code of "SFDA-rPPG: Source-free Domain Adaptive rPPG Measurement with Spatial-Temporal Consistency"
Offical code repository of ”DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection“
real time face swap and one-click video deepfake with only a single image
Official Implementation for "Cue-N: Cue-Aware Network for Audio-Visual Question Answering"
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
The offical code implementation of paper "Interpretable Multimodal Misinformation Detection with Logic Reasoning", accepted by Finding of ACL 23.
MMPD: Multi-Domain Mobile Video Physiology Dataset(EMBC2023 Oral)
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
[ECCV 2024 Oral🔥] Arc2Face: A Foundation Model for ID-Consistent Human Faces
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
ZitongYu / Bay-CAT
Forked from rikeilong/Bay-CATOfficial Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
[ECCV 2024🔥] The official code for the paper AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors.
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
OMGGGGG / mmdg
Forked from ZitongYu/Flex-Modal-FASSuppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Gemma open-weight LLM library, from Google DeepMind