Skip to content

[NeurIPS 2024] Referring Human Pose and Mask Estimation In the Wild

Notifications You must be signed in to change notification settings

bo-miao/RefHuman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License arXiv

Referring Human Pose and Mask Estimation In the Wild

This is the official pytorch implementation of our NeurIPS 2024 paper "Referring Human Pose and Mask Estimation In the Wild".

Introduction

  • We propose Referring Human Pose and Mask Estimation (R-HPM) in the wild, a new task that requires a unified model to predict both body keypoints and mask for a specified individual using text or positional prompts. This enables comprehensive and identity-aware human representations to enhance human-AI interaction.
  • We introduce RefHuman, a benchmark dataset with comprehensive human annotations, including pose, mask, text and positional prompts in unconstrained environments.
  • We propose UniPHD, an end-to-end promptable model that supports various prompt types for R-HPM and achieves top-tier performance.

⭐ UniPHD

method

⭐ RefHuman Dataset

The images and validation split annotations for the RefHuman dataset are available for download from GoogleDrive.

To request access to the training annotations, please fill out this form and send it to [email protected] using your .edu or company email address. We will respond as soon as possible.

The parsing code is provided in ./datasets/refhuman.py.

path/to/refhuman/
├── images/  
└── RefHuman_train.json   # annotations for train split
└── RefHuman_val.json     # annotations for val split

Setup

The code is developed with python=3.9.0,pytorch=1.11.0,cuda=11.3.

First, clone the repository locally and install related packages.

git clone https://github.com/bo-miao/RefHuman
pip install pycocotools timm termcolor opencv-python addict yapf scipy

Then, compile the CUDA operators.

cd models/uniphd/ops
python setup.py build install
python test.py  # check if correctly installed

Evaluation on RefHuman

Our UniPHD checkpoint trained on RefHuman is available at GoogleDrive.

The evaluation script is located in ./scripts/eval_coco_swint.sh. (Please first download the pretrained Swin-T checkpoint)

You can also run the following command (use --eval_trigger to control the prompt type) :

GPU_NUM=4
BATCH=24
python -m torch.distributed.launch --nproc_per_node=${GPU_NUM} --master_port 22222 \
    main.py -c config/uniphd.py \
    --backbone swin_T_224_1k \
    --options batch_size=${BATCH} \
    --resume ${CKPT_PATH} \
    --eval \
    --eval_trigger 'text'\

Acknowledgements

This project is built on the open source repositories SgMg, GroupPose, and Deformable DETR. Thanks them for their well-organized codes!

Citation

☀️ If you find this work useful, please kindly cite our paper! ☀️

@InProceedings{Miao_2024_NeurIPS,
    author    = {Miao, Bo and Feng, Mingtao and Wu, Zijie and Bennamoun, Mohammed and Gao, Yongsheng and Mian, Ajmal},
    title     = {Referring Human Pose and Mask Estimation In the Wild},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    year      = {2024},
}

About

[NeurIPS 2024] Referring Human Pose and Mask Estimation In the Wild

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published