Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔮 Project: Large Language Model Efficiency Challenge #3

Open
MotzWanted opened this issue Aug 14, 2023 · 0 comments
Open

🔮 Project: Large Language Model Efficiency Challenge #3

MotzWanted opened this issue Aug 14, 2023 · 0 comments

Comments

@MotzWanted
Copy link
Contributor

MotzWanted commented Aug 14, 2023

WHY
The costs of accessing, fine-tuning and querying foundation models to perform new tasks are large. Given these costs, access to performant LLMs has been gated behind expensive and often proprietary hardware used to train models, making them inaccessible to those without substantial resources. This project aims to explore the latest innovation in the way LLMs are adapted for specific tasks, considering constraints of GPU resources, while maintaining performance quality.

HOW
The challenge is set with specific constraints and an ambitious goal:

  • Constraint: Adapt a foundation model to specific tasks by fine-tuning on a single GPU (A100) within a 24-hour (1-day) time frame.
  • Goal: Maintain high accuracy for the desired tasks.

Techniques to be explored and analyzed include:

  1. Low-Rank Adaptation (LoRA):
    Designing adapters as the product of two low-rank matrices.
    Building on insights showing that pre-trained language models can learn efficiently in a smaller subspace.

  2. QLoRA:
    Building on LoRA with a 4-bit quantized model.
    Innovations include 4-bit NormalFloat, double quantization, and paged optimizers.

  3. Lightning/FlashAttention/DeepSpeed/FairScale:
    Utilizing external tools/plugins to enhance data usage, training efficiency, and model quality.

  4. Advanced topic - Blackbox LoRA:
    Current optimization methods rely on backpropagating through the whole model. Blackbox optimization consists of optimizing this small set of weights without backprop. Contact Valentin for more details about the theory.

WHAT
The results of this project will lead to:

  • Insights and Lessons: A distilled set of well-documented steps and easy-to-follow tutorials that encapsulate the learnings from the challenge.
  • Innovation in Efficiency: Uncovering new techniques and methods that can significantly impact the way VOD trained models are adapted and fine-tuned.

References
Low-Rank Adaptation (LoRA)
QLoRA
FlashAttention
Lightning Fabric
DeepSpeed
FairScale
NeurIPS LLM Efficiency Challenge

@MotzWanted MotzWanted changed the title 🔮 Project: NeurIPS Large Language Model Efficiency Challenge 🔮 Project: Large Language Model Efficiency Challenge Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant