Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EXAONE #34652

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Add EXAONE #34652

wants to merge 17 commits into from

Conversation

lgai-exaone
Copy link

@lgai-exaone lgai-exaone commented Nov 8, 2024

What does this PR do?

Add EXAONE model released by LG AI Research.

Test code and documentation are currently in progress.

Please refer to the corresponding issue: #34651

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker

@lgai-exaone lgai-exaone marked this pull request as draft November 8, 2024 07:27
@lgai-exaone lgai-exaone mentioned this pull request Nov 8, 2024
2 tasks
@lgai-exaone lgai-exaone force-pushed the add-exaone branch 8 times, most recently from 21bcbe1 to df34864 Compare November 12, 2024 12:50
@lgai-exaone lgai-exaone marked this pull request as ready for review November 12, 2024 13:34
@lgai-exaone lgai-exaone changed the title [WIP] Add EXAONE Add EXAONE Nov 12, 2024
@lgai-exaone
Copy link
Author

Please review this PR. @ArthurZucker @itazap

@ArthurZucker
Copy link
Collaborator

Hey! Of course 🤗 sorry for being late we were on a company wide offsite!

o
Merge remote-tracking branch 'upstream/main' into add-exaone
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! I should have done this a while ago 😅 Thanks a lot for contributing! Models looks nice 🚀
My main question is what are the main differences with other models like Llama for example in terms of code?

The key point about adding a new model is isolating the differences! Based on that you can easily add the model with modular now (I'll help you with that!)

Appart from some layer renaming, it seems at first hand that we can probably just add a conversion script no? 🤗

@SangbumChoi
Copy link
Contributor

SangbumChoi commented Nov 19, 2024

I think it has some probability that purely build based on transformer? Maybe asking training script could be one way

@lgai-exaone
Copy link
Author

Hey! Thanks so much for your kind words and for reaching out! 🚀 We're thrilled to hear your interest in contributing to EXAONE and exploring its integration possibilities.

To address your question:

1. Key Differences from LLaMA

EXAONE differs from LLaMA in several foundational ways:

  • Tokenizer: While LLaMA 2 uses LlamaTokenizer and LLaMA 3 adopts Tiktoken, EXAONE uses GPT2Tokenizer as its default tokenizer.
  • Impact of Tokenizer on Performance: A common issue occurs when users apply LLaMA's tokenizer after converting EXAONE into a LLaMA-style implementation. This often leads to significant accuracy degradation, particularly due to token handling in LLaMA's tokenizer, which doesn't align with EXAONE's design.

While adding a conversion script might address token compatibility, it doesn't fully resolve the potential issues stemming from deeper differences in model architecture and training methodology.

2. Why EXAONE Modeling Should Remain Independent

EXAONE was trained entirely from scratch and has no direct relation to LLaMA, though we’ve noticed a tendency for people to misunderstand this point. As EXAONE continues to evolve, its implementation may diverge even further from LLaMA as we experiment with novel techniques and enhancements. Relying on LLaMA's modeling could limit our flexibility to innovate and adapt.
Furthermore, as a bilingual model with the unique characteristic of supporting the Korean language, unlike English-centric models, managing EXAONE separately becomes an even more critical issue.
Our ultimate goal is to open-source EXAONE in a way that maximizes clarity, usability, and scalability—preserving the model's unique characteristics while empowering contributors like you to make the most of its capabilities.

We’re excited to collaborate with you and would love to hear your thoughts or ideas on how we can move forward.

Please consider it positively. We will also contribute a lot. 🤗

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

Hey! Sorry for the late answer.
I will answer as I have always done, this goes against the philosophy of transformers!
We are trying our best to set good standards, and by making sure you model is converted to the llama format, you will make sure it's available day-0 on all the other framework, making it available for the entire community at 0 software cost!

Super happy to see a new model, and I think that good results mean your team did a great job a training it 🚀

In transformers, we don't change past modeling code to adapt new checkpoints, for us, a different codepath means a new architecture!

The tokenizer should simply use PreTrainedTokenizerFast , this can be changed in the config.json!

@lgai-exaone
Copy link
Author

Hey! Thank you for your response.

We understand and respect the transformers library's philosophy of maintaining consistent architecture standards.

However, as we mentioned before, we have decided to maintain a separate modeling from the LLaMA's for future development needs, while we acknowledge the benefits of cross-framework compatibility.

We look forward to exploring ways to integrate EXAONE's distinct architectural features with the transformers library in a standardized way and to continuing our constructive discussions in the future.

We truly appreciate your time in reviewing our model and your valuable contributions to the open-source community.

@BBC-Esq
Copy link

BBC-Esq commented Dec 18, 2024

Please accept this new architecture or whatever the issues is...I've tested out the exaone models and it beats out qwen's comparably sized models. The 7.8b model was the only model of comparable size to achieve a perfect score on my custom benchmark, which I can provide details regarding if you want. It's specifically geared towards typical RAG questions - e.g. needle in haystack, chain reasoning (e.g. how many albums has XYZ artist released when the LLM has to count from multiple contexts), etc.

I'm having trouble understanding the issue with forcing them to use Llama's architecture. Sorry, this isn't my profession.

EDIT - I'm referring to the exaone 3.5 models.

@ArthurZucker
Copy link
Collaborator

Hey! You can already use the model by simply converting layer names! The same drama happened with QWEN 1 model and the authors saw that it was more beneficial to accept llama architecture.
The key idea is that this model IS a llama model, with absolutely 0 differences in terms of architecture. There is 0 motivation to not use an architecture supported by more than 20 softwares (transformers, TGI, vllm, peft, llama.cpp, ollama, mlx, etc) when with absolutely 0 cost the model can be used.

I am super happy to promote LG's work, as I believe they pulled off something super nice by training this model and pushing it's performances!

Regarding:

we have decided to maintain a separate modeling from the LLaMA's for future development needs, while we acknowledge the benefits of cross-framework compatibility.

This is not something that can happen in transformers as for us, 1 architecture == 1 codepath, meaning that we never go back and patch a modeling code, if there is a newer version, we just add a new model! This is how we have been doing for the past 4 years! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants