Add EXAONE #34652

lgai-exaone · 2024-11-08T07:26:48Z

What does this PR do?

Add EXAONE model released by LG AI Research.

Test code and documentation are currently in progress.

Please refer to the corresponding issue: #34651

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker

lgai-exaone · 2024-11-14T05:14:11Z

Please review this PR. @ArthurZucker @itazap

ArthurZucker · 2024-11-14T14:34:18Z

Hey! Of course 🤗 sorry for being late we were on a company wide offsite!

Merge remote-tracking branch 'upstream/main' into add-exaone

ArthurZucker

Hey! I should have done this a while ago 😅 Thanks a lot for contributing! Models looks nice 🚀
My main question is what are the main differences with other models like Llama for example in terms of code?

The key point about adding a new model is isolating the differences! Based on that you can easily add the model with modular now (I'll help you with that!)

Appart from some layer renaming, it seems at first hand that we can probably just add a conversion script no? 🤗

SangbumChoi · 2024-11-19T13:41:37Z

I think it has some probability that purely build based on transformer? Maybe asking training script could be one way

lgai-exaone · 2024-11-20T06:42:33Z

Hey! Thanks so much for your kind words and for reaching out! 🚀 We're thrilled to hear your interest in contributing to EXAONE and exploring its integration possibilities.

To address your question:

1. Key Differences from LLaMA

EXAONE differs from LLaMA in several foundational ways:

Tokenizer: While LLaMA 2 uses LlamaTokenizer and LLaMA 3 adopts Tiktoken, EXAONE uses GPT2Tokenizer as its default tokenizer.
Impact of Tokenizer on Performance: A common issue occurs when users apply LLaMA's tokenizer after converting EXAONE into a LLaMA-style implementation. This often leads to significant accuracy degradation, particularly due to token handling in LLaMA's tokenizer, which doesn't align with EXAONE's design.

While adding a conversion script might address token compatibility, it doesn't fully resolve the potential issues stemming from deeper differences in model architecture and training methodology.

2. Why EXAONE Modeling Should Remain Independent

EXAONE was trained entirely from scratch and has no direct relation to LLaMA, though we’ve noticed a tendency for people to misunderstand this point. As EXAONE continues to evolve, its implementation may diverge even further from LLaMA as we experiment with novel techniques and enhancements. Relying on LLaMA's modeling could limit our flexibility to innovate and adapt.
Furthermore, as a bilingual model with the unique characteristic of supporting the Korean language, unlike English-centric models, managing EXAONE separately becomes an even more critical issue.
Our ultimate goal is to open-source EXAONE in a way that maximizes clarity, usability, and scalability—preserving the model's unique characteristics while empowering contributors like you to make the most of its capabilities.

We’re excited to collaborate with you and would love to hear your thoughts or ideas on how we can move forward.

Please consider it positively. We will also contribute a lot. 🤗

HuggingFaceDocBuilderDev · 2024-11-25T17:28:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-11-27T10:58:52Z

Hey! Sorry for the late answer.
I will answer as I have always done, this goes against the philosophy of transformers!
We are trying our best to set good standards, and by making sure you model is converted to the llama format, you will make sure it's available day-0 on all the other framework, making it available for the entire community at 0 software cost!

Super happy to see a new model, and I think that good results mean your team did a great job a training it 🚀

In transformers, we don't change past modeling code to adapt new checkpoints, for us, a different codepath means a new architecture!

The tokenizer should simply use PreTrainedTokenizerFast , this can be changed in the config.json!

lgai-exaone · 2024-11-30T03:11:32Z

Hey! Thank you for your response.

We understand and respect the transformers library's philosophy of maintaining consistent architecture standards.

However, as we mentioned before, we have decided to maintain a separate modeling from the LLaMA's for future development needs, while we acknowledge the benefits of cross-framework compatibility.

We look forward to exploring ways to integrate EXAONE's distinct architectural features with the transformers library in a standardized way and to continuing our constructive discussions in the future.

We truly appreciate your time in reviewing our model and your valuable contributions to the open-source community.

BBC-Esq · 2024-12-18T03:18:40Z

Please accept this new architecture or whatever the issues is...I've tested out the exaone models and it beats out qwen's comparably sized models. The 7.8b model was the only model of comparable size to achieve a perfect score on my custom benchmark, which I can provide details regarding if you want. It's specifically geared towards typical RAG questions - e.g. needle in haystack, chain reasoning (e.g. how many albums has XYZ artist released when the LLM has to count from multiple contexts), etc.

I'm having trouble understanding the issue with forcing them to use Llama's architecture. Sorry, this isn't my profession.

EDIT - I'm referring to the exaone 3.5 models.

ArthurZucker · 2024-12-19T09:02:20Z

Hey! You can already use the model by simply converting layer names! The same drama happened with QWEN 1 model and the authors saw that it was more beneficial to accept llama architecture.
The key idea is that this model IS a llama model, with absolutely 0 differences in terms of architecture. There is 0 motivation to not use an architecture supported by more than 20 softwares (transformers, TGI, vllm, peft, llama.cpp, ollama, mlx, etc) when with absolutely 0 cost the model can be used.

I am super happy to promote LG's work, as I believe they pulled off something super nice by training this model and pushing it's performances!

Regarding:

we have decided to maintain a separate modeling from the LLaMA's for future development needs, while we acknowledge the benefits of cross-framework compatibility.

This is not something that can happen in transformers as for us, 1 architecture == 1 codepath, meaning that we never go back and patch a modeling code, if there is a newer version, we just add a new model! This is how we have been doing for the past 4 years! 🤗

lgai-exaone marked this pull request as draft November 8, 2024 07:27

lgai-exaone mentioned this pull request Nov 8, 2024

Add EXAONE #34651

Open

2 tasks

lgai-exaone force-pushed the add-exaone branch 8 times, most recently from 21bcbe1 to df34864 Compare November 12, 2024 12:50

lgai-exaone added 10 commits November 12, 2024 21:51

Add EXAONE modeling & config scripts

42a0053

Add test code for EXAONE

9f9d569

Fix code style

d2a3628

Fix code style & Update docstrings

f579e07

[WIP] Update docs

29eafa5

Update modeling & test code

97e40a7

Minor fixes

a79a20a

Update docs

77fd69f

Minor fixes

12a9bfa

Update docs & Add EXAONE to SPECIAL_MODEL_NAME_LINK_MAPPING

d155f81

lgai-exaone force-pushed the add-exaone branch from df34864 to d155f81 Compare November 12, 2024 12:51

lgai-exaone marked this pull request as ready for review November 12, 2024 13:34

lgai-exaone changed the title ~~[WIP] Add EXAONE~~ Add EXAONE Nov 12, 2024

Merge remote-tracking branch 'upstream/main' into add-exaone

d1991f7

o

24545d6

Merge remote-tracking branch 'upstream/main' into add-exaone

qubvel added the New model label Nov 18, 2024

Merge remote-tracking branch 'upstream/main' into add-exaone

5a062f9

ArthurZucker reviewed Nov 19, 2024

View reviewed changes

lgai-exaone mentioned this pull request Nov 20, 2024

Add EXAONE support casper-hansen/AutoAWQ#651

Merged

lgai-exaone added 2 commits November 22, 2024 10:36

Merge remote-tracking branch 'upstream/main' into add-exaone

672a86e

Merge remote-tracking branch 'upstream/main' into add-exaone

fdfa52c

Merge remote-tracking branch 'upstream/main' into add-exaone

f943e3d

Merge remote-tracking branch 'upstream/main' into add-exaone

151a314

BBC-Esq mentioned this pull request Dec 18, 2024

add exaone pretty please? OpenNMT/CTranslate2#1836

Open

KareemMusleh mentioned this pull request Dec 27, 2024

Implementing exaone3.5 unslothai/unsloth#1480

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EXAONE #34652

Add EXAONE #34652

lgai-exaone commented Nov 8, 2024 •

edited

Loading

lgai-exaone commented Nov 14, 2024

ArthurZucker commented Nov 14, 2024

ArthurZucker left a comment

SangbumChoi commented Nov 19, 2024 •

edited

Loading

lgai-exaone commented Nov 20, 2024

HuggingFaceDocBuilderDev commented Nov 25, 2024

ArthurZucker commented Nov 27, 2024

lgai-exaone commented Nov 30, 2024

BBC-Esq commented Dec 18, 2024 •

edited

Loading

ArthurZucker commented Dec 19, 2024

Add EXAONE #34652

Are you sure you want to change the base?

Add EXAONE #34652

Conversation

lgai-exaone commented Nov 8, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

lgai-exaone commented Nov 14, 2024

ArthurZucker commented Nov 14, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

SangbumChoi commented Nov 19, 2024 • edited Loading

lgai-exaone commented Nov 20, 2024

HuggingFaceDocBuilderDev commented Nov 25, 2024

ArthurZucker commented Nov 27, 2024

lgai-exaone commented Nov 30, 2024

BBC-Esq commented Dec 18, 2024 • edited Loading

ArthurZucker commented Dec 19, 2024

lgai-exaone commented Nov 8, 2024 •

edited

Loading

SangbumChoi commented Nov 19, 2024 •

edited

Loading

BBC-Esq commented Dec 18, 2024 •

edited

Loading