-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EXAONE #34652
base: main
Are you sure you want to change the base?
Add EXAONE #34652
Conversation
21bcbe1
to
df34864
Compare
df34864
to
d155f81
Compare
Please review this PR. @ArthurZucker @itazap |
Hey! Of course 🤗 sorry for being late we were on a company wide offsite! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! I should have done this a while ago 😅 Thanks a lot for contributing! Models looks nice 🚀
My main question is what are the main differences with other models like Llama for example in terms of code?
The key point about adding a new model is isolating the differences! Based on that you can easily add the model with modular now (I'll help you with that!)
Appart from some layer renaming, it seems at first hand that we can probably just add a conversion script no? 🤗
I think it has some probability that purely build based on transformer? Maybe asking training script could be one way |
Hey! Thanks so much for your kind words and for reaching out! 🚀 We're thrilled to hear your interest in contributing to EXAONE and exploring its integration possibilities. To address your question: 1. Key Differences from LLaMA EXAONE differs from LLaMA in several foundational ways:
While adding a conversion script might address token compatibility, it doesn't fully resolve the potential issues stemming from deeper differences in model architecture and training methodology. 2. Why EXAONE Modeling Should Remain Independent EXAONE was trained entirely from scratch and has no direct relation to LLaMA, though we’ve noticed a tendency for people to misunderstand this point. As EXAONE continues to evolve, its implementation may diverge even further from LLaMA as we experiment with novel techniques and enhancements. Relying on LLaMA's modeling could limit our flexibility to innovate and adapt. We’re excited to collaborate with you and would love to hear your thoughts or ideas on how we can move forward. Please consider it positively. We will also contribute a lot. 🤗 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hey! Sorry for the late answer. Super happy to see a new model, and I think that good results mean your team did a great job a training it 🚀 In transformers, we don't change past modeling code to adapt new checkpoints, for us, a different codepath means a new architecture! The tokenizer should simply use |
Hey! Thank you for your response. We understand and respect the transformers library's philosophy of maintaining consistent architecture standards. However, as we mentioned before, we have decided to maintain a separate modeling from the LLaMA's for future development needs, while we acknowledge the benefits of cross-framework compatibility. We look forward to exploring ways to integrate EXAONE's distinct architectural features with the transformers library in a standardized way and to continuing our constructive discussions in the future. We truly appreciate your time in reviewing our model and your valuable contributions to the open-source community. |
Please accept this new architecture or whatever the issues is...I've tested out the exaone models and it beats out qwen's comparably sized models. The 7.8b model was the only model of comparable size to achieve a perfect score on my custom benchmark, which I can provide details regarding if you want. It's specifically geared towards typical RAG questions - e.g. needle in haystack, chain reasoning (e.g. how many albums has XYZ artist released when the LLM has to count from multiple contexts), etc. I'm having trouble understanding the issue with forcing them to use Llama's architecture. Sorry, this isn't my profession. EDIT - I'm referring to the exaone 3.5 models. |
Hey! You can already use the model by simply converting layer names! The same drama happened with QWEN 1 model and the authors saw that it was more beneficial to accept llama architecture. I am super happy to promote LG's work, as I believe they pulled off something super nice by training this model and pushing it's performances! Regarding:
This is not something that can happen in transformers as for us, 1 architecture == 1 codepath, meaning that we never go back and patch a modeling code, if there is a newer version, we just add a new model! This is how we have been doing for the past 4 years! 🤗 |
What does this PR do?
Add EXAONE model released by LG AI Research.
Test code and documentation are currently in progress.
Please refer to the corresponding issue: #34651
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker