-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Self extend for server (main is already supported) #4886
Comments
yes please |
I'll look into implementing it. In the meantime, do you observe positive results when using |
@ggerganov I'm happy to announce that the problem with dolphin phi with long context is solved now, and even better with group attention. And also I tried the other model and it makes different, so I think it works I just dont have understanding between
How to calculate the context desired. Could you give enlightment here ? 😄 |
@ggerganov @x4080 yes, I tested with SeaLLM 7b chat (Llama2 architecture) and extend context to 16k, 26k, the result is also quite good, looks promising, I will test with mistral, phi,.. to see how it works and the same question like @x4080 How to calculate the context desired. Could you give enlightment here @ggerganov? It's a bit difference from here https://github.com/datamllab/LongLM |
First, you set Next, given that the original training context of the model is The Additionally, |
@ggerganov Thanks for detailed answer |
Is there any update on implementing self-extend for the server @ggerganov? |
If someone wants to give it a try - go ahead. When I get to this, will assign myself to the issue - for now other priorities |
This is a pull request about this issue #4963 |
It's just adding the cmd-line arguments - there is no actual implementation |
Yes, I see too. Waiting for actual implementation! |
Sorry, I started to do it following what was done to main on the server example, then my kid made a big mess and I didn't get to finish up and haven't had a chance to get back to it still. Hopefully someone will have time to do it before I do again. |
@ggerganov Is it ok when I do this? I will start working on it now. |
I looked at your main implementation and it looks doable for me. Is there anything I need to look out for? |
I suppose it would be a good idea to put this code behind a |
@ggerganov Well, actually I just copy pasted it. But I will refactor it once I know I have the correct way of doing this! |
I finished porting of self extend to the server. |
Hi @Maximilian-Winter , first thanks for your work. But I have found problem with KV cache #5104 (comment) |
closing as completed. (@duykhanhbk i found the same issue, but the self extend is not the cause) |
Hi, I found that using server with --grp-attn-n -> in some situation it will stop inference prematurely, I tested using server with and without, without --grp-attn-n works flawlessly, then I tried using non server (cmd line inference with grp-attn) and works fine, so maybe there's problem with the implementation in server ? |
@x4080 it would really help if you added a scenario regarding group attention self extension using the server test framework. |
@phymbert thanks for replying, I'm currently using my own fine tune model with my private data, but what I did with the model is to translate to another language, I know its difficult to fix things without repeatable evidence, maybe I can find other example with public model - I'll share it |
Self extend is now supported for main: Link: #4815
Link paper: https://arxiv.org/pdf/2401.01325.pdf
It would be great if it was also supported for the server, any guidance or support is welcome!
The text was updated successfully, but these errors were encountered: