-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anthropic's Prompt Caching not working for HumanMessage #6705
Comments
Hey @alexander-schick! I'm Dosu, here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for a human maintainer. Anthropic's prompt caching is not working for Here is the relevant code snippet from the documentation: import { ChatAnthropic } from "@langchain/anthropic";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
const modelWithCaching = new ChatAnthropic({
model: "claude-3-haiku-20240307",
clientOptions: {
defaultHeaders: {
"anthropic-beta": "prompt-caching-2024-07-31",
},
},
});
const LONG_TEXT = `You are a pirate. Always respond in pirate dialect.
Use the following as context when answering questions:
${CACHED_TEXT}`;
const messages = [
new SystemMessage({
content: [
{
type: "text",
text: LONG_TEXT,
// Tell Anthropic to cache this block
cache_control: { type: "ephemeral" },
},
],
}),
new HumanMessage({
content: "What types of messages are supported in LangChain?",
}),
];
const res = await modelWithCaching.invoke(messages);
console.log("USAGE:", res.response_metadata.usage); In this example, the Additionally, the test case from the LangChain.js repository confirms that the caching mechanism is verified by checking the const res = await model.invoke(messages);
expect(
res.response_metadata.usage.cache_creation_input_tokens
).toBeGreaterThan(0);
expect(res.response_metadata.usage.cache_read_input_tokens).toBe(0);
const res2 = await model.invoke(messages);
expect(res2.response_metadata.usage.cache_creation_input_tokens).toBe(0);
expect(res2.response_metadata.usage.cache_read_input_tokens).toBeGreaterThan(
0
); This confirms that the caching mechanism is functioning as expected for |
Follow-up for the human maintainer: As mentioned, the native Anthropic SDK does not have this limitation. |
Thank you for reporting this! I've pushed up a fix, and it'll be merged & released soon. |
@alexander-schick 0.2.17 released with a fix! Please tag me if you experience any more issues! |
Awesome. Thank you very much! |
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Here is the output with the code example. No cache tokens are being used. (Moving the long text as well as cache_control to the SystemMessage works as expected.)
{
input_tokens: 2350,
cache_creation_input_tokens: 0,
cache_read_input_tokens: 0,
output_tokens: 45
}
Description
Hello everyone,
I tried using Anthropic's prompt caching in LangChain with @langchain/anthropic (0.2.16) and I believe there might be an issue with caching behavior for HumanMessage.
To reproduce the behavior, I am adding two code snippets.
This code works as expected and I can see the cached token usage. The cache_control is set in the SystemMessage.
The same code does not work if I move the cache to the HumanMessage. Then neither cache write nor read tokens are used. The content of the first part of the HumanMessage is static and does not change.
Doing the same in the native Anthropic SDK works and it seems that caching is not limited to the message type.
Looking forward to your response!
System Info
Node version:
v20.16.0
LangChain:
"@langchain/anthropic": "^0.2.16",
"langchain": "^0.2.18",
The text was updated successfully, but these errors were encountered: