Anthropic's Prompt Caching not working for HumanMessage #6705

alexander-schick · 2024-09-06T06:16:28Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain.js documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain.js rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatAnthropic } from "@langchain/anthropic";
import { HumanMessage, SystemMessage, AIMessage } from "@langchain/core/messages";

const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  clientOptions: {
    defaultHeaders: {
      "anthropic-beta": "prompt-caching-2024-07-31",
    },
  },
});

const run = async () => {
  const message = await model.invoke([
    new SystemMessage({
      content: [
        {
          type: "text",
          text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
        }
      ]
    }),
    new HumanMessage({
      content: [
        {
          type: "text",
          text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst. Vivamus luctus urna sed urna ultricies ac tempor dui sagittis. In condimentum facilisis porta. Sed nec diam eu diam mattis viverra. Nulla fringilla, orci ac euismod semper, magna diam porttitor mauris, quis sollicitudin sapien justo in libero. Vestibulum mollis mauris enim. Morbi euismod magna ac lorem rutrum elementum. Donec viverra auctor lobortis. Pellentesque eu est a nulla placerat dignissim. Morbi a enim in magna semper bibendum. Etiam scelerisque, nunc ac egestas consequat, odio nibh euismod nulla, eget auctor orci nibh vel nisi. Aliquam erat volutpat. Mauris vel neque sit amet nunc gravida congue sed sit amet purus. Quisque lacus quam, egestas ac tincidunt a, lacinia vel velit. Aenean facilisis nulla vitae urna tincidunt congue sed ut dui. Morbi malesuada nulla nec purus convallis consequat. Vivamus id mollis quam. Morbi ac commodo nulla. In condimentum orci id nisl volutpat bibendum. Quisque commodo hendrerit lorem quis fringilla. Nullam euismod, nisi vel consectetur interdum, nisl felis lobortis risus, vel varius ante diam ut mauris. Integer ut lectus. Praesent nulla risus, congue ac tempus nec, venenatis eu ligula. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue eu dictum. Maecenas nec felis odio, ac rutrum lorem. Vestibulum auctor tincidunt varius. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor tellus, eget condimentum nunc consequat eu. Sed aliquet, est nec auctor aliquet, orci ex vestibulum ex, in condimentum libero lectus in libero. Sed scelerisque, magna in bibendum imperdiet, turpis nisi blandit urna, id varius quam nisi id turpis. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Ut non enim metus. Sed hendrerit, turpis ac consequat commodo, erat libero ultricies risus, id posuere lectus tellus vel neque. Sed sed lacinia lectus. Duis sit amet sodales felis. Duis nunc eros, mattis at dui ac, convallis semper risus. In adipiscing ultrices tellus, in suscipit massa vehicula eu.",
          cache_control: { type: "ephemeral" }
        },
        {
          type: "text",
          text: "here are my custom instructions"
        }
      ]
    })
  ]);

  console.log(message.response_metadata.usage);
}

run();

Error Message and Stack Trace (if applicable)

Here is the output with the code example. No cache tokens are being used. (Moving the long text as well as cache_control to the SystemMessage works as expected.)

{
input_tokens: 2350,
cache_creation_input_tokens: 0,
cache_read_input_tokens: 0,
output_tokens: 45
}

Description

Hello everyone,

I tried using Anthropic's prompt caching in LangChain with @langchain/anthropic (0.2.16) and I believe there might be an issue with caching behavior for HumanMessage.

To reproduce the behavior, I am adding two code snippets.

This code works as expected and I can see the cached token usage. The cache_control is set in the SystemMessage.

    new SystemMessage({
      content: [
        {
          type: "text",
          text: LONG_TEXT,
          cache_control: { type: "ephemeral" }
        }
      ]
    }),
    new HumanMessage({
      content: [
        {
          type: "text",
          text: STATIC_USER_TEXT
        },
        {
          type: "text",
          text: variable_text
        }
      ]
    })
  ]);

  console.log(message.response_metadata.usage);

The same code does not work if I move the cache to the HumanMessage. Then neither cache write nor read tokens are used. The content of the first part of the HumanMessage is static and does not change.

const message = await client.invoke([
    new SystemMessage({
      content: [
        {
          type: "text",
          text: LONG_TEXT
        }
      ]
    }),
    new HumanMessage({
      content: [
        {
          type: "text",
          text: STATIC_USER_TEXT,
          cache_control: { type: "ephemeral" }
        },
        {
          type: "text",
          text: variable_text
        }
      ]
    })
  ]);

  console.log(message.response_metadata.usage);

Doing the same in the native Anthropic SDK works and it seems that caching is not limited to the message type.

Looking forward to your response!

System Info

Node version:
v20.16.0

LangChain:
"@langchain/anthropic": "^0.2.16",
"langchain": "^0.2.18",

dosubot · 2024-09-06T06:18:04Z

Hey @alexander-schick! I'm Dosu, here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for a human maintainer.

Anthropic's prompt caching is not working for HumanMessage in LangChain.js because prompt caching is designed to work with content blocks, which are typically part of SystemMessage rather than HumanMessage. The example provided in the documentation shows caching being applied to a SystemMessage content block, not a HumanMessage.

Here is the relevant code snippet from the documentation:

import { ChatAnthropic } from "@langchain/anthropic";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";

const modelWithCaching = new ChatAnthropic({
  model: "claude-3-haiku-20240307",
  clientOptions: {
    defaultHeaders: {
      "anthropic-beta": "prompt-caching-2024-07-31",
    },
  },
});

const LONG_TEXT = `You are a pirate. Always respond in pirate dialect.

Use the following as context when answering questions:

${CACHED_TEXT}`;

const messages = [
  new SystemMessage({
    content: [
      {
        type: "text",
        text: LONG_TEXT,
        // Tell Anthropic to cache this block
        cache_control: { type: "ephemeral" },
      },
    ],
  }),
  new HumanMessage({
    content: "What types of messages are supported in LangChain?",
  }),
];

const res = await modelWithCaching.invoke(messages);

console.log("USAGE:", res.response_metadata.usage);

In this example, the cache_control field is applied to the SystemMessage content block, which is then cached. The HumanMessage does not have a cache_control field and thus is not cached.

Additionally, the test case from the LangChain.js repository confirms that the caching mechanism is verified by checking the cache_creation_input_tokens and cache_read_input_tokens in the response metadata. The SystemMessage content block is cached, while the HumanMessage is not, as indicated by the usage metrics:

const res = await model.invoke(messages);
expect(
  res.response_metadata.usage.cache_creation_input_tokens
).toBeGreaterThan(0);
expect(res.response_metadata.usage.cache_read_input_tokens).toBe(0);
const res2 = await model.invoke(messages);
expect(res2.response_metadata.usage.cache_creation_input_tokens).toBe(0);
expect(res2.response_metadata.usage.cache_read_input_tokens).toBeGreaterThan(
  0
);

This confirms that the caching mechanism is functioning as expected for SystemMessage but not for HumanMessage ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

alexander-schick · 2024-09-06T06:29:22Z

Follow-up for the human maintainer: As mentioned, the native Anthropic SDK does not have this limitation.

bracesproul · 2024-09-06T23:37:20Z

Thank you for reporting this! I've pushed up a fix, and it'll be merged & released soon.

bracesproul · 2024-09-07T00:00:50Z

@alexander-schick 0.2.17 released with a fix! Please tag me if you experience any more issues!

alexander-schick · 2024-09-07T09:19:10Z

Awesome. Thank you very much!

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 6, 2024

bracesproul mentioned this issue Sep 6, 2024

anthropic[patch]: Fix passing cache control through in messages #6711

Merged

bracesproul closed this as completed in #6711 Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic's Prompt Caching not working for HumanMessage #6705

Anthropic's Prompt Caching not working for HumanMessage #6705

alexander-schick commented Sep 6, 2024

dosubot bot commented Sep 6, 2024

alexander-schick commented Sep 6, 2024

bracesproul commented Sep 6, 2024

bracesproul commented Sep 7, 2024

alexander-schick commented Sep 7, 2024

Anthropic's Prompt Caching not working for HumanMessage #6705

Anthropic's Prompt Caching not working for HumanMessage #6705

Comments

alexander-schick commented Sep 6, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

dosubot bot commented Sep 6, 2024

alexander-schick commented Sep 6, 2024

bracesproul commented Sep 6, 2024

bracesproul commented Sep 7, 2024

alexander-schick commented Sep 7, 2024