Add image generation functionality #407

andrewfrench · 2023-11-02T02:14:21Z

This PR adds image generation functionality, including driver support for the following providers/models:

OpenAI DALLE2
Leonardo
Stable Diffusion via Amazon Bedrock

along with unit tests for the above drivers. A new tool, ImageGenerationTool, accepts an ImageGenerationEngine configured to use the desired driver:

from griptape.drivers import AmazonBedrockStableDiffusionImageGenerationDriver
from griptape.engines import ImageGenerationEngine
from griptape.tools import ImageGenerator
from griptape.structures import Agent

...

driver = AmazonBedrockStableDiffusionImageGenerationDriver(
  session=boto3.Session(),
  style_preset="cinematic",
  sampler="K_EULER",
  # ..., etc.
)

image_generator = ImageGenerator(
  image_generation_engine=ImageGenerationEngine(
    image_generation_driver=driver,
  )
)

Agent(tools=[image_generator])
...

Resolves #332

vasinov · 2023-11-02T19:19:09Z

griptape/artifacts/image_artifact.py

+
+@define(frozen=True)
+class ImageArtifact(BlobArtifact):
+    mime_type: str = field(default="image/png", kw_only=True)


Not sure this should be the default. Can we make it more generic? Seems like we might have options.

Agree that a default doesn't quite make sense here. Updated to require the mime_type value.

griptape/artifacts/image_artifact.py

vasinov · 2023-11-02T19:21:14Z

griptape/drivers/image_generation/base_image_generation_driver.py

+from griptape.artifacts import ImageArtifact
+
+
+class BaseImageGenerationDriver:


Add (ABC).

vasinov · 2023-11-02T19:22:53Z

griptape/drivers/image_generation/dalle2_image_generation_driver.py

+
+
+@define
+class Dalle2ImageGenerationDriver(BaseImageGenerationDriver):


Let's make it a generic DalleImageGenerationDriver and manage version at the class property level.

Should we prepend with OpenAi to match other driver naming schemas?

vasinov · 2023-11-02T19:23:11Z

griptape/drivers/image_generation/dalle2_image_generation_driver.py

+import requests
+from attr import field, Factory, define
+from griptape.artifacts import ImageArtifact
+from griptape.drivers.image_generation.base_image_generation_driver import (


Drop base_image_generation_driver.

vasinov · 2023-11-02T19:27:46Z

griptape/engines/image_generation/prompt_image_generation_engine.py

+
+
+@define
+class PromptImageGenerationEngine(BaseImageGenerationEngine):


Should we make it a generic ImageGenerationEngine engine and add multiple methods for text-to-image and image-to-image modes? Or should we have different engines that encapsulate text-to-image (i.e., PromptImageGenerationEngine) and image-to-image?

I've gone back and forth on it. This implementation represents the assumption that the interface and mechanics for text-to-image and image-to-image will be substantially different or that providers (and their drivers) might only provide a subset of that functionality. We could overcome each of those obstacles, just chose a path here.

vasinov · 2023-11-02T19:29:24Z

griptape/tasks/image_generation_task.py

+    image_generation_driver: BaseImageGenerationDriver = field(
+        default=Factory(lambda: Dalle2ImageGenerationDriver()), kw_only=True
+    )


Should we remove this to simplify the interface? The user can always pass a custom driver into image_generation_engine.

vasinov · 2023-11-02T19:30:32Z

griptape/templates/engines/image_generation/prompt_image_generation.j2

+Description: """{{ description }}"""
+
+Generate an image based on the description provided.


Let's sync up with @shhlife, @Amaru-Zeas, or @averyroche about what the best base prompt is.

+1. I think we should leave the flowery specifics to the description field, but I can imagine reinforcing instructions like 'stick to the description' or optionally 'be creative', etc.

vasinov · 2023-11-02T19:30:56Z

griptape/tools/image_generator/tool.py

+    image_generation_driver: BaseImageGenerationDriver = field(
+        default=Factory(lambda: Dalle2ImageGenerationDriver()), kw_only=True
+    )


Same comment as in the task: could probably remove this.

collindutter

Great work, this is going to be awesome!

griptape/artifacts/image_artifact.py

griptape/drivers/image_generation/amazon_bedrock_stable_diffusion_image_generation_driver.py

griptape/drivers/image_generation/base_image_generation_driver.py

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

griptape/schemas/artifacts/image_artifact_schema.py

griptape/tools/image_generator/tool.py

griptape/tasks/image_generation_task.py

griptape/engines/image_generation/image_generation_engine.py

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

griptape/tasks/image_generation_task.py

andrewfrench · 2023-11-14T17:51:08Z

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

+    image_size: Union[
+        Literal["256x256"], Literal["512x512"], Literal["1024x1024"], Literal["1024x1792"], Literal["1792x1024"]
+    ] = field(default=Literal["512x512"], kw_only=True)


Not sure how I feel about this -- the patterns we surface to users seem to be heavily inspired by the underlying SDKs being interfaced with. In this case (and in OpenAI Chat Prompt Driver) we expose the same Literal values the OpenAI SDK expects, but we don't see this elsewhere in the framework. Should we define a more concrete Griptape style and transform inputs to what's expected by the dependencies? Are the utilities provided by typing the way we want to go?

I think it's somewhat unavoidable to expose SDK patterns in Drivers since their primary purpose is to sit right on top of the SDK/API. Even in the case of image_size, a seemingly universal field, it seems non-straightforward to implement it at the BaseImageGenerationDriver level. I think providing helpful type hints is a good enough solution.

It's certainly less maintenance to do it this way. If we define and enforce a standard Griptape interface style, the intermediate layer would be a constant source of frustration as we'd have to re-define (or somehow handle) all possible options to some degree. On the other hand, we require the user do some research and have knowledge of the underlying dependency (OpenAI SDK, Leonardo API) before use. Not at all unreasonable, but a bit annoying.

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

collindutter · 2023-11-14T21:08:21Z

griptape/drivers/image_generation/base_image_generation_driver.py

@@ -10,5 +10,5 @@ class BaseImageGenerationDriver(ABC):
    model: str = field(kw_only=True)

    @abstractmethod
-    def generate_image(self, prompts: list[str], negative_prompts: list[str], **kwargs) -> ImageArtifact:
+    def generate_image(self, prompts: list[str], negative_prompts: list[str] = list, **kwargs) -> ImageArtifact:


Python gotcha https://docs.python-guide.org/writing/gotchas/

collindutter · 2023-11-14T22:34:55Z

griptape/drivers/image_generation/base_image_generation_driver.py

@@ -10,5 +10,5 @@ class BaseImageGenerationDriver(ABC):
    model: str = field(kw_only=True)

    @abstractmethod
-    def generate_image(self, prompts: list[str], negative_prompts: list[str] = list, **kwargs) -> ImageArtifact:
+    def generate_image(self, prompts: list[str], negative_prompts: list[str] = None, **kwargs) -> ImageArtifact:


list[str] should be Optional[list[str]]

collindutter · 2023-11-14T22:35:04Z

griptape/drivers/image_generation/leonardo_image_generation_driver.py

@@ -31,7 +31,10 @@ class LeonardoImageGenerationDriver(BaseImageGenerationDriver):
    image_width: int = field(default=512, kw_only=True)
    image_height: int = field(default=512, kw_only=True)

-    def generate_image(self, prompts: list[str], negative_prompts: list[str], **kwargs) -> ImageArtifact:
+    def generate_image(self, prompts: list[str], negative_prompts: list[str] = None, **kwargs) -> ImageArtifact:


Optional[list[str]]

collindutter · 2023-11-14T22:35:14Z

griptape/engines/image_generation/image_generation_engine.py

-        negative_prompts: list[str] = list,
-        rulesets: list[Ruleset] = list,
-        negative_rulesets: list[Ruleset] = list,
+        negative_prompts: list[str] = None,


collindutter · 2023-11-14T22:36:14Z

griptape/engines/image_generation/image_generation_engine.py

        **kwargs
    ):
+        if not negative_prompts:
+            negative_prompts = []
+
        for ruleset in rulesets:


Need to initialize rulesets and negative_rulesets to empty lists of None

collindutter · 2023-11-15T17:41:51Z

griptape/engines/image_generation/image_generation_engine.py

+        negative_prompts: Optional[list[str]] = None,
+        rulesets: Optional[list[Ruleset]] = None,
+        negative_rulesets: Optional[list[Ruleset]] = None,
+        **kwargs


Do we need kwargs? As far as I can tell it's not being used anywhere.

That was needed when additional parameters came through the method call and not the driver setup. Removed.

collindutter · 2023-11-15T17:42:55Z

griptape/drivers/image_generation/leonardo_image_generation_driver.py

+    https://docs.leonardo.ai/reference/creategeneration
+    """
+
+    api_key: str = field(default=Factory(lambda: os.environ.get("LEONARDO_API_KEY")), kw_only=True)


Looks like this change didn't get implemented

griptape/drivers/image_generation/leonardo_image_generation_driver.py

collindutter · 2023-11-15T17:47:08Z

griptape/drivers/image_generation/leonardo_image_generation_driver.py

+    def _get_image_url(self, generation_id: str):
+        for attempt in range(self.max_attempts):
+            response = self.requests_session.get(
+                url=f"{self.api_base}/generations/{generation_id}", headers={"authorization": f"Bearer {self.api_key}"}


Capitalize authorization

griptape/drivers/image_generation/leonardo_image_generation_driver.py

collindutter · 2023-11-15T17:48:13Z

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

+    ) = field(default="1024x1024", kw_only=True)
+    response_format: Literal["b64_json"] = field(default="b64_json", kw_only=True)
+
+    def generate_image(self, prompts: list[str], **kwargs) -> ImageArtifact:


Missing negative_prompts?

OpenAI's API doesn't support negative prompts. The prompt is rewritten, so the specifics of any 'do not' instructions have an opportunity to be warped by rewriting or summarization. From my experimentation negative prompts in a unified prompt seem to be counterproductive, like negative prompts: text, clouds results in more text and clouds in the image than not including the negative prompts at all.

I think the method signature should still include it though, right? Maybe we throw an error if it's provided.

kwargs is masking the fact that the parameters do not line up properly.

I realize now that you might just mean in the signature (i.e. instead of **kwargs). Updated to make that change. If I'm mistake, let me know.

collindutter · 2023-11-15T17:48:45Z

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py

+            prompt=prompt,
+        )
+
+    @staticmethod


Should this be a static method? I don't think we really use @staticmethod anywhere else.

Interesting, it is a static method but if we don't use that internally I'll remove.

vasinov

Great work!

vasinov reviewed Nov 2, 2023

View reviewed changes

andrewfrench changed the base branch from main to dev November 8, 2023 17:40

andrewfrench force-pushed the french/image-generation branch 2 times, most recently from 04fcba4 to ff54a45 Compare November 9, 2023 05:37

andrewfrench marked this pull request as ready for review November 9, 2023 05:37

andrewfrench force-pushed the french/image-generation branch from 6538b39 to b17d023 Compare November 9, 2023 16:15

andrewfrench requested review from vasinov and collindutter November 9, 2023 21:44

collindutter requested changes Nov 9, 2023

View reviewed changes

griptape/tasks/image_generation_task.py Show resolved Hide resolved

griptape/engines/image_generation/image_generation_engine.py Outdated Show resolved Hide resolved

collindutter requested changes Nov 14, 2023

View reviewed changes

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py Outdated Show resolved Hide resolved

griptape/tasks/image_generation_task.py Outdated Show resolved Hide resolved

andrewfrench commented Nov 14, 2023

View reviewed changes

collindutter requested changes Nov 14, 2023

View reviewed changes

griptape/drivers/image_generation/openai_dalle_image_generation_driver.py Outdated Show resolved Hide resolved

collindutter requested changes Nov 14, 2023

View reviewed changes

collindutter requested changes Nov 15, 2023

View reviewed changes

collindutter approved these changes Nov 15, 2023

View reviewed changes

vasinov approved these changes Nov 16, 2023

View reviewed changes

andrewfrench added 12 commits November 16, 2023 12:15

Add image generation functionality

46acd05

Save docstrings

d53b18e

Improve driver docstrings

2fe7984

Add image artifact tests

d25ee56

Updates for typing errors

29182eb

Updates for typing

3fd7495

Update image generation task

696cd1f

Add Leonardo driver test

3226622

Add Amazon Bedrock Stable Diffusion unit tests

7368542

Add OpenAI Dalle2 unit tests

b5bfd0d

Add unit tests for ImageGenerator tool

cfb3292

Review comments squashed after merge

5fe5952

andrewfrench added 16 commits November 16, 2023 12:15

Update to surface OpenAI use of literal types

5ee465b

Fix issues with Literals and Unions -> |, tests

8fc5e7a

Update image generation task to support negative rules, rulesets

c2c6eb7

Add task tests

37ca3fc

Fix list initialization

0bdeebd

Negative prompt arguments are Optional

8184c38

Save image artifact locally

c0458cf

Fix args

7218604

Use default image size that works for dall-e-2 and dall-e-3

e295b14

Fix test

5013189

Remove env var api key default

72c227a

Remove kwargs from image gen engine

7e21954

Capitalize all bearer auth headers

2c730ec

Remove static method decoratorss

11ea134

Add exponential backoff retries to image generation

3bdd618

Further remove **kwargs

0907d69

collindutter force-pushed the french/image-generation branch from 3b119cf to 0907d69 Compare November 16, 2023 20:15

collindutter merged commit 828b3f5 into dev Nov 16, 2023

andrewfrench deleted the french/image-generation branch November 16, 2023 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image generation functionality #407

Add image generation functionality #407

andrewfrench commented Nov 2, 2023 •

edited

Loading

vasinov Nov 2, 2023

andrewfrench Nov 2, 2023

vasinov Nov 2, 2023

vasinov Nov 2, 2023

vasinov Nov 2, 2023

vasinov Nov 2, 2023

vasinov Nov 2, 2023

andrewfrench Nov 2, 2023

vasinov Nov 2, 2023

vasinov Nov 2, 2023

andrewfrench Nov 2, 2023

vasinov Nov 2, 2023

collindutter left a comment

andrewfrench Nov 14, 2023 •

edited

Loading

collindutter Nov 14, 2023

andrewfrench Nov 14, 2023

collindutter Nov 14, 2023

collindutter Nov 14, 2023

collindutter Nov 14, 2023

collindutter Nov 14, 2023

collindutter Nov 14, 2023

collindutter Nov 15, 2023

andrewfrench Nov 15, 2023

collindutter Nov 15, 2023

collindutter Nov 15, 2023

collindutter Nov 15, 2023

andrewfrench Nov 15, 2023

collindutter Nov 15, 2023 •

edited

Loading

andrewfrench Nov 15, 2023

collindutter Nov 15, 2023

andrewfrench Nov 15, 2023

vasinov left a comment

		from griptape.artifacts import ImageArtifact


		class BaseImageGenerationDriver:



		@define
		class Dalle2ImageGenerationDriver(BaseImageGenerationDriver):



		@define
		class PromptImageGenerationEngine(BaseImageGenerationEngine):

		Description: """{{ description }}"""

		Generate an image based on the description provided.

Add image generation functionality #407

Add image generation functionality #407

Conversation

andrewfrench commented Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collindutter left a comment

Choose a reason for hiding this comment

andrewfrench Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collindutter Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasinov left a comment

Choose a reason for hiding this comment

andrewfrench commented Nov 2, 2023 •

edited

Loading

andrewfrench Nov 14, 2023 •

edited

Loading

collindutter Nov 15, 2023 •

edited

Loading