Skip to content

Commit

Permalink
Add uv/ruff/pyright/typos (#36)
Browse files Browse the repository at this point in the history
* chore(deps): add uv

* chore(deps): add mise

* chore: apply ruff format/fix to entire repo

* chore: add typos

* chore(ci): add github actions for checks

* docs: update README with local dev instructions

* chore(mise): don't commit mise

* chore: fix pyright errors

* fix(csv): add back logs
  • Loading branch information
collindutter authored Feb 17, 2025
1 parent 72c0e4d commit 4a11a09
Show file tree
Hide file tree
Showing 44 changed files with 3,869 additions and 453 deletions.
23 changes: 23 additions & 0 deletions .github/actions/init-environment/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: "Init Environment"
description: "Initialize environment"
runs:
using: "composite"
steps:
- name: Checkout actions
uses: actions/checkout@v4

- id: setup-python
name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true


- name: Install dependencies
run: uv sync --all-extras --all-groups
shell: bash
67 changes: 67 additions & 0 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Code Checks

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
merge_group:
types: [checks_requested]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
format:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9"]
steps:
- name: Checkout actions
uses: actions/checkout@v4
- name: Init environment
uses: ./.github/actions/init-environment
- name: Run formatter
run: uv run ruff format
type-check:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ "3.9" ]
steps:
- name: Checkout actions
uses: actions/checkout@v4
- name: Init environment
uses: ./.github/actions/init-environment
- name: Run type checker
run: uv run pyright
lint:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ "3.9" ]
steps:
- name: Checkout actions
uses: actions/checkout@v4
- name: Init environment
uses: ./.github/actions/init-environment
- name: Run linter
run: uv run ruff check
spell-check:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ "3.9" ]
steps:
- name: Checkout actions
uses: actions/checkout@v4
- name: Init environment
uses: ./.github/actions/init-environment
- name: Run linter
run: uv run typos
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@

downloaded
output
input
input

.mise.local.toml
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,54 @@ Once logged in, you can connect your GitHub account and [create a structure from
## Running Samples

Each Sample's README has more details on how to call and run the Sample. If you wish to run the Sample via the Griptape Framework, take a look at [Structure Run Drivers](https://docs.griptape.ai/stable/griptape-framework/drivers/structure-run-drivers/).

## Local Dev

### uv

This project uses [uv](https://docs.astral.sh/uv/) to manage project dependencies.
If you're familiar with using [poetry](https://python-poetry.org/), `uv` is a more modern alternative.
After you've [installed uv](https://docs.astral.sh/uv/getting-started/installation/), you can install the project dependencies by running:

```bash
uv sync --all-extras --all-groups
```

While each sample defines its own dependencies in a directory-local `requirements.txt`, this command will install some common dependencies that are used across all samples for an easier local development experience.

### ruff

This project uses [ruff](https://docs.astral.sh/ruff/) for linting and formatting.

You can run the `ruff` formatter on the project by running:

```bash
uv run ruff format
```

You can run the `ruff` linter on the project by running:

```bash
uv run ruff check --fix
```

### pyright

This project uses [pyright](https://github.com/microsoft/pyright) for static type checking.

You can run `pyright` on the project by running:

```bash
uv run pyright
```


### typos

This project uses [typos](https://github.com/crate-ci/typos) for spell checking.

You can run `typos` on the project by running:

```bash
uv run typos
```
Empty file.
3 changes: 2 additions & 1 deletion griptape_aws_bill_pdf_to_csv/download.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
import os

from dotenv import load_dotenv
from griptape.drivers import GriptapeCloudFileManagerDriver

Expand Down Expand Up @@ -45,4 +46,4 @@
)

with open(file=local_file_path, mode="wb") as file:
file.write(gtc_file_manager_driver.try_load_file(file_name))
file.write(gtc_file_manager_driver.try_load_file(file_name))
120 changes: 63 additions & 57 deletions griptape_aws_bill_pdf_to_csv/structure.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,26 @@
import argparse
import csv
import json
import logging
import os
import pypdf
from io import BytesIO
from pathlib import Path

import pypdf
from attrs import define
from dotenv import load_dotenv
from io import BytesIO

from griptape.artifacts import TextArtifact, ListArtifact
from griptape.drivers import GriptapeCloudEventListenerDriver, GriptapeCloudFileManagerDriver
from griptape.events import EventListener, EventBus, FinishStructureRunEvent
from griptape.loaders import BaseFileLoader
from griptape.artifacts import ListArtifact, TextArtifact
from griptape.drivers import (
GriptapeCloudEventListenerDriver,
GriptapeCloudFileManagerDriver,
)
from griptape.events import EventBus, EventListener, FinishStructureRunEvent
from griptape.loaders import PdfLoader
from griptape.rules import Rule, Ruleset
from griptape.structures import Agent

logger = logging.getLogger(__name__)


def is_running_in_managed_environment() -> bool:
return "GT_CLOUD_STRUCTURE_RUN_ID" in os.environ
Expand All @@ -29,15 +35,8 @@ def get_gtc_base_url() -> str:
def get_gtc_api_key() -> str:
api_key = os.environ.get("GT_CLOUD_API_KEY", "")
if is_running_in_managed_environment() and not api_key:
print(
"""
****WARNING****: No value was found for the 'GT_CLOUD_API_KEY' environment variable.
This environment variable is required when running in Griptape Cloud for authorization.
You can generate a Griptape Cloud API Key by visiting https://cloud.griptape.ai/keys .
Specify it as an environment variable when creating a Managed Structure in Griptape Cloud.
"""
)
raise ValueError("No value was found for the 'GT_CLOUD_API_KEY' environment variable.")
msg = "No value was found for the 'GT_CLOUD_API_KEY' environment variable."
raise ValueError(msg)
return api_key


Expand All @@ -52,13 +51,11 @@ def get_gtc_api_key() -> str:
],
)

agent = Agent(
conversation_memory=None,
rulesets=[csv_rules]
)
agent = Agent(conversation_memory=None, rulesets=[csv_rules])


@define
class AWSBillPdfLoader(BaseFileLoader[TextArtifact]):
class AWSBillPdfLoader(PdfLoader):
region = "UNKNOWN"
service = "UNKNOWN"
type = "UNKNOWN"
Expand All @@ -73,61 +70,68 @@ def parse(
extracted_text = page.extract_text(extraction_mode="layout")
artifacts.extend(self._text_to_artifacts(extracted_text))
return ListArtifact(artifacts)
def _text_to_artifacts(self, text: str) -> list[TextArtifact]:

def _text_to_artifacts(self, text: str) -> list[TextArtifact]: # noqa: C901, PLR0912
artifacts = []

chunks = text.splitlines()

for chunk in chunks:
lstrip_chunk = chunk.lstrip()
spaces = len(chunk)-len(lstrip_chunk)
if 'USD' in lstrip_chunk:
if 11 <= spaces <= 14:
striped_value = lstrip_chunk.split('USD')[0].strip()
spaces = len(chunk) - len(lstrip_chunk)
if "USD" in lstrip_chunk:
if 11 <= spaces <= 14: # noqa: PLR2004
striped_value = lstrip_chunk.split("USD")[0].strip()
# Special case for CodeBuild USW2-Build-Min:ARM:g1.small like types
if striped_value.startswith("AWS") or striped_value.startswith("Amazon") or striped_value.startswith("CodeBuild "):
if striped_value.startswith(("AWS", "Amazon", "CodeBuild ")):
self.type = striped_value
else:
response = agent.run(f'Only return a single word, GEOGRAPHIC or OTHER. "Any" must be classified as GEOGRAPHIC. Phrases that are related to geography or locations on Earth must be classified at GEOGRAPHIC. Classify the following phrase after removing superfluous whitespace from it: "{striped_value}"')
response = agent.run(
'Only return a single word, GEOGRAPHIC or OTHER. "Any" must be classified as GEOGRAPHIC. '
"Phrases that are related to geography or locations on Earth must be classified at GEOGRAPHIC." # noqa: E501
'Classify the following phrase after removing superfluous whitespace from it: "{striped_value}"' # noqa: E501
)
if "GEOGRAPHIC" in response.output.value:
self.region = striped_value
elif "OTHER" in response.output.value:
self.service = striped_value
else:
print(f"Invalid classification: {response.output.value} for: {striped_value}")
elif 16 <= spaces <= 18:
if '(USD' in lstrip_chunk:
usd_split = lstrip_chunk.rsplit('(USD', 1)
elif 'USD' in lstrip_chunk:
usd_split = lstrip_chunk.rsplit('USD', 1)
logger.warning("Unrecognized type: %s", striped_value)
continue
elif 16 <= spaces <= 18: # noqa: PLR2004
if "(USD" in lstrip_chunk:
usd_split = lstrip_chunk.rsplit("(USD", 1)
elif "USD" in lstrip_chunk:
usd_split = lstrip_chunk.rsplit("USD", 1)
else:
print(f"Invalid cost: {cost}")
logger.warning("Unrecognized USD format: %s", lstrip_chunk)
continue

cost = usd_split[1]
try:
float(cost)
except ValueError:
print(f"Nonnumerical cost: {cost}")
logger.warning("Unrecognized cost: %s", cost)
continue

usage_split = usd_split[0].rsplit(' ')
usage_split = usd_split[0].rsplit(" ")
usage = list(filter(None, usage_split))[-1].strip()
quantity_and_unit = usage.split(" ", 1)
quantity = quantity_and_unit[0]
try:
unit = quantity_and_unit[1]
except IndexError:
unit = ""

description = list(filter(None, usage_split))[0].strip()
result = f'["{self.region}", "{self.service}", "{self.type}", "{quantity}", "{unit}", "{cost}", "{description}"]'
formatted_value = agent.run(f'Reformat, remove whitespace that is inside of words, and return the following: {result}').output.value

description = next(filter(None, usage_split)).strip()
result = f'["{self.region}", "{self.service}", "{self.type}", "{quantity}", "{unit}", "{cost}", "{description}"]' # noqa: E501
formatted_value = agent.run(
f"Reformat, remove whitespace that is inside of words, and return the following: {result}"
).output.value

artifacts.append(TextArtifact(formatted_value))
else:
print(f"Unsupported {spaces}: {repr(lstrip_chunk)}")
logger.warning("Unrecognized spaces: %s", spaces)
continue

return artifacts
Expand Down Expand Up @@ -185,35 +189,37 @@ def _text_to_artifacts(self, text: str) -> list[TextArtifact]:
api_key=get_gtc_api_key(),
base_url=get_gtc_base_url(),
bucket_id=bucket_id,
workdir=workdir
workdir=workdir,
)

loader = AWSBillPdfLoader(file_manager_driver=gtc_file_manager_driver)
list_artifact = loader.load(pdf_file_name)
print(list_artifact)

with open(csv_file_name, 'w', newline='') as destination_file:
fieldnames = ['region', 'service', 'type', 'quantity', 'unit', 'cost', 'description']
with open(csv_file_name, "w", newline="") as destination_file:
fieldnames = [
"region",
"service",
"type",
"quantity",
"unit",
"cost",
"description",
]
writer = csv.DictWriter(destination_file, fieldnames=fieldnames)

writer.writeheader()
for artifact in list_artifact.value:
writer.writerow(dict(zip(fieldnames, json.loads(artifact.value))))
writer.writerow(dict(zip(fieldnames, json.loads(artifact.value), strict=False)))

gtc_file_manager_driver.try_save_file(path=csv_file_name, value=open(csv_file_name, "rb").read())
gtc_file_manager_driver.try_save_file(path=csv_file_name, value=Path(csv_file_name).read_bytes())

if is_running_in_managed_environment():
if os.path.exists(csv_file_name):
os.remove(csv_file_name)
if is_running_in_managed_environment() and Path(csv_file_name).exists():
Path(csv_file_name).unlink()

# This code is if you run this Structure as a GTC DC
if event_driver is not None:
print("Publishing final event...")
task_input = TextArtifact(value=None)
done_event = FinishStructureRunEvent(
output_task_input=task_input, output_task_output=list_artifact
)
done_event = FinishStructureRunEvent(output_task_input=task_input, output_task_output=list_artifact)

EventBus.add_event_listener(EventListener(event_listener_driver=event_driver))
EventBus.publish_event(done_event, flush=True)
print("Published final event")
3 changes: 2 additions & 1 deletion griptape_aws_bill_pdf_to_csv/upload.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
import os

from dotenv import load_dotenv
from griptape.drivers import GriptapeCloudFileManagerDriver

Expand Down Expand Up @@ -45,4 +46,4 @@
)

with open(file=local_file_path, mode="rb") as data:
gtc_file_manager_driver.try_save_file(file_name, data)
gtc_file_manager_driver.try_save_file(file_name, data.read())
Empty file.
Loading

0 comments on commit 4a11a09

Please sign in to comment.