Skip to content

Commit

Permalink
TODOs and client fix
Browse files Browse the repository at this point in the history
  • Loading branch information
jakep-allenai committed Nov 20, 2024
1 parent 3153aea commit 67d11ec
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions pdelfin/beakerpipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ async def process_page(args, session: httpx.AsyncClient, worker_id: int, pdf_s3_
output_tokens=base_response_data["usage"].get("completion_tokens", 0),
is_fallback=False,
)
except (httpx.TimeoutException, asyncio.TimeoutError) as e:
except (httpx.TimeoutException, httpx.ConnectError, asyncio.TimeoutError) as e:
logger.warning(f"Client error on attempt {attempt} for {pdf_s3_path}-{page_num}: {e}")

# Now we want to do exponential backoff, and not count this as an actual page retry
Expand Down Expand Up @@ -845,9 +845,7 @@ async def main():
asyncio.run(main())

# TODO
# - Add logging of failed pages and have the stats function read them
# - Sglang commit a fix for the context length issue
# - pypdf fix for the 'v' error
# - aiohttp repro and bug report
# - Get a solid benchmark on the stream vs non stream approach

0 comments on commit 67d11ec

Please sign in to comment.