Skip to content

Commit

Permalink
Support stats feature later
Browse files Browse the repository at this point in the history
  • Loading branch information
jakep-allenai committed Jan 28, 2025
1 parent 48447b6 commit dbe5487
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions olmocr/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,8 @@ def submit_beaker_job(args):
def print_stats(args):
LONG_CONTEXT_THRESHOLD = 32768

assert args.workspace.startswith("s3://"), "Printing stats functionality only works with s3 workspaces for now."

# Get total work items and completed items
index_file_s3_path = os.path.join(args.workspace, "work_index_list.csv.zstd")
output_glob = os.path.join(args.workspace, "results", "*.jsonl")
Expand Down

0 comments on commit dbe5487

Please sign in to comment.