Skip to content

Commit

Permalink
Dockerfile fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
jakep-allenai committed Nov 13, 2024
1 parent 6c9c785 commit 83bb1dc
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 7 deletions.
4 changes: 3 additions & 1 deletion pdelfin/beakerpipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -638,15 +638,17 @@ async def main():
pdf_s3 = pdf_session.client("s3")

check_poppler_version()
logger.info(f"Starting pipeline with PID {os.getpid()}")

if args.pdfs:
logger.info("Got --pdfs argument, going to add to the work queue")
await populate_pdf_work_queue(args)

if args.beaker:
submit_beaker_job(args)
return

logger.info(f"Starting pipeline with PID {os.getpid()}")

# Create a semaphore to control worker access
# We only allow one worker to move forward with requests, until the server has no more requests in its queue
# This lets us get full utilization by having many workers, but also to be outputting dolma docs as soon as possible
Expand Down
2 changes: 1 addition & 1 deletion pdelfin/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
_MINOR = "1"
# On main and in a nightly release the patch should be one ahead of the last
# released build.
_PATCH = "1"
_PATCH = "3"
# This is mainly for nightly builds which have the suffix ".dev$DATE". See
# https://semver.org/#is-v123-a-semantic-version for the semantics.
_SUFFIX = ""
Expand Down
17 changes: 12 additions & 5 deletions scripts/beaker/Dockerfile-inference
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ RUN apt-get update -y && apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get -y update

# Install requirements specific to pdfs
RUN apt-get update && apt-get -y install python3-apt
RUN echo "ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true" | debconf-set-selections
RUN apt-get update -y && apt-get install -y poppler-utils ttf-mscorefonts-installer msttcorefonts fonts-crosextra-caladea fonts-crosextra-carlito gsfonts lcdf-typetools

RUN apt-get update -y && apt-get install -y --no-install-recommends \
python3.11 \
python3.11-dev \
Expand All @@ -20,21 +25,23 @@ RUN rm -rf /var/lib/apt/lists/* \
&& curl -sS https://bootstrap.pypa.io/get-pip.py | python \
&& pip3 install -U pip

RUN apt-get update && apt-get -y install python3.11-venv

RUN apt-get update && apt-get -y install python3.11-venv
ADD --chmod=755 https://astral.sh/uv/install.sh /install.sh
RUN /install.sh && rm /install.sh

# Flash inference install sooner
RUN /root/.local/bin/uv pip install --system flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

ENV PYTHONUNBUFFERED=1
WORKDIR /root
COPY pyproject.toml pyproject.toml
COPY pdelfin pdelfin
COPY pdelfin/version.py pdelfin/version.py

RUN /root/.local/bin/uv pip install --system --no-cache -e .[inference]

# TODO You can remove this and move it into the pyproject.toml once sglang makes a release > 0.35.0
RUN /root/.local/bin/uv pip install --system git+https://github.com/sgl-project/sglang.git@eff468dd5a3d24646560eb044276585f7a11ac3c#subdirectory=python&egg=sglang[all]
RUN /root/.local/bin/uv pip install --system flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

RUN python3 -m pdelfin.beakerpipeline --help
COPY pdelfin pdelfin

RUN python3 -m pdelfin.beakerpipeline --help

0 comments on commit 83bb1dc

Please sign in to comment.