Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update run-docs to avoid code duplication #1439

Merged
merged 29 commits into from
Jan 30, 2025
Merged

Conversation

mikekgfb
Copy link
Contributor

@mikekgfb mikekgfb commented Dec 23, 2024

Update run-docs to avoid duplicate code (hygiene for scalability with more doc files)

Also, add support for command line options without space for runner (in line with how POSIX treats commandline arguments) , to offer a more portable way to specify rewrite from readme.md to tests with updown.py.

Update run-docs to avoid duplicate code
Copy link

pytorch-bot bot commented Dec 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1439

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b77ddf3 with merge base 5684175 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 23, 2024
Add back command explaining seemingly extraneous `echo exit 1`
Update to C++11 ABI for AOTI, similar to ET
@mikekgfb mikekgfb changed the title Update run-docs to avoid duplicate code Update run-docs to avoid code duplication Jan 12, 2025
fi
# Pre-initialize variables
filepath=""
parameters="--replace 'llama3:stories15M,-l 3:-l 2' --suppress huggingface-cli,HF_TOKEN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have quotation bug making the tests fail silently

 python3 torchchat/utils/scripts/updown.py --file torchchat/utils/docs/evaluation.md --replace ''\''llama3:stories15M,-l' 3:-l '2'\''' --suppress huggingface-cli,HF_TOKEN
  usage: updown.py [-h] [-f FILENAME] [-p PREDICATE] [-r REPLACE] [-s SUPPRESS]
                   [-e] [-g]
  updown.py: error: unrecognized arguments: 3:-l 2'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what system is that? There's definitely something weird going on here. Angela committed a change that we don't need -l 2/3 anymore, so we would not have to switch around that flag? the problem originates because of the space in the arg that should be properly escaped, but some shell does not seem to handle this gracefully.

And yeah, all errors are being suppressed on run-docs commands, so it's been attracting a ton of fails that are surfacing when we look closely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pytorch/torchchat/actions/runs/12996075947/job/36258706324?pr=1439
https://github.com/pytorch/torchchat/actions/runs/12996075978/job/36258715248?pr=1439

Angela committed a change that we don't need -l 2/3 anymore, so we would not have to switch around that flag?

#1159 Correct for AOTI we should't need the flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, all errors are being suppressed on run-docs commands, so it's been attracting a ton of fails that are surfacing when we look closely

Surfacing suppressed errors is a good thing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pytorch/torchchat/actions/runs/12996075947/job/36258706324?pr=1439 https://github.com/pytorch/torchchat/actions/runs/12996075978/job/36258715248?pr=1439

Angela committed a change that we don't need -l 2/3 anymore, so we would not have to switch around that flag?

#1159 Correct for AOTI we should't need the flag

et_run still needs the flag? Can we do something for et_run too? It would both help the users and simplify the cleanup ;)

PS: would -l3 work based on the argline parser? Coz that one doesn't trigger the space in arguments issue that leads to the problem we're seeing. (I think it's a badly implemented shell, but it is what it is, and no point to visit "is there a better shell" and then investigate how to rebuild git around it.... Pragmatically, cheapest way we get over that hump, since iit's sorta a nuisance to say -l anyway.... so if we can just kill it off, that's the best path!)

Copy link
Contributor Author

@mikekgfb mikekgfb Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've removed -l 3 from aoti_run, and rewritten it as -l3 for et_run. I've also rewritten the aoti_run/et_run commandline parser to allow this behavior (glue flag values onto single letter options) in keeping with traditional POSIX command line processing (where this space is historically optional).

That being saaid, if we don't need -l 2/3 on et_run either, that would be best for our users anyway.

@Jack-Khuu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the workaround, agree we should add it for ET (pretty straight forward)

#1484

Remove -l 3 since no longer necessary after Angea's change
remove -l 3 from aoti run , and write -l3 for et_run
-l 3:-l 2 -> -l3:-l2

after modifying the command lines.  Hopefull this is legal for et_run
Update to support non-space separated args
Add a gs=32 cuda.json for test runs with stories15M
add gs=32 variant of mobile for tests
Use gs=32 variants with stories models
undo gs32
mikekgfb added a commit to mikekgfb/torchchat-1 that referenced this pull request Jan 28, 2025
switch to gs=32 quantization
(requires consolidated run-docs of pytorch#1439)
Extend timeout to avoid timeout of mps quantization test
enforce that and argument must have at least length 2, and refine check for uniarg (ie arg plus flag value in one option) to be args with more than 2 characters
@mikekgfb
Copy link
Contributor Author

@Jack-Khuu any concerns with landing this after CI/CD runs pass?

@Jack-Khuu
Copy link
Contributor

Thanks for this, kicked off the CI

@Jack-Khuu Jack-Khuu merged commit 4356b4c into pytorch:main Jan 30, 2025
69 checks passed
vmpuri pushed a commit that referenced this pull request Feb 4, 2025
* Update run-docs to avoid duplicate code

Update run-docs to avoid duplicate code

* Update run-docs

Add back command explaining seemingly extraneous `echo exit 1`

* Update build_native.sh

Update to C++11 ABI for AOTI, similar to ET

* Update run-docs

* Update run-docs

Update to run distributed inference test with open-llama instead of llama3.1

* Update run-docs

Open-llama -> stories to avoid tokens.

* Update README.md

Remove -l 3 since no longer necessary after Angea's change

* Update quantization.md

remove -l 3 from aoti run , and write -l3 for et_run

* Update run-docs

-l 3:-l 2 -> -l3:-l2

after modifying the command lines.  Hopefull this is legal for et_run

* Update run.cpp

Update to support non-space separated args

* Update run.cpp

typo

* Create cuda-32.json

Add a gs=32 cuda.json for test runs with stories15M

* Create mobile-32.json

add gs=32 variant of mobile for tests

* Update run-docs

Use gs=32 variants with stories models

* Update run-docs

undo gs32

* Update run-readme-pr-mps.yml

Extend timeout to avoid timeout of mps quantization test

* Update run.cpp

enforce that and argument must have at least length 2, and refine check for uniarg (ie arg plus flag value in one option) to be args with more than 2 characters

* Update run.cpp

typos

---------

Co-authored-by: Jack-Khuu <[email protected]>
Jack-Khuu added a commit that referenced this pull request Feb 21, 2025
* Update run-readme-pr-macos.yml

source test commands instead of executing them.  
(Possible fix for #1315 )

* Update run-docs

source instead of exec

* Update README.md

somebody pushed all the model exports into exportedModels, but... we never create the directory.

we should do that also do this in the user instructions, just because storing into a directory that doesn't exist is not good :)

* Update multimodal.md

multimodal doc needed end of tests comment.

* Update ADVANCED-USERS.md

Need to download files before using them, lol. We expect the users to do this, but we should verbalize.  Plus, if we extract for testing, then it obviously fails.

* Update native-execution.md

( triggers unexpected token in macos zsh

* Update run-readme-pr-macos.yml

          # metadata does not install properly on macos
          # .ci/scripts/run-docs multimodal

* Update run-readme-pr-mps.yml

          # metadata does not install properly on macos
          # .ci/scripts/run-docs multimodal

* Update ADVANCED-USERS.md

install wget

* Update run-readme-pr-macos.yml

          echo ".ci/scripts/run-docs native DISABLED"
          # .ci/scripts/run-docs native

* Update run-readme-pr-mps.yml

          echo ".ci/scripts/run-docs native DISABLED"
          # .ci/scripts/run-docs native

* Update run-docs

switch to gs=32 quantization
(requires consolidated run-docs of #1439)

* Create cuda-32.json

add gs=32 cuda quantization for use w/ stories15M

* Create mobile-32.json

add gs=32 for stories15M

* Update run-readme-pr.yml

Comment out tests that currently fail, as per summary in PR comments

* Update install_requirements.sh

Dump location of executable to understand these errors:
https://hud.pytorch.org/pr/pytorch/torchchat/1476#36452260294

2025-01-31T00:18:57.1405698Z + pip3 install -r install/requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/cpu
2025-01-31T00:18:57.1406689Z ./install/install_requirements.sh: line 101: pip3: command not found

* Update install_requirements.sh

dump candidate locations for pip

* Update README.md

Some of the updown commands were getting rendered. Not sure why/when that happens?

* Update run-docs

readme switched from llama3 to llama3.1, so replace llama3.1 with stories15M

* Update run-readme-pr-macos.yml

remove failing gguf test

* Update run-readme-pr-mps.yml

Remove failing gguf test

* Update run-readme-pr.yml

Can we mix `steps:` with `script: |` in git workflows?

Testing 123 testing!

* Update run-docs

remove quotes around replace as the nested quotes are not interpreted by the shall but seem to be passed to updown.py.

We don't have spaces in replace, so no need for escapes.

* Update run-readme-pr.yml

1 - Remove steps experiment.
2 - add at-get install pip3

Maybe releng needs to look at what's happening with pip?

* Update run-docs

remove quotes that mess up parameter identification.

* Update run-readme-pr.yml

try to install pip & pip3

* Update run-readme-pr.yml

debug

        which pip || true
        which pip3 || true
        which conda || true

* Update run-readme-pr-macos.yml

* Update run-readme-pr-linuxaarch64.yml

debug info

```
        which pip || true
        which pip3 || true
        which conda || true
```

* Update quantization.md

use group size 32 which works on all models

* Update run-readme-pr.yml

Cleanup, comment non-working tests

* Update run-readme-pr-macos.yml

Uncomment test code requiring unavailable pip3

* Update run-readme-pr-mps.yml

comment non-working tests

* Update run-readme-pr-linuxaarch64.yml

comment out test code requiring pip3

* Update run-docs

Avoid nested quotes

* Update run-readme-pr.yml

Enable distributed test

* Update install_requirements.sh

Remove extraneous debug messages from install_requirements.sh

* Update install_requirements.sh

remove debug

* Update run-readme-pr.yml

Comment out failing quantization-any (glibc version issue) and distributed (nccl usage)

* Update run-readme-pr.yml

Disable remaining tests

* Update run-readme-pr.yml

enable readme

* Update run-readme-pr.yml

remove run of readme

---------

Co-authored-by: Jack-Khuu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants