Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0 #185

Merged
merged 205 commits into from
Apr 4, 2024
Merged

v2.0 #185

Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
205 commits
Select commit Hold shift + click to select a range
612c9eb
initial changes to hf miner
p-ferreira Jan 29, 2024
afe3993
adds 8bit and 4bit config
p-ferreira Jan 30, 2024
a424e6e
simplifies torch type logic
p-ferreira Jan 30, 2024
843f9a2
adapt zephyr miner to hf miner
p-ferreira Jan 30, 2024
e51ff26
update README for hf miner
p-ferreira Jan 30, 2024
65f3b74
adds should_force_model_loading flag to miners
p-ferreira Jan 30, 2024
1ae7540
miners module refactor
p-ferreira Jan 31, 2024
c2d40f6
runs black on miners code
p-ferreira Jan 31, 2024
9669cab
properly adds miner integration with system prompt args
p-ferreira Jan 31, 2024
dde3840
adds check verification on global logger definition
p-ferreira Feb 1, 2024
9b5b91f
clean main func from hf miner
p-ferreira Feb 1, 2024
a8e3bc8
refactor agent code + adds react agent
p-ferreira Feb 5, 2024
369820b
agents adjustments
p-ferreira Feb 5, 2024
53ca503
adds max iteration to agents
p-ferreira Feb 5, 2024
cc2b5c9
fix load llm issue
p-ferreira Feb 5, 2024
ce160ca
fix max tokens param
p-ferreira Feb 5, 2024
b8dd71b
adds toolminer for experiments
p-ferreira Feb 6, 2024
df7638b
fix error case for non wiki match
p-ferreira Feb 7, 2024
8dc340d
increase wiki retries and fix tool_miner bug
p-ferreira Feb 8, 2024
32e2fbf
dsstore gitignore, change miner base
mccrindlebrian Feb 13, 2024
cd45132
edit StreamPromptingSynapse, and add _forward to BaseStreamMinerNeuron
mccrindlebrian Feb 13, 2024
4c1902b
OpenAIUtils
mccrindlebrian Feb 13, 2024
c5066ce
change llm pipeline for streaming
mccrindlebrian Feb 13, 2024
76b7806
hf_miner to streaming
mccrindlebrian Feb 13, 2024
0cd3820
openai stream miner
mccrindlebrian Feb 13, 2024
7f983b4
echo miner
mccrindlebrian Feb 13, 2024
2b30a33
phrase miner
mccrindlebrian Feb 13, 2024
898b4b8
fix bugs
mccrindlebrian Feb 13, 2024
85c46a2
black entire workspace
mccrindlebrian Feb 13, 2024
9ad977a
add try except in langchain miners
mccrindlebrian Feb 13, 2024
4b4af19
format_send on HuggingFaceMiner
mccrindlebrian Feb 13, 2024
f1e5e57
mock miner
mccrindlebrian Feb 13, 2024
4438f15
add handle_response for StreamMiners
mccrindlebrian Feb 13, 2024
b6cf4a5
remove streaming docs
mccrindlebrian Feb 13, 2024
0735046
remove streaming docs
mccrindlebrian Feb 14, 2024
5fe465d
add client-side querying
mccrindlebrian Feb 14, 2024
1e5e9cb
black, debugging
mccrindlebrian Feb 14, 2024
5478ecd
remove format send bc broke streaming
mccrindlebrian Feb 14, 2024
41d3b5c
remove format_send
mccrindlebrian Feb 14, 2024
443f831
black
mccrindlebrian Feb 14, 2024
aa58e8d
Merge branch 'features/hf-miner' into features/streaming
mccrindlebrian Feb 14, 2024
ae8392e
remove .DS_Store
mccrindlebrian Feb 14, 2024
a129564
add tool miner
mccrindlebrian Feb 14, 2024
da90d8e
add timeout checking to stop streaming tokens
mccrindlebrian Feb 14, 2024
1b63bcd
add timeout checking to all miners
mccrindlebrian Feb 14, 2024
78a8281
add return_streamer logic
mccrindlebrian Feb 15, 2024
c7b52c6
remove force
mccrindlebrian Feb 15, 2024
5eeabb7
add thread closing and queue clearing for huggingface models
mccrindlebrian Feb 16, 2024
1f5e54f
add docstrings, more logic for cleaning
mccrindlebrian Feb 16, 2024
f3ec3fc
add timeout check to openai
mccrindlebrian Feb 16, 2024
ab81674
add streaming_batch_size to config
mccrindlebrian Feb 21, 2024
723b64c
remove dep
mccrindlebrian Feb 21, 2024
5792dfc
add wandb logging if wandb.on
mccrindlebrian Feb 21, 2024
3b4df22
remove get_openai_callback, black repo
mccrindlebrian Feb 21, 2024
784b100
fix batch size bug in config
mccrindlebrian Feb 21, 2024
516fb25
add try except to hf miners
mccrindlebrian Feb 21, 2024
1abe949
add try except to tool_miner
mccrindlebrian Feb 21, 2024
38b9a85
Merge branch 'features/streaming' of github.com:opentensor/prompting …
mccrindlebrian Feb 21, 2024
e1f6ddb
add try except to tool_miner
mccrindlebrian Feb 21, 2024
c86431f
add back in blacklist and priorty, test working
mccrindlebrian Feb 21, 2024
6bae36a
change back logging time
mccrindlebrian Feb 21, 2024
22d562a
develop mock dendrite for testing
mccrindlebrian Feb 22, 2024
f0de609
remove dep
mccrindlebrian Feb 22, 2024
425b31d
add tests for stream miners
mccrindlebrian Feb 22, 2024
5f31d16
add docstrings
mccrindlebrian Feb 23, 2024
b7bdf0b
Merge branch 'main' into features/hf-miner
p-ferreira Feb 23, 2024
13c4bea
Merge branch 'features/hf-miner' into features/streaming
mccrindlebrian Feb 23, 2024
157a9f9
change process timing
mccrindlebrian Feb 23, 2024
d3bfcfb
add constants
mccrindlebrian Feb 23, 2024
38232b1
add default port/id
mccrindlebrian Feb 23, 2024
07a73f3
merge staging
mccrindlebrian Feb 28, 2024
8f12375
remove todo
mccrindlebrian Feb 28, 2024
d1255aa
depreciation on agent miners
mccrindlebrian Feb 28, 2024
e94efae
merge main
mccrindlebrian Feb 28, 2024
46f798e
add depreciation to reqs
mccrindlebrian Feb 28, 2024
51de23f
add log_status
mccrindlebrian Feb 28, 2024
39d8ad0
change version to 1.1.2
mccrindlebrian Feb 28, 2024
8957ff9
add docstrings
mccrindlebrian Feb 29, 2024
4f573fb
move exception clause
mccrindlebrian Feb 29, 2024
d31ec32
remove all agents from PR to separate from features/hf-miners
mccrindlebrian Feb 29, 2024
64d67e7
merge pre-staging, resolve conflicts
mccrindlebrian Feb 29, 2024
860ec69
black
mccrindlebrian Feb 29, 2024
f9863b7
test precommit hook
mccrindlebrian Mar 1, 2024
4b0d76f
add .DS_Store and test precommit hook
mccrindlebrian Mar 1, 2024
cb529ee
Merge branch 'pre-staging' into features/hf-miner
p-ferreira Mar 1, 2024
493e7e4
Update prompting/miners/agents/react_agent.py
p-ferreira Mar 1, 2024
7226fa1
Update prompting/miners/openai_miner.py
p-ferreira Mar 1, 2024
6d34528
update docstring
p-ferreira Mar 1, 2024
70ab932
adds deprecation tags to all agent classes
p-ferreira Mar 1, 2024
1ad9571
Manually remove .DS_Store files
steffencruz Mar 1, 2024
41fd140
Remove .DS_Store in all directories
steffencruz Mar 1, 2024
9186c8b
fix flake8 warnings
p-ferreira Mar 1, 2024
7719a48
drops unnecessary run code for base miner
p-ferreira Mar 1, 2024
10fc0ce
Update prompting/miners/openai_miner.py
p-ferreira Mar 1, 2024
acba249
deprecates tool miner
p-ferreira Mar 1, 2024
cb99e73
Merge pull request #91 from opentensor/features/hf-miner
steffencruz Mar 1, 2024
e12e2fc
merge pre-staging, resolve conflicts, delete stream_tutorial docs
mccrindlebrian Mar 2, 2024
10aae24
change test output type
mccrindlebrian Mar 2, 2024
e72b60a
fix breaking tests
mccrindlebrian Mar 4, 2024
e826bfa
fix streaming test with percentage comparison on timeout:
mccrindlebrian Mar 4, 2024
15e6c91
rework where process_time is calculated to minimize time inconsistancies
mccrindlebrian Mar 4, 2024
c13bfff
check if process_time >= timeout
mccrindlebrian Mar 4, 2024
3567259
samples uids only on candidate uids
mccrindlebrian Mar 5, 2024
e19229d
add additional test when k > number of available uids
mccrindlebrian Mar 5, 2024
d55e953
Update neuron.py
steffencruz Mar 6, 2024
01255d1
black entire repo
mccrindlebrian Mar 6, 2024
bf71611
Merge pull request #145 from opentensor/hotfix/black-pre-staging
steffencruz Mar 7, 2024
d44228c
black
mccrindlebrian Mar 7, 2024
c72b1e4
Merge pull request #143 from opentensor/hotfix/fix-set-weights
mccrindlebrian Mar 7, 2024
ab5f6b2
merge pre-staging
mccrindlebrian Mar 11, 2024
66c7dbd
remove tokenizer mapping for models that use different tokenizers tha…
mccrindlebrian Mar 11, 2024
acf3f0a
remove unfinished wording is docstring
mccrindlebrian Mar 11, 2024
bbfdcfe
remove debugging
mccrindlebrian Mar 12, 2024
d969897
add new readme for pre-staging experiments
mccrindlebrian Mar 12, 2024
e316e93
remove old docs
mccrindlebrian Mar 12, 2024
b847daa
add docs for streaming, and change output types of miners
mccrindlebrian Mar 12, 2024
fe6782c
fix typo
mccrindlebrian Mar 12, 2024
c7caf1a
fix client script
mccrindlebrian Mar 12, 2024
5f9c591
add more to readme
mccrindlebrian Mar 12, 2024
aaed470
format docs to stream_miner_template
mccrindlebrian Mar 12, 2024
8b0a3aa
Update prompting/mock.py
mccrindlebrian Mar 12, 2024
9885128
Update prompting/miners/hf_miner.py
mccrindlebrian Mar 12, 2024
deacce0
Update prompting/utils/config.py
steffencruz Mar 12, 2024
ef911db
Update prompting/utils/uids.py
steffencruz Mar 12, 2024
ad9e1ff
Update prompting/utils/uids.py
steffencruz Mar 12, 2024
29680ec
Update prompting/utils/uids.py
steffencruz Mar 12, 2024
6095539
Merge branch 'pre-staging' into features/change-uid-sampling
mccrindlebrian Mar 12, 2024
03c1df4
fix bug with brackets
mccrindlebrian Mar 12, 2024
9b490b4
remove partial kwargs
mccrindlebrian Mar 12, 2024
326a910
Merge pull request #142 from opentensor/features/change-uid-sampling
steffencruz Mar 12, 2024
f879b48
Merge branch 'pre-staging' into features/streaming
mccrindlebrian Mar 12, 2024
720ec70
Apply suggestions from code review
mccrindlebrian Mar 12, 2024
439fb15
remove depreciated miners
mccrindlebrian Mar 12, 2024
e6359ff
Merge branch 'features/streaming' of github.com:opentensor/prompting …
mccrindlebrian Mar 12, 2024
91330cf
Merge pull request #103 from opentensor/features/streaming
steffencruz Mar 12, 2024
d8cbc16
remove miner from init
mccrindlebrian Mar 12, 2024
3c17dc7
Merge pull request #155 from opentensor/hotfix/remove-tool-miner
mccrindlebrian Mar 12, 2024
de1f84c
add torch type to load_pipeline
mccrindlebrian Mar 12, 2024
7395643
add torch type to load_pipeline in miner
mccrindlebrian Mar 12, 2024
94b80c5
Merge pull request #156 from opentensor/hotfix/add-torch-type
mccrindlebrian Mar 12, 2024
aebea20
merge prestaging into vllm
mccrindlebrian Mar 14, 2024
51660a6
fix test_tasks
mccrindlebrian Mar 14, 2024
6e72729
fix broken mockpipeline
mccrindlebrian Mar 14, 2024
28d3984
add customstreamiterator and remove docs
mccrindlebrian Mar 14, 2024
cfa4014
remove deps in openai
mccrindlebrian Mar 14, 2024
3d9eaef
remove old miners
mccrindlebrian Mar 14, 2024
ef221ae
add return streamer back into load_hf_pipeline for mocking
mccrindlebrian Mar 14, 2024
438fd93
black
mccrindlebrian Mar 14, 2024
48c8363
add vllm section
mccrindlebrian Mar 14, 2024
1324eeb
merge pre-staging into vllm streaming
mccrindlebrian Mar 14, 2024
663ed07
adds asyncio semaphore
p-ferreira Mar 14, 2024
f53427a
change docs for axon port
mccrindlebrian Mar 16, 2024
5e3e4bc
Merge pull request #161 from opentensor/hotfix/readme
steffencruz Mar 16, 2024
cb79147
remove return streamer from vllm
mccrindlebrian Mar 18, 2024
a245e02
test code for semaphore
mccrindlebrian Mar 18, 2024
6462df8
refactors streaming miner with asyncio loop approach
p-ferreira Mar 20, 2024
ea37a31
adds custom simple_hf miner with pseudo streaming
p-ferreira Mar 21, 2024
96a155a
overall logging adjustments
p-ferreira Mar 21, 2024
e8a92dd
Merge branch 'main' into pre-staging
p-ferreira Mar 21, 2024
6036fae
Merge branch 'pre-staging' into features/thread_lock_stream
p-ferreira Mar 21, 2024
314544c
fix merge conflicts
p-ferreira Mar 21, 2024
a2404e6
minor updates
p-ferreira Mar 21, 2024
53b60c1
merge pre-staging into vllm streaming
mccrindlebrian Mar 21, 2024
76dc85f
Merge branch 'features/thread_lock_stream' into features/vllm-streaming
p-ferreira Mar 21, 2024
6f7280f
improves stream encapsulation
p-ferreira Mar 21, 2024
915593a
blacks the repo
p-ferreira Mar 21, 2024
7764676
Merge branch 'features/thread_lock_stream' into features/vllm-streaming
p-ferreira Mar 21, 2024
4b63675
Merge pull request #159 from opentensor/features/vllm-streaming
p-ferreira Mar 21, 2024
7af0793
pipeline adjustments
p-ferreira Mar 21, 2024
856675c
precommit-hook-test
p-ferreira Mar 21, 2024
cfed494
black
mccrindlebrian Mar 21, 2024
c8c47bb
Merge pull request #171 from opentensor/hotfix/black
p-ferreira Mar 21, 2024
0e9d53f
drops debug from repo
p-ferreira Mar 21, 2024
0a2084b
Merge pull request #170 from opentensor/features/thread_lock_stream
p-ferreira Mar 21, 2024
10cd731
merge main into prestaing
mccrindlebrian Mar 27, 2024
1c3de3d
adapts forward test to streaming + bugfix on mock metagraph
p-ferreira Mar 28, 2024
5372f27
remove unneeded parsing in protocol
mccrindlebrian Mar 28, 2024
e103c2b
Remove <|im_start|> with other roles
bkb2135 Apr 2, 2024
b6499f6
Update all_cleaners.py
bkb2135 Apr 2, 2024
40ff2f7
Update all_cleaners.py
bkb2135 Apr 2, 2024
c5fb529
Update all_cleaners.py
bkb2135 Apr 2, 2024
8e84844
Update all_cleaners.py
bkb2135 Apr 2, 2024
123070e
Merge pull request #184 from opentensor/bkb2135-patch-1
mccrindlebrian Apr 2, 2024
6331f30
quick dirty fix on forward
p-ferreira Apr 2, 2024
7b3d082
Filter CS Math Questions
bkb2135 Apr 3, 2024
8233c45
first attempt on parallelizing stream calls
p-ferreira Apr 3, 2024
601b6b7
Merge pull request #186 from opentensor/hotfix/reduce-cs-hallucinations
mccrindlebrian Apr 3, 2024
411b932
change timing logs from dendrite process_time to be timeout
mccrindlebrian Apr 3, 2024
718daae
Merge pull request #187 from opentensor/fix-dendrite-timing
mccrindlebrian Apr 3, 2024
5f463aa
brians refactoring on dendrite
p-ferreira Apr 3, 2024
f826d2b
parallelize streaming with results wrapper
p-ferreira Apr 3, 2024
778620e
Merge branch 'pre-staging' into from-main
p-ferreira Apr 3, 2024
fa29bd4
updates versioning
p-ferreira Apr 3, 2024
f022a0a
adds safe failure clause on forward
p-ferreira Apr 3, 2024
9551663
adapts unit test + fix typo
p-ferreira Apr 4, 2024
65354fe
add doc strings and typing
mccrindlebrian Apr 4, 2024
18f7496
adds timeout on forward pass
p-ferreira Apr 4, 2024
195d011
runs black on validator code
p-ferreira Apr 4, 2024
ddade62
Merge branch 'staging' into pre-staging
p-ferreira Apr 4, 2024
55c0f2f
Merge branch 'pre-staging' into from-main
p-ferreira Apr 4, 2024
802eaad
replace readme for streaming, simplify
mccrindlebrian Apr 4, 2024
83ad35b
fix variable missing from merge
p-ferreira Apr 4, 2024
c5277ed
Merge pull request #190 from opentensor/features/improve-readme-for-s…
mccrindlebrian Apr 4, 2024
ec10276
Merge pull request #182 from opentensor/from-main
mccrindlebrian Apr 4, 2024
0da2f4b
Merge pull request #136 from opentensor/pre-staging
p-ferreira Apr 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'main' into pre-staging
  • Loading branch information
p-ferreira committed Mar 21, 2024
commit e8a92dd77bba094f98490ae9c9b3ac5f4d6bd960
5 changes: 1 addition & 4 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python package
name: Prompting CI/CD

on:
push:
branches: [ "main", "staging", "pre-staging" ]
pull_request:
branches: [ "main", "staging", "pre-staging" ]

jobs:
build:
Expand Down
147 changes: 147 additions & 0 deletions docs/changelogs/1.2.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Release Notes for prompting Version 1.2.0
## Date
2023-03-19

## What's Changed

- Adds utility scripts to facilitate environment setup by @mccrindlebrian in new setup script for generic CPU/GPU ubuntu machines #139
- Refactors validator code to facilitate support on multiple llm pipelines by @p-ferreira in #138
- Implement vllm pipeline by @p-ferreira in #138
- Adds [install.sh](../../install.sh) script to facilitate installation by @p-ferreira in #158

## TL;DR

### Results
From a practical standpoint, integrating VLLM technology leads to a significant boost in efficiency, ranging from 8.85% to 14.17% for lower-tier devices such as A10, A40, and A6000. For mid-tier devices, specifically the A100 model, the efficiency improvement is even more pronounced, with gains between 54.88% and 57.33%. This enhancement allows our validators to operate up to twice as fast when using VLLM, thereby significantly increasing the network's generation throughput.

![Main comparison](imgs/1.2.0_plots_no_gpu_rest.png)


## Known Issues and Solutions

The [VLLM framework](https://github.com/vllm-project/vllm) has rapidly gained popularity since its inception. As of March 14, 2024, despite being barely only a year old, the project has 859 open issues on GitHub. This indicates that VLLM is actively growing and has not yet reached its peak maturity.

Users of VLLM may encounter specific challenges, including dependency conflicts with other libraries such as bittensor and prompting. To mitigate these issues, **it is highly recommended to set up a new Python environment for installing the prompting v1.2.0**. This approach serves as a temporary solution to the current challenges. Should you experience any unexpected behavior with your validator following this update, **please attempt to recreate your environment before exploring the known issues listed below or seeking assistance on Discord.**

- **Issue 1 - Dependency conflicts**:

There are conflicting dependencies between VLLM and other required libraries:
```bash
The conflict is caused by:
prompting 1.1.2 depends on transformers==4.36.2 vllm 0.3.0 depends on transformers>=4.37.0
bittensor 6.6.0 depends on pydantic!=1.8, !=1.8.1, <2.0.0 and >=1.7.4 vllm 0.3.0 depends on pydantic>=2.0
```

The AnglE Embedding model utilized in our reward stack relies on `transformers==4.36.2`, which is incompatible with VLLM's requirement for `transformers>=4.37.0`. This specific conflict does not directly impact VLLM's functionality for our use case.

**Solution**:
- Follow the [README](../../README.md#installation) instructions to recreate your environment.
- Execute the following commands within your new Python environment:
```bash
pip install pydantic==1.10.7 transformers==4.36.2
```

- **Issue 2: Runtime Error due to Absence of Current Event Loop**

VLLM's dependencies include [uvloop](https://github.com/MagicStack/uvloop), which directly conflicts with bittensor.

**Solution**:

- Recreate your environment as per the [README](../../README.md#installation) guidelines.
- Within your new Python environment, run:

```python
pip uninstall uvloop
```


- **Issue 3: Manually Specifying GPU Devices for VLLM**

VLLM currently requires manual configuration of the **`CUDA_VISIBLE_DEVICES`** environment variable to specify GPU devices:

```bash
export CUDA_VISIBLE_DEVICES=1,2
python neurons/validator.py ...
```

Relevant discussions and issues:

- [How to specify which GPU to use? vllm-project/vllm#691](https://github.com/vllm-project/vllm/discussions/691)
- [Unable to specify GPU usage in VLLM code vllm-project/vllm#3012](https://github.com/vllm-project/vllm/issues/3012)
- [Specifying GPU for model inference vllm-project/vllm#352](https://github.com/vllm-project/vllm/issues/352)



## Experiment Overview: Benchmarking vLLM vs. Hugging Face Pipelines:
This experiment was designed to evaluate the performance differences between the vLLM and Hugging Face (HF) pipelines across various hardware configurations. Utilizing the latest vLLM pipeline implementation found at opentensor/prompting (vLLM-test branch), our objective was to conduct a comprehensive timing benchmark.

**Dataset:**
For this comparison, we used a set of 100 random samples of QA challenges sourced from wandb. Each Language Model (LLM) processed these samples using their respective pipelines to assess their performance.

**Machines Used for Testing:**
The benchmark tests were conducted on a range of machines to capture performance across different hardware capabilities:
- Runpod A40
- Runpod A6000
- Runpod A100 SXM4 80GB
- Lambda A10
- Lambda A100 SXM4 40GB

**Testing Procedure:**
The benchmark executed the same set of QA challenges using both the Hugging Face pipeline (as currently implemented in the validator code) and the proposed vLLM validator pipeline. We aimed to compare the timing and GPU resource usage of each pipeline across the specified machines.

**GPU Footprint Analysis:**
A overall examination of the GPU footprint for each model is listed on the table below, highlighting the resource efficiency of each pipeline.:
| Machine | total_available_memory | zephyr vllm | zephyr vllm validator pipeline with 24GB gpu limitation | zephyr 🤗 hf | zephyr 🤗 hf validator pipeline |
| --- | --- | --- | --- | --- | --- |
| Lambda A10 | 23GB | 16.25GB | 15.35GB | OOM | 15GB |
| Lambda A100 | 40.9GB | 34.58GB | 22.31GB | 31.35GB | 15.9GB |
| RunPod A40 | 46GB | 36.89 | 14.99GB | 31.4GB | 15.3GB |
| RunPod A100 | 81.92GB | 69GB | 15.45GB | 31GB | 15.3GB |
| RunPod A6000 | 49.14GB | 39.74GB | 15.41GB | 39.78GB | 15.8GB |


**Key Differences in Pipeline Configurations:**

A critical distinction between the **Zephyr 🤗 HF** and the **Zephyr 🤗 HF Validator Pipeline** is the latter's optimization for model inference and GPU usage. This optimization is achieved by leveraging **`torch_dtype_float16`**, significantly reducing the memory footprint. Notably, without this optimization, the 🤗 Zephyr model encounters Out of Memory (OOM) exceptions on the A10 hardware.

This experiment aimed to furnish a clear, comparative analysis of the vLLM and HF pipelines, with a particular focus on timing performance and GPU efficiency across a variety of computing environments.

## Results
The following analysis presents the impact of different operational configurations on the timing performance of two scenarios: with and without GPU and torch optimizations. Specifically, we compare the efficiency of the vLLM and Hugging Face (🤗 HF) pipelines under these conditions:

- **No GPU/Torch Restrictions**: This scenario involves the minimal setup configuration, where there is no constraint on **`gpu_utilization`** for vLLM and no utilization of **`torch_dtype_float16`** for 🤗 HF.
- **With GPU/Torch Restrictions**: This setup incorporates pipeline code adjustments that restrict **`gpu_utilization`** to 24GB for vLLM and employ **`torch_dtype_float16`** for 🤗 HF, mirroring the current validator implementation.

**No GPU / torch restrictions:**
![No_GPU_restriction_plot](imgs/1.2.0_plots_no_gpu_rest.png)

| Machine | vllm_avg_in_secs | hf_avg_in_secs | % vllm_efficiency_gain |
| --- | --- | --- | --- |
| Lambda A10 | 8.41 | NaN | NaN |
| Lambda A100 | 3.29 | 6.94 | 52.59% |
| RunPod A100 | 2.95 | 6.67 | 55.77% |
| RunPod A40 | 6.75 | 13.05 | 48.28% |
| RunPod A6000 | 9.73 | 14.53 | 33.04% |

**With GPU / torch restrictions:**
![GPU_restriction_plot](imgs/1.2.0_plots_gpu_rest.png)

| machine | vllm_with_restrictions_avg_in_secs | hf_with_restrictions_avg_in_secs | % vllm_efficiency_gain |
| --- | --- | --- | --- |
| Lambda A10 | 8.34 | 9.15 | 8.85% |
| Lambda A100 | 3.28 | 7.27 | 54.88% |
| RunPod A100 | 2.94 | 6.89 | 57.33% |
| RunPod A40 | 6.75 | 7.83 | 13.79% |
| RunPod A6000 | 9.75 | 11.36 | 14.17% |

Our findings reveal that vLLM's performance remains relatively stable, regardless of GPU restrictions, which stands in contrast to the HF pipeline. The latter demonstrates notable timing performance improvements across several hardware configurations (A40, A100s, A6000) when **`torch_dtype_float16`** is employed .

From a production standpoint, particularly when vLLM is configured with GPU restrictions aimed at maximizing usage within 20 and 24GB limits, we observed a significant variation in efficiency gains, as follows:

- **Low-End GPUs with Slow Inference (A10):** We recorded an efficiency gain of 8.85%, highlighting modest improvements in this configuration.
- **Mid-Range GPUs with Slow Inference (A40, A6000):** Efficiency gains in this category ranged from 13.79% to 14.17%, indicating noticeable enhancements in performance.
- **High-End/Mid-Range GPUs with Fast Inference (A100s):** The most substantial efficiency improvements were observed here, with gains ranging from 54.88% to 57.33%.

These results underscore the differential impacts of GPU and torch optimizations on the performance of vLLM and 🤗 HF pipelines, with significant implications for resource allocation and efficiency in production environments.


Binary file added docs/changelogs/imgs/1.2.0_plots_gpu_rest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/changelogs/imgs/1.2.0_plots_no_gpu_rest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Installing package from the current directory
pip install -e .

# Uninstalling mathgenerator
pip uninstall mathgenerator -y

# Reinstalling requirements to ensure mathgenerator is installed appropriately
pip install -r requirements.txt

# Uninstalling uvloop to prevent conflicts with bittensor
pip uninstall uvloop -y

# Reinstalling pydantic and transformers with specific versions that work with our repository and vllm
pip install pydantic==1.10.7 transformers==4.36.2
1 change: 1 addition & 0 deletions neurons/miners/test/echo.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from prompting.miners import EchoMiner



# This is the main function, which runs the miner.
if __name__ == "__main__":
with EchoMiner() as miner:
Expand Down
1 change: 1 addition & 0 deletions neurons/miners/test/mock.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from prompting.miners import MockMiner



# This is the main function, which runs the miner.
if __name__ == "__main__":
with MockMiner() as miner:
Expand Down
1 change: 1 addition & 0 deletions neurons/miners/test/phrase.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from prompting.miners import PhraseMiner



# This is the main function, which runs the miner.
if __name__ == "__main__":
with PhraseMiner() as miner:
Expand Down
12 changes: 4 additions & 8 deletions neurons/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,11 @@
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.


import time
import torch
import bittensor as bt

from prompting.forward import forward
from prompting.llm import load_pipeline
from prompting.llms import HuggingFacePipeline, vLLMPipeline
from prompting.base.validator import BaseValidatorNeuron
from prompting.rewards import RewardPipeline

Expand All @@ -35,11 +32,10 @@ def __init__(self, config=None):
super(Validator, self).__init__(config=config)

bt.logging.info("load_state()")
self.load_state()

self.llm_pipeline = load_pipeline(
self.load_state()
self.llm_pipeline = vLLMPipeline(
model_id=self.config.neuron.model_id,
torch_dtype=torch.bfloat16,
device=self.device,
mock=self.config.mock,
return_streamer=False,
Expand Down
4 changes: 2 additions & 2 deletions prompting/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# DEALINGS IN THE SOFTWARE.

# Define the version of the template module.
__version__ = "1.1.2"
__version__ = "1.2.0"
version_split = __version__.split(".")
__spec_version__ = (
(10000 * int(version_split[0]))
Expand All @@ -36,4 +36,4 @@
from . import agent
from . import conversation
from . import dendrite
from . import llm
from .llms import hf
4 changes: 2 additions & 2 deletions prompting/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@
import bittensor as bt
from dataclasses import asdict
from prompting.tasks import Task
from prompting.llm import HuggingFaceLLM
from prompting.llms import HuggingFaceLLM, vLLM_LLM
from prompting.cleaners.cleaner import CleanerPipeline

from prompting.persona import Persona, create_persona

from transformers import Pipeline


class HumanAgent(HuggingFaceLLM):
class HumanAgent(vLLM_LLM):
"Agent that impersonates a human user and makes queries based on its goal."

@property
Expand Down
3 changes: 1 addition & 2 deletions prompting/base/neuron.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,7 @@ def forward(self, synapse: bt.Synapse) -> bt.Synapse:
...

@abstractmethod
def run(self):
...
def run(self): ...

def sync(self):
"""
Expand Down
1 change: 1 addition & 0 deletions prompting/cleaners/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .cleaner import CleanerPipeline
4 changes: 1 addition & 3 deletions prompting/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,7 @@

def create_task(llm_pipeline: Pipeline, task_name: str) -> Task:
wiki_based_tasks = ["summarization", "qa"]
coding_based_tasks = ["debugging"]
# TODO Add math and date_qa to this structure

coding_based_tasks = ["debugging"]
# TODO: Abstract dataset classes into common dynamic interface
if task_name in wiki_based_tasks:
dataset = WikiDataset()
Expand Down
3 changes: 3 additions & 0 deletions prompting/llms/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .base_llm import BasePipeline, BaseLLM
from .hf import HuggingFacePipeline, HuggingFaceLLM
from .vllm_llm import vLLM_LLM, vLLMPipeline, load_vllm_pipeline
47 changes: 47 additions & 0 deletions prompting/llms/base_llm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import bittensor as bt
from abc import ABC, abstractmethod
from prompting.cleaners.cleaner import CleanerPipeline
from typing import Any, Dict, List


class BasePipeline(ABC):
@abstractmethod
def __call__(self, composed_prompt: str, **kwargs: dict) -> Any:
...


class BaseLLM(ABC):
def __init__(
self,
llm_pipeline: BasePipeline,
system_prompt: str,
model_kwargs: dict,
):
self.llm_pipeline = llm_pipeline
self.system_prompt = system_prompt
self.model_kwargs = model_kwargs
self.messages = []
self.times = []

def query(
self,
message: str,
role: str = "user",
disregard_system_prompt: bool = False,
cleaner: CleanerPipeline = None,
) -> str:
...

def forward(self, messages: List[Dict[str, str]]):
...

def clean_response(self, cleaner: CleanerPipeline, response: str) -> str:
if cleaner is not None:
clean_response = cleaner.apply(generation=response)
if clean_response != response:
bt.logging.debug(
f"Response cleaned, chars removed: {len(response) - len(clean_response)}..."
)

return clean_response
return response
Loading
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.