You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* change type hint to be compatible with older pythons
* update a few instructions, including flash attention
* update title
* ignore more
* Remove reference to webtasks, make uid_key generic (instead of hardcoding "data-webtasks-id"
* ignore checkpoints
* Fix reference to candidates.path
* Fix naming in splits.json data file
* improve command line for weblinx.eval CLI
* Update modeling instructions
* Update main readme with examples
* Final readme update
Copy file name to clipboardexpand all lines: README.md
+3-1
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,9 @@ To run the automatic evaluation, you can use the following command:
75
75
python -m weblinx.eval --help
76
76
```
77
77
78
-
Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.
78
+
For more examples on how to use `weblinx.eval`, take a look at the [modeling README](./modeling/README.md).
79
+
80
+
> Note: We are still working on the code for `weblinx.eval` and `weblinx.processing.outputs`. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.
The following instructions assume you are running from this directory (you may need to `cd` to this directory).
2
2
3
-
### Download Candidates
3
+
### Download Data
4
4
5
-
First, you need to download the `train.jsonl` candidate selected by `McGill-NLP/MiniLM-L6-DMR`:
5
+
First, you need to download the `splits.json` file containing information about all the splits, as well as the `train.jsonl` candidate selected by `McGill-NLP/MiniLM-L6-DMR`:
The default configs (`config.yml`) assume that the `train.jsonl` is located at `./candidates/train.jsonl`. If you want to change the path, you need to modify the `config.yml` accordingly.
32
+
The default configs (`llama/conf/config.yml`) assume that the `train.jsonl` is located at `./wl_data/candidates/train.jsonl`. If you want to change the path, you need to modify the `config.yml` accordingly.
40
33
41
34
### Set `WEBLINX_PROJECT_DIR`
42
35
@@ -57,7 +50,54 @@ You need to install the dependencies by running the following command:
57
50
pip install -r requirements.txt
58
51
```
59
52
60
-
### Action Model: LLaMA
53
+
However, due to `flash-attention` requiring `torch` to be pre-installed, it has to be install right after everything else has been installed:
Results will be saved in `./results` and checkpoints in `./checkpoints`.
119
-
120
-
#### Evaluate DMR
121
-
122
-
You need to specify which `eval.split` you want to evaluate on. For example, to evaluate on the `iid` split, you can run the following command:
123
-
124
-
```bash
125
-
export CUDA_VISIBLE_DEVICES="0"# Set the GPU device you want to use
126
-
127
-
# On just one
128
-
python -m dmr.eval eval.split=valid
149
+
In this case, `-b` is the base directory for the demonstrations, and `-d` is the directory containing the results (generated above by the `llama.eval` script). This will automatically run the evaluation metrics and save the results in the `results/aggregated_scores.json` directory. If you are only interested in the overall score for a split (e.g. `valid`), you can find look for the following entry in the aggregated score file (as an example):
Behind the scene, this will use the `weblinx.eval.auto_eval_and_save` function to run the evaluation metrics. If you want more control, you can also use that `weblinx.eval.auto_eval_and_save` function directly if you prefer; for an example, check out `weblinx/eval/__main__.py`.
Note that it might be slow the first time you run, because it reads a lot of demonstrations and load millions of files. However, a demo-level cache is automatically created (see `./.cache/demonstrations`), so the next time you run it, it should be much faster.
0 commit comments