You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Create a readme for docs, add script to convert to webp
* create webp versions of the files
* update conversion script and readme
* Add webnav figures
* Add lazy loading and alts for images
* Update theme changing button to have a name
* Make webnav.demo smaller
* embed fonts
<imgloading="lazy"alt="Example of ai tasks"src="{{ '/assets/images/examples/ai.1.oyuiubm.webp' | relative_url }}"width="24%"height="auto">
55
+
<imgloading="lazy"alt="Example of booking tasks"src="{{ '/assets/images/examples/booking.1.pxtuocd.webp' | relative_url }}"width="24%"height="auto">
56
+
<imgloading="lazy"alt="Example of composing tasks"src="{{ '/assets/images/examples/composing.1.tbtnzql.webp' | relative_url }}"width="24%"height="auto">
57
+
<imgloading="lazy"alt="Example of lookup tasks"src="{{ '/assets/images/examples/lookup.1.zbrxcee.webp' | relative_url }}"width="24%"height="auto">
58
+
<imgloading="lazy"alt="Example of productivity tasks"src="{{ '/assets/images/examples/productivity.1.ytcgitj.webp' | relative_url }}"width="24%"height="auto">
59
+
<imgloading="lazy"alt="Example of shopping tasks"src="{{ '/assets/images/examples/shopping.1.wbamufj.webp' | relative_url }}"width="24%"height="auto">
60
+
<imgloading="lazy"alt="Example of social tasks"src="{{ '/assets/images/examples/social.1.xmrqcyz.webp' | relative_url }}"width="24%"height="auto">
61
+
<imgloading="lazy"alt="Example of summarizing tasks"src="{{ '/assets/images/examples/summarizing.1.bctdmtt.webp' | relative_url }}"width="24%"height="auto">
62
62
63
-
###What is *conversational web navigation*?
63
+
## What is *conversational web navigation*?
64
64
65
65
We propose the problem of *conversational web navigation*, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion. To accomplish this, agents can learn from expert demonstrations, as shown below:
66
66
@@ -71,7 +71,7 @@ We propose the problem of *conversational web navigation*, where a digital agent
@@ -83,7 +83,7 @@ Here, there should be a 3-column layout with the following content:
83
83
2. The action and conversation history (preferably with nice text formatting)
84
84
3. The screenshot -->
85
85
86
-
###Can we download WebLINX now?
86
+
## Can we download WebLINX now?
87
87
88
88
__[You can find our dataset on Huggingface Datasets](https://huggingface.co/datasets/McGill-NLP/weblinx)__
89
89
@@ -115,7 +115,7 @@ We provide the WebLINX Explorer, a tool to explore the dataset and see the inter
115
115
Your browser does not support the video tag.
116
116
</video>
117
117
118
-
###What if I want to download the raw data (HTML, screenshots, etc.)?
118
+
## What if I want to download the raw data (HTML, screenshots, etc.)?
119
119
120
120
If you are interested in the full data, the easiest way to download the raw dataset is the use the `huggingface_hub` library with `snapshot_download`. We show you how in the [doc's prerequisite section]({{'/docs/#prerequisites' | relative_url }}).
121
121
@@ -130,7 +130,7 @@ pip install weblinx
130
130
Please take a look at the [library documentation]({{'/docs/' | relative_url }}) for more information on how to use it.
131
131
132
132
133
-
###How can we use WebLINX to train agents?
133
+
## How can we use WebLINX to train agents?
134
134
135
135
Our agent is composed of two main components: a __Dense Markup Ranker (DMR)__ and an __action model__.
136
136
@@ -143,17 +143,17 @@ We experiment with 19 action models, ranging from smaller models (Flan-T5-MindAc
143
143
<!-- There should be a card of 5 models here (MindAct, Pix2Act, Fuyu-8B, LLaMA-13B, GPT-4V) with links to the original papers of those models. -->
144
144
145
145
146
-
###Where can we find the finetuned models?
146
+
## Where can we find the finetuned models?
147
147
148
148
We provide the weights for the models we finetuned. You can [access them on Huggingface Hub](https://huggingface.co/collections/McGill-NLP/weblinx-models-65c57d4afeeb282d1dcf8434). We will share [code to reproduce our experiments on our GitHub repository](https://github.com/mcgill-nlp/weblinx). Please note that they were finetuned for research purposes (so they are not ready for production).
149
149
150
-
###How do we use the agent to control browsers?
150
+
## How do we use the agent to control browsers?
151
151
152
152
Our `weblinx` library lets you convert the HTML into a format that can be received by DMR or by an action model, and `weblinx` can also parse valid model outputs into a dictionary that can be converted to browser commands.
153
153
154
154
You will need Selenium or Pupeteer to control the browser (take screenshot, grab HTML, insert unique IDs, execute action from dictionary); you can [learn selenium here](https://www.selenium.dev/documentation/webdriver/getting_started/).
155
155
156
-
###How do we cite WebLINX?
156
+
## How do we cite WebLINX?
157
157
158
158
If you use our dataset, code, or models, please use the following `bibtex` citation entry:
0 commit comments