Skip to content

Commit a7c4777

Browse files
authored
copyedit vignette Structured data: Examples to end (#374)
1 parent 9fb73c9 commit a7c4777

File tree

1 file changed

+10
-11
lines changed

1 file changed

+10
-11
lines changed

vignettes/structured-data.Rmd

+10-11
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ Now we'll dive into some examples before coming back to talk more about the deta
107107

108108
## Examples
109109

110-
The following examples are [closely inspired by the Claude documentation](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb) and hint at some of the ways you can use structured data extraction.
110+
The following examples, which are [closely inspired by the Claude documentation](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb), hint at some of the ways you can use structured data extraction.
111111

112112
### Example 1: Article summarisation
113113

@@ -177,7 +177,7 @@ chat <- chat_openai()
177177
str(chat$extract_data(text, type = type_sentiment))
178178
```
179179

180-
Note that we've asked nicely for the scores to sum 1, and they do in this example (at least when I ran the code), but it's not guaranteed.
180+
Note that while we've asked nicely for the scores to sum 1, which they do in this example (at least when I ran the code), this is not guaranteed.
181181

182182
### Example 4: Text classification
183183

@@ -231,14 +231,13 @@ chat <- chat_claude()
231231
str(chat$extract_data(prompt, type = type_characteristics))
232232
```
233233

234-
This examples only works with Claude, not GPT or Gemini, because only Claude
235-
supports adding arbitrary additional properties.
234+
This example only works with Claude, not GPT or Gemini, because only Claude supports adding additional, arbitrary properties.
236235

237236
### Example 6: Extracting data from an image
238237

239-
This example comes from [Dan Nguyen](https://gist.github.com/dannguyen/faaa56cebf30ad51108a9fe4f8db36d8) and you can see other interesting applications at that link. The goal is to extract structured data from this screenshot:
238+
The final example comes from [Dan Nguyen](https://gist.github.com/dannguyen/faaa56cebf30ad51108a9fe4f8db36d8) (you can see other interesting applications at that link). The goal is to extract structured data from this screenshot:
240239

241-
![A screenshot of schedule A: a table showing assets and "unearned" income](congressional-assets.png)
240+
![Screenshot of schedule A: a table showing assets and "unearned" income](congressional-assets.png)
242241

243242
Even without any descriptions, ChatGPT does pretty well:
244243

@@ -298,9 +297,9 @@ chat$extract_data(prompt, type = type_article)
298297
str(data)
299298
```
300299

301-
Note that I've used more of an explict prompt here. For this example, I found that this generated better results, and it's a useful place to put additional instructions.
300+
Note that I've used more of an explict prompt here. For this example, I found that this generated better results and that it's a useful place to put additional instructions.
302301

303-
If let the LLM know that the fields are all optional, it'll instead return `NULL` for the missing fields:
302+
If I let the LLM know that the fields are all optional, it'll return `NULL` for the missing fields:
304303

305304
```{r}
306305
#| label: type-optional
@@ -315,7 +314,7 @@ chat$extract_data(prompt, type = type_article)
315314

316315
### Data frames
317316

318-
If you want to define a data frame like object, you might be tempted to create a definition similar to what R uses: an object (i.e. a named list) containing multiple vectors (i.e. arrays):
317+
If you want to define a data frame like object, you might be tempted to create a definition similar to what R uses: an object (i.e., a named list) containing multiple vectors (i.e., an array):
319318

320319
```{r}
321320
#| cache: false
@@ -327,7 +326,7 @@ type_my_df <- type_object(
327326
)
328327
```
329328

330-
This however, is not quite right becuase there's no way to specify that each array should have the same length. Instead you need to turn the data structure "inside out", and instead create an array of objects:
329+
This, however, is not quite right becuase there's no way to specify that each array should have the same length. Instead, you'll need to turn the data structure "inside out" and create an array of objects:
331330

332331
```{r}
333332
#| cache: false
@@ -341,7 +340,7 @@ type_my_df <- type_array(
341340
)
342341
```
343342

344-
If you're familiar with the terms between row-oriented and column-oriented data frames, this is the same idea. Since most language don't possess vectorisation like R, row-oriented structures tend to be much more common in the wild.
343+
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess vectorisation like R, row-oriented structures tend to be much more common in the wild.
345344

346345
## Token usage
347346

0 commit comments

Comments
 (0)