You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: vignettes/structured-data.Rmd
+10-11
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ Now we'll dive into some examples before coming back to talk more about the deta
107
107
108
108
## Examples
109
109
110
-
The following examplesare [closely inspired by the Claude documentation](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb) and hint at some of the ways you can use structured data extraction.
110
+
The following examples, which are [closely inspired by the Claude documentation](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb), hint at some of the ways you can use structured data extraction.
111
111
112
112
### Example 1: Article summarisation
113
113
@@ -177,7 +177,7 @@ chat <- chat_openai()
177
177
str(chat$extract_data(text, type = type_sentiment))
178
178
```
179
179
180
-
Note that we've asked nicely for the scores to sum 1, and they do in this example (at least when I ran the code), but it's not guaranteed.
180
+
Note that while we've asked nicely for the scores to sum 1, which they do in this example (at least when I ran the code), this is not guaranteed.
181
181
182
182
### Example 4: Text classification
183
183
@@ -231,14 +231,13 @@ chat <- chat_claude()
231
231
str(chat$extract_data(prompt, type = type_characteristics))
232
232
```
233
233
234
-
This examples only works with Claude, not GPT or Gemini, because only Claude
235
-
supports adding arbitrary additional properties.
234
+
This example only works with Claude, not GPT or Gemini, because only Claude supports adding additional, arbitrary properties.
236
235
237
236
### Example 6: Extracting data from an image
238
237
239
-
This example comes from [Dan Nguyen](https://gist.github.com/dannguyen/faaa56cebf30ad51108a9fe4f8db36d8)and you can see other interesting applications at that link. The goal is to extract structured data from this screenshot:
238
+
The final example comes from [Dan Nguyen](https://gist.github.com/dannguyen/faaa56cebf30ad51108a9fe4f8db36d8)(you can see other interesting applications at that link). The goal is to extract structured data from this screenshot:
240
239
241
-

240
+

242
241
243
242
Even without any descriptions, ChatGPT does pretty well:
244
243
@@ -298,9 +297,9 @@ chat$extract_data(prompt, type = type_article)
298
297
str(data)
299
298
```
300
299
301
-
Note that I've used more of an explict prompt here. For this example, I found that this generated better results, and it's a useful place to put additional instructions.
300
+
Note that I've used more of an explict prompt here. For this example, I found that this generated better results and that it's a useful place to put additional instructions.
302
301
303
-
If let the LLM know that the fields are all optional, it'll instead return `NULL` for the missing fields:
302
+
If I let the LLM know that the fields are all optional, it'll return `NULL` for the missing fields:
304
303
305
304
```{r}
306
305
#| label: type-optional
@@ -315,7 +314,7 @@ chat$extract_data(prompt, type = type_article)
315
314
316
315
### Data frames
317
316
318
-
If you want to define a data frame like object, you might be tempted to create a definition similar to what R uses: an object (i.e. a named list) containing multiple vectors (i.e. arrays):
317
+
If you want to define a data frame like object, you might be tempted to create a definition similar to what R uses: an object (i.e., a named list) containing multiple vectors (i.e., an array):
319
318
320
319
```{r}
321
320
#| cache: false
@@ -327,7 +326,7 @@ type_my_df <- type_object(
327
326
)
328
327
```
329
328
330
-
This however, is not quite right becuase there's no way to specify that each array should have the same length. Instead you need to turn the data structure "inside out", and instead create an array of objects:
329
+
This, however, is not quite right becuase there's no way to specify that each array should have the same length. Instead, you'll need to turn the data structure "inside out" and create an array of objects:
331
330
332
331
```{r}
333
332
#| cache: false
@@ -341,7 +340,7 @@ type_my_df <- type_array(
341
340
)
342
341
```
343
342
344
-
If you're familiar with the terms between row-oriented and column-oriented data frames, this is the same idea. Since most language don't possess vectorisation like R, row-oriented structures tend to be much more common in the wild.
343
+
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess vectorisation like R, row-oriented structures tend to be much more common in the wild.
0 commit comments