Update embed-jobs-api.mdx (#231)

* Update embed-jobs-api.mdx Update code examples for text-embedding part Signed-off-by: Max Shkutnyk <[email protected]> * update code examples * fix code example allignment --------- Signed-off-by: Max Shkutnyk <[email protected]> Co-authored-by: Max Shkutnyk <[email protected]> Co-authored-by: trentfowlercohere <[email protected]>
cohere-ai · Nov 1, 2024 · 11660ac · 11660ac
1 parent a8cdf86
commit 11660ac
Showing 1 changed file with 10 additions and 11 deletions.
diff --git a/fern/pages/v2/text-embeddings/embed-jobs-api.mdx b/fern/pages/v2/text-embeddings/embed-jobs-api.mdx
@@ -29,7 +29,7 @@ The Embed Jobs API works in conjunction with the Embed API; in production use-ca
 ![](../../../assets/images/0826a69-image.png)
 ### Constructing a Dataset for Embed Jobs
 
-To create a dataset for Embed Jobs, you will need to specify the `embedding_types`, and you need to set `dataset_type` as `embed-input`. The schema of the file looks like: `text:string`.
+To create a dataset for Embed Jobs, you will need to set dataset `type` as `embed-input`. The schema of the file looks like: `text:string`.
 
 The Embed Jobs and Dataset APIs respect metadata through two fields: `keep_fields`, `optional_fields`. During the `create dataset` step, you can specify either `keep_fields` or `optional_fields`, which are a list of strings corresponding to the field of the metadata you’d like to preserve. `keep_fields` is more restrictive, since validation will fail if the field is missing from an entry. However, `optional_fields`, will skip empty fields and allow validation to pass.
 
@@ -66,10 +66,9 @@ ds=co.datasets.create(
 	name='sample_file',
 	# insert your file path here - you can upload it on the right - we accept .csv and jsonl files
 	data=open('embed_jobs_sample_data.jsonl', 'rb'),
-	keep_fields=['wiki_id','url','views','title']
-	optional_fields=['langs']
-	dataset_type="embed-input",
-  embedding_types=['float']
+	keep_fields=['wiki_id','url','views','title'],
+	optional_fields=['langs'],
+	type="embed-input"
 	)
 
 # wait for the dataset to finish validation
@@ -89,7 +88,7 @@ co = cohere.ClientV2(api_key="<YOUR API KEY>")
 input_dataset=co.datasets.create(
 	name='your_file_name',
 	data=open('/content/your_file_path', 'rb'),
-	dataset_type="embed-input"
+	type="embed-input"
 	)
 
 # block on server-side validation
@@ -115,15 +114,15 @@ If your dataset hits a validation error, please refer to the dataset validation
 Your dataset is now ready to be embedded. Here's a code snippet illustrating what that looks like:
 
 ```python PYTHON
-embed_job = co.embed_jobs.create(
+embed_job_response = co.embed_jobs.create(
 	dataset_id=input_dataset.id,
 	input_type='search_document' ,
 	model='embed-english-v3.0',
-  embedding_types=['float'],
+    embedding_types=['float'],
 	truncate='END')
 
 # block until the job is complete
-co.wait(embed_job)
+embed_job = co.wait(embed_job_response)
 ```
 
 Since we’d like to search over these embeddings and we can think of them as constituting our knowledge base, we set `input_type='search_document'`.
@@ -133,14 +132,14 @@ Since we’d like to search over these embeddings and we can think of them as co
 The output of embed jobs is a dataset object which you can download or pipe directly to a database of your choice:
 
 ```python PYTHON
-output_dataset=co.datasets.get(id=embed_job.output.id)
+output_dataset=co.datasets.get(id=embed_job.output_dataset_id)
 co.utils.save(filepath='/content/embed_job_output.csv', format="csv")
 ```
 
 Alternatively if you would like to pass the dataset into a downstream function you can do the following:
 
 ```python PYTHON
-output_dataset=co.datasets.get(id=embed_job.output.id)
+output_dataset=co.datasets.get(id=embed_job.output_dataset_id)
 results=[]
 for record in output_dataset:
 	results.append(record)