Skip to content

Commit

Permalink
filter out null content before embedding (#498)
Browse files Browse the repository at this point in the history
  • Loading branch information
edknv authored Feb 28, 2025
1 parent b4e389e commit 31e1907
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/nv_ingest/stages/embeddings/text_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ def _generate_text_embeddings_df(

# Extract content from metadata and filter out rows with empty content.
extracted_content = df.loc[content_mask, "metadata"].apply(content_getter)
non_empty_mask = extracted_content.str.strip() != ""
non_empty_mask = extracted_content.notna() & (extracted_content.str.strip() != "")
final_mask = content_mask & non_empty_mask
if not final_mask.any():
continue
Expand Down

0 comments on commit 31e1907

Please sign in to comment.