[Programming question]: #238

haesleinhuepf · 2025-02-19T13:08:26Z

I need a Jupyter notebook which does the following things to determine copyright of used materials in a slide-deck saved as PDF.

Opens a PDF file and extracts text from the pages page-by-page and stores them in a list of urls for each slide.
Goes through the pages and corresponding URLs, loads the websites and extracts the text from the pages.
For each page, the text is sent to a prompt(messag:str)->str function. The goal of using this LLM-based prompt function is to identify the license of the page.
Returns a list of licenses per page.

git-bob try to do this

The text was updated successfully, but these errors were encountered:

github-actions · 2025-02-19T13:12:34Z

^{This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

I created a Jupyter notebook that analyzes PDF files to determine copyright information of materials used in slide decks. The notebook uses various Python libraries to extract text from PDFs and associated URLs, and then processes this information through an LLM-based prompt function to identify licenses for each page. I also generated a sample PDF file for testing and created a CSV file to store the license analysis results. Additionally, I updated the notebook's dependencies to include the fpdf2 library for improved PDF handling.

sample.pdf Created a sample PDF file for testing the license analysis functionality.
license_analysis.csv Generated a CSV file containing the license analysis results for each page.
pdf_license_analyzer.ipynb Created a Jupyter notebook that extracts and analyzes license information from PDF files and their associated URLs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Programming question]: #238

[Programming question]: #238

haesleinhuepf commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

[Programming question]: #238

[Programming question]: #238

Comments

haesleinhuepf commented Feb 19, 2025

github-actions bot commented Feb 19, 2025