JustTheInstruction Chrome Extension

💡 Project Description

Isn't it the worst when you find a recipe online, but need to scroll down for what feels like hours to find the actual instructions? We too have been bothered by this issue and began to think of a solution. What if we could use AI to extract JustTheInstructions from any recipe, arts & crafts, or other DIY website?

Well, we built and trained an AI model to do just that, and the power of this tool is one click away thanks to our Google Chrome Extension!s

Project Objectives

Build a binary text classification model using TensorFlow.
Scrape training data that covers a variety of instruction categories.
Develop a Google Chrome extension that automatically reads an entire website and promptly provides the user with JustTheInstructions.

💻 Tech Stack

Frontend Framework

Backend Framework:

Data Processing:

Hosting & Deployment Tools:

🔧 Model Architecture & Training the Model

Note: We built and trained our model on google colab!

We decided to build an LSTM neural network using TensorFlow. The model performs binary text classification and is able to predict if text is "instructions" or is "not instructions". Click the following link to see the documented process. https://colab.research.google.com/drive/1nkqleu9FP2pN5D40q1NK_xuyOvsKG7vy?usp=sharing

We used a variety of sources to collect training data for our model. Categories of sites we wanted the model to be able to determine include recipes, crafts, circuits, and other DIYs.

While many entries were collected from a public Kaggle dataset, we also did our own data scraping using the python library, Beautiful Soup. We were able to scrape over 1000 unique entries for each of the mentioned categories! To learn more about how we did this, click the link to view our google colab file that documents the data scraping process. https://colab.research.google.com/drive/1k1D4zRW0nFicjkS-KqtCVW3y4mn8qSJR?usp=sharing

🔍 Using the Model

You can find the isolate function in the isolate.py file located in the `api` directory.

Given plaintext, we use the model to idetify only the section of the text that includes procedural writing (instructions), and isolate it from the surrounding irrelevant text. This was done by first splitting the website's plaintext into individual sentences. We then parsed through the individual sentences until the model predicted a value greater than our experimentally determined threshold value which markes the start of the instructions section. The isolate function then continues to append sentences to the section and have the model make a prediction. When a significant decrease in prediction was noticed (identified by the prediction of the previous section minus an experimentally determined buffer value), this marks the end of the instructions section. This section, consisting of JustTheInstructions, is then returned to the user.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
api		api
images		images
.gitignore		.gitignore
README.md		README.md
content_script.js		content_script.js
manifest.json		manifest.json
popup.css		popup.css
popup.html		popup.html
popup.js		popup.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JustTheInstruction Chrome Extension

💡 Project Description

Project Objectives

💻 Tech Stack

🔧 Model Architecture & Training the Model

🔍 Using the Model

You can find the isolate function in the isolate.py file located in the `api` directory.

About

Releases

Packages

Contributors 2

Languages

IainMac32/JustTheInstruction

Folders and files

Latest commit

History

Repository files navigation

JustTheInstruction Chrome Extension

💡 Project Description

Project Objectives

💻 Tech Stack

🔧 Model Architecture & Training the Model

🔍 Using the Model

You can find the isolate function in the isolate.py file located in the api directory.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

You can find the isolate function in the isolate.py file located in the `api` directory.

Packages