Skip to content

IainMac32/JustTheInstruction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JustTheInstruction Chrome Extension

💡 Project Description

Isn't it the worst when you find a recipe online, but need to scroll down for what feels like hours to find the actual instructions? We too have been bothered by this issue and began to think of a solution. What if we could use AI to extract JustTheInstructions from any recipe, arts & crafts, or other DIY website?

Well, we built and trained an AI model to do just that, and the power of this tool is one click away thanks to our Google Chrome Extension!s

Project Objectives

  1. Build a binary text classification model using TensorFlow.
  2. Scrape training data that covers a variety of instruction categories.
  3. Develop a Google Chrome extension that automatically reads an entire website and promptly provides the user with JustTheInstructions.

💻 Tech Stack

Frontend Framework
Backend Framework:

Python TensorFlow Flask

Data Processing:

Pandas NumPy Beautiful Soup

Hosting & Deployment Tools:

Docker Google Cloud Google Cloud Run

🔧 Model Architecture & Training the Model

Note: We built and trained our model on google colab!

We decided to build an LSTM neural network using TensorFlow. The model performs binary text classification and is able to predict if text is "instructions" or is "not instructions". Click the following link to see the documented process. https://colab.research.google.com/drive/1nkqleu9FP2pN5D40q1NK_xuyOvsKG7vy?usp=sharing

We used a variety of sources to collect training data for our model. Categories of sites we wanted the model to be able to determine include recipes, crafts, circuits, and other DIYs.

While many entries were collected from a public Kaggle dataset, we also did our own data scraping using the python library, Beautiful Soup. We were able to scrape over 1000 unique entries for each of the mentioned categories! To learn more about how we did this, click the link to view our google colab file that documents the data scraping process. https://colab.research.google.com/drive/1k1D4zRW0nFicjkS-KqtCVW3y4mn8qSJR?usp=sharing

🔍 Using the Model

You can find the isolate function in the isolate.py file located in the api directory.

Given plaintext, we use the model to idetify only the section of the text that includes procedural writing (instructions), and isolate it from the surrounding irrelevant text. This was done by first splitting the website's plaintext into individual sentences. We then parsed through the individual sentences until the model predicted a value greater than our experimentally determined threshold value which markes the start of the instructions section. The isolate function then continues to append sentences to the section and have the model make a prediction. When a significant decrease in prediction was noticed (identified by the prediction of the previous section minus an experimentally determined buffer value), this marks the end of the instructions section. This section, consisting of JustTheInstructions, is then returned to the user.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published