Apples to Audiences: A tool for targeting organic produce buyers

This is the model development and streamlit app deployment for the Insight Data Science Project Apples to Audiences.

Motivation

Organic food is the fastest-growing category in retail grocery today, but still represents just 6% of the total market share in the U.S. To grow the market, organic trade associations provide coupons and other advertising to incentivize purchases, but currently such promotions are not targeted to individual consumers.

Can likely organic buyers be identified in a way that grows organic sales?

Apples to Audiences is a tool that identifies users who are likely to buy organic produce based on their past shopping history, combining recommendations from a collaborative filtering model with predictions from a logistic regression model.

The model behind Apples to Audiences identifies 10% new likely buyers of organic food and reduces spam to unlikely buyers by 40% compared to not targeting. More focused targeting will increase lift in the percentage of purchases with organic items.

OrganicFoodBuyers
- Contains standalone code for the final model development pipeline
notebooks
- Contains jupyter notebooks documenting different experimental pipelines for the project.
webapp_streamlit
- Contains the streamlit front-end. See Running Streamlit on EC2 for instructions for deploying on AWS.

Instructions to run

Clone the repo:

git clone https://github.com/jeffcfho/OrganicFoodBuyers.git

Use either of the requirements files to install the necessary packages. Both have all the packages required for reproducibility but some packages are superfluous that have not yet been removed from the experimentation stage (see Issue #3).

conda create --name <env> --file conda_requirements.txt

OR

pip install -r pip_requirements.txt

To run simple unit tests, run in the main repo folder.

pytest

Tech stack and data sources

Pandas
LightFM
Scikit-learn
Streamlit
EC2

Data: Instacart order data (3.4 million orders made by 200k users)

Notes

Pandas dataframe manipulations

The order data lives in several tables: orders, products, orders_products__train, and orders_products__prior.

Get product purchase frequency by user

# Filter orders table to only get past orders
users_order_prior = orders.loc[orders['eval_set'].map(lambda x: x in ['prior'])]

# Join products in prior orders to the user for each order
user_order_products_prior = order_products__prior.merge(users_order_prior[['order_id','user_id']],
                                                        on='order_id')
# Check join worked
display(users_order_prior.loc[users_order_prior['order_id']==2]) # does this return user 202279?

# Group products ordered by user
user_products_freq_long = user_order_products_prior[['user_id','product_id']]\
							.groupby(['user_id','product_id']).size()

# Reset index to remove MultiIndex
user_products_freq_long = user_products_freq_long.reset_index()
user_products_freq_long.columns = ['user_id','product_id','freq']
display(user_products_freq_long.head(3))
# 		user_id	product_id	freq
# 	0	1		196			10
# 	1	1		10258		9
# 	2	1		10326		1
# 	...

Create user-product sparse matrix needed for LightFM

# Row index of matrix will be user_id - 1 (because there is no user_id==0 but we have a row_ind = 0)
row_ind = user_products_freq_long['user_id'].values - 1
num_rows = max(user_products_freq_long['user_id'].unique())

# Column index of matrix will be prod_id - 1
col_ind = user_products_freq_long['product_id'].values - 1
num_cols = max(user_products_freq_long['product_id'].unique())

# Frequency data are cell values
freq = user_products_freq_long['freq'].values

# Create sparse matrix
user_prod_interaction_train = coo_matrix((freq, (row_ind, col_ind)), shape = (num_rows, num_cols))

user_prod_interaction_train
# <206209x49688 sparse matrix of type '<class 'numpy.int64'>'
#	with 13307953 stored elements in COOrdinate format>

Visualizing classification trees

pydotplus:

pip install pydotplus

# Fit a tree from sklearn
from sklearn import tree
rt = tree.DecisionTreeClassifier(max_depth=2,random_state=0,criterion='entropy')
rt.fit(X_train,y_train)

# tree module has export_graphviz function whose output can be passed into pydotplus
dot_data = tree.export_graphviz(rt, out_file=None, 
                         feature_names=feature_cols,  
                         class_names=y_col,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data) 

display(Image(graph.create_png()))

Running Streamlit on EC2

Instructions:

Set up EC2 instance and download pem file locally.
SSH on to the instance: ssh -i "YourPEMfile.pem" ec2-user@<Your Public DNS(IPv4) Address>
Install Python 3 and git: sudo yum install python36, sudo yum install git
Install any necessary packages: python36 -m pip install streamlit, python36 -m pip install plotly_express and others as necessary
Clone the repo: git clone https://github.com/jeffcfho/OrganicFoodBuyers.git
Run the app: streamlit run applestoaudiences.py

If you want to have the app running while you exit your EC2 instance, use tmux:

sudo yum install tmux
tmux new -s st_instance
Inside the new window, run the app: streamlit run applestoaudiences.py
Hit CTRL+B, let go, then hit D to exit out of the window with the app still running.

Domain name forwarding to your Streamlit-hosted EC2 instance

Instructions

You may have to append --server.enableCORS false when you run your streamlit app, as per this issue.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
OrganicFoodBuyers		OrganicFoodBuyers
images		images
modeling_dfs		modeling_dfs
notebooks		notebooks
saved_models		saved_models
slides		slides
webapp_REST_API		webapp_REST_API
webapp_streamlit		webapp_streamlit
.gitignore		.gitignore
README.md		README.md
conda_requirements.txt		conda_requirements.txt
pip_requirements.txt		pip_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apples to Audiences: A tool for targeting organic produce buyers

Motivation

Table of Contents

Instructions to run

Tech stack and data sources

Notes

Pandas dataframe manipulations

Get product purchase frequency by user

Create user-product sparse matrix needed for LightFM

Visualizing classification trees

Running Streamlit on EC2

Domain name forwarding to your Streamlit-hosted EC2 instance

About

Releases

Packages

Languages

jeffcfho/OrganicFoodBuyers

Folders and files

Latest commit

History

Repository files navigation

Apples to Audiences: A tool for targeting organic produce buyers

Motivation

Table of Contents

Instructions to run

Tech stack and data sources

Notes

Pandas dataframe manipulations

Get product purchase frequency by user

Create user-product sparse matrix needed for LightFM

Visualizing classification trees

Running Streamlit on EC2

Domain name forwarding to your Streamlit-hosted EC2 instance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages