GitHub - epfl-ada/ada-2024-project-pada-wan: ada-2024-project-pada-wan created by GitHub Classroom

URL to Datastory: https://owhenthesaints.github.io/posts/project/

Abstract

We are a group of students passionate about brewing, and eager to launch our first beer. We have therefore taken it upon ourselves to look at beer reviews, find the customer needs and succeed in our market entry. For this, we will first observe the global beer landscape to see how satisfied the customers are with the current market and if an entry is possible. In this case, we will analyse the favourable characteristics of the products in order to multiply our chances of a successful market entry. Once we have our product, we will develop a launch strategy containing A DAta-driven communication plan and launch schedule. All in all, we want to determine how to make the most successful newcomer beer!

Research Questions

How to launch a successful beer? Product launch: 0. Dataset overview 0.1 : data processing and dataset enrichment

What are good strategies to single out newcomers?
What is the makeup of the ideal startup location (add type of beer)?
When to launch a newcomer beer?

Methods

Data Preprocessing

The review data available from two beer rating websites, RatingBeer and BeerAdvocate, are in two .txt files that were used to create our initial dataset. This also enables us to keep only the useful data for the following tasks and discard the rest. The matched beer dataset is useful to evaluate user origin bias when rating as the dataset is created by keeping reviews from beers present in both websites, which have different demographics. The downside is a significant decrease in the number of reviews.

Data Enrichment

In order to fully analyse, three other datasets have been used to enrich our analysis and be able to normalise the results. Mainly, we noticed a huge US bias needing further investigation. The data used in addition are the following:

The population per country in the world from world bank group
The list of countries by beer consumption from wikipedia
Median income per year (after tax) from Our World in Data

Time series data

Each review uploaded to RateBeer and BeerAdvocate has a date and location associated with it. This allows reviews to be analysed as a time series over a chosen period. This allows for the analysis of seasonal changes. In order to avoid bias, location must also be taken into account to ensure that reviews from the southern hemisphere have a reversed seasonal allocation. Once taken into account we can create predictive mapping of when reviews are more favourable and when they are less favourable. This can also be mapped with other data such as, for how long the people have been reviewing the data, and combined with other datasets, informing us on availability of beer all year round to give us deep insight into when we should bring our beer to market and does the optimal beer change over time.

Additional Datasets

The current datasets from Rate Beers and Beer Advocate are already quite dense and contain various information. However, we observed a significant concentration of users from the USA compared to other regions. To mitigate this bias, we considered two options. Firstly, we will take into account, for questions that are location dependent, the overrepresentation of U.S. users by mitigating it through normalisation using population data from the World Bank in our dataset. Secondly, we want to incorporate additional beer reviews from other datasets. We intend to complement our datasets, especially the appearance of our beers with this dataset. We also intend to get more data still off the internet by trying to scrape some data off websites such as beer advocate and beerizer.com to try and get more up-to-date info about some beers and to try and extract prices.

Contributions

Matti: Problem formulation, Newcomer Analysis, Data Story Structure and Text Production
Anderson: Left the group for his startup
Sylvain: Location Analysis, Data Story Structure and Text Production
Owen: Data scraping, website setup, Plotting interactive graphs
Tom: Linking the notebooks, Data Story Structure and Text Production

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
data		data
img		img
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
results.ipynb		results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URL to Datastory: https://owhenthesaints.github.io/posts/project/

Abstract

Research Questions

Methods

Data Preprocessing

Data Enrichment

Time series data

Additional Datasets

Contributions

About

Releases

Packages

Contributors 5

Languages

epfl-ada/ada-2024-project-pada-wan

Folders and files

Latest commit

History

Repository files navigation

URL to Datastory: https://owhenthesaints.github.io/posts/project/

Abstract

Research Questions

Methods

Data Preprocessing

Data Enrichment

Time series data

Additional Datasets

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages