The MrOlympia-DataAnalysis project aims to perform web scraping from Wikipedia to gather data about the winners of Mr. Olympia in the Open Division category. From this data, we seek to extract interesting information and create visualizations that allow for a better understanding of patterns and trends over the years.
For the development of this project, we used the following technologies:
- pandas: for data manipulation and analysis.
- requests: to make HTTP requests and obtain the content of web pages.
- BeautifulSoup4: to parse HTML and extract relevant data.
- matplotlib: for creating graphs and visualizations.
The project consists of four main parts:
-
Scraping Data from Wikipedia In this part, we implemented all the web scraping logic to obtain data from the Wikipedia site about Mr. Olympia and its participants. The collected data is stored in lists.
-
Create DataFrame After collecting the data, we stored this information in dataframes using the pandas library. This makes data manipulation and analysis easier.
-
Get Data of Each Olympia In this section, we created specific dataframes for each edition of Mr. Olympia, including detailed information about the participants, such as weight and height.
-
Display Data in Graphs We used matplotlib to create interactive graphs and visualizations that help understand the collected data. Some of the generated graphs include:
Some of the observed data:
The MrOlympia-DataAnalysis project provided interesting insights into the winners and patterns of Mr. Olympia over the years. Some of the discoveries include the evolution of award values, the performance of notable competitors like Ronnie Coleman, and the distribution of events by country.
Possible extensions of the project may include the analysis of other Mr. Olympia categories, the inclusion of more recent data, and the creation of interactive visualizations using libraries like Plotly or Bokeh.