This project is the final assignment for the Erasmus Big Data and Data Mining course during the Fall semester of the academic year 2023-2024.
The purpose of this laboratory report is to analyze a dataset related to World University Rankings, focusing on gender ratios and employing machine learning for university rank classification. The analysis involves data exploration, visualization, statistical testing, and machine learning modeling. Data Preprocessing: The dataset was loaded and examined for missing values. Data normalization was applied to certain columns to ensure consistent scales.
Descriptive statistics were generated to understand the dataset's distribution. Histograms and boxplots were used to visualize data distributions and identify potential outliers. Hypothesis testing was conducted to compare gender ratios in top universities against global averages. Geographical analysis was performed to visualize gender ratios by country.
The dataset comprises information on 2,362 universities. The average gender ratio in the top 100 universities is approximately balanced, with females at 51.14% and males at 48.86%. No significant difference was found in the gender ratio between the top 100 universities and the global average. Geographical analysis highlighted countries with the highest and lowest gender representation in universities.
Universities aiming for top rankings should ensure a balanced gender ratio, as the top 100 universities exhibit a near-equal distribution. Regions with skewed gender ratios in universities might benefit from policies promoting gender equality in higher education.
The World University Rankings 2023 dataset offers valuable insights into the state of higher education globally. The balanced gender ratio in top universities is a positive sign, but there's room for improvement in certain regions. Future steps could involve a deeper dive into factors influencing university rankings and exploring correlations with other global indicators.