While I didn't personally think that women had "inferior negotiation skills" than men, I was interested in the topic and decided to conduct my own data analysis and see if income gap existed between men and women when compared under equal conditions. For the purpose of this analysis, I decided to compare individuals who have never married.
This analysis was performed using the R programming language. The AdultUCI dataset in arules package in R contains information about individuals' information from census data. Information about individuals include age, education, marital status, occupation, relationship, race, sex, work hours per week, native country, income, etc. Income is classified as either 'small' (less than $50K/year salary) or 'large' (greater than or equal to $50K/year).CRAN package URL: http://cran.r-project.org/web/packages/arules/index.html
** Note that the dataset does not provide the actual salaries of the sample individuals. Instead, it only reports whether the individuals' income is 'large' (greater than or equal to $50K/year) or 'small' (less than $50K/year).
AdultUCI data (a processed version of census income data) was further processed for the following reasons:- To eliminate any rows with missing values.
- To rename the column names by removing - characters and replacing with _ characters.
- To add a new column, age_group (in addition to the age column).
The final dataset for the analysis was stored inside the data folder as incomeData.RData.
The script used to process the AdultUCI data can be seen here.
For the purpose of this analysis, I decided to analyze individuals who have never been married before.Below R code was used to load the pre-processed AdultUCI data and subset males and females who have never been married.
> load('./data/incomeData.RData') # load pre-processed data
> m.never_married <- subset(incomeData, sex=='Male' & marital_status=='Never-married') # select all males who never married
> f.never_married <- subset(incomeData, sex=='Female' & marital_status=='Never-married') # select all females who never married
> mf.never_married <- rbind(m.never_married, f.never_married) # select all males and females who never married
- Chi-square test
- Fisher's exact test
- Mann-Whitney U test
- Levene's test
![Alt text](./images/individuals_who_never_married_by_income_and_gender.png)
There didn't appear to be any significant age difference between men and women in the dataset.
Average age of men who never married: 28.4
Average age of women who never married: 28.5
Median age of men who never married: 26
Median age of women who never married: 25
Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the median age difference between men and women was significant. The test rendered a p-value of 0.162 (greater than 0.05), which allowed me to retain the null hypothesis and conclude that there exists no statistically significant difference in average age between men and women.
There appeared to be some noteable difference in the number of work hours per week between men and women.Average work hours per week for men who have never married: 38.7
Average work hours per week for women who have never married: 35.3
Median work hours per week for men who have never married: 40
Median work hours per week for women who have never married: 40
Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the difference between men and women's work hours per week was statistically significant. The test rendered a p-value less than 2.2e-16 (much, much less than 0.05), which allowed me to conclude that there exists a statistically significant difference in the number of work hours per week between men and women (that men tend to work longer hours than women).
There appeared to be some noteable difference in the average number in school between men and women.Average number of years in school for men who have never married: 9.8
Average number of years in school for women who have never married: 10.3
Median number of years in school for men who have never married: 10
Median number of years in school for women who have never married: 10
Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the difference between men and women's number of years in school was statistically significant. The test rendered a p-value less than 2.2e-16 (much, much less than 0.05), which allowed me to conclude that there exists a statistically significant difference in the number of years in school between men and women (and that women tend to stay longer in school).
![Alt text](./images/individuals_who_never_married_by_occupation_type_and_gender.png)Men and women's income levels (large or small) were counted after fixing the following variables:
- age group
- number of work hours per week
- occupational field
- educational background (highest education)
(a) | (b) | (c) | (d) | |||||
---|---|---|---|---|---|---|---|---|
p = 0.439 | p = 1 | p = 1 | p = 1 | |||||
Female | Male | Female | Male | Female | Male | Female | Male | |
small | 17 | 23 | 19 | 14 | 11 | 12 | 10 | 5 |
large | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
All four sub-groups (a, b, c, d) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in executive/managerial field who never married, worked 40 hours per week, and:
- were in their 20s and had attended college (no degree), p > 0.05, (a)
- were in their 20s and had graduated from high school, p > 0.05, (b)
- were in their 30s and had attended college (no degree), p > 0.05, (c)
- were in their 30s and had graduated from high school, p > 0.05, (d)
(e) | (f) | (g) | (h) | |||||
---|---|---|---|---|---|---|---|---|
p = 1 | p = 1 | p = 1 | p = 1 | |||||
Female | Male | Female | Male | Female | Male | Female | Male | |
small | 38 | 36 | 56 | 65 | 13 | 12 | 32 | 33 |
large | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
All four sub-groups (e, f, g, h) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married, worked 40 hours per week, and:
- were in their 20s and had attended college (no degree), p > 0.05, (e)
- were in their 20s and had graduated from high school, p > 0.05, (f)
- were in their 30s and had attended college (no degree), p > 0.05, (g)
- were in their 30s and had graduated from high school, p > 0.05, (h)
(i) | (j) | (k) | (l) | |||||
---|---|---|---|---|---|---|---|---|
p = 1 | p = 1 | p = 1 | p = 1 | |||||
Female | Male | Female | Male | Female | Male | Female | Male | |
small | 11 | 13 | 11 | 10 | 5 | 6 | 1 | 5 |
large | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
All four sub-groups (i, j, k, l) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in professional specialty field who never married, worked 40 hours per week, and:
- were in their 20s and had attended college (no degree), p > 0.05, (i)
- were in their 20s and had graduated from high school, p > 0.05, (j)
- were in their 30s and had attended college (no degree), p > 0.05, (k)
- were in their 30s and had graduated from high school, p > 0.05, (l)
(m) | (n) | (o) | (p) | |||||
---|---|---|---|---|---|---|---|---|
p = 1 | p = 1 | p = 1 | p = 0.28 | |||||
Female | Male | Female | Male | Female | Male | Female | Male | |
small | 39 | 36 | 23 | 43 | 9 | 5 | 18 | 6 |
large | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
All four sub-groups (m, n, o, p) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in sales field who never married, worked 40 hours per week, and:
- were in their 20s and had attended college (no degree), p > 0.05, (m)
- were in their 20s and had graduated from high school, p > 0.05, (n)
- were in their 30s and had attended college (no degree), p > 0.05, (o)
- were in their 30s and had graduated from high school, p > 0.05, (p) .
- age group
- number of work hours per week
- occupational field
The following variable was not fixed:
- educational background (highest education)
(q) | (r) | |||
---|---|---|---|---|
p = 0.6209 | p = 0.03211 | |||
Female | Male | Female | Male | |
small | 89 | 89 | 55 | 45 |
large | 3 | 1 | 4 | 12 |
Fisher's exact test performed on the 30s group (r) resulted in a p-value of 0.03211 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in executive/managerial field who never married and worked 40 hours per week when there exists no fix for educational background.
![Alt text](./images/individuals_in_other-service_field_who_never_married_and_work_40hpw_2.png)(s) | (t) | |||
---|---|---|---|---|
p = 0.4667 | p = 1 | |||
Female | Male | Female | Male | |
small | 125 | 144 | 65 | 69 |
large | 1 | 0 | 0 | 0 |
Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married, worked 40 hours per week, and:
- were in their 20s, p > 0.05, (s)
- were in their 30s, p > 0.05, (t)
(u) | (v) | |||
---|---|---|---|---|
p = 0.4381 | p = 0.0003111 | |||
Female | Male | Female | Male | |
small | 134 | 102 | 73 | 56 |
large | 11 | 5 | 4 | 20 |
Fisher's exact test performed on the 30s group (v) resulted in a p-value of 0.0003111 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in professional specialty field who never married and worked 40 hours per week when there exists no fix for educational background.
![Alt text](./images/individuals_in_sales_field_who_never_married_and_work_40hpw_2.png)(w) | (x) | |||
---|---|---|---|---|
p = 0.5022 | p = 0.007476 | |||
Female | Male | Female | Male | |
small | 108 | 128 | 49 | 28 |
large | 0 | 2 | 1 | 7 |
Fisher's exact test performed on the 30s group (x) resulted in a p-value of 0.007476 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in sales field who never married and worked 40 hours per week when there exists no fix for educational background.
Men and women's income levels (large or small) were counted after fixing the following variables:- age group
- occupational field
The following variables were not fixed:
- number of work hours per week
- educational background (highest education)
(y) | (z) | |||
---|---|---|---|---|
p = 0.26 | p = 0.08727 | |||
Female | Male | Female | Male | |
small | 176 | 177 | 96 | 97 |
large | 6 | 12 | 15 | 29 |
Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in executive/managerial field who never married and:
- were in their 20s, (y)
- were in their 30s, (z)
(aa) | (ab) | |||
---|---|---|---|---|
p = 0.7146 | p = 1 | |||
Female | Male | Female | Male | |
small | 390 | 364 | 141 | 127 |
large | 3 | 2 | 1 | 0 |
Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married and:
- were in their 20s, p > 0.05, (aa)
- were in their 30s, p > 0.05, (ab)
(ac) | (ad) | |||
---|---|---|---|---|
p = 0.1283 | p = 0.01119 | |||
Female | Male | Female | Male | |
small | 308 | 241 | 138 | 125 |
large | 14 | 20 | 25 | 47 |
(ae) | (af) | |||
---|---|---|---|---|
p = 0.001935 | p = 0.00316 | |||
Female | Male | Female | Male | |
small | 320 | 341 | 93 | 93 |
large | 1 | 13 | 5 | 21 |
Fisher's exact test performed on the 20s group (ae) and the 30s group (af) resulted in p-values of 0.001935 and 0.00316, respectively (both less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (both in favor of men) among individuals in their 20s and 30s working in sales field who never married when there exists no fix for educational background or the number of hours worked.
Men and women's income levels (large or small) were counted after fixing the following variable:- occupational field
The following variables were not fixed:
- age group
- number of work hours per week
- educational background (highest education)
</tr>
<tr>
<td>large</td>
<td>42</td>
<td>70</td>
<td>5</td>
<td>7</td>
<td>71</td>
<td>100</td>
<td>10</td>
<td>48</td>
</tr>
(ag) | (ah) | (ai) | (aj) | |||||
---|---|---|---|---|---|---|---|---|
p = 0.01027 | p = 0.568 | p = 0.0006787 | p = 8.112e-08 | |||||
Female | Male | Female | Male | Female | Male | Female | Male | |
small | 343 | 328 | 830 | 766 | 578 | 457 | 667 | 565 |
Fisher's exact test performed on the executive/managerial group (ae) and chi-square tests performed on the professional specialty group (ai) and the sales group (aj) resulted in p-value of 0.01027, 0.0006787, and 8.112e-08, respectively (all less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (all three in favor of men) among individuals working in executive/managerial, professional specialty, and sales field when there exists no fix for age group, educational background, or the number of hours worked.
When men and women were compared after fixing for marital status (never married), age group, occupational field, number of hours worked, and educational background, there seemed to be no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women in all 16 comparisons (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p).When men and women were compared after fixing for marital status (never married), age group, occupational field and number of hours worked but not for educational background, there were three out of eight comparisons (r, v, x) that showed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women.
When men and women were compared after fixing for marital status (never married), age group, and occupational field but not for number of hours worked or educational background, there were three out of eight comparisons (ad, ae, af) that showed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women.
When men and women were compared after fixing for only marital status and occupational field, there existed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women in all four comparisons (ag, ah, ai, aj) among never-married individuals working in executive/managerial, other service, professional specialty, and sales field.
This suggests that the differences in the proportions of 'large' income-earners arise when individuals are compared without fixing for variables that are related to their pay. When men and women's income are counted and compared after adjusting for individuals' marital status, occupational field, age group, educational background, and number of work hours per week, there didn't appear to be any statistically significant differences in the proportions of 'large' income-earners between men and women.
For future analysis, it would be great to use a dataset that contains individuals' actual income (as opposed to a binary variable that classifies income as either 'large' or 'small'). In addition,I would also like to examine individuals in other marital status categories (e.g. married to civilian, divorced, etc.) to see if the same no income proportional differences are observed when occupational field, age group, number of work hours per week, and educational background variables are fixed.
Lastly, the basis for claiming that there existed no statistically significant difference in the proportions of 'large' income-earner (more than $50K/year) between men and women with fixed variables (marital status, occupational field, number of work hours per week, age group, and educational background) relied on Fisher's exact tests conducted on figures (a) through (p). Unfortunately, many of those cases contained cell counts less than or equal to 5, which forced me to perform Fisher's exact tests as opposed to chi-square tests, which are deemed more accurate. In fact, in all 16 cases from (a) through (p), the cells counts for 'large' income-earner counts were either 0 or 1 (mostly 0s). The analysis would have been much more robust if all of the 'large' income-earner counts were greater than 5 and if I could have performed chi-square tests that yielded the same no income proportionality differences between genders. Hence, for the next analysis, it would be great to either focus on sample subsets that can yield higher numbers of 'large' income-earners after fixing for variables or use a much bigger dataset.