This project is my capstone project for #30daysoflearning organized by Olanrewaju Oyinbooke. I analyzed a dataset from kaggle about airplane crashes and fatalities around the world from a period of 1908-2009. The dataset contained 5268 cases of crashes and 13 columns which included information about date of crash, number of people aboard, fatalities, location, its operator/airline,type of aircraft and a summary of the cause of the crash.
After importing my data into power query, I checked for duplicates(found none),the validity of my data using column distribution, quality and profiling and removed columns not neccessary for my analysis. I also made sure my columns were in the right data type.
I noticed the location column contained cities, states and countries so i split them using the "split by delimiter function"(since i needed the countries). In the newly created column, i discovered there were variations of spellings and USA was represented by its states. I cleaned this by adding a fuzzy cluster column in Power query and replacing the states with country. You can find the power query script here.
To get the causes of crashes from the Summary column, I extracted the keywords(see script here) from the column and used conditional column to group them into:
- Mechanical Failure: involves crashes due to engine, propeller or wing failure.
- Weather condition: includes crashes affected by any form of weather e.g rain, storm, fog etc.
- Pilot error: involves loss of control.
- Struck an Object: plane crashed into/or struck a mountain, tree etc.
- Shot down: plane crashing due to being shot at
- Fire Outbreak: plane crashed due to explosion or sudden fire.
- Collision: airplanes colliding.
- Fuel: crash caused by low fuel
- Unknown causes
Lastly, I made a column to group each operator into Civil or Military Aviation
- Civil Aviation: this includes all aircrafts used for personal and business purposes rather than for military Purposes.
- Military Aviation: this includes all aircrafts for the purposes of conducting or enabling warfare.
- Crashes over the years: crashes kept increasing from 1908 as more aircrafts were produced and airlines came into the industry. In recent years, there is a decrease in the crashes which can be drawn to technological advancements. For Military Aviation, the highest year of crash was in 1945(marked the end of World War II).
- Aeroflot and The US Air force had the top most crashes. Aeroflot is Russia largest airline and one of the oldest airlines. Also, The US Airforce accounted for 22.54% of total Military aviation crashes.
- The most Crashed aircraft was the Douglas DC 3.
4.The most fatal crash occurred on March 27, 1977 at Tenerife, Canary islands when 2 Airplanes(a KLM Boeing 747 and a Pan American 747) collided killing a total of 583 people out of 644 people on board.