Skip to content

Latest commit

 

History

History
79 lines (60 loc) · 2.33 KB

lesson1.md

File metadata and controls

79 lines (60 loc) · 2.33 KB

Lesson 1

Most of this tutorial has been stolen from pandas lesson 1.

In this tutorial we will assume you have included DataFrame in your project.

Create data

The data set will consist of 5 baby names and the number of births recorded for that year (1880).

# The inital set of baby names and bith rates
names = ["Bob","Jessica","Mary","John","Mel"]
births = [968, 155, 77, 578, 973]

First we create a 2D data structure with these two columns:

values = DataFrame.Table.new([names, births], from_columns: true)

We are basically done creating the data set. We now will use the library to export this data set into a csv file.

frame will be a DataFrame data structure. You can think of this structure holding the contents of the BabyDataSet in a format similar to an excel spreadsheet. Lets take a look below at the contents inside frame.

frame = DataFrame.new(values, ["Names", "Births"])

Output:

             Names         Births
0             bob           968
1             Jessica       155
2             Mary          77
3             John          578
4             Mel           973

Exporting and importing data

Export the dataframe to a csv file. We can name the file births1880.csv. The function to_csv will be used to export the file. The file will be saved in the same location of the project unless specified otherwise.

DataFrame.to_csv(frame, "births1880.csv")

To import the data we can use the from_csv function.

frame_from_file = DataFrame.from_csv("births1880.csv")
              Names         Births
0             bob           968
1             Jessica       155
2             Mary          77
3             John          578
4             Mel           973

As we can see the default is to both write and read the name of the columns and use automatic indexing. Check the documentation of these functions to see other options.

Analyze data

To find the most popular name or the baby name with the highest birth rate, we can sort the dataframe and select the top row:

 frame_from_file |> DataFrame.sort_values("Births") |> DataFrame.head(1)

Will give us:

              Names         Births
4             Mel           973