JuliaData · bkamins · Apr 27, 2020 · Apr 16, 2020 · Apr 16, 2020 · Apr 17, 2020
diff --git a/Project.toml b/Project.toml
@@ -35,7 +35,7 @@ test = ["DataStructures", "DataValues", "Dates", "Logging", "Random", "Test"]
 julia = "1"
 CategoricalArrays = "0.8"
 Compat = "2.2, 3"
-DataAPI = "1.0.1"
+DataAPI = "1.2"
 InvertedIndices = "1"
 IteratorInterfaceExtensions = "0.1.1, 1"
 Missings = "0.4.2"

diff --git a/docs/src/lib/indexing.md b/docs/src/lib/indexing.md
@@ -17,9 +17,11 @@ and broadcasting are intended to work with `DataFrame`, `SubDataFrame` and `Data
 The rules for a valid type of index into a column are the following:
 * a value, later denoted as `col`:
     * a `Symbol`;
+    * an `AbstractString`;
     * an `Integer` that is not `Bool`;
 * a vector, later denoted as `cols`:
     * a vector of `Symbol` (does not have to be a subtype of `AbstractVector{Symbol}`);
+    * a vector of `AbstractString` (does not have to be a subtype of `AbstractVector{<:AbstractString}`);
     * a vector of `Integer` other than `Bool` (does not have to be a subtype of `AbstractVector{<:Integer}`);
     * a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
     * a regular expression, which gets expanded to a vector of matching column names;
@@ -122,13 +124,14 @@ so it is unsafe to use it afterwards (the column length correctness will be pres
 * `df[CartesianIndex(row, col)] = v` -> the same as `df[row, col] = v`;
 * `df[row, cols] = v` -> set row `row` of columns `cols` in-place; the same as `dfr = df[row, cols]; dfr[:] = v`;
 * `df[rows, col] = v` -> set rows `rows` of column `col` in-place; `v` must be an `AbstractVector`;
-                         if `rows` is `:` and `col` is a `Symbol` that is not present in `df` then a new column
-                         in `df` is created and holds a `copy` of `v`; equivalent to `df.col = copy(v)` if `col` is a valid identifier;
+                         if `rows` is `:` and `col` is a `Symbol` or `AbstractString`
+                         that is not present in `df` then a new column in `df` is created and holds a `copy` of `v`; equivalent to `df.col = copy(v)` if `col` is a valid identifier;
 * `df[rows, cols] = v` -> set rows `rows` of columns `cols` in-place; `v` must be an `AbstractMatrix` or an `AbstractDataFrame`
                       (in this case column names must match);
 * `df[!, col] = v` -> replaces `col` with `v` without copying
                       (with the exception that if `v` is an `AbstractRange` it gets converted to a `Vector`);
-                      also if `col` is a `Symbol` that is not present in `df` then a new column in `df` is created and holds `v`;
+                      also if `col` is a `Symbol` or `AbstractString` that is not present in `df` then
+                      a new column in `df` is created and holds `v`;
                       equivalent to `df.col = v` if `col` is a valid identifier;
                       this is allowed if `ncol(df) == 0 || length(v) == nrow(df)`;
 * `df[!, cols] = v` -> replaces existing columns `cols` in data frame `df` with copying;
@@ -183,10 +186,10 @@ Additional rules:
 * in the `df[CartesianIndex(row, col)] .= v`, `df[row, col] .= v` syntaxes `v` is broadcasted into the contents of `df[row, col]` (this is consistent with Julia Base);
 * in the `df[row, cols] .= v` syntaxes the assignment to `df` is performed in-place;
 * in the `df[rows, col] .= v` and `df[rows, cols] .= v` syntaxes the assignment to `df` is performed in-place;
-  if `rows` is `:` and `col` is `Symbol` and it is missing from `df` then a new column is allocated and added;
+  if `rows` is `:` and `col` is `Symbol` or `AbstractString` and it is missing from `df` then a new column is allocated and added;
   the length of the column is always the value of `nrow(df)` before the assignment takes place;
 * in the `df[!, col] .= v` syntax column `col` is replaced by a freshly allocated vector;
-  if `col` is `Symbol` and it is missing from `df` then a new column is allocated added;
+  if `col` is `Symbol` or `AbstractString` and it is missing from `df` then a new column is allocated added;
   the length of the column is always the value of `nrow(df)` before the assignment takes place;
 * the `df[!, cols] .= v` syntax replaces existing columns `cols` in data frame `df` with freshly allocated vectors;
 * `df.col .= v` syntax is allowed and performs in-place assignment to an existing vector `df.col`.
@@ -197,9 +200,8 @@ Additional rules:
 
 Note that `sdf[!, col] .= v` and `sdf[!, cols] .= v` syntaxes are not allowed as `sdf` can be only modified in-place.
 
-If column indexing using `Symbol` names in `cols` is performed, the order of columns in the operation is specified
-by the order of names.
-
+If column indexing using `Symbol` or `AbstractString` names in `cols` is performed, the order
+of columns in the operation is specified by the order of names.
 
 ## Indexing `GroupedDataFrame`s
 
@@ -230,3 +232,18 @@ The elements of a `GroupedDataFrame` are [`SubDataFrame`](@ref)s of its parent.
 * `gd[n::Not]` -> Any of the above types wrapped in `Not`. The result
    will be a new `GroupedDataFrame` containing all groups in `gd` *not* selected
    by the wrapped index.
+
+# Common API for types defined in DataFrames.jl
+
+This table presents return value types of calling `names`, `propertynames` and `keys`
+on types exposed to the user by DataFrames.jl:
+
+| Type                | `names`          | `propertynames`  | `keys`           |
+|---------------------|------------------|------------------|------------------|
+| `AbstractDataFrame` | `Vector{String}` | `Vector{Symbol}` | undefined        |
+| `DataFrameRow`      | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
+| `DataFrameRows`     | `Vector{String}` | `Vector{Symbol}` | vector of `Int`  |
-| `DataFrameRows`     | `Vector{String}` | `Vector{Symbol}` | vector of `Int`  |
+| `DataFrameRows`     | `Vector{String}` | `Vector{Symbol}` | `Vector{Int}`  |
-| `DataFrameRows`     | `Vector{String}` | `Vector{Symbol}` | vector of `Int`  |
+| `DataFrameRows`     | `Vector{String}` | `Vector{Symbol}` | `Vector{Int}`  |
+| `DataFrameColumns`  | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
+| `GroupedDataFrame`  | `Vector{String}` | tuple of fields  | `GroupKeys`      |
+| `GroupKeys`         | undefined        | tuple of fields  | vector of `Int`  |
-| `GroupKeys`         | undefined        | tuple of fields  | vector of `Int`  |
+| `GroupKeys`         | undefined        | tuple of fields  | `AbsractVector{Int}`  |
-| `GroupKeys`         | undefined        | tuple of fields  | vector of `Int`  |
+| `GroupKeys`         | undefined        | tuple of fields  | `AbsractVector{Int}`  |
+| `GroupKey`          | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
diff --git a/docs/src/lib/types.md b/docs/src/lib/types.md
@@ -109,6 +109,7 @@ without caution because:
 
 ```@docs
 AbstractDataFrame
+AsTable
 ByRow
 DataFrame
 DataFrameRow

diff --git a/docs/src/man/getting_started.md b/docs/src/man/getting_started.md
@@ -45,7 +45,8 @@ julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
 
 ```
 
-Columns can be directly (i.e. without copying) accessed via `df.col` or `df[!, :col]`. The latter syntax is more flexible as it allows passing a variable holding the name of the column, and not only a literal name. Note that column names are symbols (`:col` or `Symbol("col")`) rather than strings (`"col"`). Columns can also be accessed using an integer index specifying their position.
+Columns can be directly (i.e. without copying) accessed via `df.col`, `df."col"`, `df[!, :col]` or `df[!, "col"]`. The two latter syntaxes are more flexible as they allow passing a variable holding the name of the column, and not only a literal name. Note that column names can be either symbols (written as `:col`, `:var"col"` or `Symbol("col")`) or strings (written as `"col"`).
+Columns can also be accessed using an integer index specifying their position.
 
 Since `df[!, :col]` does not make a copy, changing the elements of the column vector returned by this syntax will affect the values stored in the original `df`. To get a copy of the column use `df[:, :col]`: changing the vector returned by this syntax does not change `df`.
 
@@ -58,6 +59,13 @@ julia> df.A
  3
  4
 
+julia> df."A"
+4-element Array{Int64,1}:
+ 1
+ 2
+ 3
+ 4
+
 julia> df.A === df[!, :A]
 true
 
@@ -67,6 +75,15 @@ false
 julia> df.A == df[:, :A]
 true
 
+julia> df.A === df[!, "A"]
+true
+
+julia> df.A === df[:, "A"]
+false
+
+julia> df.A == df[:, "A"]
+true
+
 julia> df.A === df[!, 1]
 true
 
@@ -89,15 +106,28 @@ julia> df[:, firstcolumn] == df.A
 true
 ```
 
-Column names can be obtained using the `names` function:
+Column names can be obtained as strings using the `names` function:
 
 ```jldoctest dataframe
 julia> names(df)
-2-element Array{Symbol,1}:
- :A
- :B
+2-element Array{String,1}:
+ "A"
+ "B"
+ ```
+
+To get column names as `Symbol`s use the `propertynames` function:
+```
+julia> propertynames(df)
+(:A, :B)
 ```
 
+!!! note
+
+    DataFrames.jl allows to use `Symbol`s (like `:A`) and strings (like `"A"`)
+    for all column indexing operations for convenience.
+    However, using `Symbol`s is slightly faster and should generally be preferred.
-    However, using `Symbol`s is slightly faster and should generally be preferred.
+    However, using `Symbol`s is slightly faster and should generally be preferred, if not generating them via string manipulation.
-    However, using `Symbol`s is slightly faster and should generally be preferred.
+    However, using `Symbol`s is slightly faster and should generally be preferred, if not generating them via string manipulation.
+
+
 ### Constructing Column by Column
 
 It is also possible to start with an empty `DataFrame` and add columns to it one by one: