Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactor to push generic methods to other packages #31

Merged
merged 20 commits into from
Jun 23, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ os:
- linux
- osx
julia:
- 1.0
- 1.1
- 1.2
- nightly
notifications:
email: false
Expand All @@ -24,19 +24,23 @@ matrix:
#before_script: # homebrew for mac
# - if [ $TRAVIS_OS_NAME = osx ]; then brew install gcc; fi

# # uncomment the following lines to override the default test script
# script:
# - julia -e 'Pkg.clone(pwd()); Pkg.build("Microbiome"); Pkg.test("Microbiome"; coverage=true)'
# comment the following lines to use the default test script
script:
- julia -e 'using Pkg.Registry; Registry.add(Registry.RegistrySpec(url = "https://github.com/BioJulia/BioJuliaRegistry.git"))'
- julia -e 'using Pkg.Registry; Registry.add(Registry.RegistrySpec(url = "https://github.com/JuliaRegistries/General.git"))'
- julia -e 'using Pkg; Pkg.build(); Pkg.test("Microbiome", coverage = true)'
after_success:
- julia --project=coverage/ -e 'using Pkg; Pkg.instantiate()'
- julia --project=coverage/ coverage/coverage.jl

jobs:
include:
- stage: "Documentation"
julia: 1.0
julia: 1.1
os: linux
script:
- julia -e 'using Pkg.Registry; Registry.add(Registry.RegistrySpec(url = "https://github.com/BioJulia/BioJuliaRegistry.git"))'
- julia -e 'using Pkg.Registry; Registry.add(Registry.RegistrySpec(url = "https://github.com/JuliaRegistries/General.git"))'
- julia --project=docs/ -e 'using Pkg; Pkg.instantiate();
Pkg.develop(PackageSpec(path=pwd()))'
- julia --project=docs/ docs/make.jl
Expand Down
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# News for Microbiome.jl

## v0.5.0

Major Changes:
- Dropped Microbiome.jl-defined types and methods for DistanceMatrix. Use Distances.jl instead. [Requires this SpatialEcology PR](https://github.com/EcoJulia/SpatialEcology.jl/pull/36)
- Dropped Microbiome.jl-defined types and methods for PCoA. Use MDS from MultivariateStats.jl instead. [Requires this MultivariateStats PR](https://github.com/JuliaStats/MultivariateStats.jl/pull/85)
- Dropped Hclust leaf ordering. Added to Clustering.jl instead ([see Clustering.jl PR](https://github.com/JuliaStats/Clustering.jl/pull/170))


## v0.4.1

Major Changes:
Expand Down
13 changes: 6 additions & 7 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,25 @@ uuid = "3bd8f0ae-a0f2-5238-a5af-e1b399a4940c"
keywords = ["microbiology", "microbiome", "biology"]
license = "MIT"
desc = "Functions for working with microbial community data"
authors = ["kescobo <[email protected]"]
version = "0.4.1"
authors = ["kescobo <[email protected]>"]
version = "0.5.0"

[deps]
Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
SpatialEcology = "348f2d5d-71a3-5ad4-b565-8af070f99681"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
julia = "1.0, 1.1"
julia = "1.1, 1.2"

[extras]
Coverage = "a2441757-f6aa-5fb2-8edb-039e3f45d037"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
coverage = ["Coverage"]
test = ["Test"]
test = ["Test", "Random"]
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
environment:
matrix:
- julia_version: 1.0
- julia_version: 1.1
- julia_version: 1.2
- julia_version: latest

platform:
Expand Down
11 changes: 10 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
[deps]
Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Microbiome = "3bd8f0ae-a0f2-5238-a5af-e1b399a4940c"
MicrobiomePlots = "bed85cf4-5cc8-5ac4-9eb3-09fbf92b2ce2"
MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
SpatialEcology = "348f2d5d-71a3-5ad4-b565-8af070f99681"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"

[compat]
Documenter = "~0.19"
Documenter = "~0.22"
5 changes: 3 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
using Documenter, Microbiome

makedocs(
format = :html,
sitename = "Microbiome.jl",
pages = [
"Home" => "index.md",
"Microbial Abundances" => "abundances.md",
"Distances & Dissimilarity" => "distances.md",
"Contributing" => "contributing.md"
],
authors = "Kevin Bonham, PhD"
authors = "Kevin Bonham, PhD",
format = Documenter.HTML(
prettyurls = get(ENV, "CI", nothing) == "true")
)

deploydocs(
Expand Down
64 changes: 46 additions & 18 deletions docs/src/abundances.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ names are also stored, and there's a convenience function if you want to convert
a `DataFrame` to a `ComMatrix`, assuming the first column contains feature
names:

```@repl 1
```@example 1
using Microbiome
using DataFrames

Expand All @@ -22,7 +22,7 @@ Forgive the clutter... ComMatricies name rows as species (which is true in this
case, but need not be), and columns are "sites" rather than samples. That will
be fixed eventually.

```@repl 1
```@example 1
samplenames(abund)
featurenames(abund)
sampletotals(abund) # this is column sums
Expand All @@ -32,7 +32,7 @@ featuretotals(abund) # this is row sums
If you want relative abundance, you can do `relativeabundance(abund)` or
`relativeabundance!(abund)`:

```@repl 1
```@example 1
relativeabundance!(abund)

sampletotals(abund)
Expand All @@ -43,33 +43,61 @@ function automatically generates an `n+1` row for `other` containing the
remaining features. Note - these doesn't modify in-place, so you've gotta
reassign if you want to update:

```@repl 1
```@example 1
abund2 = filterabund(abund, 1)

featurenames(abund2)
```

## Plotting

**NOTE: The following functions are not currently working - I've moved them to a new package to simplify dependencies. I'm leaving the docs for now as a reference - see `Microbiome.jl` versions 0.2.1 and below for working versions**
Some convenience plotting types are available using
[MicrobiomePlots](https://github.com/BioJulia/MicrobiomePlots.jl)
and [StatsPlots](https://github.com/juliaplots/StatsPlots.jl)

```@example 1
ENV["GKSwstype"] = "100" # hide
using StatsPlots
using MicrobiomePlots
using Distributions
using Random # hide
Random.seed!(1) # hide

# add some high abundance bugs to be a bit more realistic
function spikein(spikes, y, x)
m = rand(LogNormal(), y, x)
for s in spikes
m[s, :] = rand(LogNormal(3., 0.5), x)'
end
return m
end

# 100 species in 10 samples, with every 10th bug a bit more abundant
bugs = spikein(1:10:100, 100, 10)

abund = abundancetable(bugs,
["sample_$x" for x in 1:10],
["species_$x" for x in 1:100]);

Some convenience plotting types are available using [`RecipesBase`](https://github.com/juliaplots/recipesbase.jl) and
[StatPlots](https://github.com/juliaplots/StatPlots.jl)
relativeabundance!(abund)
abundanceplot(abund, xticks=(1:10, samplenames(abund)), xrotation=45)

savefig("abundanceplot.png"); nothing # hide
```

```@repl 1
using StatPlots
![abundance plot](./abundanceplot.png)

srand(1) # hide
Perhaps you have some metadata that you'd like to add as well:

abund = abundancetable(
rand(100, 10),
["sample_$x" for x in 1:10],
["feature_$x" for x in 1:100]);
relativeabundance!(abund)
```@example 1
labels = ["a","a","b","a","b","b","b","b","a","a"]

abundanceplot(abund)
plot(
abundanceplot(abund, xticks=(1:10, samplenames(abund)), xrotation=45),
plot(annotationbar(labels)),
layout=grid(2,1, heights=[0.9,0.1]))

savefig("abundanceplot.png"); nothing # hide
savefig("abundanceplot-annotations.png"); nothing # hide
```

![](abundanceplot.png)
![abundance plot with annotations](./abundanceplot-annotations.png)
82 changes: 34 additions & 48 deletions docs/src/distances.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,76 @@
# Working with Distances / Dissimilarity

Quite often, it's useful to boil stuff down to distances between samples. For
this, I'm using an interface with `Distances.jl` to generate a symetric
`DistanceMatrix`, which also contains a vector for samples, and a field
specifying which type of distance was used to calulate it. You can load one
in manually, or generate it from an `AbundanceTable`.
Quite often, it's useful to boil stuff down to distances between samples.
`AbundanceTable`s can be used with the `pairwise()` function
from [`Distances.jl`](https://github.com/JuliaStats/Distances.jl)
to get a symetric distance matrix.

```@repl 2
```@example 2
using Distances
using Microbiome

abund = abundancetable([1 3 0;
4 8 3;
5 0 4]);

dm = getdm(abund, BrayCurtis())
dm = pairwise(BrayCurtis(), abund, dims=2)
```

I've also implemented a method to do a principle coordinates analysis. If
necessary, you can include `correct_neg=true` to use the correction method
described in [Lingoes (1971)](http://dx.doi.org/10.1007/BF02291398)
To plot this, use the `MDS` or `PCA` implementations
from [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl) [^1]
and plotting functionality
from [StatsPlots](https://github.com/JuliaPlots/StatsPlots.jl)[^2].

```@repl 2
p = pcoa(dm)
```@example 2
using MultivariateStats
using StatsPlots

eigenvalue(p, 2)
principalcoord(p, 1)
variance(p, [1,2])
```

## Plotting

**NOTE: The following functions are not currently working - I've moved them to a new package to simplify dependencies. I'm leaving the docs for now as a reference - see `Microbiome.jl` versions 0.2.1 and below for working versions**

Some convenience plotting types are available using [`RecipesBase`](https://github.com/juliaplots/recipesbase.jl).
mds = fit(MDS, dm, distances=true)

```@repl 2
using StatPlots
plot(mds)

srand(1) # hide
abund = abundancetable(
rand(100, 10),
["sample_$x" for x in 1:10],
["feature_$x" for x in 1:100]);

dm = getdm(abund, BrayCurtis());
p = pcoa(dm, correct_neg=true);

plot(p, title="Random PCoA")
savefig("pcoplot.png"); nothing # hide
savefig("mds.png"); nothing # hide
```

![pcoa plot](pcoplot.png)
![mds plot](./mds.png)

### Optimal Leaf Ordering

I've also provided a plotting recipe for making treeplots for `Hclust` objects
from the [`Clustering.jl`](http://github.com/JuliaStats/Clustering.jl) package:
I also wrote a plotting recipe for making treeplots for `Hclust` objects
from the [`Clustering.jl`](http://github.com/JuliaStats/Clustering.jl) package,
and the recipe for plotting was moved into StatsPlots:

```@repl 2
```@example 2
using Clustering

dm = [0. .1 .2
.1 0. .15
.2 .15 0.];

h = hclust(dm, :single);
h.labels = ["a", "b", "c"];
h = hclust(dm, linkage=:single);

hclustplot(h)
plot(h)
savefig("hclustplot1.png"); nothing # hide
```

![hclust plot 1](hclustplot1.png)
![hclust plot 1](./hclustplot1.png)

Note that even though this is a valid tree, the leaf `a` is closer to leaf `c`,
despite the fact that `c` is more similar to `b` than to `a`. This can be fixed
with a method derived from the paper:

[Bar-Joseph et. al. "Fast optimal leaf ordering for hierarchical clustering." _Bioinformatics_. (2001)](https://doi.org/10.1093/bioinformatics/17.suppl_1.S22)
[Bar-Joseph et. al. "Fast optimal leaf ordering for hierarchical clustering." _Bioinformatics_. (2001)](https://doi.org/10.1093/bioinformatics/17.suppl_1.S22)[^3]

```@repl 2
optimalorder!(h, dm)
hclustplot(h)
```@example 2
h2 = hclust(dm, linkage=:single, branchorder=:optimal);

plot(h2)

savefig("hclustplot2.png"); nothing # hide
```

![hclust plot 1](hclustplot2.png)
![hclust plot 1](./hclustplot2.png)

[^1]: Requires https://github.com/JuliaStats/MultivariateStats.jl/pull/85
[^2]: Requires https://github.com/JuliaPlots/StatsPlots.jl/pull/152
[^3]: Requires https://github.com/JuliaStats/Clustering.jl/pull/170
4 changes: 3 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Microbiome.jl <small>For analysis of microbiome and microbial community data</small>
# Microbiome.jl

## For analysis of microbiome and microbial community data

[![Latest Release](https://img.shields.io/github/release/BioJulia/Microbiome.jl.svg)](https://github.com/BioJulia/Microbiome.jl/releases/latest)
[![Microbiome](http://pkg.julialang.org/badges/Microbiome_0.6.svg)](http://pkg.julialang.org/?pkg=Microbiome)
Expand Down
Loading