-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.Rmd
140 lines (114 loc) · 4.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE, message=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
*Package is work in progress! If you encounter errors /
problems, please file an issue or make a PR.*

[](https://codecov.io/gh/lorenzwalthert/gitsum)
[](https://travis-ci.org/lorenzwalthert/gitsum)
[](https://ci.appveyor.com/project/lorenzwalthert/gitsum)
# Introduction
This package parses a git repository history to collect
comprehensive information about the activity in the repo. The parsed
data is made available to the user in a tabular format. The package can also
generate reports based on the parse data. You can install the development
version from GitHub.
```{r, eval=FALSE}
remotes::install_github("lorenzwalthert/gitsum")
```
There are two main functions for parsing the history, both return tabular data:
* `parse_log_simple()` is a relatively fast parser and returns a tibble with
one commit per row. There is no file-specific information.
* `parse_log_detailed()` outputs a nested tibble and for each commit, the names
of the amended files, number of lines changed ect. available. This function
is slower.
`report_git()` creates a html, pdf, or word report with the parsed log data
according to a template. Templates can be created by the user or a template
from the `gitsum` package can be used.
Let's see the package in action.
```{r, message=FALSE, warning=FALSE}
library("gitsum")
library("tidyverse")
library("forcats")
```
We can obtain a parsed log like this:
```{r, include=FALSE}
remove_gitsum()
```
```{r}
init_gitsum()
tbl <- parse_log_detailed() %>%
select(short_hash, short_message, total_files_changed, nested)
tbl
```
Since we used `parse_log_detailed()`, there is detailed file-specific
information available for every commit:
```{r}
tbl$nested[[3]]
```
Since the data has such a high resolution, various graphs, tables etc. can be
produced from it to provide insights into the git history.
# Examples
Since the output of `git_log_detailed()` is a nested tibble, you can work on it
as you work on any other tibble.
Let us first have a look at who comitted to this repository:
```{r}
log <- parse_log_detailed()
log %>%
group_by(author_name) %>%
summarize(n = n())
```
We can also investigate how the number of lines of each file in the R
directory evolved. For that, we probaly want to view files with changed names
as one file. Also, we probably don't want to see boring plots for files that
got changed only a few times. Let's focus on files that were changed in
at least five commits.
```{r per_file, message=FALSE, warning=FALSE}
lines <- log %>%
unnest_log() %>%
set_changed_file_to_latest_name() %>%
add_line_history()
r_files <- grep("^R/", lines$changed_file, value = TRUE)
to_plot <- lines %>%
filter(changed_file %in% r_files) %>%
add_n_times_changed_file() %>%
filter(n_times_changed_file >= 10)
ggplot(to_plot, aes(x = date, y = current_lines)) +
geom_step() +
scale_y_continuous(name = "Number of Lines", limits = c(0, NA)) +
facet_wrap(~changed_file, scales = "free_y")
```
Next, we want to see which files were contained in most commits:
```{r ggplot1}
log %>%
unnest_log() %>%
mutate(changed_file = fct_lump(fct_infreq(changed_file), n = 10)) %>%
filter(changed_file != "Other") %>%
ggplot(aes(x = changed_file)) + geom_bar() + coord_flip() +
theme_minimal()
```
We can also easily get a visual overview of the number of insertions & deletions in commits over time:
```{r ggplot2}
commit.dat <- data.frame(
edits = rep(c("Insertions", "Deletions"), each = nrow(log)),
commit = rep(1:nrow(log), 2),
count = c(log$total_insertions, -log$total_deletions))
ggplot(commit.dat, aes(x = commit, y = count, fill = edits)) +
geom_bar(stat = "identity", position = "identity") +
theme_minimal()
```
Or the number of commits broken down by day of the week:
```{r ggplot3}
log %>%
mutate(weekday = factor(weekday, c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))) %>%
ggplot(aes(x = weekday)) + geom_bar() +
theme_minimal()
```