-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstimulus-timings.Rmd
132 lines (109 loc) · 3.67 KB
/
stimulus-timings.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "Stimulus Timings"
author: "TJ Mahr"
date: "May 2, 2017"
output:
rmarkdown::github_document: default
ghdown::github_html_document: default
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
echo = TRUE,
comment = "#>",
collapse = TRUE)
```
Over the course of the longitudinal study, we re-recorded the stimuli used in
our word recognition experiments. I would like double-check when the stimuli
were changed in each experiment. This scripts iterates over the wav files in
each year of study x experiment x dialect version and computes the durations of
the files.
## Helpers
```{r}
library(purrr)
library(dplyr, warn.conflicts = FALSE)
# Measure duration of a wav file in ms
get_wave_duration <- function(filepath) {
info <- tuneR::readWave(filepath, header = TRUE)
# milliseconds = (1000 ms per second / samples per second) * num samples
(1000 / info$sample.rate) * info$samples
}
find_waves <- function(dirpath) {
list.files(dirpath, pattern = ".wav|.WAV",
recursive = TRUE, full.names = TRUE)
}
```
List all the wav files in each of the longitudinal study folders and compute
the duration.
```{r}
wav_files <- list(tp1 = "tp1", tp2 = "tp2", tp3 = "tp3") %>%
# find the files in each study, creating a data-frame for each study
map(find_waves) %>%
map(~ data_frame(file = .x)) %>%
# collapse to a single data-frame
bind_rows(.id = "study") %>%
mutate(
token = basename(file),
duration = map_dbl(file, get_wave_duration)) %>%
select(study, token, duration, file)
wav_files
```
Deduce dialect and experiment name.
```{r}
wav_files <- wav_files %>%
mutate(
task = file %>% stringr::str_extract("MP|RWL"),
dialect = file %>% stringr::str_extract("AAE|SAE")) %>%
select(study, task, dialect, token, duration, file)
wav_files
```
For each experiment, count the number of token _durations_. This count will
reveal whether the tokens in an experiment were normalized to have the same
duration.
```{r}
wav_files %>%
group_by(study, task, dialect) %>%
summarise(num_durations = n_distinct(duration))
```
It looks like the RWL experiments did *not* have normalized tokens.
Now, I would like to exclude the files that were not the main nouns used in the
experiments. Our trials involved three parts: a carrier phrase, a noun, and an
attention-getter phrase. For example, _find the_ [carrier phrase] _dog_ [noun]
... _check it out_ [attention getter]. I'm going to clean up the token filenames
and figure out a pattern that will filter out nouns versus non-nouns.
```{r}
# Strip off junk from file name to see the main content of it
words <- wav_files %>%
getElement("token") %>%
stringr::str_replace_all("AAE|SAE", "") %>%
stringr::str_replace_all("\\d", "") %>%
stringr::str_replace_all("_[A-Z]_", "") %>%
stringr::str_replace_all("_", "") %>%
stringr::str_replace_all(".wav", "") %>%
unique() %>%
sort()
words
non_noun <- "check|Check|Fin|Fun|look|Look|See|this"
noun_files <- wav_files %>%
filter(!stringr::str_detect(token, non_noun))
```
Now, there are fewer file durations per experiment.
```{r}
noun_files %>%
group_by(study, task, dialect) %>%
summarise(num_durations = n_distinct(duration))
```
And I can get what I would like, which is the duration of the noun tokens in
each experiment.
```{r}
noun_files %>%
group_by(study, task, dialect) %>%
summarise(
n_durations = n_distinct(duration),
min_dur = min(duration) %>% round(1),
med_dur = median(duration) %>% round(1),
max_dur = max(duration) %>% round(1)) %>%
ungroup() %>%
knitr::kable()
```
Okay, that confirms what I thought. The stimuli were re-recorded after TP1, so
the same files are used in TP2 and TP3.