-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path20-standalone-cad.Rmd
808 lines (547 loc) · 46.6 KB
/
20-standalone-cad.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
# (PART\*) CAD applications {-}
# Standalone CAD {#standalone-cad-radiologists}
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
```
## How much finished 99% {#standalone-cad-radiologists-how-much-finished}
```{r, include=FALSE}
source("R/collapse_rows_df.R", local = knitr::knit_global())
```
## Introduction {#standalone-cad-radiologists-introduction}
In the US the majority of screening mammograms are analyzed by computer aided detection (CAD) algorithms [@rao2010widely]. Almost all major imaging device manufacturers provide CAD as part of their imaging workstation display software. In the United States CAD is approved for use as a second reader, i.e., the radiologist first interprets the images (typically 4 views, 2 views of each breast) without CAD and then CAD information (i.e., cued suspicious regions, possibly shown with associated probabilities of malignancies) is shown and the radiologist has the opportunity to revise the initial interpretation. In response to the FDA-approved second reader usage, the evolution of CAD algorithms has been guided mainly by comparing observer performance of radiologists with and without CAD.
Clinical CAD systems sometimes only report the locations of suspicious regions, i.e., it may not provide ratings. Analysis of this type of date is deferred to a following **TBA** chapter. However, a malignancy index (a continuous variable) for every CAD-found suspicious region is available to the algorithm designer [@edwards2002maximum]. Standalone performance, i.e., performance of designer-level CAD by itself, regarded as an algorithmic reader, vs. radiologists, is rarely measured. In breast cancer screening I am aware of only one study [@hupse2013standalone] where standalone performance was measured. [Standalone performance has been measured in CAD for computed tomography colonography, chest radiography and three dimensional ultrasound [@hein2010computeraided; @summers2008performance; @taylor2006computerassisted; @deBoo2011computeraided; @tan2012computeraided]].
One possible reason for not measuring standalone performance of CAD is the lack of an accepted assessment method for such measurements. This chapter removes that impediment. It describes a method for comparing standalone performance of designer-level CAD to a group of radiologists interpreting the same cases and compares the method to those described in two relevant publications [@hupse2013standalone; @kooi2016comparison].
## Overview {#standalone-cad-radiologists-overview}
This chapter extends the method used in a study of standalone CAD performance [@hupse2013standalone], termed one-treatment random-reader fixed case or **1T-RRFC** analysis, since CAD is treated as an additional reader within a single treatment and since it only accounts for reader variability but does not account for case-variability.
The extension includes the effect of case-sampling variability and is hence termed one-treatment random-reader random-case or **1T-RRRC** analysis. The method is based on an existing method allowing comparison of the average performance of readers in a single treatment to a specified value. The key modification is to regard the difference in performance between radiologists over CAD as a figure of merit to which the existing work is directly applicable. The 1T-RRRC method is compared to 1T-RRFC.
The 1T-RRRC method is also compared to an unorthodox usage of conventional multiple-treatment multiple-reader method, termed **2T-RRRC** analysis, which involves replicating the CAD ratings as many times as there are radiologists, in effect simulating a second treatment, i.e., CAD is regarded as the second treatment (with identical readers within this treatment) to which existing methods (DBM or OR, as described in [RJafrocRocBook](https://dpc10ster.github.io/RJafrocRocBook/dbm-analysis-significance-testing.html)) is applied.
`
## Methods {#standalone-cad-radiologists-methods}
Summarized are two relevant studies of CAD vs. radiologists in mammography. This is followed by comments on the methods used in the two studies. The second study used multi-treatment multi-reader receiver operating characteristic (ROC) software in an unorthodox way. A statistical model and analysis method is described that avoids the unorthodox usage of ROC software and has fewer model parameters.
### Studies assessing performance of CAD vs. radiologists {#standalone-cad-radiologists-two-previous-studies}
The first study [@hupse2013standalone] measured performance in finding and localizing lesions in mammograms, i.e., visual search was involved, while the second study [@kooi2016comparison] measured lesion classification performance between non-diseased and diseased regions of interest (ROIs) previously found on mammograms by an independent algorithmic reader, i.e., visual search was not involved.
#### Study - 1 {#standalone-cad-radiologists-study1}
The first study [@hupse2013standalone] compared standalone performance of a CAD device to that of 9 radiologists interpreting the same cases (120 non-diseased and 80 with a single malignant mass per case). It used the LROC (localization ROC) paradigm [@starr1975visual; @metz1976observer; @swensson1996unified], in which the observer gives an overall rating for presence of disease (an integer 0 to 100 scale was used) and indicates the location of the most suspicious region. On a non-diseased case the rating is classified as a false positive (FP) but on a diseased case it is classified as a *correct localization* (CL) if the location is sufficiently close to the lesion and otherwise it is classified as an *incorrect localization*. For a given reporting threshold, the number of correct localizations divided by the number of diseased cases estimates the probability of correct localization (PCL) at that threshold. On non-diseased cases the number of false positives (FPs) divided by the number of non-diseased cases estimates the probability of a false positive, or false positive fraction (FPF), at that threshold. The plot of PCL (ordinate) vs. FPF defines the empirical LROC curve. Study - 1 used as figures of merit (FOMs) the interpolated PCL at two values of FPF, specifically FPF = 0.05 and FPF = 0.2, denoted $\text{PCL}_{0.05}$ and $\text{PCL}_{0.2}$, respectively. A t-test between the radiologist $\text{PCL}_{\text{FPF}}$ values and that of CAD was used to compute the two-sided p-value for rejecting the NH of equal performance. Study - 1 reported p-value = 0.17 for $\text{PCL}_{0.05}$ and p-value $\leq$ 0.001, with CAD being inferior, for $\text{PCL}_{0.2}$.
#### Study - 2 {#standalone-cad-radiologists-study2}
The second study [@kooi2016comparison] used 199 diseased and 199 non-diseased ROIs extracted by an independent CAD algorithm. These were analyzed by a different CAD algorithmic observer from that used to determine the ROIs and by four expert radiologists. In either case the ROC paradigm was used (i.e., a rating was obtained for each ROI) The figure of merit was the empirical area (AUC) under the respective ROC curves (one for each radiologist and one for CAD). The p-value for the difference in AUCs between the average radiologist's AUC and CAD AUC was determined using an unorthodox application of the Dorfman-Berbaum-Metz [@dorfman1992receiver] multiple-treatment multiple-reader multiple-case (DBM-MRMC) software.
The application was unorthodox in the sense that in the input data file **radiologists and CAD were entered as two treatments**. In conventional (or orthodox) DBM-MRMC each reader provides two ratings per case and the data file would consist of paired ratings of a set of cases interpreted by 4 readers. To accommodate the paired data structure assumed by the software, the authors of Study - 2 **replicated the CAD ratings four times in the input data file**, as explained in the caption to Table \@ref(tab:standalone-cad-table-conventional). By this artifice they converted a single-treatment 5-reader (4 radiologists plus CAD) data file to a two-treatment 4-reader data file in which the four readers in treatment 1 were the radiologists, and the four "readers" in treatment 2 were CAD replicated ratings. Note that for each case the four readers in the second treatment had identical ratings. In Table 1 the replicated CAD readers are labeled C1, C2, C3 and C4.
```{r standalone-cad-table-conventional, echo=FALSE}
cells = array(dim = c(15,9))
cells[1,] <- c("R1", "1", "1", "75", "", "R1", "1", "1", "75")
cells[2,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[3,] <- c("R1", "1", "398", "0", "", "R1", "1", "398", "0")
cells[4,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[5,] <- c("R4", "1", "1", "50", "", "R4", "1", "1", "50")
cells[6,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[7,] <- c("R4", "1", "398", "25", "", "R4", "1", "398", "25")
cells[8,] <- rep("", 9)
cells[9,] <- c("R1", "2", "1", "45", "", "C1", "2", "1", "55")
cells[10,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[11,] <- c("R1", "2", "398", "25", "", "C1", "2", "398", "5")
cells[12,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[13,] <- c("R4", "2", "1", "95", "", "C4", "2", "1", "55")
cells[14,] <- c("...", "...", "...", "...", "", "...", "...", "...", "...")
cells[15,] <- c("R4", "2", "398", "20", "", "C4", "2", "398", "5")
df <- as.data.frame(cells)
colnames(df) <- c("Reader", "Treatment", "Case", "Rating", "", "Reader", "Treatment", "Case", "Rating")
kableExtra::kbl(df, caption = "The differences between the data structures in conventional DBM-MRMC analysis and the unorthodox application of the software used in Study - 2. There are four radiologists, labeled R1, R2, R3 and R4 interpreting 398 cases labeled 1, 2, …, 398, in two treatments, labeled 1 and 2. Sample ratings are shown only for the first and last radiologist and the first and last case. In the first four columns, labeled \"Standard DBM-MRMC\", each radiologist interprets each case twice. In the next four columns, labeled \"Unorthodox DBM-MRMC\", the radiologists interpret each case once. CAD ratings are replicated four times to effectively create the second \"treatment\". The quotations emphasize that there is, in fact, only one treatment. The replicated CAD observers are labeled C1, C2, C3 and C4.", booktabs = TRUE) %>% kableExtra::kable_styling() %>% kableExtra::add_header_above(c("Standard DBM-MRMC" = 4, "", "Unorthodox DBM-MRMC" = 4))
```
Study -- 2 reported a not significant difference between CAD and the radiologists (p = 0.253).
#### Comments {#standalone-cad-radiologists-comments}
For the purpose of this work, which focuses on the respective analysis methods, the difference in observer performance paradigms between the two studies, namely a search paradigm in Study - 1 vs. an ROI classification paradigm in Study -- 2, is inconsequential. The paired t-test used in Study - 1 treats the case-sample as fixed. In other words, the analysis is not accounting for case-sampling variability but it is accounting for reader variability. While not explicitly stated, the reason for the unorthodox analysis in Study -- 2 was the desire to include case-sampling variability. Prof. Karssemeijer (private communication, 10/27/2017) had consulted with a few ROC experts to determine if the procedure used in Study -- 2 was valid, and while the experts thought it was probably valid they were not sure.
In what follows, the analysis in Study -- 1 is referred to as **single-treatment random-reader fixed-case (1T-RRFC)** while that in Study -- 2 is referred to as **dual-treatment random-reader random-case (2T-RRRC)**.
### The 1T-RRFC analysis model
The sampling model for the FOM is:
```{=tex}
\begin{equation}
\left.
\begin{aligned}
\theta_j=\mu+R_j \\
\left (j = 1,2,...J \right )
\end{aligned}
\right \}
(\#eq:standalone-1t-rrfc)
\end{equation}
```
Here $\mu$ is a constant, $\theta_j$ is the FOM for reader $j$, and $R_j$ is the random contribution for reader $j$ distributed as:
```{=tex}
\begin{equation}
R_j \sim N\left ( 0,\sigma_R^2 \right )
(\#eq:standalone-cad-2t-rrfc-rj-sampling)
\end{equation}
```
Because of the assumed normal distribution of $R_j$, in order to compare the readers to a fixed value, that of CAD denoted $\theta_0$, one uses the (unpaired) t-test, as done in Study -- 1. As evident from the model, no allowance is made for case-sampling variability, which is the reason for calling it the 1T-RRFC method.
Performance of CAD on a fixed dataset does exhibit within-CAD variability, i.e., CAD applied repeatedly to a fixed dataset does not always produce the same mark-rating data. However, this source of within-CAD variability is much smaller than *inter-reader* variability of radiologists interpreting the same dataset. The *within-reader* variability of radiologists is smaller than *inter-reader* variability and *within-CAD* variability is even smaller. For this reason one is justified in regarded $\theta_0$ as a fixed quantity for a given dataset. Varying the dataset will result in different values for $\theta_0$ reflecting case sampling variability which needs to be accounted for as done in the following analyses.
### The 2T-RRRC analysis model {#standalone-cad-radiologists-2TRRRC-anlaysis}
This could be termed the conventional or the orthodox method. There are two treatments and the study design is fully crossed: each reader interprets each case in each treatment, i.e., the data structure is as in the left half of Table \@ref(tab:standalone-cad-table-conventional).
The following approach, termed 2T-RRRC, uses the Obuchowski and Rockette (OR) figure of merit sampling model [@obuchowski1995hypothesis]. The OR model is:
```{=tex}
\begin{equation}
\theta_{ij\{c\}}=\mu+\tau_i+\left ( \tau \text{R} \right )_{ij}+\epsilon_{ij\{c\}}
(\#eq:standalone-cad-model-2t-rrrc)
\end{equation}
```
Assuming two treatments, $i$ ($i = 1, 2$) is the treatment index, $j$ ($j = 1, ..., J$) is the reader index, and $k$ ($k = 1, ..., K$) is the case index, and $\theta_{ij\{c\}}$ is the figure of merit in treatment $i$ for reader $j$ and case-sample $\{c\}$. A case-sample is a set or ensemble of cases, diseased and non-diseased, and different integer values of $c$ correspond to different case-samples.
The first two terms on the right hand side of Eqn. \@ref(eq:standalone-cad-model-2t-rrrc) are fixed effects (average performance and treatment effect, respectively). The next two terms are random effect variables that, by assumption, are sampled as follows:
```{=tex}
\begin{equation}
\left.
\begin{aligned}
R_j \sim N\left ( 0,\sigma_R^2 \right )\\
\left ( \tau R \right )_{ij} \sim N\left ( 0,\sigma_{\tau R}^2 \right )\\
\end{aligned}
\right \}
(\#eq:standalone-cad-2t-r-taur-sampling)
\end{equation}
```
The terms $R_j$ represents the random treatment-independent contribution of reader $j$, modeled as a sample from a zero-mean normal distribution with variance $\sigma_R^2$, $\left ( \tau R \right )_{ij}$ represents the random treatment-dependent contribution of reader $j$ in treatment $i$, modeled as a sample from a zero-mean normal distribution with variance $\sigma_{\tau R}^2$. The sampling of the last (error) term is described by:
```{=tex}
\begin{equation}
\epsilon_{ij\{c\}}\sim N_{I \times J}\left ( \vec{0} , \Sigma \right )
(\#eq:standalone-cad-2t-eps-sampling)
\end{equation}
```
Here $N_{I \times J}$ is the $I \times J$ variate normal distribution and $\vec{0}$, a $I \times J$ length zero-vector, represents the mean of the distribution. The $\{I \times J\} \times \{I \times J\}$ dimensional covariance matrix $\Sigma$ is defined by 4 parameters, $\text{Var}$, $\text{Cov}_1$, $\text{Cov}_2$, $\text{Cov}_3$, defined as follows:
```{=tex}
\begin{equation}
\text{Cov} \left (\epsilon_{ij\{c\}},\epsilon_{i'j'\{c\}} \right ) =
\left\{\begin{matrix}
\text{Var} \; (i=i',j=j') \\
\text{Cov1} \; (i\ne i',j=j')\\
\text{Cov2} \; (i = i',j \ne j')\\
\text{Cov3} \; (i\ne i',j \ne j')
\end{matrix}\right\}
(\#eq:standalone-cad-2t-rrrc-cov)
\end{equation}
```
Software {U of Iowa and `RJafroc`} yields estimates of all terms appearing on the right hand side of Eqn. \@ref(eq:standalone-cad-2t-rrrc-cov). Excluding fixed effects the model represented by Eqn. \@ref(eq:standalone-cad-model-2t-rrrc) contains six parameters:
```{=tex}
\begin{equation}
\sigma_R^2, \sigma_{\tau R}^2, \text{Var}, \text{Cov}_1, \text{Cov}_2, \text{Cov}_3
(\#eq:standalone-cad-2t-rrrc-varcom)
\end{equation}
```
The meanings the last four terms are described in [@hillis2007comparison; @obuchowski1995hypothesis; @hillis2005comparison; @chakraborty2017observer]. Briefly, $\text{Var}$ is the variance of a reader's FOMs, in a given treatment, over interpretations of different case-samples, averaged over readers and treatments; $\text{Cov}_1/\text{Var}$ is the correlation of a reader's FOMs, over interpretations of different case-samples in different treatments, averaged over all different-treatment same-reader pairings; $\text{Cov}_2/\text{Var}$ is the correlation of different reader's FOMs, over interpretations of different case-samples in the same treatment, averaged over all same- treatment different-reader pairings and finally, $\text{Cov}_3/\text{Var}$ is the correlation of different reader's FOMs, over interpretations of different case-samples in different treatments, averaged over all different-treatment different-reader pairings. One expects the following inequalities to hold:
```{=tex}
\begin{equation}
\text{Var} \geq \text{Cov}_1 \geq \text{Cov}_2 \geq \text{Cov}_3
(\#eq:standalone-cad-2t-rrrc-varcom-ordering)
\end{equation}
```
In practice, since one is usually limited to one case-sample, i.e., $c = 1$, resampling techniques [@efron1994introduction] -- e.g., the jackknife -- are used to estimate these terms.
### The 1T-RRRC analysis model {#standalone-cad-radiologists-1TRRRC-anlaysis}
The difference from the approach in Study - 2, and the main contribution of this work, is to regard standalone CAD as a different reader, not as a different treatment. This section describes a single treatment method for analyzing readers and CAD, where CAD is regarded as an additional reader and artificially replicated CAD data becomes unnecessary. Accordingly the proposed method is termed **single-treatment random-reader random-case (1T-RRRC)** analysis.
The starting point is the [@obuchowski1995hypothesis] model for a single treatment, which for the radiologists (i.e., *excluding* CAD) interpreting in a single-treatment reduces to the following model:
```{=tex}
\begin{equation}
\theta_{j\{c\}}=\mu+R_j+\epsilon_{j\{c\}}
(\#eq:standalone-or-model-single-treatment)
\end{equation}
```
$\theta_{j\{c\}}$ is the figure of merit for radiologist $j$ ($j = 1, 2, ..., J$) interpreting case-sample $\{c\}$; $R_j$ is the random effect of radiologist $j$ and $\epsilon_{j\{c\}}$ is the error term. For single-treatment multiple-reader interpretations the error term is distributed as:
```{=tex}
\begin{equation}
\epsilon_{j\{c\}}\sim N_{J}\left ( \vec{0} , \Sigma \right )
(\#eq:standalone-cad-1t-eps-sampling)
\end{equation}
```
The $J \times J$ covariance matrix $\Sigma$ is defined by two parameters, $\text{Var}$ and $\text{Cov}_2$, as follows:
```{=tex}
\begin{equation}
\Sigma_{jj'} = \text{Cov}\left ( \epsilon_{j\{c\}}, \epsilon_{j'\{c\}} \right )
=
\left\{\begin{matrix}
\text{Var} & j = j'\\
\text{Cov}_2 & j \neq j'
\end{matrix}\right.
(\#eq:standalone-cad-1t-var-cov2-sampling)
\end{equation}
```
In practice the terms $\text{Var}$ and $\text{Cov}_2$ are estimated using the jackknife method.
#### Single treatment analysis for radiologists
Hillis [@hillis2005comparison; @hillis2007comparison] has described how to use the single treatment model \@ref(eq:standalone-or-model-single-treatment) to compare a groups of radiologists' average performance to a fixed value, in effect the $\text{NH}: \mu = \mu_0$, where $\mu_0$ is a pre-specified constant.
One might be tempted to set $\mu_0$ equal to the performance of CAD but that would not be accounting for the fact that the performance of CAD is itself a random variable whose case-sampling variability needs to be accounted for.
#### Adaptation of single treatment analysis to accommodate CAD
Instead, the following model is used for the figure of merit of the radiologists **and** CAD (note that $j = 0$ is used to denote the CAD algorithmic reader):
```{=tex}
\begin{equation}
\theta_{j\{c\}} = \theta_{0\{c\}} + \Delta \theta + R_j + \epsilon_{j\{c\}}\\
j=1,2,...J
(\#eq:standalone-cad-1t-thetaj)
\end{equation}
```
$\theta_{0\{c\}}$ is the CAD figure of merit for case-sample $\{c\}$ and $\Delta \theta$ is the average figure of merit increment of the radiologists over CAD. To reduce this model to one to which Hillis' formulae are directly applicable, one subtracts the CAD figure of merit from each radiologist's figure of merit for the same case-sample, and defines this as the difference figure of merit $\psi_{j\{c\}}$ , i.e.,
```{=tex}
\begin{equation}
\psi_{j\{c\}} = \theta_{j\{c\}} - \theta_{0\{c\}}
(\#eq:standalone-cad-diff-reader-def)
\end{equation}
```
Then Eqn. \@ref(eq:standalone-cad-1t-thetaj) reduces to:
```{=tex}
\begin{equation}
\psi_{j\{c\}} = \Delta \theta + R_j + \epsilon_{j\{c\}}
(\#eq:standalone-cad-1t-psi)
\end{equation}
```
Eqn. \@ref(eq:standalone-cad-1t-psi) is identical in form to Eqn. \@ref(eq:standalone-or-model-single-treatment) excepting that the figure of merit on the left hand side of Eqn. \@ref(eq:standalone-cad-1t-psi) is a *difference FOM*, that between the radiologist's and CAD, i.e., describing a model for $J$ radiologists interpreting a common case set, each of whose performances is measured *relative* to that of CAD. Under the NH the expected difference is zero: $\text{NH:} \Delta \theta = 0$. The method [@hillis2005comparison; @hillis2007comparison] for single-treatment multiple-reader analysis is now directly applicable to the model described by Eqn. \@ref(eq:standalone-cad-1t-psi).
Apart from fixed effects, the model in Eqn. \@ref(eq:standalone-cad-1t-psi) contains three parameters:
```{=tex}
\begin{equation}
\sigma_R^2, \text{Var}, \text{Cov}_2
(\#eq:standalone-cad-1t-parms)
\end{equation}
```
Setting $\text{Var} = 0, \text{Cov}_2 = 0$ yields the 1T-RRFC model which contains only one random parameter, namely $\sigma_R^2$. One expects an identical estimate of this parameter using 1T-RRRC analyses.
## Implementation {#standalone-cad-radiologists-computational-details}
The three analyses, namely random-reader fixed-case (1T-RRFC), dual-treatment random-reader random-case (2T-RRRC) and single-treatment random-reader random-case (1T-RRRC), are implemented in `RJafroc`.
The following code shows usage of the software to generate the results. Note that `RJafroc::datasetCadLroc` is the LROC dataset and `RJafroc::dataset09` is the corresponding ROC dataset.
```{r standalone-cad-rjafroc-implementation, cache=TRUE, echo=TRUE}
RRFC_1T_PCL_0_05 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.05, method = "1T-RRFC")
RRRC_2T_PCL_0_05 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.05, method = "2T-RRRC")
RRRC_1T_PCL_0_05 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.05, method = "1T-RRRC")
RRFC_1T_PCL_0_2 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.2, method = "1T-RRFC")
RRRC_2T_PCL_0_2 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.2, method = "2T-RRRC")
RRRC_1T_PCL_0_2 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 0.2, method = "1T-RRRC")
RRFC_1T_PCL_1 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 1, method = "1T-RRFC")
RRRC_2T_PCL_1 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 1, method = "2T-RRRC")
RRRC_1T_PCL_1 <- RJafroc::StCadVsRad (RJafroc::datasetCadLroc,
FOM = "PCL", FPFValue = 1, method = "1T-RRRC")
RRFC_1T_AUC <- RJafroc::StCadVsRad (RJafroc::dataset09,
FOM = "Wilcoxon", method = "1T-RRFC")
RRRC_2T_AUC <- RJafroc::StCadVsRad (RJafroc::dataset09,
FOM = "Wilcoxon", method = "2T-RRRC")
RRRC_1T_AUC <- RJafroc::StCadVsRad (RJafroc::dataset09,
FOM = "Wilcoxon", method = "1T-RRRC")
```
The results are organized as follows:
* `RRFC_1T_PCL_0_05` contains the results of 1T-RRFC analysis for figure of merit = $\text{PCL}_{0.05}$.
* `RRRC_2T_PCL_0_05` contains the results of 2T-RRRC analysis for figure of merit = $\text{PCL}_{0.05}$.
* `RRRC_1T_PCL_0_05` contains the results of 1T-RRRC analysis for figure of merit = $\text{PCL}_{0.05}$.
* `RRFC_1T_PCL_0_2` contains the results of 1T-RRFC analysis for figure of merit = $\text{PCL}_{0.2}$.
* `RRRC_2T_PCL_0_2` contains the results of 2T-RRRC analysis for figure of merit = $\text{PCL}_{0.2}$.
* `RRRC_1T_PCL_0_2` contains the results of 1T-RRRC analysis for figure of merit = $\text{PCL}_{0.2}$.
* `RRFC_1T_AUC` contains the results of 1T-RRFC analysis for the Wilcoxon figure of merit.
* `RRRC_2T_AUC` contains the results of 2T-RRRC analysis for the Wilcoxon figure of merit.
* `RRRC_1T_AUC` contains the results of 1T-RRRC analysis for the Wilcoxon figure of merit.
The structures of these objects are illustrated with examples in the Appendix.
```{r do_one_ST_1T_RRFC, echo=FALSE}
do_one_ST_1T_RRFC <- function(x,fom,method){
dig <- 3
ciRad <- paste0("(",
format(x$CIAvgRadFom[1], scientific = TF, digits = 2),
",",
format(x$CIAvgRadFom[2], scientific = TF, digits = 2),
")")
ciDif <- paste0("(",
format(x$CIAvgDiffFom[1], scientific = TF, digits = 2),
",",
format(x$CIAvgDiffFom[2], scientific = TF, digits = 2)
,")")
ret <- c(
fom,
method,
format(x$fomCAD, scientific = TF, digits = dig),
"NA", # ci CAD
format(x$avgRadFom, scientific = TF, digits = dig),
ciRad, # ci RAD
format(x$avgDiffFom, scientific = TF, digits = dig),
ciDif, # ci DIFF
format(as.numeric(x$Tstat)^2, scientific = TF, digits = 2),
format(as.numeric(x$df), scientific = TF, digits = 2),
format(as.numeric(x$pval), scientific = TF, digits = 2))
return(seqinr::stresc(ret))
}
```
```{r do_one_ST_2T_RRRC, echo=FALSE}
do_one_ST_2T_RRRC <- function(x,fom,method){
dig <- 3
ciCad <- paste0("(",
format(x$ciAvgRdrEachTrt$CILower[1], scientific = TF, digits = 2),
",",
format(x$ciAvgRdrEachTrt$CIUpper[1], scientific = TF, digits = 2),
")")
ciRad <- paste0("(",
format(x$ciAvgRdrEachTrt$CILower[2], scientific = TF, digits = 2),
",",
format(x$ciAvgRdrEachTrt$CIUpper[2], scientific = TF, digits = 2),
")")
ciDif <- paste0("(",
format(x$ciDiffFom$CILower, scientific = TF, digits = 2),
",",
format(x$ciDiffFom$CIUpper, scientific = TF, digits = 2),
")")
ret <- c(
fom,
method,
format(x$fomCAD, scientific = TF, digits = dig),
ciCad, # ci CAD
format(x$avgRadFom, scientific = TF, digits = dig),
ciRad, # ci RAD
format(x$avgDiffFom, scientific = TF, digits = dig),
ciDif, # ci DIFF
format(x$FStat, scientific = TF, digits = 2),
format(x$df, scientific = TF, digits = 2),
format(x$pval, scientific = TF, digits = 2))
return(seqinr::stresc(ret))
}
```
```{r do_one_ST_1T_RRRC, echo=FALSE}
do_one_ST_1T_RRRC <- function(x,fom,method){
dig <- 3
ciRad <- paste0("(", format(x$CIAvgRad[1], scientific = TF, digits = 2),
",",
format(x$CIAvgRad[2], scientific = TF, digits = 2),
")")
ciDif <- paste0("(", format(x$CIAvgDiffFom[1], scientific = TF, digits = 2),
",",
format(x$CIAvgDiffFom[2], scientific = TF, digits = 2),
")")
ret <- c(
fom,
method,
format(x$fomCAD, scientific = TF, digits = dig),
"NA", # ci CAD
format(x$avgRadFom, scientific = TF, digits = dig),
ciRad, # ci RAD
format(x$avgDiffFom, scientific = TF, digits = dig),
ciDif, # ci DIFF
format(as.numeric(x$Tstat)^2, scientific = TF, digits = 2),
format(x$df, scientific = TF, digits = 2),
format(x$pval, scientific = TF, digits = 2))
return(seqinr::stresc(ret))
}
```
```{r allCellsSigTest, echo=FALSE}
allCells <- array("", dim = c(2,12,11))
TF <- FALSE
allCells[1,1,] <- do_one_ST_1T_RRFC (RRFC_1T_PCL_0_05,"PCL_0_05","1T-RRFC")
allCells[1,2,] <- do_one_ST_2T_RRRC (RRRC_2T_PCL_0_05,"PCL_0_05","2T-RRRC")
allCells[1,3,] <- do_one_ST_1T_RRRC (RRRC_1T_PCL_0_05,"PCL_0_05","1T-RRRC")
allCells[1,4,] <- do_one_ST_1T_RRFC (RRFC_1T_PCL_0_2,"PCL_0_2","1T-RRFC")
allCells[1,5,] <- do_one_ST_2T_RRRC (RRRC_2T_PCL_0_2,"PCL_0_2","2T-RRRC")
allCells[1,6,] <- do_one_ST_1T_RRRC (RRRC_1T_PCL_0_2,"PCL_0_2","1T-RRRC")
allCells[1,7,] <- do_one_ST_1T_RRFC (RRFC_1T_PCL_1,"PCL_1","1T-RRFC")
allCells[1,8,] <- do_one_ST_2T_RRRC (RRRC_2T_PCL_1,"PCL_1","2T-RRRC")
allCells[1,9,] <- do_one_ST_1T_RRRC (RRRC_1T_PCL_1,"PCL_1","1T-RRRC")
allCells[1,10,] <- do_one_ST_1T_RRFC (RRFC_1T_AUC,"Wilcoxon","1T-RRFC")
allCells[1,11,] <- do_one_ST_2T_RRRC (RRRC_2T_AUC,"Wilcoxon","2T-RRRC")
allCells[1,12,] <- do_one_ST_1T_RRRC (RRRC_1T_AUC,"Wilcoxon","1T-RRRC")
```
## Results {#standalone-cad-radiologists-results}
The three methods, 1T-RRFC, 2T-RRRC and 1T-RRRC, were applied to an LROC dataset similar to that used in Study -- 1 (I thank Prof. Karssemeijer for making this dataset available), Table \@ref(tab:standalone-cad-table2).
```{r standalone-cad-table2, echo=FALSE}
cells <- array(dim = c(12,11))
captionStr <- "Significance testing results for an LROC dataset. For each figure of merit (FOM) shown are results of RRRC, 2T-RRRC and 1T-RRRC analyses. Because it is accounting for an additional source of variability, each of the rows labeled RRRC yields a larger p-value and wider confidence interval than the corresponding row labeled RRFC. [$\\theta_0$ = FOM CAD; $\\theta_{\\bullet}$ = average FOM of radiologists; $\\psi_{\\bullet}$ = average FOM of radiologists minus CAD; CI= 95 percent confidence interval of quantity indicated by the subscript, F = F-statistic; ddf = denominator degrees of freedom; p = p-value for rejecting the null hypothesis: $\\psi_{\\bullet} = 0$.]"
for (j in 1:12) cells[j,] <- allCells[1,j,]
df <- as.data.frame(cells, stringsAsFactors = FALSE)
colnames(df) <-c("C1", "Analysis", "C3", "$CI_{\\theta_0}$", "C5", "$CI_{\\theta_{\\bullet}}$", "C7", "$CI_{\\psi_{\\bullet}}$", "F", "ddf", "p")
df <- collapse_rows_df(collapse_rows_df(collapse_rows_df(collapse_rows_df(df,C1),C3),C5),C7)
colnames(df) <-c("FOM", "Analysis", "$\\theta_0$", "$CI_{\\theta_0}$", "$\\theta_{\\bullet}$", "$CI_{\\theta_{\\bullet}}$", "$\\psi_{\\bullet}$", "$CI_{\\psi_{\\bullet}}$", "F", "ddf", "p")
# escape = FALSE is critical in getting Math right
kableExtra::kbl(df, caption = captionStr, booktabs = TRUE, escape = FALSE) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 10)
```
Results are shown for the following FOMs: $\text{PCL}_{0.05}$, $\text{PCL}_{0.2}$, $\text{PCL}_{1}$ and the empirical area (AUC) under the ROC curve estimated by the Wilcoxon statistic. The first two FOMs are identical to those used in Study -- 1. Columns 3 and 4 list the CAD FOM $\theta_0$ and its 95% confidence interval $CI_{\theta_0}$, columns 5 and 6 list the average radiologist FOM $\theta_{\bullet}$ (the dot symbol represents an average over the non-zero radiologist index j = 1,2,..., 9) and its 95% confidence interval $CI_{\theta_{\bullet}}$, columns 7 and 8 list the average difference FOM $\psi_{\bullet}$, i.e., radiologist average minus CAD, and its 95% confidence interval $CI_{\psi_{\bullet}}$, and the last three columns list the F-statistic, the denominator degrees of freedom (ddf) and the p-value for rejecting the null hypothesis (the numerator degree of freedom of the F-statistic is unity).
>
**The last three columns show that 2T-RRRC and 1T-RRRC analyses yield identical F-statistics, ddf and p-values**. So the intuition of the authors of Study -- 2, that the unorthodox method of using DBM -- MRMC software to account for both reader and case-sampling variability, turns out to be correct. If interest is solely in these statistics one is justified in using the unorthodox method. Important caveats are noted below.
Other results evident in Table \@ref(tab:standalone-cad-table2):
1. Where a direct comparison is possible, namely 1T-RRFC analysis using $\text{PCL}_{0.05}$ and $\text{PCL}_{0.2}$ as FOMs, the p-values in Table \@ref(tab:standalone-cad-table2) are very close to those reported in Study -- 1.
2. All FOMs (i.e., $\theta_0$, $\theta_{\bullet}$ and $\psi_{\bullet}$) in Table \@ref(tab:standalone-cad-table2) are independent of the method of analysis. However, the corresponding confidence intervals (i.e., $CI_{\theta_0}$, $CI_{\theta_{\bullet}}$ and $CI_{\psi_{\bullet}}$) depend on the analyses.
3. Since the CAD figure of merit is a constant no confidence interval is appropriate for it for either 1T-RRFC or 1T-RRRC analysis and the listed values are NA (not applicable). Since 2T-RRRC analysis assumes CAD is a different treatment the analysis lists a confidence interval that is correctly centered on the CAD value but is otherwise meaningless, i.e., it is an artifact of the unintended usage of the OR analysis method.
5. The p-value for either RRRC analyses (2T or 1T) is larger than the corresponding 1T-RRFC value. Accounting for case-sampling variability increases the p-value leading to less possibility of finding a significant difference.
6. The LROC FOMs increase as the value of FPF (the subscript) increases, a general feature of any partial curve based figure of merit, as is the observation that the area (AUC) under the ROC is larger than the largest PCL value.
7. Using either RRRC analyses ignoring localization information (i.e., using the AUC FOM) leads to a not-significant difference between CAD and the radiologists ($p$ = 0.32) while using localization information via the $\text{PCL}_1$ FOM yields a significant difference ($p$ = 0.041), consistent with the expectation that using localization information leads to increased statistical power.
8. Partial curve-based FOMs, such as $\text{PCL}_\text{FPF}$, lead, depending on the choice of $\text{FPF}$, to different conclusions on whether to reject the NH. Using either RRRC analyses the p-values decrease as $\text{FPF}$ increases (e.g., $ 0.67 > 0.042 > 0.041$). This trend is not observed for 1T-RRFC analysis which shows a "sweet-spot" effect where the p-value has a minimum for $\text{FPF} = 0.2$
Shown next, Table \@ref(tab:standalone-cad-table3), are the model-parameters corresponding to the three analyses.
```{r do_one_parms_1T_RRFC, echo=FALSE}
do_one_parms_1T_RRFC <- function(x,fom,method){
dig <- 3
ret <- c(
fom,
method,
format(x$varR, scientific = TF, digits = dig),
rep(as.character(NA),5))
return(seqinr::stresc(ret))
}
```
```{r do_one_parms_2T_RRRC, echo=FALSE}
do_one_parms_2T_RRRC <- function(x,fom,method){
RRRC_2T_PCL_0_05$ciAvgRdrEachTrt$CILower
dig <- 3
ret <- c(
fom,
method,
format(x$varR, scientific = TRUE, digits = 2),
format(x$varTR, scientific = TF, digits = dig),
format(x$cov1, scientific = TF, digits = dig),
format(x$cov2, scientific = TF, digits = dig),
format(x$cov3, scientific = TF, digits = dig),
format(x$varError, scientific = TF, digits = dig))
return(seqinr::stresc(ret))
}
```
```{r do_one_parms_1T_RRRC, echo=FALSE}
do_one_parms_1T_RRRC <- function(x,fom,method){
dig <- 3
ret <- c(
fom,
method,
format(x$varR, scientific = TF, digits = dig),
"NA",
"NA",
format(x$cov2, scientific = TF, digits = dig),
"NA",
format(x$varError, scientific = TF, digits = dig))
return(seqinr::stresc(ret))
}
```
```{r allCellsParameters, echo=FALSE}
allCells[2,1,1:8] <- do_one_parms_1T_RRFC (RRFC_1T_PCL_0_05,"PCL_0_05","1T-RRFC")
allCells[2,2,1:8] <- do_one_parms_2T_RRRC (RRRC_2T_PCL_0_05,"PCL_0_05","2T-RRRC")
allCells[2,3,1:8] <- do_one_parms_1T_RRRC (RRRC_1T_PCL_0_05,"PCL_0_05","1T-RRRC")
allCells[2,4,1:8] <- do_one_parms_1T_RRFC (RRFC_1T_PCL_0_2,"PCL_0_2","1T-RRFC")
allCells[2,5,1:8] <- do_one_parms_2T_RRRC (RRRC_2T_PCL_0_2,"PCL_0_2","2T-RRRC")
allCells[2,6,1:8] <- do_one_parms_1T_RRRC (RRRC_1T_PCL_0_2,"PCL_0_2","1T-RRRC")
allCells[2,7,1:8] <- do_one_parms_1T_RRFC (RRFC_1T_PCL_1,"PCL_1","1T-RRFC")
allCells[2,8,1:8] <- do_one_parms_2T_RRRC (RRRC_2T_PCL_1,"PCL_1","2T-RRRC")
allCells[2,9,1:8] <- do_one_parms_1T_RRRC (RRRC_1T_PCL_1,"PCL_1","1T-RRRC")
allCells[2,10,1:8] <- do_one_parms_1T_RRFC (RRFC_1T_AUC,"Wilcoxon","1T-RRFC")
allCells[2,11,1:8] <- do_one_parms_2T_RRRC (RRRC_2T_AUC,"Wilcoxon","2T-RRRC")
allCells[2,12,1:8] <- do_one_parms_1T_RRRC (RRRC_1T_AUC,"Wilcoxon","1T-RRRC")
```
```{r standalone-cad-table3, echo=FALSE}
cells = array(dim = c(12,8))
for (j in 1:12) cells[j,] <- allCells[2,j,1:8]
df <- as.data.frame(cells, stringsAsFactors = FALSE)
colnames(df) <- c("FOM", "Analysis", "$\\sigma_R^2$", "$\\sigma_{\\tau R}^2$", "Cov1", "Cov2", "Cov3", "Var")
df <- collapse_rows_df(df,FOM)
# escape = FALSE is critical in getting Math right
kableExtra::kbl(df, caption = captionStr, booktabs = TRUE, escape = FALSE) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 10)
```
From Table \@ref(tab:standalone-cad-table3) some inconsistencies are evident for 2T-RRRC analysis:
1. For 2T-RRRC analyses the listed values for $\sigma_R^2$ are smaller than machine accuracy, therefore one concludes that in fact $\sigma_R^2 = 0$ which is **clearly an incorrect result as the radiologists do not have identical performances**. In contrast, 1T-RRRC analyses yields the expected non-zero values, identical to those obtained by 1T-RRFC analyses (see comment following Eqn. \@ref(eq:standalone-cad-1t-parms)).
2. For the 2T_RRRC method the expected ordering of the inequalities, Eqn. \@ref(eq:standalone-cad-2t-rrrc-varcom-ordering) is not observed: one expects $\text{Cov}_1 \geq \text{Cov}_2 \geq \text{Cov}_3$ but instead one observes $\text{Cov}_1 = \text{Cov}_3$ and $\text{Cov}_2 > \text{Cov}_1$.
The design of a ratings simulator to statistically match a given dataset is addressed in Chapter 23 of my print book [@chakraborty2017observer]. Using this simulator, the 1T-RRRC method had the expected null hypothesis behavior (Table 23.5, ibid).
## Discussion {#standalone-cad-radiologists-discussion}
Described is an extension of the analysis used in Study -- 1 that accounts for case sampling variability. It extends [@hillis2005comparison] single-treatment analysis to a situation where one of the "readers" is a special reader subject to case-sampling variability only, and the desire is to compare performance of this special reader to the average of the remaining readers. Usage of the method along with two other methods is illustrated using an LROC dataset.
The proposed method, 1T-RRRC analyses, yields identical "overall" results (specifically the F-statistic, degrees of freedom and p-value) to those yielded by the unorthodox application of commonly available software, termed 2T-RRRC analyses, where the CAD reader is regarded as a second treatment (specifically the CAD ratings are replicated to match the number of radiologists). If interest is in just these values one is justified in using the 2T-RRRC method. However, 2T-RRRC model parameter estimates were unrealistic: for example, it yields zero between-reader variance. The result $\sigma_R^2 = 0$ is clearly an artifact. One can only speculate as to what happens when software is used in a manner that it was not designed for: perhaps finding that all readers in the second treatment have identical FOMs led the software to yield $\sigma_R^2 = 0$. Additionally, the covariance estimates are incorrect. Since sample-size estimation requires some of the covariance values the 2T-RRRC method should never be used to perform sample-size estimation for a prospective study.
The 1T-RRRC method described here is applicable to any scalar figure of merit. The paradigm used to collect the observer performance data - ROC, FROC, LROC or ROI - is irrelevant.
Assessing CAD utility by measuring performance with and without CAD may have inadvertently set a low bar for CAD to be considered useful. As examples, CAD is not penalized for missing cancers as long as the radiologist finds them and CAD is not penalized for excessive false positives (FPs) as long as the radiologist ignores them. Moreover, since both such measurements include the variability of radiologists, there is additional noise introduces that presumably makes it harder to determine if the CAD system is optimal.
In my opinion standalone performance is the most direct measure of CAD performance. Lack of a clear-cut method for assessing standalone CAD performance may have limited past CAD research. The current work hopefully removes that impediment. Going forward, assessment of standalone performance of CAD vs. expert radiologists is strongly encouraged.
## Appendix 1 {#standalone-cad-radiologists-appendix1}
The structures of the` R` objects generated by the software are illustrated with three examples.
### Example 1
The first example shows the structure of `RRFC_1T_PCL_0_2`.
```{r RRFC_1T_PCL_0_2code, echo=TRUE}
x <- RRFC_1T_PCL_0_2
fom_individual_rad <- as.data.frame(t(x$fomRAD))
colnames(fom_individual_rad) <- paste0("rdr", seq(1:9))
stats <- data.frame(fomCAD = x$fomCAD, avgRadFom = x$avgRadFom, avgDiffFom = x$avgDiffFom, varR = x$varR, Tstat = x$Tstat, df = x$df, pval = x$pval)
ConfidenceIntervals <- data.frame(CIAvgRadFom = x$CIAvgRadFom, CIAvgDiffFom = x$CIAvgDiffFom)
rownames(ConfidenceIntervals) <- c("Lower", "Upper")
```
```{r RRFC_1T_PCL_0_2show, echo=TRUE}
print(fom_individual_rad)
print(stats)
print(ConfidenceIntervals)
```
The results are displayed as three data frames.
The first data frame :
* `fom_individual_rad` shows the figures of merit for the nine radiologists in the study.
The next data frame summarizes the statistics.
* `fomCAD` is the figure of merit for CAD.
* `avgRadFom` is the average figure of merit of the nine radiologists in the study.
* `avgDiffFom` is the average difference figure of merit, RAD - CAD.
* `varR` is the variance of the figures of merit for the nine radiologists in the study.
* `Tstat` is the t-statistic for testing the NH that the average difference FOM `avgDiffFom` is zero, whose square is the F-statistic.
* `df` is the degrees of freedom of the t-statistic.
* `pval` is the p-value for rejecting the NH. In the example shown below the value is highly signficant.
The last data frame summarizes the 95 percent confidence intervals.
* `CIAvgRadFom` is the 95 percent confidence interval, listed as pairs `Lower`, `Upper`, for `avgRadFom`.
* `CIAvgDiffFom` is the 95 percent confidence interval for `avgDiffFom`.
* If the pair `CIAvgDiffFom` excludes zero, the difference is statistically significant.
* In the example the interval excludes zero showing that the FOM difference is significant.
### Example 2
The next example shows the structure of `RRRC_2T_PCL_0_2`.
```{r RRRC_2T_PCL_0_2, echo=TRUE}
x <- RRRC_2T_PCL_0_2
fom_individual_rad <- as.data.frame(t(x$fomRAD))
colnames(fom_individual_rad) <- paste0("rdr", seq(1:9))
stats1 <- data.frame(fomCAD = x$fomCAD, avgRadFom = x$avgRadFom, avgDiffFom = x$avgDiffFom)
stats2 <- data.frame(varR = x$varR, varTR = x$varTR,
cov1 = x$cov1, cov2 = x$cov2 ,
cov3 = x$cov3 , Var = x$varError,
FStat = x$FStat, df = x$df, pval = x$pval)
```
```{r RRRC_2T_PCL_0_2Show1}
print(fom_individual_rad)
print(stats1)
print(stats2)
```
In addition to the quantities defined previously, the output contains the covariance matrix for the Obuchowski-Rockette model, summarized in Eqn. \@ref(eq:standalone-cad-model-2t-rrrc) -- Eqn. \@ref(eq:standalone-cad-2t-rrrc-cov).
* `varTR` is $\sigma_{\tau R}^2$.
* `cov1` is $\text{Cov}_1$.
* `cov2` is $\text{Cov}_2$.
* `cov3` is $\text{Cov}_3$.
* `Var` is $\text{Var}$.
* `FStat` is the F-statistic for testing the NH.
* `ndf` is the numerator degrees of freedom, equal to unity.
* `df` is denominator degrees of freedom of the F-statistic for testing the NH.
* `Tstat` is the t-statistic for testing the NH that the average difference FOM `avgDiffFom` is zero.
* `pval` is the p-value for rejecting the NH. In the example shown below the value is signficant.
Notice that including the variability of cases results in a higher p-value for 2T-RRRC as compared to 1T-RRFC.
Shown next are the confidence interval statistics `x$ciAvgRdrEachTrt` for the two treatments ("trt1" = CAD, "trt2" = RAD):
```{r RRRC_2T_PCL_0_2Show2}
print(x$ciAvgRdrEachTrt)
```
* `Estimate` contains the difference FOM estimate.
* `StdErr` contains the standard estimate of the difference FOM estimate.
* `DF` contains the degrees of freedom of the t-statistic.
* `t` contains the value of the t-statistic.
* `PrGtt` contains the probability of exceeding the magnitude of the t-statistic.
* `CILower` is the lower confidence interval for the difference FOM.
* `CIUpper` is the upper confidence interval for the difference FOM.
Shown next are the confidence interval statistics `x$ciDiffFom` between the two treatments ("trt1-trt2" = CAD - RAD):
```{r RRRC_2T_PCL_0_2Show3}
print(x$ciDiffFom)
```
The difference figure of merit statistics are contained in a dataframe `x$ciDiffFom` with elements:
* `Estimate` contains the difference FOM estimate.
* `StdErr` contains the standard estimate of the difference FOM estimate.
* `DF` contains the degrees of freedom of the t-statistic.
* `t` contains the value of the t-statistic.
* `PrGtt` contains the probability of exceeding the magnitude of the t-statistic.
* `CILower` is the lower confidence interval for the difference FOM.
* `CIUpper` is the upper confidence interval for the difference FOM.
The figures of merit statistic for the two treatments, 1 is CAD and 2 is RAD.
* `trt1`: statistics for CAD.
* `trt2`: statistics for RAD.
* `Cov2`: $\text{Cov}_2$ calculated over individual treatments.
### Example 3
The last example shows the structure of `RRRC_1T_PCL_0_2`.
```{r RRRC_1T_PCL_0_2}
RRRC_1T_PCL_0_2
```
The differences from `RRFC_1T_PCL_0_2` are listed next:
* `varR` is $\sigma_R^2$ of the single treatment model for comparing CAD to RAD, Eqn. \@ref(eq:standalone-cad-1t-parms).
* `cov2` is $\text{Cov}_2$ of the single treatment model for comparing CAD to RAD.
* `varError` is $\text{Var}$ of the single treatment model for comparing CAD to RAD.
Notice that the `RRRC_1T_PCL_0_2` p value, i.e., `r RRRC_1T_PCL_0_2$pval`, is identical to that of `RRRC_2T_PCL_0_2`, i.e., `r RRRC_2T_PCL_0_2$pval`.
## Appendix 2 {#standalone-cad-radiologists-appendix2}
Two text files `R/standalone-cad/jaf_truth.txt` and `R/standalone-cad/jaf_truth.txt` were provided by Prof. Nico Karssemeijer. These are read into a dataset object by the following code.
```{r readLrocFiles}
source(here::here("R/standalone-cad/DfReadLrocDataFile.R"))
lrocDataset <- DfReadLrocDataFile()
```