Abstract
The replicability crisis in psychology is stimulating researchers to increase transparency in the process of producing research, from the development of the initial idea (e.g., pre-registrations) to the dissemination of the final product (e.g., pre-prints). However, it is still quite difficult to trace back how exactly a particular set of results was generated. Increasing openness in how these steps are performed would not only allow us to identify potential problems, but also facilitate transfer of knowledge among peers. We propose a document that: (i) unifies data, analysis, their interpretation, and the final report in one place; (ii) ensures full reproducibility (the manuscript can be reproduced exactly); (iii) promotes openness and transparency (everything is available for inspection and re-use); (iv) allows researchers to showcase all the work done under the hood and be rewarded accordingly.The replicability crisis (or revolution) in psychology has led many reasearchers to re-evaluate and improve many aspects of how empirical studies are conducted.
Some popular initiatives carried out in the latest years are:
However, the passage between data analysis and data communication is often not very transparent. In most cases, the reader does not know exactly how the authors reached a particular result starting from the raw data. In other words, there is a disconnection between the creation and the dissemination of the results of empirical studies.
This is in part due to the traditional method of scientific publication, where research materials such as procedures, data, and analytic methods are described rather than distributed. Academic articles typically show only the final product of a complex process, and honest mistakes, questionable research practices, or deliberate fraud can occur at each step. Moreover, file formats typically used to publish academic articles online (i.e., pdf) were developed to mimic printed documents and therefore suffer similar limitations (i.e., they are static and non-transparent).
We propose an alternative way of disseminating knowledge2. Inspired by the dynamic and interactive nature of online blogs, we use free and open-source software to create a form of scientific publication that is fully reproducible and inspectable3.
Bibliographic references can be included in the text (Upper (1974), Molloy (1983), Skinner et al. (1985), Skinner and Perlini (1996), Didden et al. (2013); but see Hermann (1984); for reviews, see Olson (1984), McLean and Thomas (2014)).
Another possibility would be to directly link to the published version of each manuscript (Hermann (1984)). This works better when journals are not behind a paywall (but see here; if you are one of those wretched rebels, see here and here).
Original data can be hosted on public repositories (e.g., Open Science Framework, figshare, Zenodo, Dryad, …) and downloaded from the document.
For this example we will use the mpg
dataset from the ggplot2
package, with fuel economy data from years 1999 and 2008 for 38 popular models of car.
A total number of 38 car models participated in this study. None of them were harmed.
Note that this summary is dynamically generated from the dataset. For example, the total number of cars is calculated with the following code: length(unique(mpg$model))
. Any changes in the dataset would automatically be reflected in the report.
Lame jokes aside, the point here is the possibility to embed videos or gifs showing an example of the procedure, which can be more intuitive than displaying a static timeline of serial events.
include_graphics("TOJ_static.jpg")
include_graphics("TOJ_dynamic.gif")
The mpg
dataset includes the following variables:
Again, variable names are not hard-coded but extracted from the dataset.
DISCLAIMER: I know very little about cars. Please don’t ask me what the above variables actually mean.
This is the section that would be maximally improved by adopting an augmented publishing approach.
An example of inline tables. Here we display the first 6 rows of the mpg
dataset.
kable(head(mpg)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
---|---|---|---|---|---|---|---|---|---|---|
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | compact |
One of the advantages of using a native online publication system is the possibility to create interactive plots, which would help exploring the data and are also more appealing to an audience of non-experts.
Among the R packages that help creating such graphs, see plotly
and highcharter
.
plot_ly(mpg,
x = ~ cty,
y = ~ displ,
type = "scatter",
text = paste("manufacturer: ", mpg$manufacturer),
mode = "markers",
color = ~ hwy,
size = ~ hwy
)
count(mpg, manufacturer, year) %>%
hchart(.,
"bar",
hcaes(x = manufacturer,
y = n,
group = year),
color = c("#263ada", "#d3b421"),
name = c("year 1999", "year 2008"))
mpg %>%
group_by(manufacturer) %>%
summarise(
n = n(),
unique = length(unique(model))
) %>%
arrange(-n, -unique) %>%
hchart(
.,
"treemap",
hcaes(
x = manufacturer,
value = n,
color = unique
)
)
As an example, let’s run a simple regression to investigate the linear relationship between engine displacement (displ
) and number of cylinders (cyl
).
regr.results <- summary(lm(displ ~ cyl, data = mpg))
regr.results
##
## Call:
## lm(formula = displ ~ cyl, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.05466 -0.34617 -0.06314 0.35383 1.95383
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.91989 0.11791 -7.801 2.07e-13 ***
## cyl 0.74576 0.01932 38.609 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4751 on 232 degrees of freedom
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.8647
## F-statistic: 1491 on 1 and 232 DF, p-value: < 2.2e-16
The cluttered output above can be simplified by including the relevant results directly in the text:
The number of cylinders significantly predicts engine displacement, \(\beta\) = 0.75, \(t_{232}\) = 38.61, p < .001. The number of cylinders also explains a significant proportion of variance in engine displacement, \(\sf{R^2_{adj}}\) = 0.86, \(F_{(1, 232)}\) = 1490.63, p < .001.
The code that generates plots and statistical results is hidden by default to improve readability. Interested reviewers and readers can inspect it easily by clicking on the “Code” button.
Analyses can also be run with other software but embedded and run from this document. Here is an example of python
code:
regr.fit(mpg_displ, mpg_cyl)
Supported programming languages can be found here.
If researchers use other statistical software that does not directly interface with R (e.g., SPSS), the corresponding syntax can be included as simple text. It will not dynamically generate the results (which would have to be inserted manually), but at least reviewers and readers would be able to inspect the code:
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT displ
/METHOD=ENTER cyl.
To summarize, this augmented document:
This document can be hosted on public repositories that assign a DOI (e.g., Open Science Framework, Zenodo). To promote peer reviewing, the host would also have a comment section similar to what can be found on most blogs (e.g., Disqus) and some preprint servers (e.g., bioRxiv).
An even better solution would be to integrate an online annotating system like Hypothes.is (an interesting discussion can be found here). A promising collaboration between the Center for Open Science and Hypothes.is has recently been announced.
Several projects (at various stages of development) share a similar idea of interactive scholarly publication:
We hope this proof of concept sparked your interests and made you consider and support alternative ways to disseminate your work. Given the recent interest around this issue, we believe that the publishing landscape is going to change rapidly… we are excited to see what the future will bring!
1 Department of Experimental-Clinical & Health Psychology, Ghent University (Belgium)
2 An earlier draft of this document was presented at figshare Fest (Nov. 16th, 2017, Gent).
3 Other projects (e.g., the R package papaya) effectively increase reproducibility by allowing the generation of manuscript from raw data in standard APA format. However, the output is still a static document that is submitted to classical publishing routes. Our project is by definition dynamic and expresses its full potential online.
Didden, R., Sigafoos, J., O’Reilly, M. F., Lancioni, G. E. , & Sturmey, P. (2013). “A Multisite Cross-Cultural Replication of Upper’s (1974) Unsuccessful Self-Treatment of Writer’s Block.” Journal of Applied Behavior Analysis, 40(4): 773–73. doi:10.1901/jaba.2007.773.
Hermann, B. P. (1984). “Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’: A Partial Failure to Replicate.” Perceptual and Motor Skills, 58(2): 350–50. doi:10.2466/pms.1984.58.2.350.
McLean, D. C., & Thomas, B. R. (2014). “Unsuccessful Treatments of ‘Writer’s Block’: A Meta-Analysis.” Psychological Reports, 115(1): 276–78. doi:10.2466/28.PR0.115c12z0.
Molloy, G. N. (1983). “The Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’: A Replication.” Perceptual and Motor Skills, 57(2): 566–66. doi:10.2466/pms.1983.57.2.566.
Olson, K. R. (1984). “Unsuccessful Self-Treatment of ‘Writer’s Block’: A Review of the Literature.” Perceptual and Motor Skills, 59(1): 158–58. doi:10.2466/pms.1984.59.1.158.
Skinner, N. F., & Perlini, A. H. (1996). “The Unsuccessful Group Treatment of ‘Writer’s Block’: A Ten-Year Follow-up.” Perceptual and Motor Skills, 82(1): 138–38. doi:10.2466/pms.1996.82.1.138.
Skinner, N. F., Perlini, A. H., Fric, L., Werstine, E. P., & Calla, J. (1985). “The Unsuccessful Group-Treatment of ‘Writer’s Block’.” Perceptual and Motor Skills, 61(1): 298–98. doi:10.2466/pms.1985.61.1.298.
Upper, D. (1974). “The Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’.” Journal of Applied Behavior Analysis, 7(3): 497–97. doi:10.1901/jaba.1974.7-497a.
This section would greatly help diagnose and debug possible problems in reproducing the document, e.g.:
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_BE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_BE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_BE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 highcharter_0.5.0 plotly_4.7.1
## [4] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.5
## [7] purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
## [10] tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1
## [13] kableExtra_0.9.0 knitr_1.20 crayon_1.3.4
##
## loaded via a namespace (and not attached):
## [1] httr_1.3.1 jsonlite_1.5 viridisLite_0.3.0
## [4] modelr_0.1.2 shiny_1.1.0 assertthat_0.2.0
## [7] TTR_0.23-3 highr_0.7 cellranger_1.1.0
## [10] yaml_2.1.19 pillar_1.2.3 backports_1.1.2
## [13] lattice_0.20-35 reticulate_1.8 glue_1.2.0
## [16] rlist_0.4.6.1 digest_0.6.15 promises_1.0.1
## [19] rvest_0.3.2 colorspace_1.3-2 Matrix_1.2-12
## [22] htmltools_0.3.6 httpuv_1.4.3 plyr_1.8.4
## [25] psych_1.8.4 pkgconfig_2.0.1 broom_0.4.4
## [28] haven_1.1.1 xtable_1.8-2 scales_0.5.0
## [31] jpeg_0.1-8 later_0.7.3 lazyeval_0.2.1
## [34] cli_1.0.0 quantmod_0.4-13 mnormt_1.5-5
## [37] magrittr_1.5 readxl_1.1.0 mime_0.5
## [40] evaluate_0.10.1 nlme_3.1-131 xts_0.10-2
## [43] xml2_1.2.0 foreign_0.8-69 tools_3.4.4
## [46] data.table_1.11.4 hms_0.4.2 munsell_0.5.0
## [49] compiler_3.4.4 rlang_0.2.1 grid_3.4.4
## [52] rstudioapi_0.7 htmlwidgets_1.2 crosstalk_1.0.0
## [55] igraph_1.2.1 rmarkdown_1.10 gtable_0.2.0
## [58] curl_3.2 reshape2_1.4.3 R6_2.2.2
## [61] zoo_1.8-2 lubridate_1.7.4 bindr_0.1.1
## [64] rprojroot_1.3-2 stringi_1.2.3 parallel_3.4.4
## [67] Rcpp_0.12.17 tidyselect_0.2.4