1 Introduction

1.1 Authors and affiliations
1.2 Inline references

2 Method

2.1 Participants
2.2 Procedure
2.3 Measures

3 Results

3.1 Tables
3.2 Graphs
3.3 Statistical results

3.3.1 Review the code
3.3.2 Communicate with other software

4 Discussion

4.1 Similar projects
4.2 Conclusions

5 Footnotes

6 References

7 Session Info

Augmented Publishing - A Proof of Concept

Antonio Schettino ¹ & Ian Hussey ¹

June 24, 2018

Abstract

The replicability crisis in psychology is stimulating researchers to increase transparency in the process of producing research, from the development of the initial idea (e.g., pre-registrations) to the dissemination of the final product (e.g., pre-prints). However, it is still quite difficult to trace back how exactly a particular set of results was generated. Increasing openness in how these steps are performed would not only allow us to identify potential problems, but also facilitate transfer of knowledge among peers. We propose a document that: (i) unifies data, analysis, their interpretation, and the final report in one place; (ii) ensures full reproducibility (the manuscript can be reproduced exactly); (iii) promotes openness and transparency (everything is available for inspection and re-use); (iv) allows researchers to showcase all the work done under the hood and be rewarded accordingly.

1 Introduction

The replicability crisis (or revolution) in psychology has led many reasearchers to re-evaluate and improve many aspects of how empirical studies are conducted.

Some popular initiatives carried out in the latest years are:

encouraging the sharing of data, materials, and analysis protocols on public repositories
replication of published studies
pre-registrations and registered reports
improving statistical literacy, e.g.:
- addressing common misinterpretations of p-values
- reducing the p-value threshold for claiming statistical significance of new results
- popularization of alternative ways to analyze data, e.g., emphasis on effect sizes, multilevel modeling, Bayes factors
promoting international collaborations to facilitate the collection of larger datasets

However, the passage between data analysis and data communication is often not very transparent. In most cases, the reader does not know exactly how the authors reached a particular result starting from the raw data. In other words, there is a disconnection between the creation and the dissemination of the results of empirical studies.

This is in part due to the traditional method of scientific publication, where research materials such as procedures, data, and analytic methods are described rather than distributed. Academic articles typically show only the final product of a complex process, and honest mistakes, questionable research practices, or deliberate fraud can occur at each step. Moreover, file formats typically used to publish academic articles online (i.e., pdf) were developed to mimic printed documents and therefore suffer similar limitations (i.e., they are static and non-transparent).

We propose an alternative way of disseminating knowledge². Inspired by the dynamic and interactive nature of online blogs, we use free and open-source software to create a form of scientific publication that is fully reproducible and inspectable³.

1.1 Authors and affiliations

Clicking on author names can open their personal website (click on my name) or send a direct email (click on Ian’s name). Affiliations can be paired with their respective websites (see footnote).

1.2 Inline references

Bibliographic references can be included in the text (Upper (1974), Molloy (1983), Skinner et al. (1985), Skinner and Perlini (1996), Didden et al. (2013); but see Hermann (1984); for reviews, see Olson (1984), McLean and Thomas (2014)).

Another possibility would be to directly link to the published version of each manuscript (Hermann (1984)). This works better when journals are not behind a paywall (but see here; if you are one of those wretched rebels, see here and here).

2 Method

Original data can be hosted on public repositories (e.g., Open Science Framework, figshare, Zenodo, Dryad, …) and downloaded from the document.

For this example we will use the mpg dataset from the ggplot2 package, with fuel economy data from years 1999 and 2008 for 38 popular models of car.

2.1 Participants

A total number of 38 car models participated in this study. None of them were harmed.

Note that this summary is dynamically generated from the dataset. For example, the total number of cars is calculated with the following code: length(unique(mpg$model)). Any changes in the dataset would automatically be reflected in the report.

2.2 Procedure

Probably not an accurate representation of the procedure.

Lame jokes aside, the point here is the possibility to embed videos or gifs showing an example of the procedure, which can be more intuitive than displaying a static timeline of serial events.

include_graphics("TOJ_static.jpg")

A static display. Example from here.

include_graphics("TOJ_dynamic.gif")

A dynamic display. In this particular case, the timing here is not precise due to technical limitations, but the concept is clear. Example from here.

2.3 Measures

The mpg dataset includes the following variables:

manufacturer: car manufacturer
model: car model
displ: engine displacement (in litres)
year: year of manufacture
cyl: number of cylinders
trans: type of transmission
drv: f = front-wheel drive; r = rear wheel drive; 4 = 4wd
cty: city miles per gallon
hwy: highway miles per gallon
fl: fuel type
class: “type” of car

Again, variable names are not hard-coded but extracted from the dataset.

DISCLAIMER: I know very little about cars. Please don’t ask me what the above variables actually mean.

3 Results

This is the section that would be maximally improved by adopting an augmented publishing approach.

3.1 Tables

An example of inline tables. Here we display the first 6 rows of the mpg dataset.

kable(head(mpg)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
audi	a4	1.8	1999	4	auto(l5)	f	18	29	p	compact
audi	a4	1.8	1999	4	manual(m5)	f	21	29	p	compact
audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
audi	a4	2.8	1999	6	auto(l5)	f	16	26	p	compact
audi	a4	2.8	1999	6	manual(m5)	f	18	26	p	compact

3.2 Graphs

One of the advantages of using a native online publication system is the possibility to create interactive plots, which would help exploring the data and are also more appealing to an audience of non-experts.

Among the R packages that help creating such graphs, see plotly and highcharter.

plot_ly(mpg,
  x = ~ cty,
  y = ~ displ,
  type = "scatter",
  text = paste("manufacturer: ", mpg$manufacturer),
  mode = "markers",
  color = ~ hwy,
  size = ~ hwy
)

Figure 1. An interactive plot made with plotly.

count(mpg, manufacturer, year) %>%
  hchart(., 
         "bar", 
         hcaes(x = manufacturer, 
               y = n, 
               group = year),
       color = c("#263ada", "#d3b421"),
       name = c("year 1999", "year 2008"))

Figure 2. An interactive plot made with highcharter. Example taken from here.

mpg %>%
  group_by(manufacturer) %>%
  summarise(
    n = n(),
    unique = length(unique(model))
  ) %>%
  arrange(-n, -unique) %>%
  hchart(
    .,
    "treemap",
    hcaes(
      x = manufacturer,
      value = n,
      color = unique
    )
  )

Figure 3. An interactive treemap made with highcharter. Example taken from here.

3.3 Statistical results

As an example, let’s run a simple regression to investigate the linear relationship between engine displacement (displ) and number of cylinders (cyl).

regr.results <- summary(lm(displ ~ cyl, data = mpg))
regr.results

## 
## Call:
## lm(formula = displ ~ cyl, data = mpg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.05466 -0.34617 -0.06314  0.35383  1.95383 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.91989    0.11791  -7.801 2.07e-13 ***
## cyl          0.74576    0.01932  38.609  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4751 on 232 degrees of freedom
## Multiple R-squared:  0.8653, Adjusted R-squared:  0.8647 
## F-statistic:  1491 on 1 and 232 DF,  p-value: < 2.2e-16

The cluttered output above can be simplified by including the relevant results directly in the text:

The number of cylinders significantly predicts engine displacement, = 0.75, = 38.61, p < .001. The number of cylinders also explains a significant proportion of variance in engine displacement, = 0.86, = 1490.63, p < .001.

3.3.1 Review the code

The code that generates plots and statistical results is hidden by default to improve readability. Interested reviewers and readers can inspect it easily by clicking on the “Code” button.

3.3.2 Communicate with other software

Analyses can also be run with other software but embedded and run from this document. Here is an example of python code:

regr.fit(mpg_displ, mpg_cyl)

Supported programming languages can be found here.

If researchers use other statistical software that does not directly interface with R (e.g., SPSS), the corresponding syntax can be included as simple text. It will not dynamically generate the results (which would have to be inserted manually), but at least reviewers and readers would be able to inspect the code:

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT displ
/METHOD=ENTER cyl.

4 Discussion

To summarize, this augmented document:

is directly linked to the raw data
is universally accessible (as long as you have a browser, you can read its content)
is clear to read and easy to navigate
is better than the “classical” pdf, because intrinsically dynamic and scalable
can be used with different programming languages
includes technical details that can be accessed without impairing the overall narrative, thanks to the hidden code (give readers a story, but also the possibility to check the details)

This document can be hosted on public repositories that assign a DOI (e.g., Open Science Framework, Zenodo). To promote peer reviewing, the host would also have a comment section similar to what can be found on most blogs (e.g., Disqus) and some preprint servers (e.g., bioRxiv).

An even better solution would be to integrate an online annotating system like Hypothes.is (an interesting discussion can be found here). A promising collaboration between the Center for Open Science and Hypothes.is has recently been announced.

4.1 Similar projects

Several projects (at various stages of development) share a similar idea of interactive scholarly publication:

Datazar Paper (example)
Andrew York developed an impressive template using HTML, CSS, and Javascript. An example of publication from his lab can be found here
eLife partnered with Hypothes.is to allow annotations on published manuscripts (see example)
eLife, in collaboration with Substance and Stencila, is also supporting the development of software that would power interactive and reproducible publications (see announcement here)

4.2 Conclusions

We hope this proof of concept sparked your interests and made you consider and support alternative ways to disseminate your work. Given the recent interest around this issue, we believe that the publishing landscape is going to change rapidly… we are excited to see what the future will bring!

5 Footnotes

¹ Department of Experimental-Clinical & Health Psychology, Ghent University (Belgium)

² An earlier draft of this document was presented at figshare Fest (Nov. 16th, 2017, Gent).

³ Other projects (e.g., the R package papaya) effectively increase reproducibility by allowing the generation of manuscript from raw data in standard APA format. However, the output is still a static document that is submitted to classical publishing routes. Our project is by definition dynamic and expresses its full potential online.

6 References

Didden, R., Sigafoos, J., O’Reilly, M. F., Lancioni, G. E. , & Sturmey, P. (2013). “A Multisite Cross-Cultural Replication of Upper’s (1974) Unsuccessful Self-Treatment of Writer’s Block.” Journal of Applied Behavior Analysis, 40(4): 773–73. doi:10.1901/jaba.2007.773.

Hermann, B. P. (1984). “Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’: A Partial Failure to Replicate.” Perceptual and Motor Skills, 58(2): 350–50. doi:10.2466/pms.1984.58.2.350.

McLean, D. C., & Thomas, B. R. (2014). “Unsuccessful Treatments of ‘Writer’s Block’: A Meta-Analysis.” Psychological Reports, 115(1): 276–78. doi:10.2466/28.PR0.115c12z0.

Molloy, G. N. (1983). “The Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’: A Replication.” Perceptual and Motor Skills, 57(2): 566–66. doi:10.2466/pms.1983.57.2.566.

Olson, K. R. (1984). “Unsuccessful Self-Treatment of ‘Writer’s Block’: A Review of the Literature.” Perceptual and Motor Skills, 59(1): 158–58. doi:10.2466/pms.1984.59.1.158.

Skinner, N. F., & Perlini, A. H. (1996). “The Unsuccessful Group Treatment of ‘Writer’s Block’: A Ten-Year Follow-up.” Perceptual and Motor Skills, 82(1): 138–38. doi:10.2466/pms.1996.82.1.138.

Skinner, N. F., Perlini, A. H., Fric, L., Werstine, E. P., & Calla, J. (1985). “The Unsuccessful Group-Treatment of ‘Writer’s Block’.” Perceptual and Motor Skills, 61(1): 298–98. doi:10.2466/pms.1985.61.1.298.

Upper, D. (1974). “The Unsuccessful Self-Treatment of a Case of ‘Writer’s Block’.” Journal of Applied Behavior Analysis, 7(3): 497–97. doi:10.1901/jaba.1974.7-497a.

7 Session Info

This section would greatly help diagnose and debug possible problems in reproducing the document, e.g.:

on which operating system were the analyses run?
which R version was it?
what packages were used but not explicitly mentioned?

## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_BE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_BE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_BE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2    highcharter_0.5.0 plotly_4.7.1     
##  [4] forcats_0.3.0     stringr_1.3.1     dplyr_0.7.5      
##  [7] purrr_0.2.5       readr_1.1.1       tidyr_0.8.1      
## [10] tibble_1.4.2      ggplot2_2.2.1     tidyverse_1.2.1  
## [13] kableExtra_0.9.0  knitr_1.20        crayon_1.3.4     
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.3.1        jsonlite_1.5      viridisLite_0.3.0
##  [4] modelr_0.1.2      shiny_1.1.0       assertthat_0.2.0 
##  [7] TTR_0.23-3        highr_0.7         cellranger_1.1.0 
## [10] yaml_2.1.19       pillar_1.2.3      backports_1.1.2  
## [13] lattice_0.20-35   reticulate_1.8    glue_1.2.0       
## [16] rlist_0.4.6.1     digest_0.6.15     promises_1.0.1   
## [19] rvest_0.3.2       colorspace_1.3-2  Matrix_1.2-12    
## [22] htmltools_0.3.6   httpuv_1.4.3      plyr_1.8.4       
## [25] psych_1.8.4       pkgconfig_2.0.1   broom_0.4.4      
## [28] haven_1.1.1       xtable_1.8-2      scales_0.5.0     
## [31] jpeg_0.1-8        later_0.7.3       lazyeval_0.2.1   
## [34] cli_1.0.0         quantmod_0.4-13   mnormt_1.5-5     
## [37] magrittr_1.5      readxl_1.1.0      mime_0.5         
## [40] evaluate_0.10.1   nlme_3.1-131      xts_0.10-2       
## [43] xml2_1.2.0        foreign_0.8-69    tools_3.4.4      
## [46] data.table_1.11.4 hms_0.4.2         munsell_0.5.0    
## [49] compiler_3.4.4    rlang_0.2.1       grid_3.4.4       
## [52] rstudioapi_0.7    htmlwidgets_1.2   crosstalk_1.0.0  
## [55] igraph_1.2.1      rmarkdown_1.10    gtable_0.2.0     
## [58] curl_3.2          reshape2_1.4.3    R6_2.2.2         
## [61] zoo_1.8-2         lubridate_1.7.4   bindr_0.1.1      
## [64] rprojroot_1.3-2   stringi_1.2.3     parallel_3.4.4   
## [67] Rcpp_0.12.17      tidyselect_0.2.4