1 Frey - Beilstein-Institut

Reducing uncertainty: the raison d’être of open science?

Jeremy G. Frey* and Colin Bird

Beilstein Magazine 2018, 4, No. 10               doi:10.3762/bmag.10

published: 18 June 2018

Abstract

Science has been aware for centuries of the idea of known unknowns and unknown unknowns, even though scientists would articulate the notion differently from Donald Rumsfeld. The methodology of science is to investigate the known unknowns, striving to reduce uncertainty, but occasionally the unexpected happens and the scientist enters the realm of unknown unknowns. Uncertainty reduces when research is reproducible, but a survey published by Nature in May 2016 suggested that about half of researchers replying considered that there is a significant reproducibility crisis.

The premise of this paper is that the more open the science is from the outset, the more likely it is that uncertainties will be resolved at an earlier stage. We suggest that open science can be characterised by three fundamental principles: transparency, obtainability, and capability, while adding that their attainability depends strongly on reliable recording. Accordingly, in this paper we consider briefly the basis of open science and how uncertainty can compromise its three fundamental principles. We then review the extent to which early scientists published their work openly, and how scientific communication, open and otherwise, has evolved and how this influences the evaluation and appreciation of the uncertainty in the results. We illustrate the discussion with research in chemistry conducted at the University of Southampton, UK. Our specific areas of interest are: data curation and preservation; digital recording platforms, such as electronic laboratory notebooks (ELNs); and the evolution of digital chemistry.

 

Introduction

The inspiration for this article came first from Jeremy Frey’s talk at the 2016 San Diego ACS meeting [1]  and then crystallised following the first Beilstein Open Science meeting [2]  in 2017 and some of the ideas presented are echoed in the TV interview from that meeting [3]. The key premise of this paper is that the more open the science is from the outset, the more likely it is that uncertainties will be resolved at an earlier stage. To appreciate the issues, it is worth looking into the background of uncertainties in science.

While much maligned when uttered by Donald Rumsfeld during a US Department of Defence briefing [4], the following statement has a significant history in the study of knowledge (epistemology) and captures the feeling that many new researchers feel when they get their first research position, or perhaps when they have their first discussion with their research supervisor.

… as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.

More formally, Logan recognised the relevance of this statement to science, pointing out that scientists develop hypotheses that they test by experiment, to investigate the unknowns that they know about [5]. Occasionally, the unexpected happens and the scientist enters the realm of unknown unknowns. This approach is consistent with Popper’s view of scientific methodology: science seeks to reduce uncertainty, progressively refining theories by testing them rigorously [6]. Science cannot eliminate, but it can uncover and then seek to reduce, uncertainty.

While it might seem self-evident that scientists should record not only their findings but also the path that led to their discoveries, the latter information is by no means always readily available. Throughout this paper, we shall return to how uncertainty can arise from fragmentary recording of research investigations.

From curiosity to curation

The authors, with others in an earlier paper looking at curation in the chemical sciences, highlighted the importance of curation generally as part of progress in science:

All science is strongly dependent on preserving, maintaining, and adding value to the research record, including the data, both raw and derived, generated during the scientific process [7].

However, with hindsight (of course we have a full record of the work that led to the earlier papers), this statement omits the pivotal point that the communication of the process and the outcomes should, as much as is feasible, aim to reduce uncertainty and certainly not increase the uncertainty in the processes undertaken. Recently the processes and players in the academic publication ecosystem have come under scrutiny. Twiss-Brooks [8]  argues that academic success needs to become more dependent on sharing and openness, deprecating the current system for inhibiting an open community culture. Figure 1 is a simplified schematic of the scientific process that illustrates the potential sources of uncertainty. Some have even gone as far as to refer to destruction of information in the traditional publication process [9]. Grossman [10] argues that we do not need scientific journals because they are no longer the most appropriate way to communicate science: the scientific discourse is now digital.

figure1 neu
Figure 1: Simplified schematic of the scientific process, in which we note that each arrow is also a potential source of uncertainty.

The scientific process, or methodology, is arguably less prescriptive than might appear from the schematic in Figure 1. The initial creative step, leading to the germination of an idea and the formulation of a hypothesis, can arise from a variety of stimuli. The hypothesis predicts outcomes that can be tested by some form of experiment, the results of which, following analysis, either support the hypothesis or expose some deficiency. In the latter case, the hypothesis must be modified to accommodate the observations. Even if supported, the hypothesis might still be subjected to further, more exacting tests.

As the practice of science has evolved, it has become increasingly a collaborative exercise, with progress depending on scientists building on the results already produced by others. The foundation of collaboration has been and remains the communication of scientific knowledge through publications, conferences, and face-to-face meetings, with a basic tenet that shared findings should be capable of reproduction. The advent of digital communication has not altered the need for all communications to be as full and complete as possible, to reduce and ideally to eliminate uncertainty. That should include the stimuli that led to the idea.

When scientists publish papers in the traditional format; when they record results, often with minimal insight for the reader into why they performed the experiments; or when they store data in a repository with incomplete metadata, are they adding value, adding uncertainty, or adding a mixture of the two?

 

Reproducibility

The ability to reproduce scientific research is a crucial test of its veracity. Although the meaning of the term ‘reproducibility’, and to a lesser extent that of ‘replicability’, is subject to debate, for the purposes of this paper we adopt the basic interpretation that reproducibility means being able to obtain the same results by repeating the same method.

According to Popper, “non-reproducible single occurrences are of no significance to science” [5]. While other thoughts about the scientific method are pertinent to current ideas of automated scientific discovery and Artificial and Augmented Intelligence (AI), all rely on the ability to reproduce findings in one way or another.

According to the admittedly brief online survey conducted by Nature in May 2016, 52% of researchers consider that there is a significant reproducibility crisis [11]. According to the survey: “More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments.” The first five results of a systematic effort to replicate the results of cancer biology experiments reported in high-profile papers add further weight to this concern, given the news headline that only two of the five studies could be repeated.

Nosek and Errington examine the meaning of replication in the context of these results, providing links to each of the five studies [12]. They distinguish between direct and conceptual replication, the former being an attempt to reproduce using the same or a similar procedure and the latter being a test of the same hypothesis using a different methodology. They argue that “direct and conceptual replications are vital to scientific progress” and their combination “provides confidence in the reproducibility of a finding and the explanation for the finding.” They also appraise how a study might qualify as a successful replication and consider some of the implications of irreproducibility.

Previously, in 2015, the Guardian newspaper had reported on a study in which an international team attempted to replicate 100 psychology experiments published in respected journals but could reproduce only 36% of the original results [13]. After expressing his disappointment, Nosek, who led the study, noted that:

Science is a process of uncertainty reduction, and no one study is almost ever a definitive result on its own.

In a very recent Guardian article, Devlin refers to a psychology replication study when discussing the influence of narcissism in present-day science [14]. Citing Bruno Lemaitre, an immunologist, she says:

The replication crisis in psychology and the life sciences, in which “sexy” papers fail to stand up to closer scrutiny, can be blamed in part on scientists being motivated by a need for attention and authority as well as curiosity about the natural world, he said.

We would assert strongly that an expectation of, and adherence to, openness militates against narcissism.

Rosenblatt [2016] has considered the reproducibility crisis from the perspective of commercial organisations, particularly in the pharmaceutical sector, that attempt to reproduce academic research findings, and proposes a solution based on funding the researcher rather than the project [15]:

Pharmaceutical companies should listen to this suggestion and consider moving first to fund researchers in an academic environment who have a proven ability to generate reproducible research. This might be done by funding data reproduction studies or relevant new areas of research.

Although the term ‘reproducibility’ is not to be found in the writings of the earliest scientists, there is little doubt that they were aware of its importance and the potential difficulties involved. Had they ever tried to repeat the experiments of their fellow scientists, they must have been aware of the uncertainties, as we are today. One key difference is that today we are much better placed to articulate the problems, the consequences, and the possible solutions.

Thorough recording, open to continuous appraisal by supervisors, colleagues, and possibly other researchers, clearly has the potential to reduce uncertainty. Numerous authors have described the principles of open science, of which the blog by Gezelter offers one of the most concise and usable elucidations [16]. The four goals that he puts forward can be further reduced to three fundamental principles:

  • Transparency of the entire process
  • Obtainability of all relevant data and supporting information
  • Capability for collaboration and/or further contribution

However, thorough recording requires careful attention to a number of practices:

  • Documenting the reasoning behind the experiment or series of experiments
  • Cataloguing the equipment and other resources used in each experiment
  • Capturing at source all the metadata associated with the results
  • Preserving the data, metadata, and other information in an accessible form and in an accessible place

Complying fully with all these practices might seem like a noble goal, but with the recognition and commitment from the community, it is and would be an attainable goal. Therefore, in this paper, we consider briefly the principles of open science and how uncertainty can compromise the three fundamental principles listed above and by looking at where we have come from, perhaps take the scientific research community to a position where these ideas are widely adopted. We review how early scientists, styled as virtuosi (in the English literature) owing to their proficiency and accomplishments, communicated their work and trace how this led to publication in learned journals as the medium of choice for reporting science. We then consider the role of open recording, in digital notebooks (commonly known as electronic laboratory notebooks, or ELNs) and in social media, have altered, and might continue to modify, how we understand science as it is being conducted. We illustrate the discussion with research in chemistry conducted at the University of Southampton.

 

Open science and uncertainty

On its Horizon 2020 website, the European Commission asserts [17]:

Science has always been open, unlike the processes for producing research and diffusing its results.

However, the page title, “Open Science (Open Access)”, tends to encourage the perception that open science is primarily about access to scientific publications and to research data. As outlined in the Introduction, obtainability of the literature is only one of the fundamental principles and moreover applies to more than publications and also to data and more. In particular, as argued by Hey and Payne, computer code is also necessary, with a strong emphasis on the quality of that code (18]. They go over the specific challenges of reproducible computational science, observing that open source is not a panacea, improvements being needed in areas such as: software training; funding for necessary software development; and the reporting of results. Meeting these challenges will require the recruitment and training of large numbers of expert data scientists.

Gezelter stresses the importance of greater transparency in methodology, for numerical experiments as well as for those conducted in laboratories and in the field. He argues that proprietary code or hidden parameters mean that “numerical experimentation isn’t even science.” He goes on to assert [16],

Science has to be “verifiable in practice” as well as “verifiable in principle”.

Gezelter’s point is a clear example of the inhibiting effect of uncertainly, caused by the lack of knowledge about the methodology, algorithm and implementation of the software which would be (or should be) embedded or exemplified in the source code of the program used.

In 2009, Kaitlin Thaney, then working for Science Commons (now re-integrated with CreativeCommons), laid out the principles of open science in a conference presentation, in which she made the following point regarding transparency of the entire process [19], “Research materials represent an incredible investment in tacit knowledge”.

Methods documented traditionally in research papers very rarely acknowledge the tacit knowledge that the researchers have used to guide their decisions. In the absence of a rationale for the research process, it can be difficult for other researchers to understand the decisions that have been taken, and hence to follow the possibly novel path that has been discovered. Unfortunately, tacit knowledge may become so ingrained that researchers do not realise that what is certain for them may lead to uncertainty for others, thus inhibiting them from reproducing the process.

The case for collaboration and further contribution is well made in the following extract form a page about the promise of open science collaboration, given in the material provided by the Open Science Framework [20],

Done well, open collaboration can radically transform scientific practice. Scientific expertise is distributed across many minds, but scientific problems are rarely localized to the existing expertise of one or a few minds. Broadcasting problems openly increases the odds a person with the right expertise will see it and be able to solve it easily.

Put another way, the uncertainty felt by one person might well be alleviated by another person “with the right expertise”. Collaborations should act to spread the tacit knowledge but perhaps only as far as the collaborating group, wider exposure of all the required knowledge requires the type of full and open accounts we are promoting in this paper.

Uncertainty can lead to many questions, some less obvious than others:

  • What do we already know?
  • Who is working in this area, and what are they doing?
  • How was this result actually obtained?
  • What does this data mean: where is the metadata, the additional information that provides the context and provenance for the data?
  • What is the provenance of this finding?
  • Why did they plan the experiment that way?

Returning to the words of Donald Rumsfeld, “there are things that we know we don't know. But there are also unknown unknowns.

 

Was scientific research in the past really always open?

Progress in almost every sphere of human activity relies on the discoveries and understanding of people who have worked in those areas before. This principle was captured in the words of Isaac Newton, writing to Robert Hooke [21], “If I have seen further it is by standing on the shoulders of Giants”, but the relationship between Newton and Hooke needs to be appreciated to fully understand what Newton was conveying to Hooke. Hooke was not a tall man!

Notwithstanding the assertion by the European Commission that science has always been open, it is highly unlikely to have been entirely true, as the savants, those possessing scientia – that is specialised knowledge – would want to keep their knowledge to themselves, disclosing it only when appropriate (for example to a patron). Paul David, in his explanation of the evolution of ‘open science’, describes the pre-Renaissance state as the “dominant ethos of secrecy in pursuit of 'Nature's secrets'”[22].

David attributes the emergence of ‘open science’ to changes in the system of patronage, brought about by the growing sophistication of the science, particularly the increasing use of mathematical methods. A key change occurred when sponsors unable themselves to assess the worth of candidates for patronage progressively delegated the evaluation and selection to the communities of “fellow practitioners and correspondents.” Scientists looking for patronage had to prove “their own credibility and scientific status.” This change in the nature of the community who could judge the worth of the research is significant and underlies the debate over peer review and public engagement of and in research.

David also observes, “Indeed, it was very much in the interest of a patron for the reputation of those he patronized to be enhanced in this way, for their fame augmented his own” [22].

Scientific societies appeared in greater numbers, particularly during the sixteenth and seventeenth centuries, endorsing the reputations of individual scientists and encouraging collaborations amongst them. The consequent change of ethos placed greater emphasis on cooperation and the reliability of findings: the prompt disclosure of results and methods became a characteristic of what might be regarded as 'public science', if not yet publicly-funded science.

By the nineteenth century, science had become part of the public discourse and “had to be conducted and defended, to a significant extent, in the public domain.” While science retained a flavour of scientia, the development of specialized subjects, and the societies formed to promote them, the advancement of science “was not always guaranteed by its diffusion to popular audiences, especially if these were more attracted by the utility of scientific discoveries and by factual results rather than by theoretical implications” [23]. Nevertheless, obtaining funding for scientific investigations at soirées created a need to entertain and inform the public. However, the expansion of the sciences and the broadening of interest in their findings were dependent on scientists being open about what they had done; being open about what was being done was still a step for future generations.

More open disclosure cultivated an interest in reproducing the results of other scientists. Tycho Brahe (1546-1601) was certainly keen on results by different observers matching up  and Robert Boyle demonstrated his awareness of the significance of reproducibility in the following quote [24] [Boyle, 1673]:

… you will meet with several observations and experiments which, though communicated for true by candid authors or undistrusted eye-witnesses, or perhaps recommended by your own experience, may, upon further trial, disappoint your expectation, either not at all succeeding constantly, or at least varying much from what you expected.

It follows naturally [from Boyle’s words] that researchers have an obligation to provide sufficient information to enable verification of their work, which raises the question of the extent to which uncertainty might be the cause of failures to reproduce the results of other researchers. In Boyle's time, a precisely reproducible chemical reaction would have been difficult to achieve, especially owing to uncertainty about the purity of the reactants [23].

Moreover, the early scientists were by no means universally willing to be open about their ideas, methods, and results. However, the evidence for secrecy seems largely anecdotal and relates to specific activities rather than habitual behaviour. For example, Hooke announced his discovery of the law that now bears his name, which he described then as the ‘Law of Spring’, with an anagram, or more correctly a logarithm, cediinnoopsssttuu [25].

In a book whose title begins with the phrase “Opening Science”, editors Bartling & Friesike [26] have assembled an extensive set of articles covering open science, the communication of science, and related issues from a range of perspectives. In a brief exploration of the history of knowledge creation and dissemination, the authors submit that “modern science came to life in the 17th century”.

Könneker & Lugger [27] contend that digitisation has enhanced communication and collaboration, not only between scientists but also with the public, whose engagement, sometimes as citizen scientists, in a sense renews the relationship that existed in the early modern period. They suggest that the availability of online information about the processes of science has to some extent reopened the discourse to the interested public. Exposure of processes has certainly enabled some misguided as well as informed public discussions.

 

Reliable recording

A review of the history of scientific recording is beyond the scope of this paper. Yeo has written a very thorough account of recording by English scientists – whom he calls virtuosi – in the early modern period, a book that has undoubtedly influenced our thinking about the evolution of open science [28]. Moreover, two of the authors have recently contributed to a study of the role of digital laboratory notebooks in record keeping, albeit from an experimental chemistry perspective [29].

3
Figure 2: The changing style of notetaking though the ages. As the quantities of data increase notetaking failed to keep pace leading to more orphan data (more information on the source of the images can be found in the list of figures).

Yeo attributes the conscientious note taking by the virtuosi to their concern about the fragility of memory [28]. Unless they were engaged in systematic information gathering, they would collect circumstantial details, such as location and time, because they could not sure of remembering them. Robert Hooke gave his attention to ways of collecting and sharing information; he was “preoccupied with the weakness and fragility of his own memory”. Many of today's researchers (and certainly supervisors) would wish that more of their fellow scientists would adopt the same point of view.

While uncertainties often arise as a consequence of lapses in the recording of detail, a lack of organization can also be a cause. Hooke put considerable effort into devising a scheme for recording observations of the weather. As a form of organization, commonplace books were popular in the early modern period.

In his book, Encyclopaedic Visions, Yeo discusses commonplaces as containers for knowledge and the commonplace method of arrangement, topics that are particularly relevant to the development of notebooks and their usage [30], Yeo observes that notebook usage underwent a significant change during the seventeenth century. The material that was recorded in notebooks increased in amount to the extent that they replaced memory rather than being the means for refreshing memory. The headings that identified categories in commonplaces became tags with which to retrieve information [31].

During the early modern period, the primary aim was to push knowledge into commonplaces; we retain that aim now, but with much greater emphasis on the ability to pull knowledge readily from such repositories. If the capability for capturing, recording, and organizing data became established, the less tangible facets of transparency, the methods and procedures that had been used, were not as easy to record [30].

The difficulty of capturing manual skills and tacit knowledge in print soon confronted Bacon’s disciples. Even Hooke, an instrument maker and experimental demonstrator, had a harsh opinion of the unwillingness or inability of tradesmen to give adequate descriptions of materials and processes.

We can conclude from Yeo’s explorations and analyses that the virtuosi were anxious to reduce uncertainty, even if they did not use those words when expressing their motivations. We can also surmise that the first “green shoots of open science were present in the seventeenth century, if not earlier. However, it was probably as true then as it now that experience is the key to learning procedural skill and acquiring tacit knowledge, facilitated by reliably recorded intelligence. If reliable recording is to advance the cause of open science and to reduce uncertainty, it must adhere to the three fundamental principles that we listed:

  • Transparency of the entire process
  • Obtainability of all relevant data and supporting information
  • Capability for collaboration and/or further contribution

Figure 3 illustrates these three principles.

figure3 neu
Figure 3: The three fundamental principles in an Open Science world.

The adoption of automation of recording of experimental procedures as well a advanced documenting to drive automated (robotic) experimentation could be added as a further fourth principle. These ideas were the topics of considerable discussion at the first Beilstein Open Science Symposium [2].

The desirability of reducing uncertainty by providing as much research information as possible in accessible formats is exemplified by projects that integrate notebooks and repositories and provide resolvable references, for example, the work of Harvey, Mason, and Rzepa [30]. This project relates well with our own work at Southampton, as reported in the discussion of the evolution of digital chemistry at Southampton [33].

 

Transparency

To ensure transparency of the experiment or other procedure being recorded, it is necessary to pay attention to the following practices:

  • Documenting the reasoning behind the experiment or series of experiments
  • Cataloguing the equipment and other resources used in each experiment
  • Capturing at source all the metadata associated with the results

The review of laboratory notebooks in the digital era, which shares two authors with this article, covers not only the history of scientific recording but also, importantly, the issues associated with preserving and curating the record. Proper attention to these two activities is vital for all three principles of open science. Lyon et al have investigated the subject of transparency in some depth [34]. They too have three dimensions for open science, the other two being access and participation, which we would equate to obtainability and capability respectively.

At the University of Southampton, we have developed the LabTrove laboratory notebook. Although we still refer to it as an Electronic Laboratory Notebook (ELN), we now consider that the description Digital Research Platform (DRP) reflects more accurately the function and operation of LabTrove and kindred tools. LabTrove is an open source platform, developed with a researcher-based frame of reference. It uses a blog-based approach to embody the journal characteristics of traditional notebooks in mind, while also incorporating the potential for linking together procedures, materials, samples, observations, data, and analysis reports. LabTrove extends the traditional blog paradigm with full access control, enabling it to meet regulatory requirements alongside flexibility for the individual user [35].

 

Obtainability

However effective for transparency the documenting, cataloguing, and capturing at source have been, ensuring the obtainability of all relevant data and supporting information requires attention to the additional practice, that is, “Preserving the data, metadata, and other information in an accessible form and in an accessible place”.

The development of the culture of the academic journal for these purposes followed the decline in the patronage of individual savants led to the establishment of academies that supported the work of groups of scientists, relying on their disclosure of methods and results to establish their reputation [22]. In 1665, the Philosophical Transactions of the Royal Society became the first journal dedicated to science. Since then, the number of scientific publications has continued to increase. While public disclosure might be considered to be a herald of open science, various conventions, including house styles, have created a publication ethos that does not necessarily encourage true open science.

Scientists conducting publicly funded research are usually expected, by the funding institution, to make the publications resulting from their research available to the public. There is an increasing move to make the data that they generate available in open repositories, with the presumption that they will curate their data sufficiently to enable other researchers to reproduce and/or reuse their results.

The situation is less clear with regard to methods and procedures, which should be as readily obtainable, with sufficient clarity, as the results. In 2011, Lang and Botstein took what was at the time the unusual step of publishing as supporting information a scanned copy of the complete laboratory notebook detailing the work described in the paper [36]. On the other hand, some advocates of open science consider the present publishing system to be broken, by reason of restricting access to information that should be openly available to everyone [9,10,19]. Although such issues clearly bear strongly on obtainability and the reduction of uncertainty, we consider a fuller discussion to be beyond the scope of this paper. However, problems of obtainability are not confined to issues arising from a need to pay for access to information. Even when the original data, the laboratory record, or both are ostensibly available, the links to them might have been lost or even be non-existent.

Two of us were co-authors of a recent review of the evolution of digital chemistry at Southampton, in which we describe contributions to open access, open exchange formats, open repositories, and open notebook science [33]. Of the issues explored, the need for open exchange format standards in chemistry is the one that seems to have advanced the least. Reporting a workshop run under the auspices of the CrystalGrid network in 2004, the review noted [37]:

The workshop also concluded that further research was required with regard to standards for referring to data sets from publications and also that standards were required to enable instrument vendors to offer raw data in open formats as well as in their own proprietary formats.

The lack of clarity over data formats leads to considerable pain in the reuse of data and thereby increased uncertainty. Cross-industry initiative such as the Allotrope foundation [38] are working on modern data standards for exchange of information for analytical data, and the FAIR (Finable, Accessible, Interoperable and Re-usable) initiatives [39] have focussed more people on this issue (for example see the data format catalogue at FAIR Sharing [40]).

 

Capability

Making novel contributions to science - whether as an individual or through collaboration - naturally depends on the provision of clear and complete descriptions of methods and procedures. Uncertainty can lead to errors and failures to reproduce previous results. Gezelter argues that research should not be considered complete until data and metadata have been made available, or code has been documented and released, concluding that it will be necessary to raise the expectations for completing a project [16].

The so-called ‘ClimateGate’ affair led to consequential demands for scientists to “show their working” [41]. However, it is vital that that should not be a retrospective activity, undertaken only if requested: it should be an integral part of the planning and provenance recording of the experiments. Yeo described as a regular confounding factor the tendency for people to postpone recording their “working” and then rely on their memories [28]. Thesis writing by research students often involves dealing with the consequences of having delayed the huge organizational effort until the point of writing up. Delay embeds inaccuracies and uncertainties, both of which are inimical to the effective reuse of scientific knowledge in collaboration or for making further novel contributions.

The capture at source strategy underpins the primacy of the notebook for recording observations, leading Dartmouth College to offer the following advice to their students [42]:

If you are caught using the scrap of paper technique, your improperly recorded data may be confiscated by your TA or instructor at any time.

This emphasizes the need to record properly what you see as you see it. Many modern digital replacements for paper notebooks have not necessarily achieved the required laboratory compatibility to ensure this is possible. Ruggedized notebooks are one possible solution but there is a continuing need for high quality education in this area, but the best approach is not yet obvious, for example the use of templates does not always lead to better record keeping [43].

Collaboration and further contribution are, of course, dependent on the easy transferability of recorded information, so digital capture at source is clearly preferable to recording in a paper notebook, as examined fully in our review of notebooks in the digital era [29]. The review describes initiatives for facilitating digital capture, including the Smart Tea project conducted at Southampton [44]. More recently, we have developed techniques for the direct capture of experimental conditions using sensors and Internet of Things devices and indeed the use of speech interfaces.

 

Conclusions

Reducing uncertainty is both an ingredient and a requirement of open science. The best way to achieve that goal is arguably to be open about everything, which is the thinking behind Open Notebook Science, of which Matthew Todd is a notable proponent. For example, he and his team conducted their research project investigating the Pictet–Spengler route to Praziquantel in public view [45]. Given the concern expressed recently that some scientists are neglecting the traditional scientific ethos, owing to narcissistic motivations [14], we believe that greater openness in scientific practice would not only militate against such trends but also improve reproducibility.

Open Notebook Science will not always be the appropriate way to work, owing to commercial and intellectual property considerations, as well as issues relating to recognition and to the sustainability of long term projects. The nature of most current industrial research and the patent process necessarily involves a trade-off between privacy and disclosure. However, future science and indeed innovation becomes much easier if the data, procedures and methods are open and therefore the inevitable uncertainties have as much as possible been understood and subsequently resolved.

The examples discussed do illustrate the key role of e-Science (Cyber-Science) in enabling the application of the ideas of open science to the scale of much of modern research methods. The ideals of the early communicators of science (for example the Royal Society) in conveying all the details of the research, which fell away as the scale of research data became unmatched by the scale of dissemination, are again possible. This is the subject of another series of talks and paper which we plan to formalise in the near future.

 

Acknowledgements

The discussion reported in this paper was presented at the Beilstein meeting on Open Science and have been informed by the research undertaken in projects supported by the following grants under the EPSRC Science, UK e-Science, and Digital Economy programmes including: Digital Economy IT as a Utility Network+ (EP/K003569/1), myGrid: A Platform for e-Biology Renewal (EP/G026238/1), South East Regional e-Research Consortium (EP/F05811X/1), PLATFORM: End-to-End pipeline for chemical information: from the laboratory to literature and back again (EP/C008863/1), Structure-Property Mapping: Combinatorial Chemistry & the Grid (GR/R67729/1), Basic Technology: New Technology for nanoscale X-ray sources: Towards single isolated molecule scattering (GR/R87307/01) and many valuable discussions with all our colleagues and friends in Chemistry, the wider physical, digital, social and life sciences.

 

References

[1]

Frey, J. G. Is open science an inevitable outcome of e-science? At 251st American Chemical Society National Meeting & Exposition - Computers in Chemistry, United States. 13–17 Mar 2016. https://eprints.soton.ac.uk/389833/ (accessed June 5, 2018).

[2]

Open science and the chemistry lab of the future, Beilstein Open Science Symposium 2017, 22–24 May 2017, Hotel Jagdschloss Niederwald, Rüdesheim, Germany (accessed June 12, 2018).

[3]

Uncertainties in Chemistry. Interview with Jeremy G. Frey recorded at the Beilstein Open Science Symposium 2017. http://www.beilstein.tv/video/uncertainties-in-chemistry/ June 22, 2017 (accessed June 12, 2018).

[4]

Rumsfeld, D. 2002, quoted in: http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636 (accessed June 12, 2018).

[5]

Logan, D. C. J. Exp. Bot. 2009, 60, No. 3, 712–714. doi:10.1093/jxb/erp043

[6]

Popper, K. R. The Logic of Scientific Discovery; 1934 (as Logik der Forschung, English translation 1959), ISBN 0-415-27844-9.

[7]

Bird, C. L.; Willoughby, C.; Coles, S. J. and Frey, J. G.Information Standards Quarterly 2013, Fall 2013, 25 (3): 4–12. doi:10.3789/isqv25no3.2013.02

[8]

Twiss-Brooks, A. Chemical Information Bulletin 2012, 64 (1), 2012, available from http://bulletin.acscinf.org/node/306 (accessed June 12, 2018).

[9]

Murray-Rust, P. Principles and practice of Open Science, 2015, presentation available at: http://www.slideshare.net/petermurrayrust/principles-and-practice-of-open-science (accessed June 12, 2018).

[10]

Grossman, A. Publishing in transition – do we still need scientific journals? 2015, available at: https://www.scienceopen.com/document?vid=ad4713e2-4db2-4e37-bcf3-94ff0311fc7c (accessed June 12, 2018).

[11]

Baker, M. Nature 2016, 533, 452-454. http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 (accessed June 12, 2018).

[12]

Nosek, B. A. and Errington, T. M. eLife2017, 6, e23383, doi:10.7554/eLife.23383.001

[13]

Sample, I. “Study delivers bleak verdict on validity of Psychology experiment results” The Guardian August 2015. https://www.theguardian.com/science/2015/aug/27/study-delivers-bleak-verdict-on-validity-of-psychology-experiment-results (accessed June 12, 2018).

[14]

Devlin, H. “Science falling victim to ‘crisis of narcissism’” The Guardian January 2017 https://www.theguardian.com/science/2017/jan/20/science-victim-crisis-narcissism-academia  (accessed June 12, 2018).

[15]

Rosenblatt, M. Sci. Transl. Med. 2016, Vol. 8, Issue 336, pp. 336ed5. doi:10.1126/scitranslmed.aaf5003

[16]

Gezelter, D. “What, exactly, is Open Science?” 2009. http://www.openscience.org/blog/?p=269 (accessed June 12, 2018). 

[17]

European Commission, “Open Science (Open Access)”, Horizon 2020. https://ec.europa.eu/programmes/horizon2020/en/h2020-section/open-science-open-access (accessed June 12, 2018). 

[18]

Hey, T. and Payne, M. C. Nat. Phys. 2015, 11, 367–369. doi:10.1038/nphys3313 

[19]

Thaney, K. “Laying out the principles of open science”, 2009. http://www.slideshare.net/kaythaney/laying-out-the-principles-of-open-science-presentation (accessed June 12, 2018).

[20]

Open Science Framework, “Open Science Collaboration”. https://osf.io/vmrgu/wiki/home/ (accessed June 12,2018).

[21]

According to Wikipedia, amongst other sources, the concept has been attributed to Bernard of Chartres, in the 12th century: https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants (accessed June 12,2018).

[22]

David, P. A. Industrial and Corporate Change 2004, 13, (4): 571–589. doi:10.1093/icc/dth023

[23]

Chapman, A. personal communications, 2016 and 2017.

[24]

Boyle, R. “The First Essay Concerning the Unsuccessfulness of Experiments”, 1673, obtained from: “Science Quotes by Robert Boyle” http://todayinsci.com/B/Boyle_Robert/BoyleRobert-Quotations.htm (accessed June 12,2018).

[25]

Chapman, A. Proc. R. Inst. G. B. 1996,67, 239–275. Available from: https://web.archive.org/web/20110306084446/http://home.clara.net/rod.beavon/leonardo.htm (accessed June 12,2018).

[26]

Bartling, S. and Friesike, S., Eds. Opening Science – The Evolving Guide on How the Web is Changing Research, Collaboration and Scholarly Publishing; Springer Open, 2014.

[27]

Könneker, C. and Lugger, B. Science 2013, 342(6154), 49–50. doi:10.1126/science.1245848

[28]

Yeo, R. Notebooks, English Virtuosi, and Early Modern Science; University of Chicago Press, 2014, ISBN: 9780226106564.

[29]

Bird, C., Willoughby, C. and Frey, J. G. Chem. Soc. Rev. 2013, 42, 8157–8175. doi:10.1039/C3CS60122F

[30]

Yeo, R. Encyclopaedic Visions: Scientific Dictionaries and Enlightenment Culture; Cambridge University Press, 2001, ISBN 978-0-521-65191-2.

[31]

Yeo, R. Defining science; Cambridge University Press, 1993. ISBN 9780511521515. doi:10.1017/CBO9780511521515

[32]

Harvey, M. J.; Mason, N. J. and Rzepa, H. S. J. Chem. Inf. Model. 2014, 54 (10), 2627–2635. doi:10.1021/ci500302p

[33]

Bird, C.; Coles, S. J. and Frey, J. G. Mol. Inf. 2015, 34, 585–597. doi:10.1002/minf.201500008

[34]

Lyon, L.; Jeng, W. and Mattern, E. International Journal of Digital Curation 2017,12 (1). doi:10.2218/ijdc.v12i1.530

[35]

Milsted, A. J.; Hale, J. R.; Frey, J. G. and Neylon, C. PLoS One, 2013, doi:10.1371/journal.pone.0067460

[36]

Lang, G. I. and Botstein, D. PLos One 2011. doi:10.1371/journal.pone.0025290

[37]

Coles, S. J.; Frey, J. G.; DeRoure, D. and Hursthouse, M. The CrystalGrid Collaboratory Foundation Workshop, Southampton, 13–17 September, 2004: a selection of presentationshttp://eprints.soton.ac.uk/9777/ (accessed June 12, 2018).

[38]

Allotrope Foundation – Rethinking Scientific Data. https://www.allotrope.org (accessed June 12, 2018).

[39]

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. Scientific Data 2016, 3, 160018.  doi:10.1038/sdata.2016.18

[40]

McQuilton, P.; Gonzalez-Beltran, A.; Rocca-Serra, P.; Thurston, M.; Lister, A.; Maguire, E.; Sansone, S.-A. Database 2016, May 17. doi:10.1093/database/baw075https://fairsharing.org (accessed June 12,2018).

[41]

Hulme, M. and Ravetz, J. 'Show Your Working': What 'ClimateGate' means, BBC News, 2009, available at: http://news.bbc.co.uk/2/hi/technology/8388485.stm (accessed June 12,2018).

[42]

Dartmouth College ChemLab, "How to Keep a Notebook", originally from https://www.dartmouth.edu/~chemlab/info/notebooks/how_to.html Accessed in 2013 but currently the ChemLab site is unavailable. Copies are available, e.g. at: https://web.archive.org/web/20100501224102/https://www.dartmouth.edu/~chemlab/info/notebooks/how_to.html  (accessed June 12, 2018).

[43]

Willoughby, C.; Logothesis, T. A. and Frey, J. G. J. Cheminform. 2016, 8:9. doi:10.1186/s13321-016-0118-6

[44]

Hughes, G.; Mills, H.; De Roure, D.; Frey, J. G.; Moreau, L.; Schraefel, M. C.; Smith, G. and Zaluska, E. Org. Biomol. Chem. 2004, 2, 3284–3293. doi:10.1039/b410075a

[45]

Todd, M. et al, Pictet-Spengler route to Praziquantel, 2013, available at: http://www.ourexperiment.org/racemic_pzq (accessed June 12,2018).

 

List of figures

1     

© Jeremy G. Frey

2

© Jeremy G. Frey ­­­- Picture sources:

Newton’s Trinity College Student Notebook, http://cudl.lib.cam.ac.uk/view/MS-ADD-03996/5 (accessed June 12, 2018)

Alexander Graham Bell's unpublished laboratory notebook https://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg (accessed June 12, 2018)

Einstein - John D. Norton's homepage, https://www.pitt.edu/~jdnorton/Goodies/Zurich_Notebook/32R.jpeg (accessed June 12, 2018)

MoreTea Notebook Project, https://www.slideshare.net/jgf/blogs-logs-pods-smart-labs (accessed June 12, 2018)

LabTrove https://malaria.ourexperiment.org/daraprim_synthesis/15428/1_g_scaleup_of_alternate_Daraprim_reaction_SGS_163_from_24chlorophenyl32methylpropoxypent2enenitrile_SGS_152.html (accessed June 12, 2018)

3

© Jeremy G. Frey