Introduction to data management best practices


With the realisation that even linked data may not be enough for scientists (1), and as the European Union decided to embrace open access and best practices in data management (2–4), many psychologists find themselves treading on an unfamiliar terrain. Given that ~85% of health research is wasted, this is nothing short of a pressing issue in related fields.

Here, I comment on the FAIR Guiding Principles for scientific data management and stewardship (5) for the benefit of myself and perhaps others, who have not been involved with data management best practices.

[Note: all this does NOT mean that you are forced to share sensitive data. But if your work can not be checked or reused (even after anonymisation), calling it scientific might be a stretch.]

What goes in a data management plan?

A necessary document to accompany any research plan is the data management plan. This plan should first of all specify the purpose of the data collection, and how it relates to the objectives of one’s research project. It should state which types of data are collected – for an example in the context of an intervention to promote physical activity, one might collect survey data, as well as accelerometer and body composition measures. The steps to assure the quality of the data can be described, too.

Next, the file formats for this data should be specified, along with which parts of the data will be made openly available, if the whole data is not made so. When and where will the data be made available, and what software is needed to read it? Will there be restrictions to access? Will there be an embargo, and if so, why?

The data management plan should also state, whether existing data is being re-used. The researcher should clarify the origin of data, whether existing or new, comment on its size (if known), and outline for whom the data will be useful to (4).

Bad practices leading to unusable data are still common, so adopting proper data management practices can incur costs. The data management plan should explicate these, how they are covered and who is responsible for the data management process.

The importance of collecting original data in psychology cannot be overstated. Data are a conditio sine qua non for any empirical science. Anyone who generates data and shares them publicly should be adequately recognized. (6)

Note: metadata means any information about the data. For example, descriptive metadata increases discovery and identification; includes elements such as keywords, title, abstract, author. Administrative metadata informs the management of the data; creation dates, file types, version numbers.

The FAIR principles for data management

The FAIR principles have been composed to help both machines and humans (such as meta-analysts) to find and use existing data. The principles consist of four requirements: Findability, Accessibility, Interoperability and Reusability. Note that the adherence to these principles is not just a yes-no question, but a gradient where data stewards should aspire for an increased uptake.

Below, the exact formulation of the (sub-)principles is in italics, my comments in bullet points.


F1. data are assigned a globally unique and eternally persistent identifier.

  • This is mostly handled in psychological research by making sure the research document is supplied with a DOI (Digital Object Identifier (7)). In addition to journals (for published research), most repositories where one can deposit any material (such as FigShare or Zenodo), or preprints (such as PsyArxiv), assign the work a DOI automatically.

F2. data are described with rich metadata.

  • This relates to R1 below. There should be data about the data telling you what the data is. Also: What is your approach to making versioning clear? In the Open Science Framework (OSF), you can upload new versions of your document and it automatically saves the previous version behind the new one, given that the new file has the same name as the old one.
  • Your data archiver helps you with metadata. E.g. the Finnish Social Science Data Archive (FSD) uses the DDI 2.1. metadata standard.

F3. data are registered or indexed in a searchable resource.

  • The researcher should deposit the data in a searchable repository. Your own website, or the website of your research group, is unfortunately not enough.

F4. metadata specify the data identifier.

  • Make sure your data actually shows its DOI somewhere, and include a link to the dataset in the metadata. As far as I know, repositories such as the OSF do this for you.

Non-transparent, inaccessible data. [Photo by Maarten van den Heuvel on Unsplash.]

  • From what I understand, these are not too relevant to individual researchers. Basically, if your work can be accessed via “http://”, you are complying with this. You should also be mindful of storing your data in one repository only, and avoid having multiple DOIs. Regarding A2: if your data is sensitive and you cannot share it openly, the description of the data should still be accessible to researchers. I am not certain about how repositories deal with accessibility after the data has been taken offline.

A1. data are retrievable by their identifier using a standardized communications protocol.

A1.1 the protocol is open, free, and universally implementable.

A1.2 the protocol allows for an authentication and authorization procedure, where necessary.

A2. metadata are accessible, even when the data are no longer available.


  • Behind these items (and the FAIR principles in general) is the idea that machines could read the data and mine it for e.g. meta-analyses. I am blissfully unaware of the intricacies related to that endeavour, so I just comment from the perspective of a common researcher here.

I1. data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

  • It is better to prefer simple formats (e.g. spreadsheets with comma-separated values, “file.csv”) that can be opened without special software (e.g. SPSS, “file.sav”).

I2. data use vocabularies that follow FAIR principles.

  • This principle may seem somewhat vague and hard for others than computer scientists to grasp. It relates to index terms or glossaries used. In psychology, one possibility would be the APA thesaurus used by Psycinfo.

I3. data include qualified references to other (meta)data.

  • This should be a given, and the citation culture of psychology seems well-equipped to follow. But it is still important to cite the original source of questionnaires, accelerometer algorithms etc.

Accessible, transparent and FAIR data. [Photo by Pahala Basuki on Unsplash.]

R1. data have a plurality of accurate and relevant attributes.

  • This means that the research should be accompanied with e.g. tags or a description, which provides sufficient information to determine the value of reuse for the information seekers.

R1.1. data are released with a clear and accessible data usage license.

  • You should state what licence is the work under. It is commonly recommended to use “CC0”, which allows all reuse, even without attribution. The second-best alternative, “CC-BY” (which requires attribution), can lead to interpretation problems of attribution stacking, when licences pile on each other (see chapter 10.4 in reference 8). It is a commonly accepted practice to cite others’ work in psychology, so CC0 seems a reasonable option, though I sympathise with the (almost invariably unfounded) fear of being scooped.

R1.2. data are associated with their provenance.

  • This means that the source of the data is clear, so that the data can be cited.

R1.3. data meet domain-relevant community standards.

  • In psychology, there are not many well-known community standards, but e.g. the DFG guidelines (6) are showing the way.


The FAIR principles can be hard to comply with exhaustively, as they are sometimes difficult to interpret (even by people who work in data archives) and take a lot of effort implement. Hence, everyone should consider whether their data is FAIR enough. As with open data in general, one should be able to describe why best practices could not be followed, when that is the case. But—for the sake of ethics if nothing else—we should aim to do the best we can.

Additional information on the FAIR principles can be found here, and some difficulties in assessing the adherence to them in (9). A 20min webinar in Finnish is available here.



  1. Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, et al. Why linked data is not enough for scientists. Future Gener Comput Syst. 2013;29(2):599–611.
  2. Khomami N. All scientific papers to be free by 2020 under EU proposals. The Guardian [Internet]. 2016 May 28 [cited 2017 Mar 29]; Available from:
  3. European Commission. Open access – H2020 Online Manual [Internet]. [cited 2017 Mar 29]. Available from:
  4. European Commission. Guidelines on data management in Horizon 2020 [Internet]. 2016 [cited 2017 Mar 29]. Available from:
  5. Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018.
  6. Schönbrodt F, Gollwitzer M, Abele-Brehm A. Data Management in Psychological Science: Specification of the DFG Guidelines [Internet]. 2017 [cited 2017 Mar 29]. Available from:
  7. International DOI Foundation. Digital Object Identifier System FAQs [Internet]. [cited 2017 Mar 29]. Available from:
  8. Briney K. Data Management for Researchers: Organize, maintain and share your data for research success [Internet]. Pelagic Publishing Ltd; 2015 [cited 2017 Mar 29]. Preview available from:
  9. Dunning A. FAIR Principles – Connecting the Dots for the IDCC 2017 [Internet]. Open Working. 2017 [cited 2017 Mar 29]. Available from:


The legacy of social psychology

To anyone teaching psychology.

In this post I express some concerns about the prestige given to ‘classic’ studies, which are widely taught in undergraduate social psychology courses around the world. I argue that rather than just demonstrating a bunch of clever but dodgy experiments, we could teach undergraduates to evaluate studies for themselves. To exemplify this, I quickly demonstrate power, Bayes factors, the p-checker app and the GRIM test.

psychology’s foundations are built not of theory but with the rock of classic experiments

Christian Jarrett

Here is an out-of-context quote from Sanjay Srivastava from a while back:


This got me thinking about why and how we teach classic studies.

Psychologists usually lack the luxury of well-behaving theories. Some have thus proposed that the classic experiments, which have survived in the literature until the present, serve as the bedrock of our knowledge 1. In the introduction to a book retelling the stories of classic studies in social psychology 2, the authors note that classical studies have “played an important role in setting the research agenda for the field as it has progressed over time” and “serve as common points of reference for researchers, teachers and students alike”. The authors continue by pointing out that many of these classics lacked sophistication, but that this in fact is a feature of their enduring appeal, as laypeople can understand the “points” the studies make. Exposing the classics to modern statistical methods, would thus miss their point.

Now, this makes me wonder; if the point of a study is not to assess the existence of a phenomenon, what in the world may it be? One answer would be to serve as historical examples of practices no longer considered scientific, but I doubt this is what’s normally thought. Notwithstanding, I wanted to dip into the “foundations” of our knowledge by demostrating the use of some more-or-less recently developed tools on a widely known article. According to Google Scholar, the Festinger and Carlsmith cognitive dissonance experiment 3 has been cited for over three thousand times, so its influence is hard to downplay.


But first, a necessary digression: statistical power is the probability of detecting a “significant” effect of the postulated size, if the null hypothesis is false. As explained in Brunner & Schimmack 4, it is an interesting anomaly that the statistical power of studies in psychology is usually small, but almost all of them end up finding these “significant” results. As to how small, power doubtfully exceeds 50% 5–7, and for small (conventional?) effect sizes, the mean has been shown to be as low as 24%. As a recent replication project regarding the ego depletion effect 8 exemplified, a highly “replicable” (as judged by the published record) phenomenon may turn out to be a fluke, when null findings are taken into account. This has recently made psychologists consider the uncomfortable possibility, that entire research lines consisting of “accumulated scientific evidence” may in fact not contain that much evidence 9,10.

So, what is the statistical power of Festinger and Carlsmith? Using G*Power 11, it turns out that they had 80% chance to discover a humongous effect of d = 0.9, and only a coin flip’s probability to find a (still large) effect of d = 0.64. Now, if an underpowered study finds an effect, with current practices it is likely to be exaggerated, and/or even of the wrong sign 12. Here would be a nice opportunity to demonstrate these concepts to students.

Considering the low power, it may not come as a surprise that the evidence the study provided was low to begin with. A Bayes Factor (BF) is an indicator of evidence for one hypothesis, in relation to another. In this case, a BF of ~3 moves an impartial observer from being 50% sure the experiment works to being 75% sure, or a skeptic from being 25% sure to being 43% sure that the effect is small instead of nil.

It would be relatively simple to introduce Bayes Factors with this study. The effect of a prior scale in this case does not matter much for reasonable choices, as exemplified with a plot made in JASP with two clicks:

Figure 1: Bayes factor robustness check for the main finding of the dissonance study. Plotted by JASP, using n=20 for both groups, a t-value of 2.48 and a cauchy prior scale of 0.4.

Nowadays it is possible to easily check, whether a paper correctly reports test statistics and their associated p-values. The p-checker app (this link feeds the relevant statistics to the app) can do this, and it turns out that most of the t-values in the paper are incorrectly rounded down (assuming, that “significant at the 0.08 level” means p < 0.08). You can demonstrate this by including the link on your slides, using it to go to p-checker and choosing “p-values correct?”.

Finally, you can look at the study using the GRIM test 13, which evaluates if the reported means are mathematically possible. As it turns out, a quarter of the reported means in the table with the main results do not pass the test. One more time: 25% of the reported means are mathematically impossible. The most likely explanation for this is shoddy reporting of means or accidental misreporting of sample sizes, but I find it telling that—to my knowledge, at least—the issue has not come up in fifty years of scientific investigation.

Figure 2: Main results table of the Festinger & Carlsmith study. Circled means are mathematically impossible given the reported sample sizes.

Now, even though I have doubts about this study, as well as the process by which the theory has “evolved” 14, it does not mean that cognitive dissonance effects do not exist. It is just that the research may not have been able to capture the essence of this everyday phenomenon (which, if it exists, can influence behaviour without the help of academics). Under the traditional paradigm of psychological science, fraught with publication bias and unhelpful incentives 10, a Registered Replication Report (RRR) -type of work would be needed, and even that could only test one operationalisation. As an undergraduate, I would have been exhilarated to hear early about how and why such initiatives work, and why the approach is much more informative than any singular experiments.

Returning to the notion of the bedrock of psychology, consisting of classic experiments instead of theories as in the natural sciences 1. Perhaps we need a more solid foundation, regardless of whether some flashy findings from decades ago happened to spur out a progressive-ish 15,16 line of research.

How would such foundation come to be? Maybe teaching could play a role?


  1. Jarrett, C. Foundations of sand? The Psychologist 21, 756–759 (2008).
  2. Smith, J. R. & Haslam, S. A. Social psychology: Revisiting the classic studies. (SAGE Publications, 2012).
  3. Festinger, L. & Carlsmith, J. M. Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology 58, 203–210 (1959).
  4. Brunner, J. & Schimmack, U. How replicable is psychology? A comparison of four methods of estimating replicability on the basis of test statistics in original studies. (2016).
  5. Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365–376 (2013).
  6. Cohen, J. Things I have learned (so far). American psychologist 45, 1304 (1990).
  7. Sedlmeier, P. & Gigerenzer, G. Do studies of statistical power have an effect on the power of studies? Psychological bulletin 105, 309 (1989).
  8. Hagger, M. S. et al. A multi-lab pre-registered replication of the ego-depletion effect. Perspectives on Psychological Science (2016).
  9. Earp, B. D. & Trafimow, D. Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol 6, 621 (2015).
  10. Smaldino, P. E. & McElreath, R. The Natural Selection of Bad Science. arXiv preprint arXiv:1605.09511 (2016).
  11. Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39, 175–191 (2007).
  12. Gelman, A. & Carlin, J. Beyond Power Calculations Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science 9, 641–651 (2014).
  13. Brown, N. J. L. & Heathers, J. A. J. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science (2016). doi:10.1177/1948550616673876
  14. Aronson, E. in The science of social influence: Advances and future progress (ed. Pratkanis, A. R.) 17–82 (Psychology Press, 2007).
  15. Lakatos, I. History of science and its rational reconstructions. (Springer, 1971).
  16. Meehl, P. E. Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry 1, 108–141 (1990).


How lack of transparency feeds the beast

This is a presentation I held for the young researchers branch of the Finnish Psychological Society. I show how low power and lack of transparency can lead to weird situations, where the published literature contains little or no knowledge.


We had big fun with Markus Mattsson and Leo Aarnio in a seminar, presenting to a great audience of eager young researchers.

The slides for my talk are here:

If you’re interested in more history and solutions, check out Felix Schönbrodt‘s slides here. Some pictures were made adapting code from a wonderful Coursera MOOC by Daniel Lakens. For Bayes, check out Alexander Etz‘s blog.

Oh, and for the monster analogy; this piece made me think of it.

The myth of the magical “Because”

In this post I try to answer the call for increased transparency in psychological science by presenting my master’s thesis. I ask for feedback about the idea and the methods. I’d also appreciate suggestions for which journal it might be wise to submit the paper I’m now starting to write with co-authors. Check out OSF for the Master’s thesis documents and a supplementary website for analyses in the manuscript in preparation (I presented the design analysis in a previous post).

In my previous career as a marketing professional, I was often enchanted by news about behavioral science. Such small things could have such large effects! When I moved into social psychology, it turned out that things weren’t quite so simple.

One study that intrigued me was done in the 70’s, and has since gained huge publicity (see here and here, for examples). The basic story is, that you could use the word because to get people to do things, due to a learned “reason → compliance” link.


Long story short, I was able to experiment in a within-trial setting of a health psychology intervention. Here’s a slideshow adapted from what I presented in the annual conference of the European Health Psychology Society:


Things I’m happy about:

  • Maintaining a Bayes Factor / p-value ratio of about 1:2. It’s not “a B for every p“, but it’s a start…
  • Learning basic R and redoing all analyses in the last minute, so I wouldn’t have to mention SPSS 🙂
  • Figuring out how this pre-registration thing works, and registering before end of data collection.
  • Using the word “significant” only twice and not in the context of results.

Things I’m not happy about:

  • Not having pre-registered before starting data collection.
  • Not knowing what I now know, when the project started. Especially about theory formation and appraisal (Meehl).
  • Not having an in-depth understanding of the mathematics underlying the analyses (although math and logic are priority items on my stuff-to-learn-list).
  • Not having the data public… yet. It will be in 2017 the latest, but hopefully already this autumn.

A key factor for fixing psychological science is transparency; making analyses, intentions and data available for all researchers. As a consequence, anyone can point out inconsistencies and use the findings to elaborate on the theory, making accumulation of knowledge possible.

Science is all about predicting, and everyone knows how anyone can say “yeah, I knew that’d happen”. The most impressive predictions are those made well before things start happening. So don’t be like me, and pre-register your study before the start of data collection. It’s not as hard as it sounds! For clinical trials, this can be done for free in the WHO-approved German Clinical Trials Register (DRKS). For all trials, the Open Science Framework (OSF) website can be used for pre-registering plans and protocols, as well as making study data available for researchers everywhere.There’s also an extremely easy-to-use pre-registration site AsPredicted.

One can also use the OSF website as a cloud server to privately manage one’s workflow (for free). As a consequence, automated version control protects the researcher in the case of accusations of fraud or questionable research practices.

ps. If there’s anything weird in that thesis, it’s probably because I have disregarded some piece of advice from Nelli Hankonen, Keegan Knittle and Ari Haukkala, for whose comments I’m indebted to.

Defeating the crisis of confidence in science: 3 + 3 ideas

[Update 6. March 2016: new figure for the Bayesian RP:P + some minor changes]

The first thing you need to know about practical science is that it is not a miraculous (or often, even awesome) way to learn about the world. As put in the excellent blog Less Wrong, it is just the first set of methods that isn’t totally useless when trying to make sense of the modern world. Although problems are somewhat similar in all sciences, I will focus on psychology here.

One of the most important projects in the history of psychology was published in the journal Science in the end of August. In the “Reproducibility Project: Psychology”, 356 contributors tried to re-do 100 studies published in high-profile psychology journals (all in 2008). (You can download the open data and see further info here.) Care was taken to mimic the original experiments as closely as possible, just with many more participants for increased reliability.

The results? Not too flattering: the effects in the replications were only about half as big as in the original studies. Alexander Etz provides an informative summary in this figure from this blog about a recent paper:

If Bayes Factor (B) is somewhere between 1/10 and 10 (or, for example 1/3 and 3, if you have less strict evidence standards), you can’t make confident conclusions. Most of the replicated studies (the ones on the lower left corner) never contained much information in the first place!

The number of “successes” in different fields of psychology depends on how you count and what you include (for example, what you think counts as social or cognitive psychology). For social psychology, the success rate of replication was somewhere between 8% (3 out of 38; by Replication Index) and 25% (14 out of 55; the paper in Science). Even conceding that the scientific method is not perfect, as referred to in the beginning, this was not what I expected to see.

My thoughts and beliefs often torment the hell out of me, so I’ve learned to celebrate when they turn out to be false. Thus, I ended up informing people at Helsinki University’s discipline of social psychology by replicating a “Friday-cake-for-no-reason” from a month earlier by one awesome colleague:

Behold; the Replicake!
Behold; the Replicake!

The messenger cake worked well in Helsinki, but unfortunately the news were too bitter a dish to many. The results raised several confused reactions from psychologists who refused to believe the sorry state of status quo. I like the term “hand-wringling” as a description, the loudest arguments being (in no specific order):

If you're a researcher in the psychological sciences, denial of troubles isn't a real option.
Don’t just sit there, go make the world a better place!
  1. The replicators didn’t know what they were doing.
  2. The studies replicated weren’t representative of the actual state of art.
  3. This is how science is supposed to work, no cause for alarm!
    1. … because some fields are doing even worse (e.g. replicability of cancer biology may be just 10%-25%, economics ~49%, psychiatry less than 40% etc.)
    2. … because non-replications are a part of the self-correcting nature of science.

Andrew Gelman answers these points eloquently, so I won’t go much deeper into them. Note also, that Daniel “Stumbling on Happiness” Gilbert & co. used these arguments in their much publicised (but unhappily, flawed) critique of the psychology’s replication effort.

Suffice it to say that I value practicality; claiming there is a phenomenon only you can show (ideally, when no-one is looking) doesn’t sound too impressive to me. What worries me is this: science is supposed to embrace change and move forward with cumulative knowledge. Instead, researchers often take their favourite findings to a bunker and start shouting profanities to whoever wants to have a second look.

Researchers take their favourite findings to a bunker and start shouting profanities to whoever wants to have a second look.

I think anyone who’s seen the Internet recognises the issue. Personally, I find it hard to believe that arguing can change things, so I’d rather see people exemplify their values by their actions.

Sometimes I find consolation in Buddhist philosophy, so here are some thoughts maybe worth considering when you need to amp up your cognitive flexibility:

1. “You” are not being attacked.

Things are non-self. Just as wishing doesn’t make winning the lottery more likely, a thought of yours that turns out to be ill-informed doesn’t destroy “you”. Your beloved ideas, whether they concern the right government policy, the right way to deal with refugees or the right statistical methods in research, may turn out to be wrong. It’s okay. You can say you don’t know or don’t expect your view to be the final solution.

When Richard Ryan visited our research group, I asked him when does he expect his self-determination theory to die. The answer was fast: “When it gets replaced by a higher-order synthesis”. He had thought about it and I respect that.

2. The business of living includes stress, but it’s worse if you cling to stuff.

Wanting to hold on to things you like and resist things you don’t is normal, but takes up a lot of energy. You might want to try not gripping so hard and see if it makes an actual difference in how long the pleasures or displeasure lasts. So; if your ideas are under fire, take a moment to think about what life would be without whatever it is being threatened.

One of the big ideas in science is that we need big ideas. And, of course, big ideas are exciting. The problem is that most ideas – big or small – will turn out to be wrong and if we don’t want to be spectacularly wrong, we might want to take small steps at a time. As Daniel Lakens, one of the authors of the reproducibility project, put it:


3. Nothing will last (and this, too, will pass – but the past will never return).

Although calls for change in research practices have been made for at least half a century, this time the status quo is currently going away fast. It might be a product of the accelerating change we see in all human domains. It’s impossible to predict how things end up, but change isn’t going away. What you can do, is to try create the kind of change that reflects what you think is right.

For an example in research, take statistical methods, where the insanity of the whole “p<0.05 -> effect exists” approach has become more and more common knowledge in the recent years. Another change is happening in publishing practices; we are no more bound by the shackles of the printing press, which did serve science well for a long time. This means infinite storage space for supplements and open data for anyone to confirm or elaborate upon another researcher’s conclusions. Of course, the traditional publishing industry isn’t that happy about seeing their dominance crumble. But in the end, they too must change to avoid the fate of (music industry) dinosaurs in this NOFX-song from 15 years ago.

Mentions of
Change in action: Mentions of “Bayesian” in the English literature since the death of rev. Thomas Bayes in 1761. Click for source in Google Ngram. For an intro to Bayesian ideas, check out this or this.

Good research with real effects does exist!

The reproducibility project described above was actually not the first large-scale replication project in psychology. Projects called “Many Labs”, where effects in psychology are tested with different emphases, are just now beginning to bear fruit:

  • Many Labs 1 (over 6 000 participants; published fall 2014) picked 13 classic and contemporary effects and managed to replicate 10 consistently. Priming studies were found hard to replicate. Interestingly the fact that most psychology studies are conducted on US citizens didn’t have much of an effect.
  • Many Labs 2 (ca. 15 000 participants; expected on October 2015) studied how effects vary across persons and situations.
  • Many Labs 3 (around 3 500 participants; currently in press) studied mainly the so-called “semester-effects”. As study participants usually are university students, it has been thought they might behave differently in different points of the semester. Apparently they don’t, which is good news. The not-so-good news is that only three of the original 10 results was replicated.
  • Many Labs 4 (in preparation phase) will study how replicator expertise affects replicability, as well as whether involving the original author makes a difference.

These projects definitely will increase our understanding of psychological science, although suffer from some limitations themselves (such as the fact that e.g. really expensive studies get less replication attempts for practical reasons).

… It’s just really hard to tell what’s real and what’s not.

In the Cochrane Colloquium 2015, John “God of Meta-analysis” Ioannidis (the guy who published the 3000+ times cited paper Why Most Published Research Findings Are False) ended his presentation with a discouraging slide. He concluded that systematic reviews in biomedicine have become marketing tools with illusory credibility assigned to them.

The field I’m most interested in is health psychology. So when one of the world’s top researchers in the field tweeted that poorly performing meta-analyses are increasingly biasing psychological knowledge, I asked him to elaborate. Here’s his reply:


Susan Michie addressed the reproducibility problem in her talk at the annual conference of the European Health Psychology Society, with an emphasis on behavior change. She mostly addressed reporting, but questionable research practices are undoubtedly important, too.

Susan Michie presenting in EHPS 2015. Click to enlarge.
Susan Michie presenting in EHPS 2015. Click to enlarge.

This became clear in the very same conference, when a PHD student told me how a professor reacted to his null results: “Ok, send me your data and you’ll have a statistically significant finding in two weeks”. I have hope that young researchers are getting more savvy with methods and more confident that the game of publishing can be changed. This opens the door for fraudulent authority figures to exit the researcher pool like Diederik Stapel – by the hands of their students, instead of a failed peer-review process.

“Ok, send me your data and you’ll have a statistically significant finding in two weeks”.

– a professor’s reaction to null results


Based on all the above, here’s what I think makes science worth the taxpayers’ money:

  1. Sharing and collaborating. Not identifying with one’s ideas. Maybe openness to the possibility of being wrong is the first step towards true transparency?
  2. Doing good, cumulative research [1], even if it means doing less of it. Evaluating eligibility for funding by the number of publications (or related twisted metrics) must stop. [2]
  3. Need to study how things can be made better, instead of just describing the problems. Driving change instead of clinging to the status quo!

[1] The need for better statistical education has been apparent for decades but not much has changed… Until the 2010s.

[2] See here for reasoning. (Any thoughts on this alternative?)