The legacy of social psychology

To anyone teaching psychology.

In this post I express some concerns about the prestige given to ‘classic’ studies, which are widely taught in undergraduate social psychology courses around the world. I argue that rather than just demonstrating a bunch of clever but dodgy experiments, we could teach undergraduates to evaluate studies for themselves. To exemplify this, I quickly demonstrate power, Bayes factors, the p-checker app and the GRIM test.

psychology’s foundations are built not of theory but with the rock of classic experiments

Christian Jarrett

Here is an out-of-context quote from Sanjay Srivastava from a while back:


This got me thinking about why and how we teach classic studies.

Psychologists usually lack the luxury of well-behaving theories. Some have thus proposed that the classic experiments, which have survived in the literature until the present, serve as the bedrock of our knowledge 1. In the introduction to a book retelling the stories of classic studies in social psychology 2, the authors note that classical studies have “played an important role in setting the research agenda for the field as it has progressed over time” and “serve as common points of reference for researchers, teachers and students alike”. The authors continue by pointing out that many of these classics lacked sophistication, but that this in fact is a feature of their enduring appeal, as laypeople can understand the “points” the studies make. Exposing the classics to modern statistical methods, would thus miss their point.

Now, this makes me wonder; if the point of a study is not to assess the existence of a phenomenon, what in the world may it be? One answer would be to serve as historical examples of practices no longer considered scientific, but I doubt this is what’s normally thought. Notwithstanding, I wanted to dip into the “foundations” of our knowledge by demostrating the use of some more-or-less recently developed tools on a widely known article. According to Google Scholar, the Festinger and Carlsmith cognitive dissonance experiment 3 has been cited for over three thousand times, so its influence is hard to downplay.


But first, a necessary digression: statistical power is the probability of detecting a “significant” effect of the postulated size, if the null hypothesis is false. As explained in Brunner & Schimmack 4, it is an interesting anomaly that the statistical power of studies in psychology is usually small, but almost all of them end up finding these “significant” results. As to how small, power doubtfully exceeds 50% 5–7, and for small (conventional?) effect sizes, the mean has been shown to be as low as 24%. As a recent replication project regarding the ego depletion effect 8 exemplified, a highly “replicable” (as judged by the published record) phenomenon may turn out to be a fluke, when null findings are taken into account. This has recently made psychologists consider the uncomfortable possibility, that entire research lines consisting of “accumulated scientific evidence” may in fact not contain that much evidence 9,10.

So, what is the statistical power of Festinger and Carlsmith? Using G*Power 11, it turns out that they had 80% chance to discover a humongous effect of d = 0.9, and only a coin flip’s probability to find a (still large) effect of d = 0.64. Now, if an underpowered study finds an effect, with current practices it is likely to be exaggerated, and/or even of the wrong sign 12. Here would be a nice opportunity to demonstrate these concepts to students.

Considering the low power, it may not come as a surprise that the evidence the study provided was low to begin with. A Bayes Factor (BF) is an indicator of evidence for one hypothesis, in relation to another. In this case, a BF of ~3 moves an impartial observer from being 50% sure the experiment works to being 75% sure, or a skeptic from being 25% sure to being 43% sure that the effect is small instead of nil.

It would be relatively simple to introduce Bayes Factors with this study. The effect of a prior scale in this case does not matter much for reasonable choices, as exemplified with a plot made in JASP with two clicks:

Figure 1: Bayes factor robustness check for the main finding of the dissonance study. Plotted by JASP, using n=20 for both groups, a t-value of 2.48 and a cauchy prior scale of 0.4.

Nowadays it is possible to easily check, whether a paper correctly reports test statistics and their associated p-values. The p-checker app (this link feeds the relevant statistics to the app) can do this, and it turns out that most of the t-values in the paper are incorrectly rounded down (assuming, that “significant at the 0.08 level” means p < 0.08). You can demonstrate this by including the link on your slides, using it to go to p-checker and choosing “p-values correct?”.

Finally, you can look at the study using the GRIM test 13, which evaluates if the reported means are mathematically possible. As it turns out, a quarter of the reported means in the table with the main results do not pass the test. One more time: 25% of the reported means are mathematically impossible. The most likely explanation for this is shoddy reporting of means or accidental misreporting of sample sizes, but I find it telling that—to my knowledge, at least—the issue has not come up in fifty years of scientific investigation.

Figure 2: Main results table of the Festinger & Carlsmith study. Circled means are mathematically impossible given the reported sample sizes.

Now, even though I have doubts about this study, as well as the process by which the theory has “evolved” 14, it does not mean that cognitive dissonance effects do not exist. It is just that the research may not have been able to capture the essence of this everyday phenomenon (which, if it exists, can influence behaviour without the help of academics). Under the traditional paradigm of psychological science, fraught with publication bias and unhelpful incentives 10, a Registered Replication Report (RRR) -type of work would be needed, and even that could only test one operationalisation. As an undergraduate, I would have been exhilarated to hear early about how and why such initiatives work, and why the approach is much more informative than any singular experiments.

Returning to the notion of the bedrock of psychology, consisting of classic experiments instead of theories as in the natural sciences 1. Perhaps we need a more solid foundation, regardless of whether some flashy findings from decades ago happened to spur out a progressive-ish 15,16 line of research.

How would such foundation come to be? Maybe teaching could play a role?


  1. Jarrett, C. Foundations of sand? The Psychologist 21, 756–759 (2008).
  2. Smith, J. R. & Haslam, S. A. Social psychology: Revisiting the classic studies. (SAGE Publications, 2012).
  3. Festinger, L. & Carlsmith, J. M. Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology 58, 203–210 (1959).
  4. Brunner, J. & Schimmack, U. How replicable is psychology? A comparison of four methods of estimating replicability on the basis of test statistics in original studies. (2016).
  5. Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365–376 (2013).
  6. Cohen, J. Things I have learned (so far). American psychologist 45, 1304 (1990).
  7. Sedlmeier, P. & Gigerenzer, G. Do studies of statistical power have an effect on the power of studies? Psychological bulletin 105, 309 (1989).
  8. Hagger, M. S. et al. A multi-lab pre-registered replication of the ego-depletion effect. Perspectives on Psychological Science (2016).
  9. Earp, B. D. & Trafimow, D. Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol 6, 621 (2015).
  10. Smaldino, P. E. & McElreath, R. The Natural Selection of Bad Science. arXiv preprint arXiv:1605.09511 (2016).
  11. Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39, 175–191 (2007).
  12. Gelman, A. & Carlin, J. Beyond Power Calculations Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science 9, 641–651 (2014).
  13. Brown, N. J. L. & Heathers, J. A. J. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science (2016). doi:10.1177/1948550616673876
  14. Aronson, E. in The science of social influence: Advances and future progress (ed. Pratkanis, A. R.) 17–82 (Psychology Press, 2007).
  15. Lakatos, I. History of science and its rational reconstructions. (Springer, 1971).
  16. Meehl, P. E. Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry 1, 108–141 (1990).


How lack of transparency feeds the beast

This is a presentation I held for the young researchers branch of the Finnish Psychological Society. I show how low power and lack of transparency can lead to weird situations, where the published literature contains little or no knowledge.


We had big fun with Markus Mattsson and Leo Aarnio in a seminar, presenting to a great audience of eager young researchers.

The slides for my talk are here:

If you’re interested in more history and solutions, check out Felix Schönbrodt‘s slides here. Some pictures were made adapting code from a wonderful Coursera MOOC by Daniel Lakens. For Bayes, check out Alexander Etz‘s blog.

Oh, and for the monster analogy; this piece made me think of it.

Getting Started With Bayes

This post presents a Bayesian roundtable I convened for the EHPS/DHP 2016 health psychology conference. Slides for the three talks are included.

bayes healthpsych cover

So, we kicked off the session with Susan Michie and acknowledged Jamie Brown who was key in making it happen, but could not attend.


Robert West was the first to present, you’ll find his slides “Bayesian analysis: a brief introductionhere. This presentation gave a brief introduction to Bayes and how belief updating with Bayes Factors works.

I was the second speaker, building on Robert’s presentation. Here are slides for my talk, where I introduced some practical resources to get started with Bayes. The slides are also embedded below (some slides got corrupted by Slideshare, so the ones in the .ppt link are a bit nicer).

The third and final presentation was by Niall Bolger. In his talk, he gave a great example of how using Bayes in a multilevel model enabled him to incorporate more realistic assumptions and—consequently—evaporate a finding he had considered somewhat solid. His slides, “Bayesian Estimation: Implications for Modeling Intensive Longitudinal Data“, are here.

Let me know if you don’t agree with something (especially in my presentation) or have ideas regarding how to improve the methods in (especially health) psychology research!

A short intro to what’s up

Slides below are a presentation of what I was up to in 2016.

Don’t pay too much attention, it’s not 2016 any more.

Currently (September 2019), I’m part of the Behaviour Change and Wellbeing group of the University of Helsinki, working on my doctoral dissertation on complex systems approaches to changing health behaviour, and planning a self-leadership intervention to improve occupational health.

The spirit of my dissertation is reflected in the CARMA syllabus.



Esittelyssä Raistlin Laplace

[See English version here.]


Raistlin Laplace on juuri saanut psykiatriltaan diagnoosin, jota hän istuutuu lukemaan keväisenä päivänä New Yorkin aurinkoisen Keskuspuiston penkille. Ohuet huulet tapailevat luisevien sormien pitelemää tuomiota: “Epistemologinen meluyliherkkyys”. Se liittyi jotenkin siihen, kuinka hahmoja (signaali) erotetaan melun (tai “kohinan”) keskeltä, kuinka esimerkiksi kanavien välille viritetty radio ei kerro paljoakaan soittolistojen laatijoiden musiikkimauista, koska melua on liikaa signaaliin nähden. Toisaalta ihmisaivot ovat hahmontunnistuskone vailla vertaa, ja voivat vaivatta havaita saatanallisia säkeitä takaperin soitetussa musiikissa tai Jeesuksen koiran anuksessa. Herra Laplacen ongelma oli käänteinen raamatusta salattuja koodeja etsivän väen tulokulmaan nähden; pakkomielteinen satunnaisuuden luomien illusoristen hahmojen välttäminen. Diagnoosi kävi tietyllä tavalla järkeen, mutta hän oli kauan sitten lakannut luottamasta asioihin, jotka kävivät järkeen.

Raistlin vietti lapsuutensa Napoli-nimisessä pikkukylässä Yhdysvalloissa. Se oli kuulostanut sopivan eurooppalaiselta hänen ranskalais-venäläisille siirtolaisvanhemmilleen, jotka halusivat tarjota ainoalle lapselleen suvaitsevaisen kasvuympäristön jostain vähemmän sotaisasta maankolkasta. Vasta muutettuaan heille valkeni, että Napoli oli tosiasiassa punaniskakylä, jossa vanhemmat käyttivät suuren osan päivästään työmatkoihin ja lapset verisiin tappeluihin naapurikylien nuorten kanssa.

Raistlin oli aina ollut olemukseltaan sairaalloinen, vaikkei juurikaan tavannut sairastaa. Hänen hintelä ja kalvakka ulkonäkönsä, sekä pistävän sinisten silmien ja pikimustan tukan luoma kontrasti sai alusta lähtien taikauskoiset vanhukset kuiskimaan. Tietoisena tästä, hän ala-asteikäisenä paikallisen kirjaston löydettyään huomasi nauttivansa suunnattomasti rajatiede-nurkkauksen kirjoista, ammentaen itseensä kaikkea okkultismista ja samanismista new age-niteisiin. Urheilusta – tai sen puoleen mistään muustakaan, mikä muita lapsia kiinnosti – hän ei koskaan välittänyt, ja olisikin kaikkein mieluiten vain halunnut viettää aikaa yksin kirjojensa parissa.

Ensimmäisen kerran hänen informaatiomaailmankaikkeutensa romahti, kun hän kaikkea kokeilleena joutui ylä-asteella myöntämään, etteivät rajatiede-nurkkauksen kirjojen rituaalit ja tekniikat toimineetkaan luvatusti. Kaikki aikuisten kirjoittama ei ollutkaan erehtymätöntä, kaikki informaatio ei sisältänytkään tietoa. Mutta tämä oli vasta alkua.

Eräänä New Yorkin syksyisenä sadepäivänä 28-vuotias Raistlin liimasi kiinni startup-yrityksensä konkurssihakemuksen sisältävän kirjekuoren ja mietti, mikä oli mennyt vikaan. Hän oli tehnyt kaiken oikein; lukenut oikeat kirjat, noudattanut menestysyritysten taktiikoita, kuunnellut satoja tunteja populaaripsykologiaa hyödyntäviä myyntikoulutusnauhoja, ottanut harkittuja riskejä ja tehnyt vuosia työtä periksiantamattomalla asenteella. Jossain vaiheessa rahat vain loppuivat ja –  velkojien hengittäessä niskaan – lisää ei tullut. Yksiönsä himmeässä valaistuksessa Raistlin hitaasti kasvavan kauhun vallassa pohti, mistä kirjailijat tiesivät niiden asioiden, joiden he väittivät tietävänsä todeksi, olevan totta? Erosivatko he tosiaan jollain tapaa hänestä itsestään, joka olisi voinut tällä hetkellä olla menestyvän teknologiayrityksen johtaja, mikäli vain muutama pikkuasia olisi sattunut menemään toisin?

Mies ei nukkunut sinä yönä. Hänen mielessään pyörivät ne lukemattomat tunnit, jotka hän oli viettänyt sanomalehtien parissa oppimatta mitään maailman toiminnasta. Uusista läpimurroista kertovat tiedeuutiset, joista kaikki olivat jälkeenpäin osoittautuneet ennenaikaisiksi; kaikki ne kirjat, joiden kirjoittajat luulivat kokemuksensa johtuvan satunnaistapahtumien sijaan omasta toiminnastaan ja kyvyistään.

Kaksi vuotta myöhemmin hän luki enää vain vertaisarvioituja tieteellisiä artikkeleita, kunnes matemaatikko-tilastotieteilijä John Ioannidisin artikkelista “Why most published research findings are false” seurannut keskustelu sai hänet sille kannalle, ettei tieteelliseenkään tietoon ole luottaminen. Informaation ja tiedon välinen suhde, josta hän oli ylä-asteella oppinut, alkoi muodostua hänelle pakkomielteeksi: Raistlin ei halunnut enää yhtään enempää informaatiota, hän janosi tietoa. Puhdas matematiikka todistettavissa olevine aksioomineen viimein tarjosi juuri tätä, ja hintelä oppimisaddiktimme paneutuikin siihen täysin rinnoin, päätyen sukulaisen suosituksen kautta pankkiin töihin. Hän asetti tavoitteekseen välttää kaikkea sellaista informaatiota, mikä ei ollut kosher – jos signaali oli heikko suhteessa meluun, mielen portit pysyivät visusti suljettuina.

Tästä seurasi odottamaton ongelma: mitä enemmän hän pyrki eristämään itsensä “hyödyttömältä hölynpölyltä”, sitä herkemmäksi hän sille tuli. Silloin harvoin kun hän enää käyskenteli ulkona, iltapäivälehtien shokkiotsikot tuntuivat vatsanpohjassa asti. Mainokset saivat hänet raivon valtaan. Hän alkoi myös välttelemään sosiaalisia tilanteita tajutessaan, kuinka helposti hyvät tarinat jäivät hänen mieleensä kummittelemaan. Hän työskenteli riskianalyytikkona, eikä halunnut alkaa pelätä lentokoneita, koska jonkun tutun tutun tuttu oli kokenut kauheita pakkolaskun tehneessä koneessa. Pakko-oireiden (kuten venäläisten matemaatikkojen nimien nopea peräkkäinen toistaminen jonkun perustellessa kantaansa anekdootein) pahetessa, Raistlinin huolestunut työnantaja ohjasi hänet ammattiavun piiriin.

Kevään voimistuvien auringonsäteiden lämmittämällä Keskuspuiston penkillä diagnoosiaan tarkasteleva Raistlin oli luvannut psykiatrilleen aloittaa terapiaryhmässä. Hänen ottamansa ahdistuslääkkeet olivat myös alkaneet tehota, mikä sai hänet ostamaan viereiseltä hodarikauppiaalta iltapäivälehden ja lukemaankin siitä pari sivua. Se ei tuntunut enää niin pahalta, suunnilleen yhtä järkevältä kuin hänen diagnoosinsakin; psykiatri oli selittänyt epistemologisen meluyliherkkyyden tarkoittavan tiedon alkuperään liittyvää ahdistusta siitä, ettei kohinan keskeltä löydykään signaalia, ja kuoleman hetkellä tajuaa eläneensä elämänsä reagoiden mielen melussa näkemiin aaveisiin, todellisten ilmiöiden sijaan.

Joitain tunteja myöhemmin nuori, koiraa ulkoiluttava opiskelija löysi ilokseen puistosta päivän lehden, jonka hän vei kotiinsa ja avasi murokulhon ääressä. Se näytti muuten lähes koskemattomalta, mutta usean artikkelin perään oli hyvin pienellä mutta varmalla käsialalla kirjoitettu: “Kolmogorov. Kolmogorov. Kolmogorov.

Introducing Raistlin Laplace

In this post, you meet Raistlin Laplace. You will hear more of him at a later time. Please find the Finnish version here.


Raistlin Laplace has just received a diagnosis from his psychiatrist. It’s a sunny day of early spring, as he sits down on a bench in New York’s Central Park and opens an envelope. His thin lips hesitate upon the judgement held in his bony fingers: “Epistemological Hypersensitivity”. It had something to do with how patterns are distinguished from the midst of noise. Like how a radio tuned in the middle of two channels doesn’t tell much of the DJs’ music taste; too much noise, too little signal. The human brain is a signal detection machine without comparison, as it can detect satanic verses in backwards-played metal music or see Jesus in a dog’s anus (has happened). But Mr. Laplace’s problem was at odds with the one of those who seek secret codes in the bible. He was obsessed with avoiding randomness-created illusory patterns. The diagnosis made sense in a way, but he had long ago given up trust in things that made sense.

Raistlin spent his youth in a small town called Naples in southwest Florida. It had sounded aptly European to his French-Russian immigrant parents, who wanted to offer their only child a more tolerant environment from a less war-prone part of the world. It wasn’t immediately clear that Naples was, in fact, a red-neck village where parents spent most of their days commuting, and children in bloody fights with the youngsters of nearby villages.

Raistlin had always had a sickly appearance, although he was seldom ill. He had a feeble posture and pale complexion, combined with the contrast between his icy blue eyes and jet-black hair. This was more than enough to make the superstitious elderly whisper. Knowing this, and upon discovering the local library in elementary school, he realised he took great delight in the books found at the corner marked occultism. He devoured everything from shamanism to theosophy and new age. Sports—or, to that matter, anything else which interested other children—he couldn’t care less about. Having a hard time fitting in, he would’ve most wanted just to spend time alone with his books.

The first time his information universe collapsed was, when in junior high, he had to admit that the rituals and techniques of the occult-corner didn’t work as promised. Everything adults wrote wasn’t unerring; all information wasn’t knowledge. But the shock waned quickly and little did he know that this was only the beginning.

On a rainy New York day, 28-year-old Raistlin sealed the envelope containing a bankruptcy application of his startup company. He pondered on what had gone wrong. He had done everything right; read all the right books, followed the strategies of highly successful companies, listened to hundreds of hours of popular psychology-inspired sales training tapes, taken educated risks, and for years worked with a relentless, never-give-up attitude. At some point the money just run out and, as creditors breathed down his neck, more wasn’t coming. In the dim lighting of his studio apartment, Raistlin felt horror escalate. How could those writers, who so confidently spew out facts of the world, actually know how things truly worked? Were they really different than him, who—had any of the myriad small things gone differently—could now well be the CEO of a highly successful tech company?

He didn’t sleep that night. He watched an agonising replay of all those hours he had spent reading newspapers without learning anything about how the world actually worked. All the popular science news touting great new discoveries, all of which had later turned out to be premature. All the books written by those who thought their success was caused by their own actions and aptitude, instead of random occurrences of serendipity.

Two years later he only read peer-reviewed scientific journals. That is, until the discussion which followed mathematician-statistician John Ioannidis’ article “Why most published research findings are false” persuaded him of the fallibility of the scientific method (outside of physics, at least). The relationship between information and knowledge he had learnt about in junior high, began forming as an obsession: Raistlin wanted no more information, and he hungered for knowledge. Pure mathematics with it’s provable axioms finally offered just this, and our bony learning addict delved into it. By a stroke of luck and a relative’s recommendation, he ended up working in a bank. He vowed to avoid all information which wasn’t kosher; if the signal-to-noise ratio was low, the gates of his mind remained sealed.

This resulted in an unexpected problem: the more he aspired to isolate himself from “useless nonsense”, the more sensitive to it he became. On those few days he strolled outside, the shock headlines at newspaper stands turned his stomach to knots. Advertisements filled him with outrage. He also started avoiding social situations when he realised how easy it was for good stories to get stuck in his brain. He worked as a risk analyst, and didn’t want to start fearing airplanes just because some acquaintance of an acquaintance had experienced dread during an emergency landing. Compulsions—like fast repetition of names of old Russian mathematicians, when someone used anecdotes to advocate a position—got worse and eventually his worried employer steered him towards professional help.

Rays of intensifying sunlight warmed up Raistlin’s bench and whispered promises of summer to the people wandering about Central Park. Raistlin had promised his psychiatrist to begin participating in a therapy group. The anxiety meds he took had also started to kick in, which made him buy a newspaper from a nearby hot dog stand and even read a couple of pages. It didn’t feel as bad anymore, perhaps about as sensible as his diagnosis. The psychiatrist had explained that epistemological hypersensitivity meant anxiety stemming from the origin of knowledge. That there wouldn’t be a signal in the noise. Fear of realising, upon the moment of one’s death, that he had spent his life reacting to ghosts the mind saw in the noise, instead of real phenomena.

Some hours later, a young student walking a dog in the park, to her delight stumbled upon a pristine newspaper. She took it home and opened it in front of a bowl of cereal. At first glance, the paper looked untouched. Only after reading several articles, she noticed some very small but resolute handwriting. In the margins, someone had written: “Kolmogorov. Kolmogorov. Kolmogorov.

The myth of the magical “Because”

In this post I try to answer the call for increased transparency in psychological science by presenting my master’s thesis. I ask for feedback about the idea and the methods. I’d also appreciate suggestions for which journal it might be wise to submit the paper I’m now starting to write with co-authors. Check out OSF for the Master’s thesis documents and a supplementary website for analyses in the manuscript in preparation (I presented the design analysis in a previous post).

In my previous career as a marketing professional, I was often enchanted by news about behavioral science. Such small things could have such large effects! When I moved into social psychology, it turned out that things weren’t quite so simple.

One study that intrigued me was done in the 70’s, and has since gained huge publicity (see here and here, for examples). The basic story is, that you could use the word because to get people to do things, due to a learned “reason → compliance” link.


Long story short, I was able to experiment in a within-trial setting of a health psychology intervention. Here’s a slideshow adapted from what I presented in the annual conference of the European Health Psychology Society:


Things I’m happy about:

  • Maintaining a Bayes Factor / p-value ratio of about 1:2. It’s not “a B for every p“, but it’s a start…
  • Learning basic R and redoing all analyses in the last minute, so I wouldn’t have to mention SPSS 🙂
  • Figuring out how this pre-registration thing works, and registering before end of data collection.
  • Using the word “significant” only twice and not in the context of results.

Things I’m not happy about:

  • Not having pre-registered before starting data collection.
  • Not knowing what I now know, when the project started. Especially about theory formation and appraisal (Meehl).
  • Not having an in-depth understanding of the mathematics underlying the analyses (although math and logic are priority items on my stuff-to-learn-list).
  • Not having the data public… yet. It will be in 2017 the latest, but hopefully already this autumn.

A key factor for fixing psychological science is transparency; making analyses, intentions and data available for all researchers. As a consequence, anyone can point out inconsistencies and use the findings to elaborate on the theory, making accumulation of knowledge possible.

Science is all about predicting, and everyone knows how anyone can say “yeah, I knew that’d happen”. The most impressive predictions are those made well before things start happening. So don’t be like me, and pre-register your study before the start of data collection. It’s not as hard as it sounds! For clinical trials, this can be done for free in the WHO-approved German Clinical Trials Register (DRKS). For all trials, the Open Science Framework (OSF) website can be used for pre-registering plans and protocols, as well as making study data available for researchers everywhere.There’s also an extremely easy-to-use pre-registration site AsPredicted.

One can also use the OSF website as a cloud server to privately manage one’s workflow (for free). As a consequence, automated version control protects the researcher in the case of accusations of fraud or questionable research practices.

ps. If there’s anything weird in that thesis, it’s probably because I have disregarded some piece of advice from Nelli Hankonen, Keegan Knittle and Ari Haukkala, for whose comments I’m indebted to.

Analyse your research design, before someone else does

In this post, I demonstrate how one could use Gelman & Carlin’s (2014) method to analyse a research design for Type S (wrong sign) and Type M (exaggeration ratio) errors, when studying an unknown real effect. Please let me know if you find problems in the code presented here.

[Concept recap:]

Statistical power is the probability you detect an effect, when it’s really there. Conventionally disregarded completely, but often set at 80% (more is better, though).

Alpha is the probability you’ll say there’s something when there’s really nothing, in the long run (as put by Daniel Lakens). Conventionally set at 5%.

Two classic types of errors. Mnemonic: with type 1, there’s one person and with type 2, there are two people. Not making a type 2 error is called ‘power’ (feel free to make your own mnemonic for that one). Photo source.

Why do we need to worry about research design?

If you have been at all exposed to the recent turbulence in the psychological sciences, you may have bumped into discussions about the importance of a bigger-than-conventional sample sizes. The reason is, in a nutshell, that if we find a “statistically significant” effect with an underpowered study, the results are likely to be grossly overestimated and perhaps fatally wrong.

Traditionally, if people have considered their design at all, they have done it in relation to Type 1 and Type 2 errors. Gelman and Carlin, in a cool paper, bring another perspective to this thinking. They propose considering two things:

Say you have discovered a “statistically significant” effect (p < alpha)…

  1. How probable is it, that you have in your hands a result that’s of the wrong sign?  Call this a Type S (sign) error.
  2. How exaggerated is this finding likely to be? Call this a Type M (magnitude) error.

Let me exemplify this with a research project we’re writing up at the moment. We had two groups with around 130 participants each, and exposed one of them to a message with the word “because” followed by a reason. The other received a succinct message, and we observed their subsequent behavior. Note, that you can’t use the observed effect size to figure out your power (see this paper by Dienes). That’s why I figured out a minimally interesting effect size of around d=.40 [defined by calculating the mean difference considered meaningful, and dividing the result by the standard deviation we got in a another study].

First, see how we had an ok power to detect a wide array of decent effects:


So, unless the (unknown) effect is smaller than what we care about, we should be able to detect it.


Next, above we see that the probability we would observe an effect of the wrong sign would be miniscule for any effect over d=.2. This would mean it’d look like the succinct message worked better than the reason message, when it really was the other way around. typeM

Finally, and a little surprisingly, we can see that even relatively large true effects would actually be exaggerated by a factor of two!


But what can you do, those were all the participants we could muster up with our resources. An interesting additional point is brought by looking at the “v-statistic”. This is the measure of how your model compares to random guessing. 0.5 represents coin flipping accuracy (see here for full explanation and the original code I used).


Figure above shows how we start exceeding random guessing at R^2 around 0.25 (d=.32 according to this). The purple line is in there to show how an additional 90 people help a little but do not do wonders. I’ll write about the results of this study in a later post.

Until then, please let me know if you spot errors or find this remotely helpful. In case of the latter, you might be interested in how to calculate power in cluster randomised designs.

Oh, and the heading? I believe it’s better to do as much of this sort of thinking, before someone looking to have your job (or, perhaps, reviewer 2) does it for you.

Welcome / Tervetuloa!

If you’d like to e-mail me about anything, please use!

[due to the nature of the world, most of the older Finnish posts lost all their illustrations]


Bias, meditation and the pursuit of clarity

[Update: Short Twitter-discussion on the issue w/ Headspace’s Andy Puddicombe here]

Quick summary: In this post, I evaluate the effect of the “anchoring heuristic” on my meditation data by dabbling with Bayesian(ish?) model fitting. I also find that my perceived clarity is not improving with time. I ask for your favourite explanations. Markdown code for the analysis can be found here.

For a while now, I’ve been collecting data to keep me motivated with my daily meditation practice. This is the first time I took a sneak peek into it, just to see how much my assessments depend on the so-called anchoring effect.

Roughly speaking, the anchoring effect is said to be a cognitive bias, where people base (“anchor”) their estimates on unrelated previous information. For example, in one classic study, people were asked for the proportion of African countries in the United Nations after spinning a wheel of fortune to obtain a random number. Those who got a big number guessed a high proportion, and those who got a low number went for a low proportion.

This app helped me develop a sticking meditation habit a couple of years ago.

My meditation practice is 120 minutes a day, broken down into a combination of 30/60/90 minute blocks. One of the 30 minute blocks is a 20 minute Headspace session (the extra 10 minutes come from the time it takes to feel out and log the variables I’m interested in).

This is how my spreadsheet begins every day. There are, in total, ~45 columns, including pre- and post-meditation assessments of calm, reaction times (Stroop test) etc. I’ll write more about these if/when I find the time for analysis!


Column A is the date, column B is the “package”, or type of meditation, column C is the day within the package. There are 30 days per package, so column C is a number that runs from 1 to 30 for each package number. Finally, column D is my subjective sense of clarity, from 1 (completely unclear) through 5 (not clear nor unclear) to 10 (completely clear).

Now, because I jot down the clarity assessment right after the day number, I would expect higher day numbers to boost my clarity assessment because of the anchoring effect. Why? Because Kahneman, in his famous book Thinking Fast and Slow, proclaims:

[D]isbelief is not an option. […] You have no choice but to accept that the major conclusions of these studies are true. More important, you must accept that they are true about you. – Daniel Kahneman

Without going too deep into the sorry state of replicability in priming effects (anchoring seems robust, though), let’s see how this particular effect may have affected my assessments:

D’oh! If anchoring was affecting my clarity assessment, the line in the plot should have gone up from left to right. It clearly does not.

[In technical blahblah: The plot above shows (slightly jittered) values for clarity for each day number and the Maximum A Posteriori (MAP) line. It’s basically linear regression with priors, and with this much data the priors don’t matter much. Darker shade hugging the line shows the 50% highest probability density interval (top 50% of most probable lines) and the lighter shade shows the 90% interval. Read more about priors here and learn everything you ever need to know about making inferences here]

I would have expected the slope of the line to be positive, maybe something like 0.15. Instead, the most credible (90%) interval for the slope of the line is from -0.0148 to 0.

What if the magic hides in row numbering?

If you look again at the picture of my spreadsheet in the beginning of this post, you’ll notice that left to the date cell, there’s the row number. Perhaps that’s where I tend to anchor at?

So let’s see how clarity changes with the running number of the row:


Yay! At least now the slope is positive. Although, upon closer inspection, the 90% interval is from 0 to 0.0011. Which is pretty much zero.

Another thing this plot reveals, is that my clarity assessment hasn’t gone up during the past 500+ days. This might be because there has not been an effect (I wasn’t exactly a meditation newbie when I started). Alternatively, I may unconsciously keep shifting the scale (what I would have considered “clear” a year ago, now seems less so). What do you think?

And why am I not seeing an anchoring effect here, what am I missing?

Any thoughts?