Randomised experiments (mis?)informing social policy in complex systems

In this post, I vent about anti-interdisciplinarity, introduce some basic perspectives of complexity science, and wonder whether decisions on experimental design actually lead us to end up in a worse place than where we were, before we decided to use experimental evidence to inform social policy.

People in our research group recently organised a symposium, Interdisciplinary perspectives on evaluating societal interventions to change behaviour (talks watchable here), as part of a series called Behaviour Change Science & Policy (BeSP). The idea is to bring together people from various fields from philosophy to behavioural sciences, medicine and beyond, in order to better tackle problems such as climate change and lifestyle diseases.

One presentation touched upon Finland’s randomised controlled trial to test the effects of basic income on employment (see also report on first year results). In crude summary, they did not detect effects of free money on finding employment. (Disclaimer: They had aimed for 80% statistical power, meaning that if all your assumptions regarding the size of the effect are correct, in the long term, 20% of the time you’d get no statistically significant effect in spite of there being a real effect.)

During post-symposium drinks, I spoke with an economist about the trial. I was wondering, how come they used individual instead of cluster randomisation – randomising neighbourhoods, for example. The answer was resource constraints; much larger sample sizes are needed for the statistics to work. To me it seemed clear, that it’s a very different situation if one person in a network of friends got free money, as compared to if everyone did. The economist wondered: “How come there could be second-order effects when there were no first-order effects?” The conversation took a weird turn. Paraphrasing:

Me: Blahblah compelling evidence from engineering and social sciences to math and physics that “more is different”, i.e. phenomena play out differently depending on the scale at consideration… blahblah micro-level interactions create emergent macro-level patterns blahblah.

Economist: Yeah, we’re not having that conversation in our field.

Me: Oh, what do you mean?

Economist: Well, those are not things discussed in our top journals, or considered interesting subjects to research.

Me: I think they have huge consequences, and specifically in economics, this guy in Oxford just gave a presentation on what he called “Complexity economics“. He had been doing it for some decades already, I think he originally had a physics background…

Economist: No thanks, no physicists in my economics.

Me: Huh?

Economist: [exits the conversation]

Now, wasn’t that fun for a symposium on interdisciplinary perspectives.

I have a lot of respect for the mathematical prowess of economists and econometricians, don’t get me wrong. One of my favourites is Scott E. Page, though I only know him due to an excellent course on complexity (also available as an audio book). I do probably like him, because he breaks out of the monodisciplinary insulationist mindset economists are often accused of. Page’s view of complexity actually relates to our conversation. Let’s see how.

First off, he describes complexity (and most social phenomena of interest) as arising from four factors, which can be thought as tuning knobs or dials. Complexity arises, when each dial is not tuned into either of the extremes, which is where equilibria arise, but somewhere in the middle. And complex systems tend to reside far from equilibrium, permanently.

To dig more deeply into how the attributes of interdependence,
connectedness, diversity, and adaptation and learning generate
complexity, we can imagine that each of these attributes is a dial that
can be turned from 0 (lowest) to 10 (highest).

Scott E. Page

  • Interdependence means the extent of how much one person’s actions affect those of another’s. This dial ranges from complete independence, where one person’s actions do not affect others’ at all, to complete dependence, where everyone observes and tries to perfectly match all others’ actions. In real life, we see both unexpected cascades (such as the US decision makers’ ethanol regulations, leading to the Arab Spring), as well as some, but never complete, independence – that is, manifestations that do not fit into either extreme of the dial, but lie somewhere in between.
  • Connectedness refers to how many other people a person is connected to. The extremes range from people living in a cabin in the woods all alone, to hypersocial youth living in Instagram trying to keep tabs on everyone and everything. The vast majority of people lie somewhere in between.
  • Diversity is the presence of qualitatively different types of actors: If every person is a software engineer, mankind is obviously doomed… But the same happens if there’s only one engineer, one farmer etc. Different samples of real-world social systems (e.g. counties) consist of intermediate amounts of diversity, lying somewhere in between.
  • Adaptation and learning refer to the extent of the actors’ smartness. This ranges from following simple, unchanging rules, to being perfectly rational and informed, as assumed in classical economics. In actual decision making, we see “bounded rationality”, reliance on rules of thumb and tradition, as well as both optimising and satisficing behaviours – the “somewhere in between”.

The complexity of complex systems arises, when diverse, connected people interact on the micro-level, and by doing so produce “emergent” macro-level states of the world, to which they adapt, creating new unexpected states of the world.

You might want to read that one again.

Back to basic income: When we pick 2000 random individuals around the country and give them free money, we’re implicitly assuming they are not connected to any other people, and/or that they are completely independent the actions of others’. We’re also assuming that they are either the same, or that it’s not interesting that they are of different types. And so forth. If we later compare their employment data to that of those who were not given basic income, the result we get is an estimate of the causal effect in the population, if all assumptions would hold.

But consider how these assumptions may fail. If the free money was perceived as a permanent thing, and given to people’s whole network of unemployed buddies, it seems quite plausible that they would adapt their behaviour as a response of the dynamics of their social network changing. This might even be different in cliques of certain people, who might use the safety net of basic income to collectively found companies and take risks, and cliques of other people, who might alter their daily drinking behaviour to match the costs with the predictable income – for better or worse. But when you randomise individually and ignore how people cluster in networks, you’re studying a different thing. Whether it’s an interesting thing or a silly thing, is another issue.

Now, it’s easy to come up with these kinds of assumption-destroying scenarios, but a whole different ordeal to study them empirically. We need to simplify reality in order to deal with it. The question is this: How much of an abstraction can a map (i.e. a model in a research study, making those simplified assumptions) be, in order to still represent reality adequately? This is also an ontological question, because if you take the complexity perspective seriously, you say bye-bye to the kind of thinking that allows you to dream up predictable effects a button-press (such as a policy change) has over the state of a system. People who act in—or try steering—complex systems, control almost nothing but influence almost everything.

An actor in a complex system controls almost nothing but influences almost everything.

Scott E. Page

Is some information, some model, still better than none? Maybe. Maybe not. In Helsinki, you’re better off without a map, than with a map of Stockholm – the so-called “Best map fallacy” (explained here in detail). Rare, highly influential events drive the behaviour of complex systems: the Finnish economy was not electrified by average companies starting to sell more, but by Nokia hitting the jackpot. And these events are very hard, if not impossible, to predict✱.

Ok, back to basic income again. I must say that the people who devised the experiment were not idiots, and included e.g. interviews of people to get some idea about unexpected effects. I think that this type of an approach is definitely necessary when dealing with complexity, and all social interventions should include qualitative data in their evaluation. But, again, unless the unemployed don’t interact, with randomisation done individually you’re studying a different thing than when it’s done in clusters. I do wonder if it would have been possible to include some matched clusters, to see if any qualitatively different dynamics take place, when you give basic income to a whole area instead of randomly picked individuals within it.

Complex systems organizational map.jpg
The society is a complex system, and must be studied as such. Figure: Hiroki Sayama (click to enlarge)

But, to wrap up this flow of thought, I’m curious if you think it is possible to randomise a social intervention individually AND always keep in mind that the conclusions are only valid if there are no interactions between people’s behaviour and that of their neighbours. Or is it inevitable that that the human mind smoothes out the details?

Importantly: Is our map better now, than it was before? Will this particular experiment go in history as—like the economist stated in “there were no first-order effects”—basic income not having any effect on job seeking? (remember, aim was only 80% statistical power). Lastly, I want to say I consider it unforgiveable to only work within one discipline and disregard the larger world you’re operating in: When we bring science to policy making, we must be doubly cautious of the assumptions our conclusions stand on. Luckily, transparent scientific methodology allows us to be explicit about them.

Let me hear your thoughts, and especially objections, on Twitter, or by email!

✱ One solution is to harness convexity, which can be oversimplified like this:

  1. Unpredictable things will happen, and they will make you either better or worse off.
  2. Magnitude of an event is different from it’s effect on you, i.e. there are huge events that don’t impact you at all, and small events that are highly meaningful to you. Often that impact depends on the interdependence and connectedness dials.
  3. To an extent, you can control the impact an event has on you.
  4. You want to control exposure in such a way, that surprise losses are bounded, while surprise gains are as limitless as possible.

Replication is impossible, falsification unnecessary and truth lies in published articles (?)

joonasautocomic
Is psychology headed towards being a science conducted by “zealots”, or to a post-car (or train) crash metaphysics, where anything goes because nothing is even supposed to replicate?

I recently peer reviewed a partly shocking piece called “Reproducibility in Psychological Science: When Do Psychological Phenomena Exist?“ (Iso-Ahola, 2017). In the article, the author makes some very good points, which unfortunately get drowned under very strange statements and positions. Me, Eiko Fried and Etienne LeBel addressed those shortly in a commentary (preprint; UPDATE: published piece). Below, I’d like to expand upon some additional thoughts I had about the piece, to answer Martin Hagger’s question.

On complexity

When all parts do the same thing on a certain scale (planets on Newtonian orbits), their behaviour is relatively easy to predict for many purposes. Same thing, when all molecules act independently in a random fashion: the risk that most or all beer molecules in a pint move upward at the same time is ridiculously low, and thus we don’t have to worry about the yellow (or black, if you’re into that) gold escaping the glass. Both situations are easy-ish systems to describe, as opposed to complex systems where the interactions, sensitivity to initial conditions etc. can produce a huge variety of behaviour and states. Complexity science is the study of these phenomena, which have become increasingly common since the 1900s (Weaver, 1948).

Iso-Ahola (2017) quotes (though somewhat unfaithfully) the complexity scientist Bar-Yam (2016b): “for complex systems (humans), all empirical inferences are false… by their assumptions of replicability of conditions, independence of different causal factors, and transfer to different conditions of prior observations”. He takes this to mean that “phenomena’s existence should not be defined by any index of reproducibility of findings” and that “falsifiability and replication are of secondary importance to advancement of scientific fields”. But this is a highly misleading representation of the complexity science perspective.

In Bar-Yam’s article, he used an information theoretic approach to analyse the limits of what we can say about complex systems. The position is that while full description of systems via empirical observation is impossible, we should aim to identify the factors which are meaningful in terms of replicability of findings, or the utility of the acquired knowledge. As he elaborates elsewhere: “There is no utility to information that is only true in a particular instance. Thus, all of scientific inquiry should be understood as an inquiry into universality—the determination of the degree to which information is general or specific” (Bar-Yam, 2016a, p. 19).

This is fully in line with the Fisher quote presented in Mayo’s slides:

Fisher quote Mayo

The same goes for replications; no single one-lab study can disprove a finding:

“’Thus a few stray basic statements contradicting a theory will hardly induce us to reject it as falsified. We shall take it as falsified only if we discover a reproducible effect which refutes the theory. In other words, we only accept the falsification if a low-level empirical hypothesis which describes such an effect is proposed and  corroborated’ (Popper, 1959, p. 66)” (see Holtz & Monnerjahn, 2017)

So, if the high-quality non-replication replicates, one must consider that something may be off with the original finding. This leads us to the question of what researchers should study in the first place.

On research programmes

Lakatos (1971) posits a difference between progressive and degenerating research lines. In a progressive research line, investigators explain a negative result by modifying the theory in a way which leads to new predictions that subsequently pan out. On the other hand, coming up with explanations that do not make further contributions, but rather just explain away the negative finding, leads to a degenerative research line. Iso-Ahola quotes Lakatos to argue that, although theories may have a “poor public record” that should not be denied, falsification should not lead to abandonment of theories. Here’s Lakatos:

“One may rationally stick to a degenerating [research] programme until it is overtaken by a rival and even after. What one must not do is to deny its poor public record. […] It is perfectly rational to play a risky game: what is irrational is to deceive oneself about the risk” (Lakatos, 1971, p. 104)

As Meehl (1990, p. 115) points out, the quote continues as follows:

“This does not mean as much licence as might appear for those who stick to a degenerating programme. For they can do this mostly only in private. Editors of scientific journals should refuse to publish their papers which will, in general, contain either solemn reassertions of their position or absorption of counterevidence (or even of rival programmes) by ad hoc, linguistic adjustments. Research foundations, too, should refuse money.” (Lakatos, 1971, p. 105)

Perhaps researchers should pay more attention which program they are following?

As an ending note, here’s one more interesting quote: “Zealotry of reproducibility has unfortunately reached the point where some researchers take a radical position that the original results mean nothing if not replicated in the new data.” (Iso-Ahola, 2017)

For explorative research, I largely agree with these zealots. I believe exploration is fine and well, but the results do mean nearly nothing unless replicated in new data (de Groot, 2014). One cannot hypothesise and confirm with the same data.

Perhaps I focus too much on the things that were said in the paper, not what the author actually meant, and we do apologise if we have failed to abide with the principle of charity in the commentary or this blog post. I do believe the paper will best serve as a pedagogical example to aspiring researchers, on how strangely arguments could be constructed in the olden times.

ps. Bar-Yam later commented on this blog post, confirming the mis(present/interpr)etation of his research by the author of the reproducibility paper:

baryam

pps. Here’s Fred Hasselman‘s comment on the article, from the Frontiers website (when you scroll down all the way to the bottom, there’s a comment option):

1. Whether or not a posited entity (e.g. a theoretical object of measurement) exists or not, is a matter of ontology.

2. Whether or not one can, in principle, generate scientific knowledge about a posited entity, is a matter of epistemology.

3. Whether or not the existence claim of a posited entity (or law) is scientifically plausible depends on the ability of a theory or nomological network to produce testable predictions (predictive power) and the accuracy of those predictions relative to measurement outcomes (empirical accuracy).

4. The comparison of the truth status of psychological theoretical constructs to the Higgs Boson is a false equivalence: One is formally defined and deduced from a highly corroborated model and predicts the measurement context in which its existence can be verified or falsified (the LHC), the other is a common language description of a behavioural phenomenon “predicted” by a theory constructed from other phenomena published in the scientific record of which the reliability is… unknown.

5. It is the posited entity itself -by means of its definition in a formalism or theory that predicts its existence- that decides how it can be evidenced empirically. If it cannot be evidenced using population statistics, don’t use it! If the analytic tools to evidence it do not exist, develop them! Quantum physics had to develop a new theory of probability, new mathematics to be able to make sense of measurement outcomes of different experiments. Study non-ergodic physics, complexity science, emergence and self-organization in physics, decide if it is sufficient, if not, develop a new formalism. That is how science advances and scientific knowledge is generated. Not by claiming all is futile.

To summarise: The article continuously confuses ontological and epistemic claims, it does not provide a future direction even though many exist or are being proposed by scholars studying phenomena of the mind, moreover the article makes no distinction between sufficiency and necessity in existence claims, and this is always problematic.

Contrary to the claim here, a theory (and the ontology and epistemology that spawned it) can enjoy high perceived scientific credibility even if some things cannot be known in principle, or if there’s always uncertainty in measurements. It can do so by being explicit about what it is that can and cannot be known about posited entities.

E.g. Quantum physics is a holistic physical theory, also in the epistemic sense: It is in principle not possible to know anything about a quantum system at the level of the whole, based on knowledge about its constituent parts. Even so, quantum physical theories have the highest predictive power and empirical accuracy of all scientific theories ever produced by human minds!

As evidenced by the history of succession of theories in physics, successful scientific theorising about the complex structure of reality seems to be a highly reproducible phenomenon of the mind. Let’s apply it to the mind itself!

Bibliography:

Bar-Yam, Y. (2016a). From big data to important information. Complexity, 21(S2), 73–98.

Bar-Yam, Y. (2016b). The limits of phenomenology: From behaviorism to drug testing and engineering design. Complexity, 21(S1), 181–189. https://doi.org/10.1002/cplx.21730

de Groot, A. D. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]. Acta Psychologica, 148, 188–194. https://doi.org/10.1016/j.actpsy.2014.02.001

Holtz, P., & Monnerjahn, P. (2017). Falsificationism is not just ‘potential’ falsifiability, but requires ‘actual’ falsification: Social psychology, critical rationalism, and progress in science. Journal for the Theory of Social Behaviour. https://doi.org/10.1111/jtsb.12134

Iso-Ahola, S. E. (2017). Reproducibility in Psychological Science: When Do Psychological Phenomena Exist? Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.00879

Lakatos, I. (1971). History of science and its rational reconstructions. Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-94-010-3142-4_7

Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.

Weaver, W. (1948). Science and complexity. American Scientist, 36(4), 536–544.

 

The art of expecting p-values

In this post, I try to present the intuition behind the fact that, when studying real effects, one usually should not expect p-values near the 0.05 threshold. If you don’t read quantitative research, you may want to skip this one. If you think I’m wrong about something, please leave a comment and set the record straight!

Recently, I attended a presentation by a visiting senior scholar. He spoke about how their group had discovered a surprising but welcome correlation between two measures, and subsequently managed to replicate the result. What struck me, was his choice of words:

“We found this association, which was barely significant. So we replicated it with the same sample size of ~250, and found that the correlation was almost the same as before and, as expected, of similar statistical significance (p < 0.05)“.

This highlights a threefold, often implicit (but WRONG), mental model:

[EDIT: due to Markus’ comments, I realised the original, off-the-top-of-my-head examples were numerically impossible and changed them a bit. Also, added stuff in brackets that the post hopefully clarifies as you read on.]

  1. “Replications with a sample size similar to the original, should produce p-values similar to the original.”
    • Example: in subsequent studies with n = 100 each, a correlation (p = 0.04) should replicate as the same correlation (p ≈ 0.04) [this happens about 0.02% of the time when population r is 0.3; in these cases you actually observe an r≈0.19]
  2. “P-values are linearly related with sample size, i.e. bigger sample gives you proportionately more small p-values.”
    • Example: a correlation (n = 100, p = 0.04), should replicate as a correlation of about the same, when n = 400, with e.g. a p ≈ 0.02. [in the above-mentioned case, the replication gives observed r±0.05 about 2% of the time, but the p-value is smaller than 0.0001 for the replication]
  3. “We study real effects.” [we should think a lot more about how our observations could have come by in the absence of a real effect!]

It is obvious that the third point is contentious, and I won’t consider it here much. But the first two points are less clear, although the confusion is understandable if one has learned and always applied Jurassic (pre-Bem) statistics.

[Note: “statistical power” or simply “power” is the probability of finding an effect, if it really exists. The more obvious an effect is, and the bigger your sample size, the better are your chances of detecting these real effects – i.e. you have bigger power. You want to be pretty sure your study detects what it’s designed to detect, so you may want to have a power of 90%, for example.]

revolving_lottery_machinekaitenshiki-cyusenkijapan
Figure 1. A lottery machine. Source: Wikipedia

To get a handle of how the p behaves, we must understand the nature of p-values as random variables 1. They are much like the balls in a lottery machine, with values between zero and one marked on them. The lottery machine of real effects has disproportionately more low (e.g. < 0.01) values on the balls, while the lottery machine of null effects contains a “fair” distribution of numbers on balls (where each number is as likely as any other). If this doesn’t make sense yet, read on.

Let us exemplify this with a simulation. Figure 2 shows the expected distribution of p-values, when we do 10 000 studies with one t-test each, and every time report the p of the test. You can think of this as 9999 replications with the same sample size as the original.

50percent-power-distribution
Figure 2: p-value distribution for 10 000 simulated studies, under 50% power when the alternative hypothesis is true. (When power increases, the curve gets pushed even farther to the left, leaving next to no p-values over 0.01)

Now, if we would do just five studies with the parameters laid out above, we could see a set of p-values like {0.002, 0.009, 0.024, 0.057, 0.329, 0.479}, half of them being “significant” (in bold). If we had 80% power to detect the difference we are looking for, about 80% of the p-values would be “significant”. As an additional note, with 50% power, 4% of the 10 000 studies give a p between 0.04 and 0.05. With 80% power, this number goes down to 3%. For 97.5% power, only 0.5%  of studies (yes, five for every thousand studies) are expected to give such a “barely significant” p-value.

The senior scholar, who was mentioned in the beginning, was studying correlations. They work the same way. The animation below shows, how p-values are distributed for different sample sizes, if we do 10 000 studies with every sample size (i.e. every frame is 10 000 studies with that sample size). The samples are from a population where the real correlation is 0.3. The red dotted line is p = 0.05.

corrplot_varyspeed
Figure 3. P-value distributions for different sample sizes, when studying a real correlation of 0.3. Each frame is 10 000 replications with a given sample size. If pic doesn’t show, click here for the gif (and/or try another browser).

The next animation zooms in on “significant” p-values in the same way as in figure 2 (though the largest bar goes off the roof quickly here). As you can see, it is almost impossible to get a p-value close to 5% with large power. Thus, there is no way we should “expect” a p-value over 0.01 when we replicate a real effect with large power. Very low p-values are always more probable than “barely significant” ones.

zoomplot_quick
Figure 4. Zooming in on the “significant” p-values. It is more probable to get a very low p than a barely significant one, even with small samples. If pic doesn’t show, click here for the gif.

But what if there is no effect? In this case, every p-value is equally likely (see Figure 5). This means, that in the long run, getting a p = 0.01 is just as likely as getting a p = 0.97, and by implication, 5% of all p-values are under 0.05. Therefore, the number of studies that generated a p between 0.04 and 0.05, is 1%. Remember, how this percentage was 0.5% (five in a thousand) when the alternative hypothesis was true under 97.5% power? Indeed, when power is high, these “barely significant” p-values may actually speak for the null, not the alternative hypothesis! Same goes for e.g. p=0.024, when power is 99% [see here].

5percent-power-distribution
Figure 5. p-value distribution when the null hypothesis is true. Every p is just as likely as any other.

Consider the lottery machine analogy again. Does it make better sense now?

The lottery machine of real effects has disproportionately more low (e.g. < 0.01) values on the balls, while the lottery machine of null effects contains a “fair” distribution of numbers on balls (each number is as likely as any other).

Let’s look at one more visualisation of the same thing:

pcurveplot_quick.gif
Figure 6. The percentages of “statistically significant” p-values evolving as sample size increases. If the gif doesn’t show, you’ll find it here.

Aside: when the effect one studies is enormous, sample size naturally matters less. I calculated Cohen’s d for the Asch 2 line segment study, and a whopping d = 1.59 emerged. This is surely a very unusual effect size in psychological experiments, and leads to high statistical power even under low sample sizes. In such a case, by the logic presented above, one should be extremely cautious of p-values closer to 0.05 than zero.

Understanding all this is vital in interpreting past research. We never know what the data generating system has been (i.e. are p-values extracted from a distribution under the null, or under the alternative), but the data gives us hints about what is more likely. Let us take an example from a social psychology classic, Moscovici’s “Towards a theory of conversion behaviour” 3. The article reviews results, which are then used to support a nuanced theory of minority influence. Low p-values are taken as evidence for an effect.

Based on what we learned earlier about the distribution of p-values under the null vs. the alternative, we can now see, under which hypothesis the p-values are more likely to occur. The tool to use here is called the p-curve 4, and it is presented in Figure 6.

moscovici-p-curve
Figure 6. A quick-and-dirty p-curve of Moscovici (1980). See this link for the data you can paste onto p-checker or p-curve.

You can directly see, how a big portion of p-values is in the 0.05 region, whereas you would expect them to cluster near 0.01. The p-curve analysis (from the p-curve website) shows that evidential value, if there is any, is inadequate (Z = -2.04, p = .0208). Power is estimated to be 5%, consistent with the null hypothesis being true.

The null being true may or may not have been the case here. But looking at the curve might have helped researchers, who spent some forty years trying to unsuccessfully replicate the pattern of Moscovici’s afterimage study results 5.

In a recent talk, I joked about a bunch of researchers who tour around holiday resorts every summer, making people fill in IQ tests. Each summer they keep the results which show p < 0.05 and scrap the others, eventually ending up in the headlines with a nice meta-analysis of the results.

Don’t be those guys.

lomavessa

Disclaimer: the results discussed here may not generalise to some more complex models, where the p-value is not uniformly distributed under the null. I don’t know much about those cases, so please feel free to educate me!

Code for the animated plots is here. It was inspired by code from Daniel Lakens, whose blog post inspired this piece. Check out his MOOC here. Additional thanks to Jim Grange for advice on gif making and Alexander Etz for constructive comments.

Bibliography:

  1. Murdoch, D. J., Tsai, Y.-L. & Adcock, J. P-Values are Random Variables. The American Statistician 62, 242–245 (2008).
  2. Asch, S. E. Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological monographs: General and applied 70, 1 (1956).
  3. Moscovici, S. in Advances in Experimental Social Psychology 13, 209–239 (Elsevier, 1980).
  4. Simonsohn, U., Simmons, J. P. & Nelson, L. D. Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015). J Exp Psychol Gen 144, 1146–1152 (2015).
  5. Smith, J. R. & Haslam, S. A. Social psychology: Revisiting the classic studies. (SAGE Publications, 2012).

How lack of transparency feeds the beast

This is a presentation I held for the young researchers branch of the Finnish Psychological Society. I show how low power and lack of transparency can lead to weird situations, where the published literature contains little or no knowledge.

nutu_markuspanorama_cartoonise

We had big fun with Markus Mattsson and Leo Aarnio in a seminar, presenting to a great audience of eager young researchers.

The slides for my talk are here:

If you’re interested in more history and solutions, check out Felix Schönbrodt‘s slides here. Some pictures were made adapting code from a wonderful Coursera MOOC by Daniel Lakens. For Bayes, check out Alexander Etz‘s blog.

Oh, and for the monster analogy; this piece made me think of it.

Getting Started With Bayes

This post presents a Bayesian roundtable I convened for the EHPS/DHP 2016 health psychology conference. Slides for the three talks are included.

bayes healthpsych cover

So, we kicked off the session with Susan Michie and acknowledged Jamie Brown who was key in making it happen, but could not attend.

start

Robert West was the first to present, you’ll find his slides “Bayesian analysis: a brief introductionhere. This presentation gave a brief introduction to Bayes and how belief updating with Bayes Factors works.

I was the second speaker, building on Robert’s presentation. Here are slides for my talk, where I introduced some practical resources to get started with Bayes. The slides are also embedded below (some slides got corrupted by Slideshare, so the ones in the .ppt link are a bit nicer).

The third and final presentation was by Niall Bolger. In his talk, he gave a great example of how using Bayes in a multilevel model enabled him to incorporate more realistic assumptions and—consequently—evaporate a finding he had considered somewhat solid. His slides, “Bayesian Estimation: Implications for Modeling Intensive Longitudinal Data“, are here.

Let me know if you don’t agree with something (especially in my presentation) or have ideas regarding how to improve the methods in (especially health) psychology research!