CARMA: Critical Appraisal of Research Methods and Analysis

This is the syllabus for my University of Helsinki course. Target audience is non-mathematical students in social sciences. The 2019 class consisted of social psychologists, social workers, sociologists and political scientists, so it’s quite a mishmash of topics I considered of high importance in life, research and everything.

UPDATE: Some people have been asking about how to cite this; OSF page with DOI, which includes the materials, is here

Critical Appraisal of Research Methods and Analysis (CARMA) – Evaluating and not getting fooled by data in scientific and practical research contexts


marta horror

(the violence is real, though)


Description: Research claims in news, science, and business can mislead people, either purposefully or inadvertently. How and why does this happen, and what mistakes, misconceptions and pitfalls should one avoid when evaluating data? This course will help participants assess data-based statements, and offer some tools to avoid getting fooled by them. It is meant for students who aspire to future careers, which involve undertaking, interpreting or commissioning research. This could include science in academic or other institutions, consumer/marketing research in business settings, evidence-based decision making as policy makers or journalists, among others. The course does not require specialising in quantitative methods, although basic familiarity can be useful.

Note: a lot of slides contain “animation” that doesn’t work if you watch the presentation on a scrolling mode instead of having one full slide on the screen at a time. So, download or zoom in.


  • The crisis of confidence in social and life sciences: State of affairs (4 September 2019) – slides

Learning objectives: Become acquainted with the recent developments regarding the so-called “replication crisis”.

        1. Replication crisis: how it all started (this time around).
        2. Medicine, you were supposed to be the best of us!
        3. Consequences of problematic practices.
        4. You’re not alone in misinterpreting p-values.


  • From questionable research practices and biased stories, to better evidence and/or decisions (11 September 2019) – slides

Learning objectives: Understand what the research community is doing to improve the quality of published research. Extrapolate to non-academic settings.

        1. Transparency and Openness Promotion (TOP) guidelines to fight bad science.
        2. Transforming publication practices with pre-prints
        3. Disentangling confirmatory and exploratory research.
        4. Tricky rule-of-thumb questions to ask when being presented research (1/2: “null findings”).


  • Magnificient mistakes and where to find them (18 September 2019) – slides

Learning objectives: Recognise some particular pitfalls in evidential statements. Understand that decisions in the field do not need to rely on correct predictive statements, let alone scientific evidence.

        1. Tricky rule-of-thumb questions to ask when being presented research (2/2: “statistically significant” findings).
        2. Ways tests can fail: Type I/II mistakes. Type M and Type S mistakes.
        3. The difference between evidence of absence and absence of evidence: Black Swans and the Turkey Problem.
        4. When you don’t need to be right: green lumber, and a first taste of convexity.
        5. Heuristics: Simple rules that make us smart.


  • On interpreting data nudes instead of summary tables (25 September 2019) – slides

Learning objectives: Understand the rationale for visualising data, and what can be hidden when reporting summary statistics only. Learn to spot some common tricks used to visualise data in a favourable way to the presenter.

        1. A crude redux to evidence of absence.
        2. Data Nudes vs. Shitty Tables.
        3. The End of Average.
        4. What gets lost in looking at numbers alone: Uncertainty hidden in the absence of distributions.
        5. Demons with(in) axes: Slaying or summoning effects with presentation tricks.
        6. Dose-response effects masked by averages.


  • Complex systems and why they ruin everything straightforward (2 October 2019) – slides

Learning objectives: Become familiar with general features of so-called complex systems. Understand how they can be thought of in the context of practical interventions.

        1. Intro to complexity, and general features of complex systems. 
        2. Interaction vs. component dominant systems.
        3. Don’t camp at 1st order effects in dragon season.
        4. Navigating the Four Quadrants


  • Never cross Heraclitus’ river, if it’s on average 1 meter deep: Interventions and their offspring (9 October 2019) – slides

Learning objectives: Understand the rationale behind interventions and experimenting/intervening in complex systems, as well as some limitations of big data.

        1. Change comes in a triad.
        2. Sales tricks to counter, use and abuse.
        3. Pathway thinking & complexity thinking in behaviour change science.
        4. Failures and unexpected effects of social interventions.
        5. When is it safe(r) to intervene?


  • Dynamic/idiographic phenomena, and hidden assumptions (16 October 2019) – slides

Learning objectives: Describe the concepts of ergodicity and stationarity. Understand how they can mislead when not taken into account when e.g. assessing risks.

      1. Assumptions, schmassumptions; mind your foundations!
      2. Damned world not sitting still: Ergodicity & stationarity
      3. The idiographic approach to science
      4. The best map fallacy
      5. The precautionary principle for policy and interventions
      6. Frequency vs. consequences of being wrong: What matters more?
      7. Recap on the course: The Fourth Quadrant will find you, so better put your house in order


Student evaluations, comments, and feedback

Some students provided spontaneous feedback, and I everyone an opportunity to give evaluations. These are comprehensive answers i.e. there is no publication bias or selective reporting here!

Great course!! Even if the statistics are not exactly your thing, this course will give you a lot of useful information and a better look on the research field. I feel that I did benefit a lot from this course. The teaching was great and got me interested in the things that haven’t interest me before.

  • Anonymous student

Thank you Matti for this exiting and engaging course! I enjoyed substantially ambitious and well-prepared lectures. Even though I’m focusing on qualitative methodology in my own work, I found this course important and highly interesting.

  • Valter, a Social Work major

Can highly recommend the course. It shows that the teacher knows what he is talking about and is interested in the topics presented. The course can be a bit difficult but it’s teached in a fun way with concrete examples. Definitely not a boring course. The teacher is not boring either.

  • Anonymous student

A great course, I learned a lot. After the course I find two learning outcomes especially important; learning to better evaluate research, but especially learning to treat academy as an institution.

The course had A LOT of stuff, and was sometimes a bit difficult to follow and keep up the connections between topics. With some improving for the structure and creating clear bridges from one topic to another, this course will be even more beneficial.

  • Anonymous participant

This course is an eye opener, it makes you have a different but more clear understanding of research particularly and the world in general. The teaching style was excellent and the content was practical. Personally, I found it easy to relate to my field of study and i’m sure anyone else would find it very practical too, regardless of their research being qualitative or quantitative.

  • Selestino, Public Policy major

Highly-stimulating overview of a range of interrelated complex topics. Presented in an engaging manner and involving multiple interdisciplinary perspectives, this course can change how you think.

  • Antti, social psychology major

The course shows and discusses many issues of contemporary quantitative research methods and provides tools and tips how to become a better researcher. Not a critical course towards quantitiative research methods though, so don’t think about taking the course as an excuse for not learning the methods!

I would recommend the course for first year master students who have some prior knowledge of quantitative research methods. You don’t have to know how to use them though as there are no quantitative excercises in the course.

In summary, a great remedy for any traumas that you might have from trying to learn quantitative research methods. The course itself doesn’t heal the wounds though as those skills are not teached but the lecturer does provide great sources where you can hone those skills on your own time. Hopefully, a second course where those skills are excercised will soon follow.

  • Aku, a 6th year social psychology student

The course was well designed and teacher’s enthusiasm and expertise motivated me to do my best. Altough, I was suprised to notice that the evaluation of the course was based on the assessment scale of pass and fail. After investing a lot of time and effort in doing the assignments, it would have been instructive to know in what scale did I performed. Nonetheless, I learned a lot in this course and it opened new perspectives which I can utilize in my masters thesis.

  • Henna, a sociology major

Big thanks for the course! It was a very interesting and fun set even though it included a lot of new things to be learned in a fast phase. You are very skilled at explaining things very clearly and in an entertaining way by using (for some reason often fatal :D) examples. Not that entertainment is the most essential aspect of a course but at least it helps to concentrate and remember the content. The course had a good balance of lecturing and group discussions, albeit it wasn’t always easy to come up with discussion points since there was so much to take in. Still, it was nice to hear what materials others had been reading or what they remember from the lectures. You were also very good at taking and answering questions in many different ways to ensure everyone understood the underlying point, and I never felt that I could not ask something I did not understand no matter “simple” the question.

CARMA for the win!

  • Social Psychology Master’s Student

I would recommend this course for every student because it gives you many new viewpoints concerning validity of scientific methods.

  • Anonymous participant

Thank you for the lecture course, Matti. Your passion to these topics really shows with the enthusiasm you presented the numerous examples in class, with the blog and tweets and with the breathtaking slideshows sometimes consisting of 100 slides or more. I appreciate you bringing up the importance of open science and “hacks”, with which it is possible to take the other direction with science. And honestly, without all of the examples with which you tied the topics to real life, I probably wouldn’t have had the slightest idea what this course was about. The in-class discussions didn’t work that well, and I think that was because it was hard to tie our thoughts together (and present them in class) because everyone had done assignments in different topics. Discussing itself was alright, though. I liked that the at-home assignments balanced the theory-heavy lectures also, where we could think of the topics more concretely, if we wanted to. All in all, I think this was a rather “easy” course to complete, but I like that, since studying is done for our own sake and for our education, not for teachers. Like critical thinking. And as I stated in the last assignment, during this course I learned that before, I wasn’t at all as critical as I thought I was. So, thanks for that!

  • A 4th year student

Thank you for an excellently organised course! Your effort in the implementation and enthusiasm toward the subject, as well as goals aiming to expand students’ understanding were very visible during the course. This motivated to do the intensive work required by internalising difficult topics.

  • Henna, a sociology major

Thank you for this course, I really liked it! I feel that I now have a deeper understanding of research methodology and am able to do more critical judgments than before. I also wish there would be a second Carma course.

  • A social psychology major

Pathways and complexity in behaviour change research

These are slides of a talk given at the Aalto University Complex Systems seminar. Contrasts two views to changing behaviour; the pathway view and the complexity view, the latter being at its infancy. Presents some Secret Analysis Arts of Recurrence, which Fred Hasselman doesn’t want you to know about. Includes links to resources. If someone perchance saw my mini-moocs (1, 2) and happened to find them useful, drop me a line and I’ll make one of this.

Lifestyle factors are hugely relevant in preventing disease in modern societies; unfortunately people often fail in their attempts to change health behaviour – both their own, as well as that of others’. In recent years, behaviour change design has been conceived as a process where one identifies deficiencies in factors influencing the behaviours (commonly called “determinants”). Complexity thinking suggests putting emphasis on de-stabilisation instead.

The perspective taken here is mostly at the idographic level. At the time of writing, we have behaviour change methods to affect e.g. skills, perceived social norms, attitudes and so forth – but very little on general de-stabilisation of the motivational system as an important predictor of change.

Perspectives are welcome!

ps. Those of you to worry about brainwashing and freedom of thought: Chill. Stuff that powerful doesn’t really exist, and if it did, marketers would know about it and probably rule the world. [No, they don’t rule the world, I’ve been there]

pps. Forgot to put it in the slides, but this guy Merlijn Olthof will perhaps one day tweet about his work about destabilisation in psychotherapy contexts. Meanwhile, you can e.g. be his 10th Twitter follower, or keep checking his Google Scholar profile, as there’s a new piece coming out soon!

Why you should share Data Nudes instead of just Shitty Tables

This post summarises what I wanted to say with a recent paper published in Health Psychology and Behavioural Medicine, which includes an RMarkdown website supplement with code. Related slideshow and a video walkthrough is available here. Note: If it’s not obvious, These are my opinions as the first author, and may or may not be shared with collaborators who are nice people and surely wouldn’t use such foul language in public.

Some Problems in Summarising and Presenting Data

Many research reports include lots of variables, presented in tables comparing two or more groups, say an intervention and a control, or males and females. Readers often look at the means and standard deviations, looking for statistically significant differences between the two. What’s the problem?

1. It’s often not clear what significance even means, or whether some correction for multiple testing has been applied.

First of all, following the logic of Neyman-Pearson hypothesis testing, to keep error rate under the alpha level, one would have to correct for multiple testing, and it is unclear how many tests one should correct for when hypotheses are not pre-specified. Ignoring this – especially, where it is unclear how to heed the recommendation to justify one’s alpha level – error rates can become surprisingly high, much more than the conventionally assumed 5%.

2. In the absence of randomisation, increased sample size leads to detecting more and more tiny differences.

When there has not been randomisation (as in the case of genders or baseline cohort descriptions), the null hypothesis of zero difference is never true, and its rejection only depends on statistical power. We are pretty much never interested in whether the populations differ by any arbitrarily small amount on any of the presented variables. What usually matters, is whether this difference is large enough to make a difference, that is, how big is the effect size. Two caveats follow: Firstly, in behavioural field trials, your participants are rarely independent from each other, but come clustered in e.g. classrooms (students), hospitals (patients) or offices (9-to-5 mental patients). Secondly, you almost always need to randomise clusters instead of individuals (here‘s why), which gives statistical power a huge ass-whooping.

Not accounting for the multilevel structure of the data when calculating effect sizes inflates the standard errors, possibly even making zero effects appear as medium-sized ones. But it is not a trivial task to derive trustworthy effect sizes for nested data (Lai & Kwok 2016). Although some solutions exist, they have not yet been empirically validated for finite populations in the second or third levels, nor is there currently a straightforward software implementation available – to my knowledge, that is. Therefore, a sensible option may be to present the means with their corresponding confidence intervals, encouraging the readers to refrain from merely considering non-overlapping intervals between groups as dichotomous hypothesis tests. In Shitty Table 1 you can see how this is done. That seem clear to you? Don’t worry, there are alternatives!

shitty table 1
Shitty Table 1. Means and confidence intervals for lots of things. Click to enlarge. Source.

3. The shape of the distribution may matter much, much more than simple arithmetic mean.

Difference between two means is fun and neat, but only informative for approximately normal or symmetric distributions, which are not the norm in social and life sciences. See reading list in the end. But hey, surely everyone reports things like skewness and kurtosis? [Of course they don’t, and even if they did, a minority of social scientists could actually interpret the numbers.] Look at Shitty Table 2 to see for yourself, whether you consider this a good way to convey information.

shitty table 2
Shitty Table 2. Means, standard deviations and some distributional properties of a single variable in different educational tracks the participants were nested in. Nur = Practical nurse, HRC = Hotel, restaurant and catering studies, BA = Business and administration, IT = Business information technology. Click to enlarge. Source.

An aside as regards the means: Few individual participants are described by the group-level summary statistics. In fact, using Daniels’ definition of an ‘approximately average individual’ as falling in the middle 30% of the range of values, only 1.50% of participants can be considered ‘average’ on all of the primary outcome measures (see supplementary website, section Also see this and this blog post, as well as the papers listed in the end.

Data Wants to be Seen Naked

star trek android GIF

In our paper, we present some ways behaviour change researchers could visualise their data, discuss some limitations and provide links to R code. Many, many other dedicated sources do this better, so feel free to check out this or this, for example. A principle I particularly like is to, whenever possible, include the raw data in the visualisation. This is because in abstractions, I personally have a hard time keeping in mind that I’m dealing with individuals operating in the world (complex dynamic systems in complex dynamic systems), and the raw data tends to ground me to some reality.

pretty picture 1
Pretty Picture 1. Visualising the information in Shitty Table 1 with raw data. Click to enlarge.

Data-visualisation and data exploration techniques (e.g. network analysis) can help reveal the dynamics involved in complex multi-causal systems – a challenging task with Shitty Tables. Data visualisations are crucial supplements to large numerical tables of descriptive statistics. With visualisations, researchers can communicate large amounts of information – including the associated uncertainty – in an accessible format, without requiring extensive mathematical expertise from the reader. This is important for researchers who intend to build on previous results, and in the paper we argue that such practices may also reduce problems that have led to the recent loss of confidence in the reproducibility and replicability of research findings in social and life sciences. Fully open data sharing would be ideal, but this is not always possible due to privacy concerns and, at the time of writing, remains a lamentably rare practice. In addition, open data does not necessarily accommodate stakeholders with low technical expertise in data analysis and visualisation, such as clinicians, patients and policy makers.

The benefits of presenting complex data visually should encourage researchers to publish extensive analyses and descriptions as website supplements, which would increase the speed and quality of scientific communication, as well as help to address the crisis of reduced confidence in research findings.

pretty picture 2
Pretty Picture 2. Visualising the information in Shitty Table 2. Shows hours of accelerometer-measured moderate-to-vigorous physical activity for different educational tracks. Midpoints of diamonds indicate means, endpoints 95% credible intervals. Individual observations are presented under the density curves, with random scatter on the y-axis to ease inspection. Nur = Practical nurse, HRC = Hotel, restaurant and catering, BA = Business and administration, IT = Information and communications technology.

In Pretty Picture 2, looking closely you can observe that boys did more moderate-to-vigorous physical activity (x-axis is average daily hours) in every educational track. In spite of this, girls appeared more active when combining the educational tracks (shown as rows in the figure), because there is much more people in the practical nurse track, ,as well as those people being mostly girls. This is also known as the Simpson’s paradox, and is best investigated by visualising data.

pretty picture 3.PNG
Pretty Picture 3. See paper for elaboration.

Conventional approaches would have e.g. left the reader with an impression that the means of the multimodal or skewed variables (see Pretty Picture 1) are interpretable as central tendencies, and that the sample is homogenous (see Pretty Picture 2). Transparent and accessible sharing of data characteristics, analyses and analytical choices is imperative for increasing confidence in research findings; if nothing else, the elaborate supplements can act as a platform to present robustness tests and assumption explorations in.

pretty picture 4
Pretty Picture 4. See paper for elaboration.

Reading list

The paper described in this post:

  • Heino, M. T. J., Knittle, K., Fried, E., Sund, R., Haukkala, A., Borodulin, K., … Hankonen, N. (2019). Visualisation and network analysis of physical activity and its determinants: Demonstrating opportunities in analysing baseline associations in the let’s move it trial. Health Psychology and Behavioral Medicine, 7(1), 269–289.
  • Supplementary website: Link

On data visualisation:

  • Tay, L.Parrigon, S.Huang, Q., & LeBreton, J. M. (2016). Graphical descriptives a way to improve data transparency and methodological rigor in psychologyPerspectives on Psychological Science11(5), 692701

On hypothesis testing for non-prespecified comparisons:

  • de Groot AD. The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]. Acta Psychologica. 2014;148:188–94.
  • Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proceedings of the National Academy of Sciences. 2018;201708274.

On effect sizes for cluster randomised situations:

  • Lai MHC, Kwok O-m. Estimating Standardized Effect Sizes for Two- and Three-Level Partially Nested Data. Multivariate Behavioral Research. 2016;51:740–56.
  • Lai MHC, Kwok O-m, Hsiao Y-Y, Cao Q. Finite population correction for two-level hierarchical linear models. Psychological methods. 2018;23:94.

On distributional shapes:

  • Choi, S. W. (2016). Life is lognormal! What to do when your data does not follow a normal distribution. Anaesthesia71(11), 1363-1366.
  • Saxon, E. (2015). Beyond bar chartsBMC Biology13(1), 60. doi: 10.1186/s12915-015-0169-6
  • Taleb, N. N. (2007). Black swans and the domains of statistics. The American Statistician61(3), 198-200.
  • van Rooij, M. M., Nash, B., Rajaraman, S., & Holden, J. G. (2013). A fractal approach to dynamic inference and distribution analysis. Frontiers in physiology, 4, 1.
  • Weissgerber, T. L.Garovic, V. D.Savic, M.Winham, S. J., & Milic, N. M. (2016). From static to interactive: Transforming data visualization to improve transparencyPLOS Biology14(6), e1002484. doi: 10.1371/journal.pbio.1002484
  • Weissgerber, T. L.Milic, N. M.Winham, S. J., & Garovic, V. D.(2015). Beyond bar and line graphs: time for a new data presentation paradigmPLOS Biology13(4), e1002128. doi: 10.1371/journal.pbio.1002128

On averages:

  • Daniels, G. S. (1952). The“average man”?Wright-Patterson Air Force Base, OHAir Force Aerospace Medical Research Lab.
  • Rose, T. (2016). The end of average: How to succeed in a world that values sameness. Penguin UK.
  • Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2017). Beyond differences in means: Robust graphical methods to compare two groups in neuroscienceEuropean Journal of Neuroscience46(2), 17381748. doi: 10.1111/ejn.13610
  • Trafimow, D., Wang, T., & Wang, C. (2018). Means and standard deviations, or locations and scales? That is the question!New Ideas in Psychology503437. doi: 10.1016/j.newideapsych.2018.03.001

Randomised experiments (mis?)informing social policy in complex systems

In this post, I vent about anti-interdisciplinarity, introduce some basic perspectives of complexity science, and wonder whether decisions on experimental design actually lead us to end up in a worse place than where we were, before we decided to use experimental evidence to inform social policy.

People in our research group recently organised a symposium, Interdisciplinary perspectives on evaluating societal interventions to change behaviour (talks watchable here), as part of a series called Behaviour Change Science & Policy (BeSP). The idea is to bring together people from various fields from philosophy to behavioural sciences, medicine and beyond, in order to better tackle problems such as climate change and lifestyle diseases.

One presentation touched upon Finland’s randomised controlled trial to test the effects of basic income on employment (see also report on first year results). In crude summary, they did not detect effects of free money on finding employment. (Disclaimer: They had aimed for 80% statistical power, meaning that if all your assumptions regarding the size of the effect are correct, in the long term, 20% of the time you’d get no statistically significant effect in spite of there being a real effect.)

During post-symposium drinks, I spoke with an economist about the trial. I was wondering, how come they used individual instead of cluster randomisation – randomising neighbourhoods, for example. The answer was resource constraints; much larger sample sizes are needed for the statistics to work. To me it seemed clear, that it’s a very different situation if one person in a network of friends got free money, as compared to if everyone did. The economist wondered: “How come there could be second-order effects when there were no first-order effects?” The conversation took a weird turn. Paraphrasing:

Me: Blahblah compelling evidence from engineering and social sciences to math and physics that “more is different”, i.e. phenomena play out differently depending on the scale at consideration… blahblah micro-level interactions create emergent macro-level patterns blahblah.

Economist: Yeah, we’re not having that conversation in our field.

Me: Oh, what do you mean?

Economist: Well, those are not things discussed in our top journals, or considered interesting subjects to research.

Me: I think they have huge consequences, and specifically in economics, this guy in Oxford just gave a presentation on what he called “Complexity economics“. He had been doing it for some decades already, I think he originally had a physics background…

Economist: No thanks, no physicists in my economics.

Me: Huh?

Economist: [exits the conversation]

Now, wasn’t that fun for a symposium on interdisciplinary perspectives.

I have a lot of respect for the mathematical prowess of economists and econometricians, don’t get me wrong. One of my favourites is Scott E. Page, though I only know him due to an excellent course on complexity (also available as an audio book). I do probably like him, because he breaks out of the monodisciplinary insulationist mindset economists are often accused of. Page’s view of complexity actually relates to our conversation. Let’s see how.

First off, he describes complexity (and most social phenomena of interest) as arising from four factors, which can be thought as tuning knobs or dials. Complexity arises, when each dial is not tuned into either of the extremes, which is where equilibria arise, but somewhere in the middle. And complex systems tend to reside far from equilibrium, permanently.

To dig more deeply into how the attributes of interdependence,
connectedness, diversity, and adaptation and learning generate
complexity, we can imagine that each of these attributes is a dial that
can be turned from 0 (lowest) to 10 (highest).

Scott E. Page

  • Interdependence means the extent of how much one person’s actions affect those of another’s. This dial ranges from complete independence, where one person’s actions do not affect others’ at all, to complete dependence, where everyone observes and tries to perfectly match all others’ actions. In real life, we see both unexpected cascades (such as the US decision makers’ ethanol regulations, leading to the Arab Spring), as well as some, but never complete, independence – that is, manifestations that do not fit into either extreme of the dial, but lie somewhere in between.
  • Connectedness refers to how many other people a person is connected to. The extremes range from people living in a cabin in the woods all alone, to hypersocial youth living in Instagram trying to keep tabs on everyone and everything. The vast majority of people lie somewhere in between.
  • Diversity is the presence of qualitatively different types of actors: If every person is a software engineer, mankind is obviously doomed… But the same happens if there’s only one engineer, one farmer etc. Different samples of real-world social systems (e.g. counties) consist of intermediate amounts of diversity, lying somewhere in between.
  • Adaptation and learning refer to the extent of the actors’ smartness. This ranges from following simple, unchanging rules, to being perfectly rational and informed, as assumed in classical economics. In actual decision making, we see “bounded rationality”, reliance on rules of thumb and tradition, as well as both optimising and satisficing behaviours – the “somewhere in between”.

The complexity of complex systems arises, when diverse, connected people interact on the micro-level, and by doing so produce “emergent” macro-level states of the world, to which they adapt, creating new unexpected states of the world.

You might want to read that one again.

Back to basic income: When we pick 2000 random individuals around the country and give them free money, we’re implicitly assuming they are not connected to any other people, and/or that they are completely independent the actions of others’. We’re also assuming that they are either the same, or that it’s not interesting that they are of different types. And so forth. If we later compare their employment data to that of those who were not given basic income, the result we get is an estimate of the causal effect in the population, if all assumptions would hold.

But consider how these assumptions may fail. If the free money was perceived as a permanent thing, and given to people’s whole network of unemployed buddies, it seems quite plausible that they would adapt their behaviour as a response of the dynamics of their social network changing. This might even be different in cliques of certain people, who might use the safety net of basic income to collectively found companies and take risks, and cliques of other people, who might alter their daily drinking behaviour to match the costs with the predictable income – for better or worse. But when you randomise individually and ignore how people cluster in networks, you’re studying a different thing. Whether it’s an interesting thing or a silly thing, is another issue.

Now, it’s easy to come up with these kinds of assumption-destroying scenarios, but a whole different ordeal to study them empirically. We need to simplify reality in order to deal with it. The question is this: How much of an abstraction can a map (i.e. a model in a research study, making those simplified assumptions) be, in order to still represent reality adequately? This is also an ontological question, because if you take the complexity perspective seriously, you say bye-bye to the kind of thinking that allows you to dream up predictable effects a button-press (such as a policy change) has over the state of a system. People who act in—or try steering—complex systems, control almost nothing but influence almost everything.

An actor in a complex system controls almost nothing but influences almost everything.

Scott E. Page

Is some information, some model, still better than none? Maybe. Maybe not. In Helsinki, you’re better off without a map, than with a map of Stockholm – the so-called “Best map fallacy” (explained here in detail). Rare, highly influential events drive the behaviour of complex systems: the Finnish economy was not electrified by average companies starting to sell more, but by Nokia hitting the jackpot. And these events are very hard, if not impossible, to predict✱.

Ok, back to basic income again. I must say that the people who devised the experiment were not idiots, and included e.g. interviews of people to get some idea about unexpected effects. I think that this type of an approach is definitely necessary when dealing with complexity, and all social interventions should include qualitative data in their evaluation. But, again, unless the unemployed don’t interact, with randomisation done individually you’re studying a different thing than when it’s done in clusters. I do wonder if it would have been possible to include some matched clusters, to see if any qualitatively different dynamics take place, when you give basic income to a whole area instead of randomly picked individuals within it.

Complex systems organizational map.jpg
The society is a complex system, and must be studied as such. Figure: Hiroki Sayama (click to enlarge)

But, to wrap up this flow of thought, I’m curious if you think it is possible to randomise a social intervention individually AND always keep in mind that the conclusions are only valid if there are no interactions between people’s behaviour and that of their neighbours. Or is it inevitable that that the human mind smoothes out the details?

Importantly: Is our map better now, than it was before? Will this particular experiment go in history as—like the economist stated in “there were no first-order effects”—basic income not having any effect on job seeking? (remember, aim was only 80% statistical power). Lastly, I want to say I consider it unforgiveable to only work within one discipline and disregard the larger world you’re operating in: When we bring science to policy making, we must be doubly cautious of the assumptions our conclusions stand on. Luckily, transparent scientific methodology allows us to be explicit about them.

Let me hear your thoughts, and especially objections, on Twitter, or by email!

✱ One solution is to harness convexity, which can be oversimplified like this:

  1. Unpredictable things will happen, and they will make you either better or worse off.
  2. Magnitude of an event is different from it’s effect on you, i.e. there are huge events that don’t impact you at all, and small events that are highly meaningful to you. Often that impact depends on the interdependence and connectedness dials.
  3. To an extent, you can control the impact an event has on you.
  4. You want to control exposure in such a way, that surprise losses are bounded, while surprise gains are as limitless as possible.

Idiography illustrated: Things you miss when averaging people

This post contains slides I made to illustrate some points about phenomena, which will remain forever out of reach, if we continue the common practice of always averaging individual data. For another post on perils of averaging, check this out, and for an overview of idiographic research with resources, see here.  

(Almost the same presentation with some narration is included in this thread, in case you want more explanation.)

Here’s one more illustration of why you need the right sampling frequency for whatever it is you study – and the less you know, the denser sampling you need initially. From a paper I’m drafting:


The figure illustrates a hypothetical percentage of a person’s maximum motivation (y-axis) measured on different days (x-axis). Panels: 

  • A) measurement on three time points—representing conventional evaluation of baseline, post-intervention and a longer-term follow-up—shows a decreasing trend.
  • B) Measurement on slightly different days shows an opposite trend. 
  • C) Measuring 40 time points instead of three would have accommodated both phenomena.
  • D) New linear regression line (dashed) as well as the LOESS regression line (solid), with potentially important processes taking place during the circled data points.
  • E) Having measured 400 time points instead, would have revealed a process of “deterministic chaos” instead. Not knowing the equation and the starting points, it would be impossible to predict accurately, but this doesn’t mean regression is helpful.

During the presentation, a question came up: How much do we need to know? Do we really care about the “real” dynamics? Personally, I mostly just want information to be useful, so I’d be happy just tinkering with trial and error. Thing is, tinkering may benefit from knowing what has already failed, and where fruitful avenues may lie. My curiosity ends, when we can help people change their behaviour in ways that fulfill the spirit of R.A. Fisher’s criterion for an empirically demonstrable phenomenon:

In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result. (Fisher 1935b/1947, p. 14; see Mayo 2018)

So, if I was a physiology researcher studying the effects of exercise, I would have changed fields (to e.g. PA promotion) when the negative effects of low activity became evident, whereas other people want to learn the exact metabolic pathways by which the thing happens. And I will quit intervention research when we figure out how to create interventions that fail to work <5% of the time.

Some people say we’re dealing with human phenomena that are so unpredictable and turbulent, that we cannot expect to do much better than we currently do. I disagree with this view, as all the methods I’ve seen used in our field so far are designed for ergodic, stable, linear systems. But there are other kinds of methods, which physicists started using when they left behind the ones that stuck with us, around maybe the 19th century. I’m very excited about learning more at the Complexity Methods for Behavioural Science summer school (here are some slides on what I presume will be among the topics).

Additional resources:

I don’t have examples on e.g. physical activity, because nobody’s done that yet, and lack of good longitudinal within-individual data is a severe historical hindrance. But some research groups are gathering longitudinal continuous data, and one that I know of, has very long time series of machine vision data on school yard physical activity (those are systems, too, just like individuals). Plenty has already been done in the public health sphere.

Hell do I know, this might turn out to be a dead-end, like most new developments tend to be.

But I’d be happy to be convinced that it is an inferior path to our current one 😉


Correlation pitfalls – Happier times with mutual information?

I’ve become increasingly anxious about properties of correlation I never knew existed. I collect resources and stuff on the topic in this post, so that have everything in one place. Some resources for beginners in the end of the post.

Correlation isn’t causation, and causation doesn’t require correlation. Ok. But have you heard that correlation is not correlation? In other words, things can be dependent without being correlated, and independent though correlated. Ain’t that fun. As Shay Allen Hill describes visually in his excellent, short blog (HIGHLY RECOMMENDED):

[C]ovariance doesn’t actually measure “Does y increase when x increases?” it only measures “Is y above average when x is above average (and by how much)?” And when covariance is broken [i.e. mean doesn’t coincide with median], our correlation function is broken.

So there may well be situations, where only 20% of people in the sample show dependence between two variables, and this shows up as a correlation of 37% at minimum. Or when a correlation of 0.5 carries ~4.5 times (and a correlation of 0.75 carries ~12.8 times) more information than a correlation of 0.25. As you may know, in psychology, it’s quite rare to see a correlation of 0.5. But even a correlation of 0.5 only gives 13% more information than random. This prompted the following conversation:

How can we interpret a result without in-depth knowledge of the field as well as the data in question? A partial remedy apparently is using mutual information instead (see this paper draft for more information). I know nothing about it, so like always, I just started playing around with things I don’t understand. Here’s what came out:


The first four panels are the Anscombe’s Quartet. Fifth illustrates Taleb’s point about intelligence. Data for the last two panels are from this project. First four and last two panels have the same mean and standard deviation. Code for creating the pic is here.

MIC and BCMI were new to me, but I thought they were easy to implement, which doesn’t of course mean they make sense. But see how they catch the dinosaur?

  • MIC is the Maximal Information Coefficient, from maximal information-based nonparametric exploration (documentation)
  • BCMI stands for Jackknife Bias Corrected MI estimates (documentation)
  • DCOR is distance correlation (see comments)

I’d be happy to hear thoughts and caveats regarding the use of entropy-based dependency measures in general, and these in particular, from people who actually know these methods. Here’s a related Twitter thread, or just email me!

ps. If this is your first brush with uncertainties related to correlations, and/or have little or no statistics background, you may not know how correlation can vary spectacularly in small samples. Taleb’s stuff (mini-moocs [1, 2]) can sometimes be difficult to grasp without math background, so perhaps get started with this visualisation, or these Excel sheets. A while ago I animated some elementary simulations of p-value distributions for statistical significance of correlations; selective reporting makes things a lot worse than what’s depicted there. If you’re a psychology student, also be sure to check out the p-hacker app. If you haven’t thought about distributions much lately, check this out for a fun read by a math student.

⊂This post has been a formal sacrifice to Rexthor.⊃

Statistical tests for social science

These are slides from my lecture on significance testing, which took place in a course on research methods for social scientists. Some thoughts:

  • I tried to emphasise that this stuff is difficult, that people shouldn’t be afraid to say they don’t know, and that academics should try doing that more, too.
  • I tried to instill a deep memory that many uncertainties are involved in this endeavour, and that mistakes are ok as long as you report the choices you made transparently.
  • Added a small group discussion exercise at about 2/3 of the lecture: What was the most difficult part to understand so far? I think this worked quite well, although “Is this what an existential crisis feels like?” was not an uncommon response.

I really think statistics is mostly impossible to teach, and people learn when they get interested and start finding things out on their own. Not sure how successful this attempt was in doing that. Anyway, slides are available here.

TLDR: If you’re a seasoned researcher, see this. If you’re an aspiring one, start here or here, and read this.

stat testing tausta

Complexity considerations for intervention (process) evaluation

For some years, I’ve been partly involved in the Let’s Move It intervention project, which targeted dysfunctional physical activity and sedentary behaviour patterns of older adolescents, by affecting their school environment as well as social and psychological factors.

I held a talk at the closing seminar; it was live streamed and is available here (on stage starting from about 1:57:00 in the recording). But if you were there, or are otherwise interested in the slides I promised, they are now here.

For a demonstration of non-stationary processes (which I didn’t talk about but which are mentioned in these slides), check out this video and an experimental mini-MOOC I made. Another blog post touching on some of the issues is found here.


blogiin kuva

Misleading simplifications and where to find them (Slides & Mini-MOOC 11min)

The gist: to avoid getting fooled by them, we need to name our simplifying assumptions when modeling social scientific data. I’m experimenting with this visual approach to delivering information to those who think modeling is boring; feedback and improvement suggestions very welcome! [Similar presentation with between-individual longitudinal physical activity networks, presented at the Finnish Health Psychology conference: here]

I’m not as smooth as those talking heads on the interweb, so you may want just the slides. Download by clicking on the image below or watch at SlideShare.


misleading assumptions 1st slide



Note: Jan Vanhove thinks we shouldn’t  become paranoid with model assumptions; check his related blog post here!