In this post, I present a property of averages I found surprising. Undoubtedly this is self-evident to statisticians and people who can think multi-variately, but personally I needed to see it to get a grasp of it. If you’re a researcher, make sure you do the single-item quiz before reading, to see how well your intuitions compare to those of others!
Ooo-oh! Don’t believe what they say is true
Ooo-oh! Their system doesn’t work for you
Ooo-oh! You can be what you want to be
Ooo-oh! You don’t have to join their f*king army
– Anti-Flag: Their System Doesn’t Work For You
In his book “The End of Average”, Todd Rose relates a curious story. In the late 1940s, the US Air Force saw a lot of planes crashing, and those crashes couldn’t be attributed to pilot error nor equipment malfunction. On one particularly bad day, 17 pilots crashed without an obvious reason. As everything from cockpits to helmets had been built to conform to the average pilot of the 1926, they brought in Lt. Gilbert Daniels to see if pilots had gotten bigger since then. Daniels measured 4063 pilots—who were preselected to not deviate from the average too much—on ten dimensions: height, chest circumference, arm length, thigh circumference, and so forth.
Before Daniels began, the general assumption was, that these pilots were mostly if not exclusively average, and Daniels’ task was to find the most accurate point estimate. But he had a more fundamental idea in mind. He defined “average” generously as person who falls within the 30% band around the middle, i.e. the median ±15%, and looked at whether each individual fulfills that criterion for all the ten bodily dimensions.
So, how big a proportion of pilots were found to be average by this metric?
This may be surprising, until you realise that each additional dimension brings with it a new “objective”, making it less likely that someone achieves all of them. But actually, only a fourth were average on a single dimension, and already less than ten percent were average on two dimensions.
As you saw in the quiz, I wanted to figure out how big a proportion of our intervention participants could be described as “average” by Daniels’ definition, on four outcome measures. The answer?
A lousy 1.98 percent.
I’m a bit slow, so I had to do a of simulation to get a better grasp of the phenomenon (code here). First, I simulated 700 intervention participants, who were hypothetically measured on four random, uncorrelated, normally distributed variables. What I found was that 0.86 % of this sample were “average” by the same definition as before. But what if we changed the definition?
Here’s what happens:
As you can see, you’ll describe more than half of the sample only when you extend the definition of “average” to about the middle 85% percent (i.e. median ±42.5%).
But what if the variables were highly correlated? I also simulated 700 independent participants with four variables, which were correlated almost perfectly (within-individual r = 0.99) with each other. Still, only 22.9 % percent of participants were described by defining average as the middle 30% around the median. For other definitions, see the plot below.
What have we learned? First of all: When you see averages, do not go assuming that they describe individuals. If you’re designing an intervention, you don’t just want to see which determinants correlate highly with the target behaviour on average, or seem changeable in the sense that the mean on those variables is not very high to begin with in your target group (see the CIBER approach, if you’re starting from scratch and want to get a preliminary handle on the data). This, because a single individual is unlikely to have the average standing on more than, say, two of the determinants, and individuals are who you’re generally looking to target. One thing you could do, is a cluster analysis where you’d look for the determinant profile, which is best associated with e.g. hospital visits (or, attitude/intention), and try to target the changeable determinants within that.
As a corollary: If you, your child, or your relationship doesn’t seem to conform to the dimensions of an average person in your city, or a particular age group, or whatever, this is completely normal! Whenever you see yourself falling behind the average, remember that there are plenty of dimensions where you land above it.
But wait, what happened to USAF’s problem of planes crashing? Well, the air force told the plane manufacturers to fix the problem of cockpits which don’t fit any individuals. The manufacturers said it was impossible and extremely costly. But when the air force said didn’t listen to excuses, cheap and easy solutions appeared quickly. Adjustable seats—now standard equipment in cars—are an example of the new design philosophy of individual fit, where we don’t try to fit the individual to the system, but the system to the individual.
Let us conclude with Daniels’ introduction section:
Three additional notes about the average:
Note 1: I’m taking it for granted, that we already know that the average is a useless statistic to begin with, unless you know the variation around the average, so I won’t pound on that further. But remember that variables generally aren’t perfectly normally distributed, as in the above simulations; my guess is that the situation would be even worse in those cases. Here’s a blog post you may want to check out: On Average, You’re Using the Wrong Average.
Note 2: There’s a curious tendency to think that deviations from the average represent “error” regardless of domain, whereas it’s self-evident that individuals can survive both if they’re e.g. big and bulky, or small and fast. With psychological measurement, is it not madness to think all participants have an attitude score, which comes from a normal distribution with a common mean for all participants? To inject reality in the situation, each participant may have their own mean, which changes over time. But that’s a story for another post.
Note 3: Did I already say, that you generally shouldn’t make individual-level conclusions based on between-individual data, unless ergodicity holds (which, in psychology, would be quite weird)?
I recently had a great experience with a StackOverflow question, when I was thinking about how to visualise ordinal data. This post shows an option for how to do that. Code for the plots is in the end of this post.
Update: here’s an FB discussion, which mentions e.g. a good idea of making stacked % graphs (though I like to see the individuals, so they won’t sneak up behind me) and using the package TramineR to visualise and analyse change.
Update 2: Although they have other names too, I’m going to call these things flamethrower plots. Just because it reflects the fact, that even though you have the opportunity to do it, it may not always be the best idea to apply them.
Say you have scores on some likert-type scale questionnaire items, like motivation, in two time points, and would like to visualise them. You’re especially interested in whether you can see detrimental effects, e.g. due to an intervention. One option would be to make a plot like this: each line in the plot below is one person, and the lighter lines indicate bigger increases in motivation scores, whereas the darker lines indicate iatrogenic development. The data is simulated so, that the highest increases take place in the item in the leftmost plot, the middle is randomness and the right one shows iatrogenics.
I have two questions:
Do these plots have a name, and if not, what should we call them?
How would you go about superimposing model-implied-changes, i.e. lines showing that when someone starts off at, for example, a score of four, where are they likely to end up in T2?
The code below first simulates 500 participants for two time points, then draws plot. If you want to use it on your own data, transform the variables in the form scaleName_itemNumber_timePoint (e.g. “motivation_02_T1”).
It was recently brought to my attention that there exist such things as time and context, the flow of which affects human affairs considerably. Then there was this Twitter conversation about what habits actually are. In this post, I try to make sense of how to view health behavioural habits from the perspective of dynamical systems / complexity theory. I mostly draw from this article.
Habits are integral to human behaviour, and arguably necessary to account for in intervention research 1–3. Gardner 1 proposes a definition of habit as not a behaviour but “a process by which a stimulus generates an impulse to act as a result of a learned stimulus-response association”. Processes being seldom stable for all eternity, a complex dynamical systems perspective would propose some consequences of this definition.
What does it mean, when a process—such as habit—is stable? One way of conceiving this is considering the period of stability as a particular state a system can be in, while being subject to change. Barrett 4 proposes four features of dynamic system stability, in which a system’s states depend on the interactions among its components, as well as the system’s interactions with its environment.
First of all, stability always has a time frame, and stabilities at different time frames (such as stability over a month and a year) are interdependent. We ought to consider, how these time scales interact. For example, some factors which determine one’s motivation to go to the gym, such as mood, fluctuate on the scale from minutes to hours. Others may fluctuate on the daily level, and can be influenced by how much one slept the previous night or how stressful one’s workday was, whereas others fluctuate weekly. Then again, some—which increasingly resemble dispositions or personality factors—may be quite stable across decades. When inspecting a health behaviour, we ought to be looking at minimum the process which takes place on a time scale one level faster, and one lever slower than the one we are purportedly interested in 4. For example, how do daily levels of physical activity relate to weekly ones, and how do montly fluctuations affect the weekly fluctuations? Health psychologists could also classify each determinant of a health behaviour, based on the time scale it is thought to operate on. For example, if autonomous forms of motivation 5 seem to predict physical activity quite well cross-sectionally, we could attempt to measure it for a hundred days and investigate what the relevant time-scales of fluctuations are, in relation to those of the target behaviour. Such an exercise could also be helpful for deciding on the sampling frequency of experience sampling studies.
Second, processes in systems such as people have their characteristic attractor landscapes, and these landscapes can possibly be spelled out, along with the criteria associated with them. By attractors I mean here behaviours a person is drawn to, and an attractor landscape is the conglomerate of these behaviours. The cue-structure of the behaviours can be quite elaborate. For example, a person may smoke only, when they have drank alcohol (1) in a loud environment (2), among a relatively large group (3) of relatively unfamiliar people (4), one or two of whom are smokers (5); a situation where it is easier to have a private conversation if one joins another to go out for a cigarette. This highlights how the process of this person’s smoking habit can be very stable (mapping to the traditional conception of “habitual”), while also possibly being highly infrequent.
Note: Each of the aforementioned conditions for this person to smoke are insufficient by themselves, although all are needed to trigger smoking in this context. As a whole, they are sufficient to cause the person to smoke, but not always necessarily needed, because the person may smoke in some more-or-less limited other conditions, too. These conditions can also be called INUS (referring to Insufficient but Necessary criteria of an Unnecessary but Sufficient context for the behaviour) 6. Let that sink in a bit. As a corollary, if a criterion really is necessary, it may be an attractive target for intervention.
Third, the path through which change happens matters, a lot. Even when all determinants of behaviour are at a same value, the outcome may be very different depending on previous values of the outcome. This phenomenon is known as hysterisis, and it has been observed in various fields from physics (e.g. the form of a magnetic field depends on its past) to psychology (e.g. once a person becomes depressed due to excess stress, the stress level must be much lower to switch back to the normal state, than was needed for the shift to depression; 7). As a health behaviour example, just imagine how much easier it is to switch from a consistent training regime to doing no exercise at all, compared to doing it the other way around. Another way to think about is to consider that systems are “influenced by the residual stability of an antecedent regime” 4. As a consequence of stability being “just” a particular type of a path-dependent dynamic process 4,8, we need to consider the history leading up to the period where a habit is active. This forces investigators to consider attractor patterns and sensitivity to initial conditions: When did this stable (or attractor) state come about? If interactions in a system create the state of the system, which bio-psycho-social interactions are contributing to the stable state in question?
Fourth, learning processes such as those happening due to interventions usually affect a cluster of variables’ stabilities, not just one of them. To change habits, we naturally need to consider which changeable processes should be targeted, but it is probably impossible to manipulate these processes in isolation. This has been dubbed the “fat finger problem” (Borsboom 2018, personal communication); trying to change a specific variable, like attempting to press a specific key on the keyboard with gloves on, almost invariably ends up affecting neighbouring variables. Our target is dynamic and interconnected, often calling for coevolution of the intervention and the intervened.
It is obvious that people can relapse to their old habitual (attractor) behaviour after an intervention, and likely that extinction, unlearning and overwriting of cue-response patterns can help in breaking habits, whatever the definition. But the complex dynamics perspective puts a special emphasis on understanding the time scale and history of the intervenable processes, as well as highlighting the difficulty of changing one process while holding others constant, as the classical experimental setup would propose.
I would be curious of hearing thoughts about these clearly unfinished ideas.
Gardner, B. A review and analysis of the use of ‘habit’ in understanding, predicting and influencing health-related behaviour. Health Psychol. Rev.9, 277–295 (2015).
Wood, W. Habit in Personality and Social Psychology. Personal. Soc. Psychol. Rev.21, 389–403 (2017).
Wood, W. & Rünger, D. Psychology of Habit. Annu. Rev. Psychol.67, 289–314 (2016).
Barrett, N. F. A dynamic systems view of habits. Front. Hum. Neurosci.8, (2014).
Ryan, R. M. & Deci, E. L. Self-determination theory: Basic psychological needs in motivation, development, and wellness. (Guilford Publications, 2017).
Mackie, J. L. Causes and Conditions. Am. Philos. Q.2, 245–264 (1965).
Cramer, A. O. J. et al. Major Depression as a Complex Dynamic System. PLoS ONE11, (2016).
Roe, R. A. Test validity from a temporal perspective: Incorporating time in validation research. Eur. J. Work Organ. Psychol.23, 754–768 (2014).
When Roger Giner-Sorolla three years ago lamented to me, how annoying it can be to dig out interesting methods/results information from a manuscript with a carefully crafted narrative, I wholeheartedly agreed. When I saw the 100%CI post on reproducible websites a year ago, I thought it was cool but way too tech-y for me.
Well, it turned out that when you learn a tiny bit of elementary R Markdown, you can follow idiot-proof instructions on how to make cool websites out of your analysis code. I was also working on the manuscript-version of my Master’s thesis, and realised several commenters thought much of the methods stuff I considered interesting, was just unnecessary and/or boring.
So I made this thing of what I thought was the beef of the paper (also, to motivate me to finally submit that damned piece):
It got me thinking: Perhaps we could create a parallel form of literature, where (open) highly technical and (closed) traditionally narrated documents coexist. The R Markdown research notes could be read with only a preregistration or a blog post to guide the reader, while the journals could just continue with business as usual. The great thing is that, as Ruben Arslan pointed out in the 100%CI post, you can present a lot of results and analyses, which is nice if you’d do them anyway and data sharing a no-no in your field. In general, if there’s just too much conservative inertia in your field, this could be a way around it: Let the to-be-extinct journals build paywalls around your articles, but put the important things openly available. The people who get pissed off by that sort of stuff rarely look at technical supplements anyway 🙂
I’d love to hear your thoughts of the feasibility of the approach, as well as how to improve such supplements!
After some insightful comments by Gjalt-Jorn Peters, I started thinking how this could be abused. We’ve already seen how e.g. preregistration can be used as a signal of illusory quality (1, 2), and supplements like this could do the same thing. Someone could just bluff by cramming the thing full of difficult-to-interpret analyses, and claim “hey, it’s all there!”. One helpful thing is to expect heavy use of visualisations, which are less morbid to look at than numeric tables and raw R output. Another option would be creating a wonderful shiny app, like Emorie Beck did.
Actually, let’s take a moment to marvel at how super awesomesauce that thing is.
So, to continue: I don’t know how difficult it really is to make such a thing. I’m sure a lot of tech-savvy people readily say it’s the simplest thing in the world, and I’m sure a lot of people will see the supplements I presented here as a shitton of learning to do. I don’t have a solution. But if you’re a PI, you can do both yourself and your doctoral students a favour by nudging them towards learning R; maybe they’ll make a shiny app (or whatever’s in season then) for you one day!
ps. If I’d do the R Markdown all over again, I’d do more and better plots, as well as put more emphasis on readability, including better annotation of my code and decisions. Some of that code is from when I first learned R, and it’s a bit … rough. (In the last moment before submitting my Master’s thesis I decided, in a small state of frustrated fury, to re-do all analyses in R so that I needn’t mention SPSS or Excel in the thesis…)
pps. In the manuscript, I link to the page via a GitHub Pages url shortener, but provide permalink (web page stored with the Wayback Machine) in the references. We’ll see what the journal thinks of that.
ppps. There are probably errors lurking around, so please notify me when you see them 🙂
After half a century of talk, the researcher community is putting forth genuine efforts to improve social scientific practices in 2018. This is a presentation for the University of Helsinki faculty of Social Sciences, on the recent developments in statistical practices and publishing reforms. Update: Slightly modified version of presentation, held in Aberdeen here!
Nota bene: If the embedded slide deck below doesn’t work, download a pdf here.
With the realisation that even linked data may not be enough for scientists (1), and as the European Union decided to embrace open access and best practices in data management (2–4), many psychologists find themselves treading on an unfamiliar terrain. Given that ~85% of health research is wasted, this is nothing short of a pressing issue in related fields.
Here, I comment on the FAIR Guiding Principles for scientific data management and stewardship (5) for the benefit of myself and perhaps others, who have not been involved with data management best practices.
[Note: all this does NOT mean that you are forced to share sensitive data. But if your work can not be checked or reused (even after anonymisation), calling it scientific might be a stretch.]
What goes in a data management plan?
A necessary document to accompany any research plan is the data management plan. This plan should first of all specify the purpose of the data collection, and how it relates to the objectives of one’s research project. It should state which types of data are collected – for an example in the context of an intervention to promote physical activity, one might collect survey data, as well as accelerometer and body composition measures. The steps to assure the quality of the data can be described, too.
Next, the file formats for this data should be specified, along with which parts of the data will be made openly available, if the whole data is not made so. When and where will the data be made available, and what software is needed to read it? Will there be restrictions to access? Will there be an embargo, and if so, why?
The data management plan should also state, whether existing data is being re-used. The researcher should clarify the origin of data, whether existing or new, comment on its size (if known), and outline for whom the data will be useful to (4).
Bad practices leading to unusable data are still common, so adopting proper data management practices can incur costs. The data management plan should explicate these, how they are covered and who is responsible for the data management process.
The importance of collecting original data in psychology cannot be overstated. Data are a conditio sine qua non for any empirical science. Anyone who generates data and shares them publicly should be adequately recognized. (6)
Note: metadata means any information about the data. For example, descriptive metadata increases discovery and identification; includes elements such as keywords, title, abstract, author. Administrative metadata informs the management of the data; creation dates, file types, version numbers.
The FAIR principles for data management
The FAIR principles have been composed to help both machines and humans (such as meta-analysts) to find and use existing data. The principles consist of four requirements: Findability, Accessibility, Interoperability and Reusability. Note that the adherence to these principles is not just a yes-no question, but a gradient where data stewards should aspire for an increased uptake.
Below, the exact formulation of the (sub-)principles is in italics, my comments in bullet points.
F1. data are assigned a globally unique and eternally persistent identifier.
This is mostly handled in psychological research by making sure the research document is supplied with a DOI (Digital Object Identifier (7)). In addition to journals (for published research), most repositories where one can deposit any material (such as FigShare or Zenodo), or preprints (such as PsyArxiv), assign the work a DOI automatically.
F2. data are described with rich metadata.
This relates to R1 below. There should be data about the data telling you what the data is. Also: What is your approach to making versioning clear? In the Open Science Framework (OSF), you can upload new versions of your document and it automatically saves the previous version behind the new one, given that the new file has the same name as the old one.
Your data archiver helps you with metadata. E.g. the Finnish Social Science Data Archive (FSD) uses the DDI 2.1. metadata standard.
F3. data are registered or indexed in a searchable resource.
The researcher should deposit the data in a searchable repository. Your own website, or the website of your research group, is unfortunately not enough.
F4. metadata specify the data identifier.
Make sure your data actually shows its DOI somewhere, and include a link to the dataset in the metadata. As far as I know, repositories such as the OSF do this for you.
From what I understand, these are not too relevant to individual researchers. Basically, if your work can be accessed via “http://”, you are complying with this. You should also be mindful of storing your data in one repository only, and avoid having multiple DOIs. Regarding A2: if your data is sensitive and you cannot share it openly, the description of the data should still be accessible to researchers. I am not certain about how repositories deal with accessibility after the data has been taken offline.
A1. data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2. metadata are accessible, even when the data are no longer available.
Behind these items (and the FAIR principles in general) is the idea that machines could read the data and mine it for e.g. meta-analyses. I am blissfully unaware of the intricacies related to that endeavour, so I just comment from the perspective of a common researcher here.
I1. data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
It is better to prefer simple formats (e.g. spreadsheets with comma-separated values, “file.csv”) that can be opened without special software (e.g. SPSS, “file.sav”).
I2. data use vocabularies that follow FAIR principles.
This principle may seem somewhat vague and hard for others than computer scientists to grasp. It relates to index terms or glossaries used. In psychology, one possibility would be the APA thesaurus used by Psycinfo.
I3. data include qualified references to other (meta)data.
This should be a given, and the citation culture of psychology seems well-equipped to follow. But it is still important to cite the original source of questionnaires, accelerometer algorithms etc.
R1. data have a plurality of accurate and relevant attributes.
This means that the research should be accompanied with e.g. tags or a description, which provides sufficient information to determine the value of reuse for the information seekers.
R1.1. data are released with a clear and accessible data usage license.
You should state what licence is the work under. It is commonly recommended to use “CC0”, which allows all reuse, even without attribution. The second-best alternative, “CC-BY” (which requires attribution), can lead to interpretation problems of attribution stacking, when licences pile on each other (see chapter 10.4 in reference 8). It is a commonly accepted practice to cite others’ work in psychology, so CC0 seems a reasonable option, though I sympathise with the (almost invariably unfounded) fear of being scooped.
R1.2. data are associated with their provenance.
This means that the source of the data is clear, so that the data can be cited.
R1.3. data meet domain-relevant community standards.
In psychology, there are not many well-known community standards, but e.g. the DFG guidelines (6) are showing the way.
The FAIR principles can be hard to comply with exhaustively, as they are sometimes difficult to interpret (even by people who work in data archives) and take a lot of effort implement. Hence, everyone should consider whether their data is FAIR enough. As with open data in general, one should be able to describe why best practices could not be followed, when that is the case. But—for the sake of ethics if nothing else—we should aim to do the best we can.
Additional information on the FAIR principles can be found here, and some difficulties in assessing the adherence to them in (9). A 20min webinar in Finnish is available here.
Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, et al. Why linked data is not enough for scientists. Future Gener Comput Syst. 2013;29(2):599–611.
Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018.
Schönbrodt F, Gollwitzer M, Abele-Brehm A. Data Management in Psychological Science: Specification of the DFG Guidelines [Internet]. 2017 [cited 2017 Mar 29]. Available from: https://osf.io/preprints/psyarxiv/vhx89
International DOI Foundation. Digital Object Identifier System FAQs [Internet]. [cited 2017 Mar 29]. Available from: https://www.doi.org/faq.html