3: Research Meta-Methods#

  • As I was outlining this chapter, I fell prey to one of the misperceptions about research I warned you about earlier in this guide. I kept having this diffuse, low-level anxiety about whether I’d listed all of the available meta-methods. “Are there really only 3 categories of methods?”, I kept asking myself. Then I realized: I was doing what I warned you about; I was making research into this big, opaque, un-knowable domain of massive complexity.

  • However, when you read a bunch of research papers across a variety of domains, you pretty quickly find that there are only 3 categories of methods, or meta-methods as I’m calling them here. There are many more specific methods (surveys vs interviews vs direct observation and so on). But there really are just 3 high-level buckets that describe your the overall method, or approach, that most research efforts are based on. In this chapter, we’ll come to understand these meta-methods. One of the 3 works better if we subdivide it into 2, so you’ll see 4 meta-methods described below.

  • When you’re designing a SSR initiative, you might think of your choice of meta-method as a research strategy. Posed as a question, you’re asking yourself, “Which of these 4 meta-methods is the best approach to help me answer my research question?”

  • Overall:

    • The four research meta-methods are:

      • Measurement

        • Measuring prevalance

        • Measuring the under-measured

      • Testing a hypothesis, often done with a toy version of a more complex system

      • Structured oberservation

    • Each of these meta-methods has its place. Each one can be used well or badly. None of them is a “research superpower”, though if we were arbitrarily limited to one method, I’d propose that structured observation is our “desert island meta-method”. A SSR effort may involve combining two or more of these meta-methods. For example, you might begin with structured observation and use that meta-method to develop a theory that you then stress-test by building a toy and using that toy to run experiments to see if your theory holds up. Your choice of which meta-method(s) to use will be driven partially by your level of comfort with the meta-method and mostly by the specifics of your research question and the system you are trying to learn more about.

1: Measuring Prevalence#

  • Here are some questions where measuring prevalence is the meta-method likely to be chosen:

    • “We think employee morale is a problem, but don’t know how big a problem.”

    • “We’d like to know how happy or unhappy our customers are with our business.”

    • “We think we can claim our marketing software is the #1 in the marketplace, but we’re not sure yet.”

    • “We’d like to improve our service or product, but don’t know which of these 6 ideas to prioritize.”

  • The meta-method of measuring prevalance is simply measuring how often something occurs within the group (known as the sample population, or simply as the sample) that you measure. The something that you are measuring could be:

    • A quality (“I am mostly dissatisfied with my manager.”)

    • A state (“I am currently using Mailchimp for email marketing; last year I used Active Campaign.”)

    • A preference (“I prefer you improve your product using idea 3.”)

  • This meta-method tends to produce a dashboard-level number that gets further abstracted into “great/good/OK/bad/emergency”. It’s tempting to be dismissive of this kind of collapsing of nuance into a single number, but such approaches have their place! If we regarded every aspect of the world with a full-resolution view at all times, we’d be in a state of constant overwhelm at all times. While driving an internal combustion engine car, we don’t need to know exactly how many ounces or liters of fuel are in the tank, how many amps are being drawn from the battery, and the precise temperature of the engine oil and coolant. Simple “OK”, “get this checked soon”, and “pull over and turn off the engine NOW” indicators are the right level of resolution for the job we’re doing as motorists. The mechanic will pull out tools that measure with higher resolution in order to accomplish their job.

  • The same is true with SSR. Sometimes a simple number helps prioritize resources effectively (“Is this really a big enough problem to spend $X on?”), or provide a sufficient heads-up that more attention, detail, and nuance is needed in the area being measured.

  • As I write this section, I notice this little devil on my shoulder. It wants to subtly criticize the value of measuring prevalence meta-method. My personal bias towards over-engineering is why this little devil is there, and I bet I’m not the only one who has this bias. So let’s end this description of this meta-method with a reminder of how almost all of us labor under a distorted view of the world, and how simple measures of prevalence can give us a much more accurate picture of reality.

  • A market research outfit called YouGov, which claims to have a panel with 9 million members, asked respondents to “guess the percentage (ranging from 0% to 100%) of American adults who are members of 43 different groups, including racial and religious groups, as well as other less frequently studied groups, such as pet owners and those who are left-handed.” In other words, they asked people to measure the prevalence of these 43 groups by referencing their internal mental model of the world. This isn’t a direct measurement of prevalence, but rather an assessment of how distorted our mental model of reality is.

  • Certainly some of the groups in YouGov’s survey are groups that can’t be precisely defined. As a trivial example, my wife is mostly left-handed, but does some tasks with her right hand. Is she left-handed or not? Nevertheless, YouGov’s findings show significant distortions between our mental model of reality and reality itself:

  • When people’s average perceptions of group sizes are compared to actual population estimates, an intriguing pattern emerges: Americans tend to vastly overestimate the size of minority groups. This holds for sexual minorities, including the proportion of gays and lesbians (estimate: 30%, true: 3%), bisexuals (estimate: 29%, true: 4%), and people who are transgender (estimate: 21%, true: 0.6%).

  • It also applies to religious minorities, such as Muslim Americans (estimate: 27%, true: 1%) and Jewish Americans (estimate: 30%, true: 2%). And we find the same sorts of overestimates for racial and ethnic minorities, such as Native Americans (estimate: 27%, true: 1%), Asian Americans (estimate: 29%, true: 6%), and Black Americans (estimate: 41%, true: 12%).

  • A parallel pattern emerges when we look at estimates of majority groups: People tend to underestimate rather than overestimate their size relative to their actual share of the adult population. For instance, we find that people underestimate the proportion of American adults who are Christian (estimate: 58%, true: 70%) and the proportion who have at least a high school degree (estimate: 65%, true: 89%).

  • Although there is some question-by-question variability, the results from our survey show that inaccurate perceptions of group size are not limited to the types of socially charged group divisions typically explored in similar studies: race, religion, sexuality, education, and income. Americans are equally likely to misestimate the size of less widely discussed groups, such as adults who are left-handed. While respondents estimated that 34% of U.S. adults are left-handed, the real estimate lies closer to 10-12%. Similar misperceptions are found regarding the proportion of American adults who own a pet, have read a book in the past year, or reside in various cities or states. This suggests that errors in judgment are not due to the specific context surrounding a certain group.

  • Source: https://today.yougov.com/topics/politics/articles-reports/2022/03/15/americans-misestimate-small-subgroups-population

  • Another fun example for you: take a moment to guess the American city with the highest per-capita number of millionaire households. Which city is #1?

  • I wonder how many of you guessed Los Alamos, NM? Here’s a photo I made of Los Alamos on a recent trip to the @TK:science museum. Doesn’t seem much like the city that contains a lot of millionaires, does it?

  • @TODO:pic

  • Before you feel good or bad about how accurate your mental model of the world of millionaires is, notice that I used the word city. If I’d used the word metro area in my question, then the New York metro area becomes the #1 per capita concentration of millionaires, which is what I bet most of you guessed. This points out another important aspect of measurement: how you define the thing you are measuring (where are the edges? What’s in and what’s outside of those edges?) matters quite a lot.

  • Source: https://www.phoenixmi.com/learn/do-you-live-among-millionaires/ (also: https://web.archive.org/web/20190418124333/https://www.lamonitor.com/content/los-alamos-ranked-highest-wealthy-households)

  • A final example: for years, the foundation of my revenue model was selling educational services to self-employed tech people, primarily software developers and indie consultants. From my perspective at that time, most self-employed tech people I knew invested hundreds or thousands of dollars per year in their professional development (this is a form of sample bias, which I’ll discuss in the next chapter). I ran a SSR project to answer the question: how do self-employed software developers invest in their career? (Yes, maybe I should have thought to pose this question before building a services business around it… :) ) Here’s what my SSR uncovered:

  • My study participants – self-employed software developers – strongly prefer individual, asynchronous learning styles (books and courses, primarily) over group, synchronous styles (realtime courses or mentoring). IRL conferences are the primary exception to this pattern. New business opportunity for this group comes from relationships (networking and past clients), paying attention to changes in the market, and persistence in pursuing opportunity.

  • Source: https://researchnotes.philipmorganconsulting.com/philip-morgan-research-notes/past-research/self-employed-devs-career-investment/

  • In other words, the prevalence of the kind of people who would eagerly pay for the services my business sold was very low. A small niche of a huge market is not necessarily a too-small market, but having a somewhat more accurate mental model of the market I was selling to would have been advantageous.

  • The meta-method of measuring prevalence seems simple, but it contains some complexity! It might seem low-value if you’re prone to over-engineering, but it can offer significant value when it comes to enhancing the accuracy of our mental model of the world! Measuring prevalence is often a good starting point for SSR, but we don’t want this meta-method to impose a ceiling on our ability to understand detail and nuance. We’ll look at one more measurement-centric meta-method before we move on to the 2 meta-methods that surface more detail and nuance.

2: Measuring The Under-Measured#

  • @NOTE: I’m struggling to make this section clear and coherent.

  • Measuring the under-measured is basically Douglas Hubbard’s jam. I discussed Doug in the previous chapter and won’t re-introduce him here, but wanted to remind you that his work focuses on making this meta-method useful in a business context. Measuring the under-measured is similar to measuring prevalence, but it’s worth calling out separately because using this meta-method will trickle down into slightly or very different research design/methods.

  • Measuring the under-measured is different from measuring prevalence. To compare the two:

    • Measuring Prevalence

      • You’re asking “How common or uncommon is this thing?”

      • Your answer will usually be expressed as a percentage, which is a comparison of the relative size of the measured thing to the whole. For example, if your measuring prevalence question is “How many of our employees are unhappy with their manager?”, the answer will almost certainly be expressed as a percentage, or in the form of x out of y employees.

      • Percentages make the midpoint of 50% artificially important. It’s easy to see 40% of employees being unhappy as significantly different than 60% being unhappy, even though both situations are essentially the same level of severity for the business. Percentages get us thinking in terms of majorities and minorities, which can be a dangerously un-nuanced way to view the world.

    • Measuring The Under-Measured

      • You are usually trying to answer a more specific question, or a second-order consequence of something. For example:

        • How big/small is this?

        • How likely/unlikely is this?

        • How much would this cost?

        • What does this component that’s ordinarily integrated into or concealed inside a larger whole look like when pulled out of the whole?

      • Your answer will typically be a raw measurement that doesn’t need a comparison point to be useful.

  • Here are some questions where measuring the under-measured is the meta-method likely to be chosen:

    • What would it cost us if supply chain disruptions shut down our production for a day, week, month, etc?

    • Is the cost and likelihood of that potential supply chain shutdown big enough that we invest in redundancy?

    • Of the 10 steps in this process, which step is the least efficient or most failure-prone?

    • TODO MOAR examples

  • The main value of measuring the under-measured is to replace “gut feel” or intuition with data. Specifically, this looks like:

    • Increasing the precision of a measurement. Ex: Replacing “this system crashes a lot” with “this system crashes between 3 and 8 times per week.”

    • Establishing a probability range. Ex: Moving from “we’re pretty worried that our front-line employees don’t know how to deal with a phishing attempt” to “most front-line employees spot phishing attempts about 7 out of 10 times, but 30% of them spot phishing attempts only 0 or 1 out of 10 times.”

    • Reducing uncertainty in some important way. Ex: Upgrading from “we have no idea how much it would cost if our e-commerce site was down for a week” to “the immediate cost would be between \(10,000 and \)50,000 and the long-term cost via brand damage would be between \(500,000 and \)5MM.”

  • @TODO: ((62e9c50d-dcfc-4f47-89c7-5432204e7e3a))

  • Again you’ll notice that many of my examples for the measuring prevalence meta-method have a sort of “health check” or “good/bad” feel to them, while the examples I chose for measuring the under-measured are more linked to a specific business decision someone might be considering. This is consistent with how these meta-methods tend to be used in practice.

3: Toy Version Of A More Complex System#

  • When I was a child growing up in the US Virgin Islands, I decided I wanted to build a steam locomotive. I had a plastic toy car of some kind – about 10” long, with 4 wheels. I figured if I put an open can of Sterno on top of the car and lit the Sterno, it would function like a steam locomotive and the car would move forward. It had the key elements, right? Wheels and fire!1

  • As an adult I have a slightly better understanding of correlation and causation, but the idea that we can create “toys” for experimentation and learning has stuck with me, and so we’ll refer to the third meta-method as creating a “toy” version of a more complex system. Toy, by the way, is not a dismissive term. It just means we are creating an inexpensive and simplified version of the real thing; one that features lower resolution and smaller scope. But, critically!, it’s usable. It’s not a “display model”. It works in a way that (we hope) resembles the real thing. Also… you can play with it! Meaning, you can try lots of things in a low-stakes way to observe what happens and, sometimes, be surprised by what happens. Some examples will help clarify.

  • Engineered context: We have a hypothesis that after a person forms a negative or positive impression of another person based on some easily-perceptible attribute of that other person, the negative or positive impression will “overflow” into the person’s evaluation of the other person’s invisible or uncorrelated attributes. We test this hypothesis by getting a friend of ours, a fellow professor who grew up in Europe but works in America now, to interact with two groups of American students. He behaves in a warm, friendly way with one group of students, a cold, distant way with the other, and speaks with his natural heavy Belgian accent to both. We ask both groups of students to rate the professor on a variety of attributes, one of which is “how irritating is his accent”?

  • Source: https://psycnet.apa.org/record/1979-23612-001

  • Digital toy: We have seen that spaced repetition can help people memorize or learn things more effectively. We have a hunch that this technique can “scale up” to very complex topics, but we’re not sure. Nobody has really tested the technique’s ability to scale up. So, we build a software tool that uses the approach to explain quantum mechanics – a notoriously difficult topic – and we invite people from our social graph to use the software and, as part of using it, the software implements the spaced repetition technique and measures the user’s ability to recall and understand the material.

  • Engineered context: This paper’s abstract does a good job of summarizing:

    • Discrimination has persisted in our society despite steady improvements in explicit attitudes toward marginalized social groups. The most common explanation for this apparent paradox is that due to implicit biases, most individuals behave in slightly discriminatory ways outside of their own awareness (the dispersed discrimination account). Another explanation holds that a numerical minority of individuals who are moderately or highly biased are responsible for most observed discriminatory behaviors (the concentrated discrimination account). We tested these 2 accounts against each other in a series of studies at a large, public university (total N = 16,600). In 4 large-scale surveys, students from marginalized groups reported that they generally felt welcome and respected on campus (albeit less so than nonmarginalized students) and that a numerical minority of their peers (around 20%) engage in subtle or explicit forms of discrimination. In 5 field experiments with 8 different samples, we manipulated the social group membership of trained confederates and measured the behaviors of naïve bystanders.

    • Source: https://psycnet.apa.org/record/2020-75139-001

  • Digital toy: We’d like to understand how people deploy their attention to ads in a social media environment. We know that Facebook et. al. won’t give us the access we need, and we also know there’s evidence that if you can track where someone’s eyes are gazing, you have a reasonable proxy measurement for where their attention is focused. So we build a smartphone app that simulates the Facebook app and also use the front-facing camera on the phone to build eye-tracking technology into the app. We recruit a bunch of people to participate in our experiments, we inform them they are being tracked, we promise a bunch of privacy and anonymity for their participation, and we use our digital toy to run a bunch of experiments to learn about how people deploy their attention to ads in a social media environment.

  • Source: https://www.amplifiedintelligence.com.au/attention-plan/

  • Engineered context: We’d like to understand how people perceive differences between products, even if those products are identical. From the paper, published in 1978, that describes the experiment design:

  • The experiment took place in a large bargain store in a shopping center. Card tables were set up with signs attached to the tables that read, “Institute for Social Research – Consumer Evaluation Survey – Which is the Best Quality?” Four identical nylon stocking pantyhose (Agilon (r), cinnamon color) were hung from racks on the tables, approximately three feet (.91m) apart, with the top of the stockings just below eye level.

    Subjects were passersby who voluntarily approached the stocking display and made a judgment as to which one they thought was the best quality. A total of 52 people participated in the study, 50 of whom were female. Both the scent and the position of the stockings were counterbalanced. When subjects approached the display, one of the two male experimenters asked which stocking they thought was the best quality. When the choice was made, the experimenter recorded it and said, “Could you tell me why you chose that one?” After responding, subjects were asked if there were any other reasons for their choice. All reasons were recorded.

  • I find this experiment so fascinating. The results:

  • The scents had no effect on stocking choice, but the position of the stockings on the table had a substantial effect. The further their position was to the right, the more likely the stockings were to be chosen as being of the best quality. Stockings in position A, the left-most position, were chosen by 12% of the subjects, stockings in position B by 17% of the subjects, in position C by 31% of subjects, and in position D by 40% of the subjects (x2 = 10.62, 3 df, p < .025). Despite the fact that stockings were identical, few subjects appeared to experience difficulty in making a choice. Only two subjects voiced a suspicion that the stockings were identical.

  • Digital toy: We’d like to test our hypothesis that biased examples of success will skew people’s judgement, even when those people have money on the line. So we ask study participants to make a small bet on whether a startup company founded by a college graduate or a college dropout is more likely to become a billion-dollar “unicorn”. Before they make their bet, we show participants examples of successful startups; 1/3rd are biased towards college dropout founders, 1/3rd are biased away from college dropout founders, and 1/3rd are un-biased. We tell the study participants that they have seen biased data. Then, we ask them to place small bets on a group of fictitious startups, some with dropout founders, and some with founders who completed a college degree. We create this whole experience for participants using something like a sophisticated digital survey tool, and we recruit paid participants for the study using Amazon Mechanical Turk.

  • Study findings, excerpted from the abstract:

  • Despite acknowledging biases in the examples, participants’ decisions were very strongly influenced by them. People shown dropout founders were 55 percentage points more likely to bet on a dropout-founded company than people who were shown graduate founders. Most reported medium to high confidence in their bets, and many wrote causal explanations justifying their decision. In light of recent concerns about false information, our findings demonstrate how true but biased information can strongly alter beliefs and decisions.

  • Source: https://journal.sjdm.org/21/210225/jdm210225.pdf

  • All these examples have several things in common:

    • They are a simplified (reduced resolution and reduced scope) version of a real system we are interested in learning about. Think about how, especially before powerful computing replaced them, wind tunnels were used to test a scale model of a real airplane.

    • They are generally single-hypothesis-driven, meaning they are attempting to resolve a narrow question in the form: “I think X will produce or fail to produce output/effect Y.” The toy version of Facebook is a pretty sophisticated model and doesn’t rely so much on a single hypothesis because it accommodates playing with a range of hypotheses, but the other examples are closer to this single-hypothesis-driven approach.

  • Toys don’t have to be built out of atoms or bits; they can be a system of rules (perhaps combined with small amounts of real money, etc.) as has been done with prospect theory research (“how much will people risk in this hypothetical situation?”) or multi-armed bandit experiments. In other words, a toy can be an “engineered context” that you invite real humans into to see what happens. Lots of social science research will test a hypothesis by putting students or mechanical turk users in toy situations to see if they behave according to the hypothesis or not.

  • The main limitation of many toy experiments is that they happen apart from real life. Posed instead as a challenging question: how well does the behavior of college students – who have been informed they are participating in research – sitting in a classroom translate to other more “real” situations? Andy Mutaschack’s spaced repetition digital toy is an exception here because of how integrated with “real life” it can be. So not all digital toy experiments happen in artificial, controlled environments, but the fact that many do is a notable feature and potential limitation of this meta-method.

4: Structured Observation#

  • The final meta-method is, in many ways, the best suited to SSR. That’s not to say the other meta-methods won’t be used in SSR. It’s common to mix methods, but if were allowed only one meta-method for any SSR project, then structured observation would be our “desert island meta-method”.

  • Structured observation seeks to comprehend and make sense of the behavior, nuance, context, and variation within a phenomenon, system, or form of thinking. Structured observation is simply: being as rigorous as possible about using observation to identify patterns. For example:

    • Instead of relying on only your experience working with clients to learn about X, which is a somewhat random, stochastic sampling of the world, you set up interviews where you specifically recruit interviewees that you hope will balance out the bias inherent in your client experience. If your client experience is mostly with small companies, for example, you might try to correct this bias by interviewing people at larger companies. This is adding rigor to your sampling, or the way you find people to learn from.

    • Or instead of replying only on experience gained from the scopes of work you are typically involved in, you interview people both upstream and downstream from your area of involvement to better understand the full project context and lifecycle.

  • As we’ll discuss later, you can’t eliminate bias from the kind of research we do2, but you can work to gain a mostly balanced, objective understanding of many kinds of phenomena using structured observation. This is what “being as rigorous as possible” means.

  • The observation you base your research on can be:

    • Observing real people directly in their real environment. This is known as ethnographic research.

    • Or, your observation can be partially removed from those real environments. Often when you interview someone you are asking them to verbally convey what happened in their real environment in the past to you during the interview. This is not direct observation, of course, but it’s often as close as we can get and it can be a far richer, more detailed, and nuanced way of observing than something like a survey.

  • One of the big questions that touches on every kind of research and every meta-method is this: “when are we sure?” In other words, when have we collected enough data in the right way and analyzed it properly; when is there enough of that kind of rigor and enough data to be able to say “we’re sure about X”? Or said even more simply, what does it take to reach certainty?

  • This is a bedeviling question because the questions that are easy to answer with complete certainty (ex: at sea level, what is the boiling point of water?) have already been resolved, leaving only questions that cannot be answered with complete certainty. Additionally, this question is bedeviling because:

    • Humanity’s first hundred years of experience with science got us expecting that the usual outputs of science are complete certainty and easy-to-use technology. So research that merely reduces uncertainty rather than achieving complete certainty seems flawed somehow. It’s not; more often it means we are operating in a domain ruled by probability rather than a deterministic domain, and reduced uncertainty is the best we can do in such domains. Most domains where we ask business strategy questions are probabilistic rather than deterministic.

    • The basic mental model we use when thinking about large-scale academic or scientific research is: more data leads to more certainty. Additionally, there is both a way of using statistical methods to measure certainty and a commonly accepted threshold of what constitutes enough certainty:

      • The method involves calculating how likely it is that your experimental result would occur by random chance. This is known as the p value, and a p value very close to zero means “the effect we observed in this experiment almost certainly did not happen by chance; this experiment is almost certainly produces the effect we observed”.

      • The commonly accepted threshold for what is known as a statistically significant p-value is less than 0.05 (a 5% probability that random chance would produce the same effect your experiment did).

    • Again, our (very much over-simplified) model of large-scale research is that more data leads to more certainty, like pouring more water into a glass leads to a more full glass. This might be true in an overall general sense, but there are plenty of specific instances where it is not true. Imagine a researcher has run an experiment that involves gathering 1,000 data points. The p-value at this point is 0.06. Very close to the threshold of statistical significance, right? (That’s not a trick question; it is indeed very close to that threshold.) Here’s where our intuition leads us astray, though. Most of us, myself included, before I learned what I’m about to share with you, assume that if our researcher gathers more data, say another 250 or 500 data points, their p-value will march steadily towards 0.05 and then perhaps onward to 0.04. and they will have achieved a statistically significant result. In reality, there is no guarantee this is what will happen! (@NOTE: another potential example: you want to predict the next month’s weather. You have gathered data from Jan - September. Of course, the trendline is a somewhat steady increase. On the basis of ONLY that data, what would you predict for October?) It’s a somewhat cartoonish example, but imagine that our research question is about soil composition, and our experimental design involves taking core samples of the soil at 1km intervals in a straight line starting at some point and headed in some direction. If this line runs through a lake or river, what do you think will happen to our data? What if we had a p-value of 0.06 right before we started sampling the “soil” that’s actually a river or lake? What if this new data took our p-value to 0.07? Of course in this extremely obvious example, we’d throw out the data that was sampled from a waterway rather than from solid ground. But what if the differences in the data were not so obvious? What if it was data obtained using the exact same sampling, measurement, and experimental methods but had the effect of pushing our p-value away from statistical significance? If we are curious, humble, dedicated researchers, we accept what our data is telling us, get over the disappointment of not reaching statistical significance, reorient, and move forward again with better methods next time. However, most scientific and academic researchers are good people laboring within a system with flawed incentives that might elicit different behavior. Anyway, the point here is that gathering more data for an almost-statistically-significant experiment does not automatically push the result closer to statistical significance. More data does not always create more certainty, even in the highly rigorous world of academic and scientific research.

  • With small sample-size research, we use a different way of deciding when we have enough data. We’re playing a different game with different rules and a different endgame: we don’t even fantasize that complete certainty – or statistical significance – is a possible outcome of small-scale research. We embrace that a meaningful reduction in uncertainty is a win for us. Quite often, we are developing a theory or model (an understanding or explanation of how something works) rather than testing a hypothesis (a testable guess about the relationship between two variables). Because of these fundamental differences in approach, we use a different way of defining “enough”.

  • The notion of saturation determines when a small-scale experimenter who is using qualitative methods has gathered enough data. There are minor differences between the 4 main ways3 that saturation is understood by expert researchers, but for our purposes, we can think of saturation as the point at which we start seeing diminishing returns on additional data collection. With the structured observation meta-method, our data collection often (though not exclusively) takes place through interviews, so saturation means that we’re learning less from each new interview. Learning less might look like discovering fewer new themes (ex: if we’re trying to understand why our clients build software rather than buying an off-the-shelf option, themes might be stated reasons they build/buy, situations that prompt building rather than buying, business goals that prompt building vs. buying, and so forth.) Or learning less might look like uncovering less nuance, variation, and detail within an established set of themes that we’re exploring more deeply. The feeling of reaching saturation is the feeling of almost being able to predict what an interviewee will say in response to your question. The result of saturation is that you can move on to the next phase of your research, which might be developing a theory or model that attempts to explain how the thing you are investigating works, or it might be revising/contracting/expanding your question, or it might be using different meta-methods to generate qualitative data that makes your theory/model more rich and robust (or disproves it!).

    • LATER Research [[Research/methodology/grounded theory]] themes :LOGBOOK: CLOCK: [2022-09-10 Sat 08:01:26]–[2022-09-10 Sat 08:01:29] => 00:00:03 :END:

  • The academic research that has studied the idea of saturation4 suggests that when the group of people you are interviewing are sufficiently homogeneous, you will usually reach saturation at between 6 to 12 interviews. “Sufficiently homogeneous” means that your interviewees are similar enough in a way that matters to your research, even if they are quite different in other ways. @TODO example.

    • TODO Research into homogeneity and [[Research/methodology/saturation]]

  • The structured observation meta-method is useful in a wide range of situations, but it’s especially useful in the following:

    • You’re somewhat of an outsider to the world you are researching.

    • You don’t yet have a good mental model of how the system you are investigating works.

    • The system you are investigating is new, emergent, or disordered. (While these qualities will make structured observation more difficult, these qualities will make the other meta-methods near-impossible, which is why structured observation is especially well-suited to this context.)


I know where you’re expecting this story to go. I had already found a can of gasoline, a box of matches, and lit the yard on fire years earlier, so I’d gotten the burning-things-down impulse out of my system… mostly. I actually don’t remember whether I got ahold of a can of Sterno or not. I do clearly remember my hypothesis that simply placing it atop the toy car and lighting the Sterno would be enough to propel the car forward. Given my history with fire, if there were cans of Sterno in the house, they probably were stored well out of my reach.


If you ever encounter someone using relatively small sample sizes or other SSR methods who also claims their findings are completely unbiased, please downgrade your confidence both in their findings and their research ability.


@TODO summarize and link to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993836/