Podcast editing, algorithmic decision making, and what it means to be human
Why and how humans and algorithms can be complementary
During our most recent Across The Lines team sync, my co-host Jay and I had an interesting debate about whether we should continue editing audio in-house or outsource it to someone on Fiverr.
For context, I still lovingly edit every single episode of our podcast (though Descript has made post-production SO much easier), and it takes me between 2-2.5 hours to edit each one. Though editing is a significant time investment, I was adamant that we weren’t yet ready to outsource it. My rationale at the time was something along the lines of a) there are so many bespoke and situationally dependent adjustments I make and b) there was an “x-factor” in the way I edited that was difficult to codify.
I’ve been reflecting on this conversation, particularly how the difficulty of explaining this “x-factor” was mirrored in the impossibility of defining it. As someone whose first instinct is to try and create structure out of ambiguity, I was struggling to explain the richness in subtleties of intuition and instinct. How could I systematically document the decision criteria I used when snipping out a segment in a guest’s response or deciding to keep a filler word instead of removing it? And how could a contractor on Fiverr, or an audio editing algorithm, accurately replicate my decision-making if there could be decision criteria I wasn’t even consciously aware of?
Moreover, widening the aperture a bit, what should be said of creativity and the “human touch” if it could easily be turned into an infinitely scalable set of Boolean operators?
This train of thought transported me back to Penn, where regularly nerding out over questions of cognition and decision-making was a welcome and necessary part of my day-to-day. Fast-forwarding to the present day, I now frequently encounter prescriptive applications of these questions in the form of AI-enabled tools (e.g. AllyO for recruiting, Gong.io for sales). These tools illuminate how humans and algorithms could work together, but I’m more curious about why and in what circumstances they should coexist. As such, I wanted to revisit some of my favorite research papers that shed light on the topic.
Setting the stage
A few questions around which I anchored my trip down memory lane:
Where do humans excel, and where do algorithms excel? Conversely, what are the shortcomings each faces?
Why should humans and algorithms work together? In what circumstances?
What are barriers to humans and algorithms working together?
And a couple of things I wanted to note before diving in:
I use algorithm, model, and AI interchangeably (and admittedly loosely) — essentially, I’m referring to anything that generates a prediction given input variables (can be as simple as linear regression)
The cutting edge of AI (e.g. GPT-3 ) is obviously far more complex than a simple linear regression. However, the atomic unit in both cases is still prediction and forecasting, which subsequently enables more sophisticated relevance and recommendation
Back to the basics — foundational research in human and algorithmic decision-making
“It is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility” — Moravec, 1988
As early as the 1950s, decades before Moravec’s paradox (which highlights that reasoning requires far more computational power than basic motor skills) was coined, and even before the advent of computers, psychology researchers noticed that statistical appraisal was often more accurate than clinical judgment (Meehl, 1954). That is, a simple linear model often beats out the gut sense of experts. Mehl’s conclusion has stood up well for more than half a century — a recent meta-analysis of 76 studies found a 13% increase in accuracy using statistical versus clinical methods (Ægisdóttir et al., 2006).
Several studies in the late 1900s illuminated where humans excel and algorithms falter, and vice versa. In a study of judges dealing with multidimensional information, Einhorn (1974) found that humans a) tend to cluster variables in the same way when identifying and organizing cues, and b) are good at evaluating attributes subjectively that are difficult to measure objectively. Similarly, Dawes (1979) found that even non-optimal models (i.e. variables were equally weighted or randomly weighted) performed better than human intuition. He argued that linear models are superior to human judgment because people are much better at evaluating what variables to put in the model than they are at using data to make a prediction.
In short, humans struggle with …
Integrating information, i.e. combining and assigning appropriate weight to disparate data inputs to make a prediction. Models, on the other hand, are able to objectively weight input variables and crunch numbers efficiently
Inconsistent decision rules — even if we’re given the same data, we may make different decisions due to irrelevant factors. Our preferences are often fickle, and our judgment can be affected by order of information, choice overload, biases etc. Models can sidestep these contextual factors that impair human judgment
But humans excel at …
Selecting and coding which information to incorporate and which hypotheses to prioritize. Models can make inferences from structured hypotheses but lack the intuition to identify which hypothesis to test in the first place
Adapting our decision-making to account for new cues and environments. Models often don’t perform as required or expected when they encounter novel situations and edge cases
Paradoxically, what is easy for humans is difficult for algorithms, and what is difficult for humans seems rather easy for algorithms.
AI replacing humans is quite a common and sensationalized headline nowadays, but I’m in the camp that the two are more complementary than competitive — there’s an opportunity for humans and algorithms to not only complement but also augment each other in dimensions where one excels and the other has shortcomings.
This might initially seem like a Pollyanna-ish perspective, but there’s a slew of research showing how a combination of algorithmic and human decision-making trumps either standalone. Here are a few that come to mind:
Einhorn (1972) demonstrated this in a study with physicians who coded biopsies of patients with Hodgkin’s disease. Physicians’ individual ratings were poor at predicting survival rate. However, the variables that the physicians chose to code did predict survival time when optimal weights were determined with a multiple regression model. Physicians knew what information to consider, and models helped integrate this information consistently into accurate predictions
Blattberg and Hoch (1990) showed that, in five different business forecasting situations, a simple combination of a model and a manager’s judgment weighted equally outperformed either decision method alone. Moreover, managers picked up almost 25% of the variance left unexplained by models, illuminating the human ability to identify and account for edge cases. By integration decision inputs, we can compensate for weaknesses of people beings too flexible and models being too consistent
Graefe et al. (2014) applied the “better together” framework to election results. They examined predictions of six U.S. Presidential elections from 1992 through 2012 and were able to improve accuracy by combining forecasts across four forecasting methods: poll projections, expert judgment, quantitative models, and the Iowa Electronic Markets. Hybrid methods that incorporated both human and algorithmic input yielded error reductions of between 16% and 59% and were on average more accurate than each component method
Increasing human amenability to algorithms
Humans are inherently irrational creatures. So unsurprisingly, despite empirical evidence that a) algorithms are often more accurate than humans in forecasting and prediction and b) algorithms can aid human decision making, people are often averse to using algorithms and would instead opt for less accurate human decisions.
Fildes and Goodwin (2007) conducted a survey of 149 professional forecasters from a wide variety of domains (e.g., cosmetics, banking, and manufacturing) and found that many professionals either did not use algorithms in their forecasting process or failed to give them sufficient weight. These forecasters preferred to rely on their perceived intuition and expertise instead.
Attitudes toward algorithmic versus human error highlight preference for human decision-making as well, even when it’s faulty. Dietvorst et al. (2015) found that people were less tolerant of algorithms’ (smaller) mistakes than of humans’ (larger) mistakes, a phenomenon they dubbed "algorithm aversion”. People give less leniency to algorithms and will avoid any algorithm that they recognize to be imperfect, even when it is less imperfect than its human counterpart.
Interestingly, Dietvorst et al. also found that participants were considerably more likely to choose to use an imperfect algorithm when they could modify its forecasts, helping them perform better as a result. The implication for me here is that being able to (or at least perceiving that you’re able to) control the inputs of an algorithm goes a long way in making us more amenable to working alongside it. This principle would explain why LinkedIn allows me to dismiss recommendations for “people I may know”, or why Instagram gives me the option to respond with “not interested” to recommended posts that appear on the Explore tab.
Algorithms, especially in their simplest form, implore us to think in a multidisciplinary way about what it means to be human.
Though algorithms are able to transcend human capabilities in multiple domains, the fallibility of human intuition is precisely (and paradoxically) what best illuminates the essence of our humanity. To be human is to have bounded rationality and to frequently err … but it is also to create, to imagine, and to adapt — things that even the most advanced AI is bounded by.
If you enjoyed this piece, please subscribe below for more musings about behavioral economics, culture, tech, and philosophy:
I’ve also written about decision making through the lens of collective individual impact:
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
Moravec, H. (1988). Mind children: The future of robot and human intelligence. Harvard University Press.
Ægisdóttir, Stefanía, et al. "The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction." The Counseling Psychologist 34.3 (2006): 341-382.
Einhorn, Hillel J. "Expert judgment: Some necessary conditions and an example." Journal of applied psychology 59.5 (1974): 562.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American psychologist, 34(7), 571.
Einhorn, H. J. (1972). Expert measurement and mechanical combination. Organizational behavior and human performance, 7(1), 86-106.
Blattberg, R. C., & Hoch, S. J. (2010). Database models and managerial intuition: 50% model+ 50% manager. In Perspectives On Promotion And Database Marketing: The Collected Works of Robert C Blattberg (pp. 215-227).
Graefe, A., Armstrong, J. S., Jones Jr, R. J., & Cuzán, A. G. (2014). Combining forecasts: An application to elections. International Journal of Forecasting, 30(1), 43-54.
Fildes, R., & Goodwin, P. (2007). Against your better judgment? How organizations can improve their use of management judgment in forecasting. Interfaces, 37(6), 570-576.
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114.