Does AI make you stupid?

There is a study making the rounds and raising some concern about the use of AI chatbots. Headlines such as Cognitive scientists found using AI for just 10 minutes impairs brain performance raise a rather alarming concern that chatbots can cause harm. Indeed, in the study itself, the authors conclude:

…we find that AI assistance improves immediate performance, but it comes at a heavy cognitive cost: after just ∼10 minutes of AI-assisted problem-solving, people who lost access to the AI performed worse and gave up more frequently than those who never used it. These findings raise urgent questions about the cumulative effects of daily AI use on human persistence and reasoning. We caution that if such effects accumulate with sustained AI use, current AI systems — optimized only for short-term helpfulness — risk eroding the very human capabilities they are meant to support.

The study, AI Assistance Reduces Persistence and Hurts Independent Performance, measures the performance of subjects in solving math fraction problems. Some participants had access to a chatbot (AI-assisted group), while some did not (control group). Later in the test, the chatbot was removed from the AI-assisted group. The participants’ ability to solve the problems was assessed, and it was observed that when the chatbot was removed from the AI-assisted group, their ability to solve the problems decreased, and the tendency to skip problems increased. The provocative part of the data is that the AI-assisted group’s performance did not return to that of the control group – instead, the (formerly) AI-assisted group’s solve rate was worse than the control group’s, and the skip rate was higher.

At face value – and indeed what the study’s authors suggest – is that “AI assistance reduces persistence and impairs independent performance.” This is quite a claim, and while there is no reason to dispute the data collected, the authors fail to discuss or propose any mechanism by which AI assistance caused the reduced persistence or independent performance. While a causal mechanism is strictly not required, such a mechanism greatly boosts findings from correlational to causative. It is a factor in criteria such as the Bradford Hill criteria for establishing whether an association between a presumed cause and effect is causal or merely correlative. The mechanism provides the Why A caused B, and allows findings to be extrapolated to conditions outside those of the study (which may be controlled and somewhat artificial). In the absence of any identified mechanism, it is necessary to carefully and critically evaluate the data in light of the study’s design and the behavior of subjects.

Dissection

Let’s first take a look at the study. It uses the term learning to label the part of the experiment where the AI-assisted group had the chatbot available and test for the part where the chatbot was removed. This labeling may be somewhat provocative, as it is unlikely that the subjects were actually “learning” during that phase. (If so, then there are a host of other questions regarding the study.) It is more probable that the subjects had learned how to do math fraction problems as part of their general education.

A less provocative term for the phase might be baselining, preparation, or priming. The goal was presumably not to see how the subjects learned, but to establish a baseline of performance. Priming may describe a phenomenon occurring during that phase of the test – the subjects entered a cognitive state for solving fraction problems. That state then primes them for how they behave in the next phase of the study. In the AI-assisted case, the priming results (for some subjects) a state where they ask the chatbot to solve the problem, in others to provide hints, and for others to slog through without assistance. The non-AI cohort had only the last option.

The study also uses self-reporting to determine how the AI-assisted group used the chatbot. Presumably, the interaction with the chatbot was (or could have been) available to the researchers. In such cases, reviewing and scoring the interaction for the level and type of assistance would provide more solid data than self-reporting, which relies on memory and may be tainted by personal biases. Understanding what the interaction could be a key factor in understanding how the cognitive processes of the AI-assisted cohort differed from the control cohort.

A more detailed description of the “fraction problems” would also be illuminating – the study provides an example as part of a diagram showing a multiplication problem. This is one of the easier types of fraction problems. Fractional division problems can be reduced to fraction multiplication problems – if the subject remembers that. (The author of this post had to think a bit to recall that nugget of information…the author is a firm believer in the use of calculators). Fractional addition and subtraction problems can pose more challenging, and once again require the subject to recall the method for solving these types of problems.

There is also the question of the cognitive processes that occurred when the chatbot was removed. Was this a case of violation of expectation? When one comes to expect something, and then something else happens, this causes some amount of confusion and ~~dissonance~~. Such a state would likely affect performance.

Priming and similar phenomena are well-known aspects of cognitive science that many of us can see in our daily lives. It’s the foundation of adages like You never forget how to ride a bicycle or the need to brush up on [something]. Skills that are learned are not always immediately available at the same level as they were when first acquired. They may seem completely “forgotten”, when they are better thought of as dormant and imperfectly recalled (though over time some amount of actual “forgetting” may occur). However, the initial skill level can be reacquired – or exceeded – by performing the skill. This generally happens much faster than initial acquisition of the skill – a testament to its latent presence – and the skill may be actively retained to a greater degree afterwards as it is associated with new contexts (in connectionist theory, there are more connections.)

Once you learn to ride a bicycle you never forget

There are some interesting findings from this study: First, “Participants who used AI for hints showed no significant impairments relative to control.” The impaired subjects were those who asked the chatbot to solve the fraction problem and provide the answer. While we can only guess why this happened, one potential reason is that the AI-assisted group was reminded by the AI of the process for solving the problem. Having “remembered” how to solve certain types of fraction problems, the subjects would be better-equipped to tackle the test phase problems. In an educational context, this correlation suggests that chatbots in an educational environment should be constrained to assisting students in solving problems (teaching them how to solve a problem) rather than solving the problem itself. This is probably not a great surprise to most teachers and reminds us of the adage Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime.

The second interesting thing about this study is the correlation of the AI chatbot to superior performance when the AI is present. In a work context, productivity is a major concern. To the extent that a chatbot can enable an employee to be more productive, it is a great asset. The question that must be asked is whether there is permanent harm from using the chatbot.

The study regrettably suggests that there may be permanent harm, but does not provide any causal mechanism by which harm occurs. Without that, there is no way to prove further or counter the assertion.

Reflections

The study does not provide any indication of the cognitive processes involved and whether the noted effect of “harm” is persistent, permanent, or transient. “Common sense” suggests that the effect is transient in nature and directly related to a cognitive state of having an aid (the chatbot) perform work rather than doing it oneself. Some amount of dissonance resulting from the unexpected withdrawal of the chatbot may also play into the performance deficit. Given the variation in results during the “learning” phase, three test problems are not sufficient to see any trend in the “test” phase. However, it would be interesting to see if both cohorts’ performance eventually converged, how long that took, and also to assess the subjects’ cognitive processes.

Regardless, it is unlikely that the knowledge subjects had about fraction problems before the study was “erased” by the study. It is likely that, given time to “reset”, their unaided performance would mirror that of the non-AI cohort. (It is also very likely that the non-AI cohort would perform better on a second round, even if a few days or a week elapsed before the retest, due to the lasting effects of relearning and refreshing the skill.)

One suggestion that has been made by those who do use AI tools is that it is valuable to periodically do a task without those tools. This generally helps you to stay grounded in what the tool is actually doing and also keeps your ability to do the task at a maintenance level. While someone may still have a book-knowledge level of knowing what a tool does when using it, there is a deeper experiential knowledge gained by performing the task. This deeper knowledge conveys a greater appreciation for where and how the steps to completing a task must be altered in different contexts, and how the tool could be applied to different domains with suitable adaptations. In an extreme, it differentiates mere users of a tool from experts who both use and have a deep understanding of the tool.

Having something more than “book knowledge” is particularly valuable at a time when AI tools are relatively new and are still being refined. They may hallucinate or otherwise behave aberrantly, and it is valuable to be able to spot those errors. Eventually, AI tools will mature – and our understanding of how we should use AI in general will evolve – to a point where using AI tools will be like using a calculator today – a once-controversial tool that most find indispensable today for doing moderately complex math. But we’re not quite there yet.

Dissection

Once you learn to ride a bicycle you never forget

Reflections

Comments