Chatbot'ing at Your Own Peril
Large language models (LLMs) and their consequences for learning.
"Responsibility to yourself means refusing to let others do your thinking, talking, and naming for you; it means learning to respect and use your own brains and instincts; hence, grappling with hard work."
Adrienne Rich
The New Normal in the Classroom
Since the release of ChatGPT, large language models (LLMs) — and generative AI (GenAI) more broadly a couple of years ago —, the world has had to grapple with an avalanche of lightning-fast changes, from people using these technologies to write their cover letters when applying for a job, to social media users asking them to write their content for them. Students and professionals alike have used — and are using — these models for a wide range of purposes, from learning new concepts to solving practical problems; the pervasiveness of this new technology is hard to overstate, and thus its daily use has become the new reality. But to say that we were not prepared for the ripple effects of such disruptive technology would be a severe understatement.
There are many areas that could be argued to be rife with ethical and other considerations when using LLMs, but one that I am personally interested in is Education. The students’ learning is, effectively, a key responsibility of those doing the teaching, and most young students — kids, in essence — do not have the maturity required to understand the implications of their ongoing formation; to understand, for example, that math is not just about arithmetic, geometric, or otherwise rules, but about abstract thinking, or that science is not just about conducting experiments with fancy equipment and sometimes uncommon chemicals, but about critical and logical thinking about the world we live in.
The job of the teacher in this sense is not just one of knowledge transfer, but one of guidance and formation in the thinking space. A good teacher does not simply bestow knowledge in the same manner a book does; it also instills a sense of curiousity, it challenges the student to come up with the answers, and ensures the student has actually learnt. From this perspective, for better or for worse, students are at the mercy of their tutors, insofar as they have not yet learnt their own ways of developing the required skills to think critically about the content and ideas their tutors lecture about.
Students on the other hand are required to have at least the basic disposition to attend the relevant lectures, pay attention, and do their homeworks, so they can learn and reinforce their learning. But students everywhere are already using LLMs to do their homeworks, write their essays, and complete their assignments, especially considering the high levels of accuracy and knowledge displayed by LLMs. In response to this, many educational institutions have simply opted for banning the use of these chatbots, but such approaches appear more knee-jerked responses than carefully balanced policy: how could they ever effectively ban the use of LLMs when students get to take many assignments home and there are no effective, fail-proof ways to distinguish AI-generated text from human-generated text?
Getting the right balance for policy also appears critical: parents and guardians will want to know how the educational institution of their choice navigates newer technology and its impact on the learning of their children. So, what should be the position of primary and secondary schools, as well as universities, regarding the usage of LLMs? Should students (and people learning in general) have access to LLMs while forming themselves or exploring new knowledge domains? We are still in early days of conducting experiments and carrying out more formal studies, but it appears as if the answer is converging into something rather logical for technology in general: it will depend on how people, student or otherwise, use it.
Responsible Usage is Key
A Microsoft Research team has recently published a small study with a small sample of knowledge workers (think scientists, engineers, lawyers, etc) concluding that “higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking” (emphasis added). The study further finds that
[…] GenAI tools reduce the perceived effort of critical thinking while also encouraging over-reliance on AI, with confidence in the tool often diminishing independent problem-solving. As workers shift from task execution to AI oversight, they trade hands-on engagement for the challenge of verifying and editing AI outputs, revealing both the efficiency gains and the risks of diminished critical reflection.
The key takeaway point from this study is, effectively, the impact on critical thinking based on how much the professional offloads her thinking to the LLM and relies on its output, as opposed to taking time to parse and think about the information output by the model. Interestingly, this affected all areas of Bloom’s pyramid (taxonomy) of learning:
In the majority of examples, knowledge workers perceive decreased effort for cognitive activities associated with critical thinking when using GenAI compared to not using one — examples that were reported as “much less effort” or “less effort” comprise 72% in Knowledge, 79% in Comprehension, 69% in Application, 72% in Analysis, 76% in Synthesis, and 55% in Evaluation dataset. Moreover, knowledge workers tend to perceive that GenAI reduces the effort for cognitive activities associated with critical thinking when they have greater confidence in AI doing the tasks and possess higher overall trust in GenAI.
While the aforementioned study focuses on professionals, the effects on students are somewhat similar in essence. Another recent study shows that usage of LLMs amongst students can similarly have positive or negative effects depending on how the students engage with the model:
Students who use LLMs as personal tutors by conversing about the topic and asking for explanations benefit from usage. However, learning is impaired for students who excessively rely on LLMs to solve practice exercises for them and thus do not invest sufficient own mental effort. Those who never used LLMs before are particularly prone to such adverse behavior.
Interestingly, the latter study also finds that using LLMs lead to students overestimating their own learning progress, and that AI can also harm learning when individuals become too reliant on the technology and reduce their engagement with the focal problem.
Notwithstanding the important limitations of the aforementioned studies (and indeed there are many we should take notice of, starting with the sample size and issues with self-reports, for example), the conclusions gathered so far are rather logical in my view and are very much akin to having an on-demand tutor or scholarly friend who can help you with homework or with things you do not know. Should you ask him to help you with difficult concepts, explain things to you, and interact in a constructive manner, you will definitely learn about the subject matter at hand. Should you ask him to do your homework or your work for you so you can pretend you did it or move on with life, what is that you learnt? Similarly, if you engage in a debate with your tutor about a topic you are prone to learn much more than if you just copy-pasted an argument verbatim from a lecture so you could get a passing grade, regardless of whether you understood the argument.
There seems to be at least two main axes determining how much learning happens — or, alternatively, how much critical thinking is fostered or hindered — when students interact with LLMs. I tried to summarise these two axes, and the quadrants that they create, in the picture below. For critical thinking to be fostered and learning to be maximised, students and users in general need to (i) engage with LLMs with skeptical eyes, wanting to learn, and (ii) ask for references and explanations for the topics they are engaging with, therefore being in the top right quadrant. There is little learning — and research now shows critical thinking is dulled as well — when people move towards the lower left quadrant, blindly trusting (sometimes wrong) outputs that they just copy-paste wherever they need them, be this as part of their assignments or part of their professional outputs at work, and when they simply offload their work to the LLMs so they can use the time for something else. From this perspective the diagram appears a useful first conceptual summary to organise our thoughts about using LLMs in an educational setting.
The Bases of Good Policy
What can be done at the policy level, though? Before getting into this, I believe there is value in addressing the complexity of the landscape. Some salient points include but are not limited to:
students come with different levels of disposition for learning (and cheating) due to natural and social factors, and therefore there will never be a one-size-fits-all when it comes to policy;
no matter the academic prowess and motivation of a student, time is limited and therefore there will always be a trade-off between learning as a standalone process, and creating the required outputs for academic purposes (i.e., turning over assays or completing homework) while also managing a normal life. Therefore, academic workload needs to be balanced accordingly to ensure it does not create incentives for students to take shortcuts that lead towards impaired learning or dulled critical thinking — basically, to not motivate the student to cheat using LLMs just to cope with the workload;
the outcome of an educational process is in much larger part a consequence of the right environment at home rather than simply having the best tutors with the fanciest tools, so educational institutions should give parents the relevant information about LLMs in Education to ensure appropriate learning, and encourage parents to enforce some basic rules for responsible usage;
there are innumerable forms to bypass tech bans, and educational institutions do not have the capacity to enforce these policies outside their campuses, so we should perish the thought of a GenAI ban effective immediately. Furthermore, banning LLMs is throwing away the baby with the bathwater — as shown above, there are benefits to LLMs when used responsibly.
The right policies regarding LLMs in educational setting therefore should be ones where:
both tutors and students are appropriately aware of the issues pervading the usage of these models, and are aware of the best practices surrounding use of these models;
tutors are connected with peers and supported by their institutions to create an environment where workload is manageable by both students and tutors alike, to minimise reliance on these models just to be able to get a break;
tutors are encouraged and supported to create evaluation tools and exams where the student is incapable of cheating using LLMs (e.g. coding or reading/writing assignment with offline computer);
there is a whole-institution approach to educating parents on the appropriate ways to encourage and supervise learning while using LLMs;
Carte blanche when using LLMs is policy as bad as total bans, and this should be a first thing to understand. There are, after all, benefits of using LLMs in education if one takes care to create an environment where the student can learn the right way, and this is a shared job between tutors and parents. The Microsoft Research study notes, for example, that “early studies suggest that AI-generated feedback can improve writing quality and logical structure, especially for lower-performing students and less confident English learners”, so tutors could also approach the usage of LLMs as a suggestion for students who may be lagging behind, for example, instead of simply encouraging or discouraging everyone to use them. Not everyone needs them to improve their learning or productivity, after all.
Learning is much more than the technology we use to achieve it, and much more than the institutions we choose to send our kids and the policies they might have in place. The biggest factor is the home environment, and if we do not create the correct environment there, not even the best tutors with the best technologies will be able to do much about it. Sadly, those who have motivations to cheat to avoid punishment, get ahead of the competition, or simply do away with their responsibilities now have it much easier by tapping on LLMs. Technology has now evolved to allow us to now pretend we do all our homework and assignments without having to rely on copying it from another friend. LLMs are our responsible, responsive, and all-knowing friends, and pretending to be a good student has simply become easier. It is not the only thing that has remained constant, though: it is still the case that only those who actually learn are those who do the work instead of just copying responses, and this applies to young people and adults alike.
If we want people — be them high school kids, uni students, or professionals — to learn as they should, we need to think bigger than technology. LLMs are just the latest tool to be used and abused, but the incentive structures and social factors should be our main concern if we want to maximise learning and critical thinking.
Great article! Two solutions I have found which work. Research assignments which can be copied are followed by tests on the same topic. Copiers are quickly exposed. Training students using LLMs blooms higher levels. Lastly making sure assessments incorporate synthetic and unfamiliar or novel problems which require cognitive transfer. Where LLMs may not always help. These all test understanding and hence reliance on LLMs on their own will not help