Barmy in Wonderland

Or, Chatting With a Chatbot

By Kenneth Norrie - Posted on 21 April 2023

Protecting Our Honest Students

I spent more than a dozen years either chairing or being a member of Strathclyde Law School’s Student Affairs Committee, which deals with allegations of plagiarism and academic dishonesty, and gave six years’ service to the Senate Discipline Committee.  A mental Cheshire Cat kept me sane by constantly reminding me that the number of our students who try to cheat their way to a law degree is tiny.  Most students have enough integrity to resist the temptation to submit other people’s work as their own.

But a far higher number of students, for what are perhaps more forgivable reasons, seek shortcuts, especially when they have numerous assignments to submit in a short space of time.  I have never been particularly bothered if, in answering a question about, say, Donoghue v Stevenson, the student types that case name into Wikipedia instead of Westlaw Journals.  Though errors creep in to Wikipedia, it is constantly edited by real people and the information it offers is often a good starting point to help a student’s understanding.  A student who summarises (in their own words) what they have learnt from that non-legal source may well be able to show enough understanding to gain a pass mark.  If they simply cut and paste it, Turnitin (a tool that checks student submissions for similarity with existing sources) will help to identify them as plagiarists.  Higher marks, of course, require more work from primary sources.  But a new development is threatening to shake our complacencies. 

A New Way of Cheating

There has been a lot of writing in the media of late about what I understand is called “generative AI”, that is to say computer programmes that are capable of generating text by algorithmic software.  Open AI, an organisation that describes itself as a “capped profit company” with a self-proclaimed mission “to ensure that artificial general intelligence benefits all of humanity”, offers to the world “ChatGPT”.  One of the abilities of this programme is to generate text, with remarkable speed, on a topic that it is asked about, and its major selling point is that it does so in a conversational (or chatty) style, indistinguishable from that of a real person.  It offers to produce text that sounds human, if without grammatical errors.

Universities across the world, who rely on students producing their own text, are horrified – and not just at the thought of having to become suspicious of students with good grammar.  It will benefit no-one, far less all of humanity, if universities give out degrees to people much (or even some) of whose assessed work was generated by ChatGPT and the like.  Even if only a tiny percentage of students would deliberately cheat in this way, if they are able to do so they demean the worth of degrees for all students.  Students and staff have a mutual interest in preventing this. 

But Does it Work?

I wondered how genuine the risk is, and so with a few idle hours to spare on one of those rainy afternoons that so characterises our own little corner of the globe, I set ChatGPT some tasks, similar to what we might ask law students to do.

“Write me an essay”, I typed into the system, “of 500 words in length, with full citation of authority, on the 1932 House of Lords decision in Donoghue v Stevenson”.  Around three seconds later – three seconds! – I had my essay.  It was nicely structured, with an introduction, substance and conclusion, and ended with references.  It was more or less accurate, if lacking anything in the way of insight, but a good general description of the case and its importance.  No grammatical errors: an easy read that made sense.  Had it been submitted for assessment by a student in a mid-degree class I would have given it a solid pass mark, and if submitted by a student for a first semester class perhaps an even higher mark.

I typed the same question in a second time and, interestingly, the machine gave me another essay, saying much the same thing but in different words, as if written by a different person faced with the same task.  Turnitin, which we presently use to pick up plagiarism in the form of copying, might not identify the two as coming from the same source.

This is worrying, but perhaps not unduly so since with assessments in the real world we very seldom ask students to do no more than give us pure facts.  We are far more likely to ask for some sort of evaluation and – at least at the moment – Chatbots either struggle to provide these or simply cannot.  I asked the machine “Is Donoghue v Stevenson a good decision?”.  After a disarming pause, it replied that the case was considered by many to be a good decision for the following reasons (and it gave three, quite sensible, reasons for holding that view).  As markers, we will have to look out for phraseology like “considered by many” because the machine seems to use this a lot in an attempt to distance itself from anything like an opinion.  Oddly, when I repeated this question the response was structured very differently (and more revealingly): “As an AI language model, I am programmed to be neutral and do not have the ability to form opinions on legal cases.  However, here is some background to the case…”  That “I am programmed to be neutral”, with its implication that the system could be otherwise programmed, is deeply worrisome.  But the “However….” is a clear marker of that easily identified (bad) tactic common amongst the weaker students of answering a question that was not asked: such answers never gain pass marks. 

Fighting Back

Clearly, in setting assignments (even in first year modules) we are going to have to ask for more sophisticated evaluations.  “Is Donoghue a better decision than Caparo Industries?” generated the fairly meaningless answer that the cases were difficult to compare, “but here are the facts of each….”  That would not receive a pass mark for that sort of question.

At the moment, the best protection against misuse in assignments is probably that the machine often generates major factual errors, and if a student spends time checking everything ChatGPT says against formal legal sources then they will have learnt something and may well deserve credit for doing so.  Sometimes, the answers were those that might have been given by the Hatter (subsequently called the Mad): structured in technically perfect sentences and offered with confidence, but with a substance of palpable nonsense.  My next question was: “What amendments have been made to the Children’s Hearings (Scotland) Act 2011?”  (You can get an accurate list from Westlaw).

ChatGPT’s answer was that “Several amendments have been made to this legislation, and the following are some representative examples”.  It then listed some SSIs, and offered what might have been a helpful explanation of the effect they had on the 2011 Act.  But in no case was it remotely accurate.  The first listed Statutory Instrument was the 2013 Modification Order (which has the merit of existing in the real world).  ChatGPT suggested this 2013 Order was made as a consequence of the Children and Young People (Scotland) Act 2014 (seeming not to notice the dissonance in dates).  The next on the list was the “Children’s Hearings Review of Decisions Amendment Rules 2014”, which do not exist – and it made up some rubbish about how this now required reviews of decisions from children’s hearings to be conducted by three panel members rather than, as before, just one panel member (!?!).

I then wondered, somewhat immodestly, what does ChatGPT think it knows about me?  So I asked the machine, “Tell me all you know about Professor Kenneth Norrie”.  Here it offered some assessments, or at least it used evaluative words, which no-one could possibly disagree with – “renowned”, “highly regarded”, “eminent”, “world-leading”, and the like.  But it also adopted Humpty Dumpty’s approach to meaning.  It told me that my degrees are from the Universities of Edinburgh and Glasgow (in fact, it is Dundee and Aberdeen); that I joined Strathclyde in 1996 (in fact, 1990) – and most bemusingly that I was awarded an OBE for services to gender equality and LGBT rights in 2017!  (There is no OBE in my CV, though much of my work is indeed on gender equality and LGBT rights).

Again, I repeated the question, but this time there was no mention of a phantom OBE.  The machine now however suggested that I had been awarded The Herald’s Law Awards 2021 “Outstanding Contribution to the Scottish Legal Profession”.  (In the real world, away from Wonderland, that award went to Mike Dailly of the Govan Law Centre.  A most worthy recipient and a Strathclyde graduate of whom the Law School is inordinately proud).  Really, how can the machine generate such nonsense?  You might as well ask the Hatter’s question: Why is a raven like a writing desk? 

Implications for Assessment Strategy

Where does all this leave university examiners?  At the moment, we can I think probably reduce the risks of misuse of Chatbots by changing the nature of what we ask our students to do in assessed work.  But ChatGPT is clearly becoming more and more sophisticated, and we cannot rely on it continuing to offer perfectly written essays riddled with basic errors and eschewing anything that looks like an opinion or evaluation.  As competitor programmes enter the field then accuracy and cross-checking may well become a feature, as will the offering of rational opinions based on factually accurate justifications.

The Law School (in common with schools across the HE sector) may well have to radically rethink how students are assessed, to ensure that their legal knowledge, understanding, skills and indeed values are being tested rather than their typing ability.  We might have to increase, for example, the extent of assessed presentations, or revert back to hand-written, invigilated, closed-book examinations.  We may have to revisit our resistance to assessing performance in tutorials.  We might work out how to offer credits to activities such as mooting and other inter-varsity competitions.  Doubtless Turnitin and similar similarity-identifying systems are already being recalibrated to meet the challenges created by Chatbots.  Every student who cares about the worth of their degree certificate has an interest in ensuring the integrity of the assessment system, and whatever approach the Law School takes in response to the very real threats from Chatbots, our students’ views on what is practicable and acceptable will be an invaluable resource.

Finally, the White Rabbit pushed me down an ever-more disorienting rabbit hole: I asked ChatGPT what its own ideas were on how to prevent students from using Chatbots in preparing assessed work.  Its responses ranged from Dormouse-like naivety (simply tell the students the importance of learning things rather than pretending to know things) to Queen of Hearts-like harshness (substantially increase the penalties for those caught, including termination of studies, to make the risk not worth taking).  None of its ideas can be rejected out of hand: all of them will require a human judgment founded on experience before adoption.

What else might we do?  Answers on a (handwritten) postcard, please. 

Kenneth Norrie