Collections: On ChatGPT

So I stirred up a bit of conversation on Twitter last week when I noted that I had already been handed ChatGPT produced assignments.1 For those who are unaware, ChatGPT is an ‘AI’ chatbot that given a prompt can produce texts; it is one of most sophisticated bots of this sort yet devised, trained on a massive amount of writing (along with substantial human input in the training process, something we’ll come back to). And its appearance has made a lot of waves and caused a fair bit of consternation.

Now I should note at the outset that while I am going to argue that ChatGPT is – or at least ought to be – basically useless for doing college assignments, it is also wrong to use it for this purpose. Functionally all university honor codes prohibit something like ‘unauthorized aid or assistance’ when completing an assignment. Having a chatbot write an assignment – or any part of that assignment – for you pretty clearly meets that definition. Consequently using ChatGPT on a college essay is pretty clearly an impermissible outside aid – that is to say, ‘cheating.’ At most universities, this sort of cheating is an offense that can lead to failing classes or expulsion. So however irritating that paper may be, it is probably not worth getting thrown out of college, money wasted, without a degree. Learn. Don’t cheat.

That said I want to move through a few of my basic issues: first, what ChatGPT is in contrast to what people seem to think it is. Second, why I think that functionality serves little purpose in essay writing – or more correctly why I think folks that think it ‘solves’ essay writing misunderstand what essay writing is for. Third, why I think that same functionality serves little purpose in my classroom – or more correctly why I think that folks that think is solves issues in the classroom fundamentally misunderstand what I am teaching and how.

Now I do want to be clear at the outset that I am not saying that this technology has no viable uses (though I can’t say I’ve yet seen an example of a use I would consider good rather than merely economically viable for ChatGPT in particular) and I am certainly not saying that future machine-learning based products, be they large language models or other products, will not be useful (though I do think that boosters of this technology frequently assume applications in fields they do not understand). Machine learning products are, in fact, already useful and in common use in ways that are good. But I think I will stipulate that much of the boosterism for ChatGPT amounts to what Dan Olsen (commenting on cryptocurrency) describes as, “technofetishistic egotism,” a condition in which tech creators fall into the trap where, “They don’t understand anything about the ecosystems they’re trying to disrupt…and assume that because they understand one very complicated thing, [difficult programming challenges]…that all other complicated things must be lesser in complexity and naturally lower in the hierarchy of reality, nails easily driven by the hammer that they have created.”

Of course that goes both ways which is why I am not going to say what capabilities machine learning may bring tomorrow. It is evidently a potentially powerful technology and I am not able to assess what it may be able to do in the future. But I can assess the observes capabilities of ChatGPT right now and talk about the implication those capabilities have in a classroom environment, which I do understand.2 That means – and I should be clear on this – this is a post about the capabilities of ChatGPT in its current form; not some other machine learning tool or AI that one imagines might exist in the future. And in that context what I see does not convince me that this technology is going to improve the learning experience; where it is disruptive it seems almost entirely negatively so and even then the disruption is less profound than one might think.

Now because I am not a chatbot but instead a living, breathing human who in theory needs to eat to survive, I should remind you that if you like what you are reading here you can help by sharing what I write (for I rely on word of mouth for my audience) and by supporting me on Patreon. And if you want updates whenever a new post appears, you can click below for email updates or follow me on twitter (@BretDevereaux) for updates as to new posts as well as my occasional ancient history, foreign policy or military history musings, assuming there is still a Twitter by the time this post goes live.

The Heck is a ChatGPT?

But I think we want to start by discussing what ChatGPT is and what it is not; it is the latter actually that is most important for this discussion. The tricky part is that ChatGPT and chatbots like it are designed to make use of a very influential human cognitive bias that we all have: the tendency to view things which are not people as people or at least as being like people. We all do this; we imagine our pets understand more than they can, have emotions more similar to ours than they do,3 or that inanimate objects are not merely animate but human in their feelings, memories and so on. We even imagine that the waves and winds are like people too and assign them attributes as divine beings with human-like emotions and often human-like appearances. We beg and plead with the impersonal forces of the world like we would with people who might be moved by those emotions.

The way ChatGPT and other chatbots abuse that tendency is that they pretend to be like minds – like human minds. But it is only pretend, there is no mind there and that is the key to understanding what ChatGPT is (and thus what it is capable of). Now I can’t claim to understand the complex computer science that produced this program (indeed, with machine learning programs, even the creators sometimes cannot truly understand ‘how’ the program comes to a specific result), but enough concerning how it functions has been discussed to get a sense of what it can and cannot do. Moreover its limitations (demonstrated in its use and thus available for interrogation by the non-specialist) are illustrative of its capabilities.

ChatGPT is chatbot (a program designed to mimic human conversation) that uses a large language model (a giant model of probabilities of what words will appear and in what order). That large language model was produced through a giant text base (some 570GB, reportedly) though I can’t find that OpenAI has been transparent about what was and was not in that training base (though no part of that training data is post-2021, apparently). The program was then trained by human trainers who both gave the model a prompt and an appropriate output to that prompt (supervised fine tuning) or else had the model generate several responses to a prompt and then humans sorted those responses best to worst (the reward model). At each stage the model is refined (CGP Grey has a very accessible description of how this works) to produce results more in keeping with what the human trainers expect or desire. This last step is really important whenever anyone suggests that it would be trivial to train ChatGPT on a large new dataset; a lot of human intervention was in fact required to get these results.

It is crucial to note, however, what the data is that is being collected and refined in the training system here: it is purely information about how words appear in relation to each other. That is, how often words occur together, how closely, in what relative positions and so on. It is not, as we do, storing definitions or associations between those words and their real world referents, nor is it storing a perfect copy of the training material for future reference. ChatGPT does not sit atop a great library it can peer through at will; it has read every book in the library once and distilled the statistical relationships between the words in that library and then burned the library.

ChatGPT does not understand the logical correlations of these words or the actual things that the words (as symbols) signify (their ‘referents’). It does not know that water makes you wet, only that ‘water’ and ‘wet’ tend to appear together and humans sometimes say ‘water makes you wet’ (in that order) for reasons it does not and cannot understand.

In that sense, ChatGPT’s greatest limitation is that it doesn’t know anything about anything; it isn’t storing definitions of words or a sense of their meanings or connections to real world objects or facts to reference about them. ChatGPT is, in fact, incapable of knowing anything at all. The assumption so many people make is that when they ask ChatGPT a question, it ‘researches’ the answer the way we would, perhaps by checking Wikipedia for the relevant information. But ChatGPT doesn’t have ‘information’ in this sense; it has no discrete facts. To put it one way, ChatGPT does not and cannot know that “World War I started in 1914.” What it does know is that “World War I” “1914” and “start” (and its synonyms) tend to appear together in its training material, so when you ask, “when did WWI start?” it can give that answer. But it can also give absolutely nonsensical or blatantly wrong answers with exactly the same kind of confidence because the language model has no space for knowledge as we understand it; it merely has a model of the statistical relationships between how words appear in its training material.

In artificial intelligence studies, this habit of manufacturing false information gets called an “artificial hallucination,” but I’ll be frank I think this sort of terminology begs the question.4 ChatGPT gets called an artificial intelligence by some boosters (the company that makes it has the somewhat unearned name of ‘OpenAI’) but it is not some sort of synthetic mind so much as it is an extremely sophisticated form of the software on your phone that tries to guess what you will type next. And ChatGPT isn’t suffering some form of hallucination – which is a distortion of sense-perception. Even if we were to say that it can sense-perceive at all (and this is also question-begging), its sense-perception has worked just fine: it has absorbed its training materials with perfect accuracy, after all; it merely lacks the capacity to understand or verify those materials. ChatGPT isn’t a mind suffering a disorder but a program functioning perfectly as it returns an undesired output. When ChatGPT invents a title and author of a book that does not exist because you asked it to cite something, the program has not failed: it has done exactly what was asked of it, putting words together in a statistically probable relationship based on your prompt. But calling this a hallucination is already ascribing mind-like qualities to something that is not a mind or even particularly mind-like in its function.

Now I should note the counter-argument here is that by associating words together ChatGPT can ‘know’ things in some sense because it can link those associations. But there are some major differences here. First, human minds assess the reliability of those associations: how often when asked a question does an answer pop into your mind that you realize quickly cannot be right or you realize you don’t know the answer at all and must look it up? Part of that process, of course, is that the mental associations we make are ‘checked’ against the real world realities they describe. In fancy terms, words are merely symbols of actual real things (their ‘referents‘ – the things to which they refer) and so the truth value of words may be checked against the actual status of their referents. For most people, this connection is very strong. Chances are, if I say ‘wool blanket’ your mind is going to not merely play word association but also conjure up some memories of actual wool blankets – their sight, touch or smell. ChatGPT lacks this capability; all it has are the statistical relationship between words stripped entirely of their referents. It will thus invent descriptions for scientific phenomenon that aren’t real, embellish descriptions of books that do not exist and if asked to cite things it will invent works to cite, because none of those things is any more or less real to ChatGPT than actual real existing things.

All it knows, all it knows are the statistical relationships of how words appear together, refined by the responses that its human trainers prefer. Thus the statement that ChatGPT doesn’t know anything about anything or more correctly it cannot know anything about the topics it is asked to write about.

All of that is important to understand what ChatGPT is doing when you tell it to, say, write an essay. It is not considering the topic, looking up references, thinking up the best answer and then mobilizing evidence for that answer. Instead it is taking a great big pile of words, picking out the words which are most likely to be related to the prompt and putting those words together in the order-relationships (but not necessarily the logical relationships) that they most often have, modified by the training process it has gone through to produce ‘better’ results. As one technical writer, Ted Chiang, has put it, the result is merely a ‘very lossy’ (that is, not very faithful) reproduction of its training materials, rather than anything new or based on any actual understanding of the underlying objects or ideas. But, because it is a chatbot, its can dole those words out in tremendous quantity, with flawless spelling and grammar and to follow whatever formula (more or less) the prompt asks for. But it doesn’t know what those words mean; indeed coming from the chatbot, in a sense they mean nothing.

I stress this functionality at the beginning because I want readers to understand that many of the mental processes – analysis, verification, logical organization – that we take for granted from a thinking person are things ChatGPT does not do and is entirely incapable of in the same way that an electric can-opener cannot also double as a cell phone. Those capabilities are both entirely outside of the structure of the current iteration of ChatGPT and also entirely outside of the processes that the training procedures which produced ChatGPT will train. Incremental improvements in the can-opener will not turn it into a cell phone either; the cell phone is an entirely different sort of machine. Thus the confidence among some that the ‘hallucination’ problem will be inevitably solved seems premature to me. It may well be solved, but it may well not; doing so will probably require the creation of an entirely new sort of machine of a type never before created. That eventuality cannot be taken for granted; it is not even something that we know is possible (though it may well be!). It most certainly will not happen on its own.

The Heck Is an Essay?

So that is what ChatGPT does: in response to a prompt, it puts together an answer that is composed of words in its training material organized based on the statistical probability that those words appear together and the degree to which they are related to the prompt (processed through an extremely complex language model). It thus assembles words from its big bag of words in a way that looks like the assemblages of words it has seen in its training and which its human trainers have ranked highly. And if all you want ChatGPT to do is precisely that: somewhat randomly assemble a bunch of words loosely related to a topic in a form that resembles communication, it can do that for you. I’m not sure why you want it to do that, but that is the one and only thing it can do.

But can ChatGPT write an essay?

It has been suggested that this endangers or even makes obsolete the essay or particularly the ‘college essay,’ and I think this misunderstands what the purpose of an essay is. Now the definition of an essay is somewhat nebulous, especially when it comes to length; essays are shorter than books but longer than notes but these too are nebulously defined. Still we can have a useful definition:

An essay is a piece of relatively short writing designed to express an argument – that is, it asserts a truth about something real outside of the essay itself – by communicating the idea of argument itself (the thesis) and assembling evidence chosen to prove that argument to a reader. Communication is thus part of writing an essay, but not the only part or even necessarily the most important. Indeed, the communication element may come in entirely different forms from the traditional essay. Consider video essays or photo essays: both have radically changed the form of communication but they remain essays because the important part – the argument asserting a truth about something supported by assembled evidence – remains the same, even as the nature of the evidence and communication has changed.

Writing an essay thus involves a number of steps, of which communication is merely the last. Ideally, the essay writer has first observed their subject, then drawn some sort of analytical conclusion about that subject,5 then organized their evidence in a way that expresses the logical connections between various pieces of evidence, before finally communicating that to a reader in a way that is clear and persuasive.

ChatGPT is entirely incapable of the first two steps (though it may appear to do either of them) and incompetent at the third; it’s capabilities are entirely on the last step (and even there generally inferior to a well-trained human writer at present).

When it comes to observing a subject, as noted ChatGPT is not capable of research so the best it can do, to borrow Ted Chiang’s phrasing again, is provide a ‘lossy’ replica of the research of others and only if that research has somehow found its way into ChatGPT’s training materials. Even when the necessary information is contained within the works in ChatGPT’s training material, it can’t actually understand those things, it can only reproduce them, so if they do not explicitly draw the conclusion it needs in as many words, ChatGPT can’t do so either. We can demonstrate this by asking ChatGPT an almost trivially easy research question, like, “What is the relationship between Edward Luttwak’s Grand Strategy of the Roman Empire and Benjamin Isaac’s The Limits of Empire?” And so we did:

If you know nothing about either book, this answer almost sounds useful (it isn’t).6 Now this is a trivial research task; simply typing ‘the limits of empire review’ into Google and then clicking on the very first non-paywalled result (this review of the book by David Potter from 1990) and reading the first paragraph makes almost immediately clear the correct answer is that Isaac’s book is an intentional and explicit rebuttal of Luttwak’s book, or as Potter puts it, “Ben Isaac’s The Limits of Empire offers a new and formidable challenge to Luttwack.” A human being who understands the words and what they mean could immediately answer the question, but ChatGPT which doesn’t, cannot: it can only BS around the answer by describing both books and then lamely saying they “intersect in some ways.” The information ChatGPT needed was clearly in its training materials (or it wouldn’t have a description of either book to make a lossy copy of),7 but it lacks the capacity to understand that information as information (rather than as a statistically correlated sequence of words).8 Consequently it cannot draw the right conclusion and so talks around the question in a convincing, but erronous way.

Note that no analysis was required for the above question! It was a pure reading comprehension question that could be solved by merely recognizing that something in the training set already said the answer and copying it, but ChatGPT wasn’t even capable of that because while it has a big bag of words related to both books, it lacks the capability to understand and grab the relevant words. This is an example of the not at all uncommon situation where Google is a far better research tool than ChatGPT, because Google can rely on your reading comprehension to understand the places it points you to which may have the answer you seek.

So research and observation are out; what about analysis? Well, if you have been following along you’ll realize that ChatGPT is incapable of doing that too. What it can do is find something that looks like analysis (though it may not be analysis or it may be quite bad analysis) and then reproduce it (in a lossy form) for you. But the point of analysis is to be able to provide novel insight, that is to either suggest a conclusion hitherto unconsidered for a given problem or equally importantly to come up with a conclusion for a problem that is only being encountered for the very first time. ChatGPT, limited entirely to remixing existing writings, cannot do either.

As a system to produce essays, this makes ChatGPT not very useful at all. Generally when people want an essay, they don’t actually want the essay; the essay they are reading is instead a container for what they actually want which is the analysis and evidence. An essay in this sense is a word-box that we put thoughts in so that we can give those thoughts to someone else. But ChatGPT cannot have original thoughts, it can only remix writing that is already in its training material; it can only poorly copy writing someone else has already done better somewhere.9 ChatGPT in this sense is like a friendly, if somewhat daft neighbor who noticed one day that every so often you get a box from Amazon and that you seem quite happy to get it and so decides to do you a favor by regularly ordering empty Amazon boxes to your house. The poor fellow does not know and cannot understand that it was the thing in the box – in this case, the thoughts (original observations, analysis, evidence) in the essay – that you actually wanted. ChatGPT doesn’t have any thoughts to give you (though it can somewhat garble someone else’s thoughts), but it sure can order you up a bunch of very OK boxes.

In a very real sense then, ChatGPT cannot write an essay. It can imitate an essay, but because it is incapable of the tasks which give an essay its actual use value (original thought and analysis), it can only produce inferior copies of other writing. That quite a few people, including some journalists, have supposed that ChatGPT can write an essay suggests to me that they have an impoverished idea of what an essay is, viewing it only as ‘content’ rather than as a box that thoughts go into for delivery, or haven’t really scrutinized what ChatGPT outputs closely enough.

Now there are, in that previous analogy, box-sellers online: outlets who really do not care about the thoughts in the essay but merely want units of text to throw up to generate clicks. Few reputable publications function this way – that’s why they have editors whose job is to try to figure out if your essay has a thought in it actually worth sharing and then if so to help guide you to the most effective presentation of that thought (that’s the editing process). But there are a lot of content mills online which are really looking to just supply large amounts of vaguely relevant text at the lowest possible cost hoping to harvest views from gullible search engines. For those content mills, ChatGPT potentially has a lot of value but those content mills provide almost no value to us, the consumer. Far from it, they are one of the major reasons why folks report declining search engine quality, as they crowd out actually useful content.10

That said I don’t want to rule out ChatGPT’s ability to produce functional formulaic documents entirely. I’ve heard it suggested that it could massively reduce the cost of producing formula-driven legal and corporate documents and perhaps it can. It’s also been suggested it could be trained to write code, though my understanding is that as of now, most of the code it produces looks good but does not work well. I don’t write those sorts of things, though, so I can’t speak to the question. I would be concerned though, because ChatGPT can make some very bad mistakes and has no way of catching those mistakes, so very high stakes legal or corporate documents seems like a risky use of ChatGPT. ChatGPT can’t write a good essay, but a bad essay only wastes a few minutes of your time; a bad contract can cost a company millions and a single bad line of code can crash an entire program (or just cause it to fail to compile and in either case waste hours and hours of bug-hunting to determine what went wrong).

But the core work of the essay? This ChatGPT cannot do. And importantly it is not some capacity which merely requires iterative improvements on the product. While ChatGPT can fake an original essay, the jump from faking that essay to writing an actually original thought certainly looks like it would require a completely different program, one capable of observing the real world, analyzing facts about it and then reaching conclusions.

The Heck is the Teaching Essay For?

That leaves the role of ChatGPT in the classroom. And here some of the previous objections do indeed break down. A classroom essay, after all, isn’t meant to be original; the instructor is often assigning an entire class to write essays on the same topic, producing a kaleidoscope of quite similar essays using similar sources. Moreover classroom essays are far more likely to be about the kind of ‘Wikipedia-famous’ people and works which have enough of a presence in ChatGPT’s training materials for the program to be able to cobble together a workable response (by quietly taking a bunch of other such essays, putting them into the blender and handing out the result, a process which in the absence of citation we probably ought to understand as plagiarism). In short, many students are often asked to write an essay that many hundreds of students have already written before them. And so there were quite a few pronouncements that ChatGPT had ‘killed’ the college essay. And indeed, in my own experience in the Twitter discourse around the system, one frequent line of argument was that ChatGPT was going to disrupt my classroom, so shouldn’t I just go ahead and get on board with the new technology?

This both misunderstands what the college essay is for as well as the role of disruption in the classroom. Let’s start with the first question: what is the teaching essay (at any level of schooling) for? It’s an important question and one that arises out of a consistent problem in how we teach students, which is that we rarely explain our pedagogy (our ‘teaching strategy’) to the students. That tends to leave many assignments feeling arbitrary even when teachers have in fact put a great deal of thought into why they are assigning what they are and what skills they are supposed to train. So let’s talk about why we assign essays, what those assignments are supposed to accomplish and why ChatGPT has little to offer in that realm.

In practice there are three things that I am aiming for an essay assignment to accomplish in a classroom. The first and probably least important is to get students to think about a specific historical topic or idea, since they (in theory) must do this in order to write about it. In my own planning I sometimes refer to these assignments as ‘pedagogical’ essays (not a perfect term) where the assignment – typically a ‘potted’ essay (short essay with pre-chosen sources handed to students, opposite of a ‘research’ essay) – is meant to have students ponder a specific question for the value of that question. One example is an essay prompt I sometimes use in my ancient history survey asking students, “On what basis do we consider Alexander to be ‘great’? Is this a sound basis to apply this title?” Obviously I want students here to both understand something about Alexander but also to think about the idea of greatness and what that means; does successfully killing a lot of people and then failing to administer what remains qualify as greatness and if so what does that say about what we value? Writing the essay forces them to ponder the question. That value is obviously lost if they just let ChatGPT copy some other essay for them.

That said this first sort of goal is often the least important. While of course I think my course material matters, the fact is few students will need to be able to recall from memory the details of Alexander the Great at some point in their life. They’ll be able to look him up and hopefully with the broad knowledge framework I’ve given them and the research and analysis skills, be able to reach for these same conclusions. Which brings us to:

The second goal and middle in importance is training the student in how to write essays. I’ve made this element of my approach more explicit in recent years, making the assignments more closely resemble the real world writing forms they train for. Thus the classics 3-5 page paper becomes the c. 1000 word think-piece (though I do require a bit more citation than a print publication would in a ‘show your work’ sort of way), the sort paper becomes a 700-800 word op-ed, etc. The idea here is to signal to students more clearly that they are training to write real things that exist in the world outside of the classroom. That said, while a lot of students can imagine situations in which they might want to write an op-ed or a think piece or a short speech, many of them won’t ever write another formal essay after leaving college.

Thus the last and most important thing I am trying to train is not the form of the essay nor its content, but the basic skills of having a thought and putting it in a box that we outlined earlier. Even if your job or hobbies do not involve formal writing, chances are (especially if your job requires a college degree) you are still expected to observe something real, make conclusions about it and then present those conclusions to someone else (boss, subordinates, co-workers, customers, etc.) in a clear way, supported by convincing evidence if challenged. What we are practicing then is how to have good thoughts, put them in good boxes and then effectively hand that box to someone else. That can be done in a formal written form (the essay), in informal writing (emails, memos, notes, Slack conversations), or verbally (speeches, but also arguments, debates and discussions). The skills of having the idea, supporting it with evidence, organizing that evidence effectively to be understood and then communicating that effectively are transferable and the most important skills that are being practiced when a student writes an essay.

Crucially – and somehow this point seems to be missed by many of ChatGPT’s boosters I encountered on social media – at no point in this process do I actually want the essays. Yes, they have to be turned in to me and graded and commented because that feedback in turn is meant to both motivate students to improve but also to signal where they need to improve.11 But I did not assign the project because I wanted the essays. To indulge in an analogy, I am not asking my students to forge some nails because I want a whole bunch of nails – the nails they forge on early attempts will be quite bad anyway. I am asking them to forge nails so that they learn how to forge nails (which is why I inspect the nails and explain their defects each time) and by extension also learn how to forge other things that are akin to nails. I want students to learn how to analyze, organize ideas and communicate those ideas.

What one can immediately see is that a student who simply uses ChatGPT to write their essay for them has simply cheated themselves out of the opportunity to learn (and also wasted my time in providing comments and grades). As we’ve seen above, ChatGPT cannot effectively replace the actual core tasks we are training for, so this is not a case where the existence of spinning jennies renders most training at hand spinning obsolete. And it certainly doesn’t fulfill the purpose of the assignment.

To which some boosters of the technology respond that what I should really be doing is training students on how to most effectively use ChatGPT as a tool. But it is not clear to me that ChatGPT functions well as a tool for any part of this process. One suggestion is to write an outline and then feed that into ChatGPT to generate a paper, but that fails to train the essential communication component of the assignment and in any case, ChatGPT is actually pretty bad at the nuts of and bolts of writing paragraphs. Its tendency in particular to invent facts or invent non-existent sources to cite makes it an enormous liability here; it is a very bad research tool because it is unreliable. Alternately the suggestion is that students could use ChatGPT to produce an essay they edit to fit or an outline they fill in; both problems run into the issue that the student is now trying to offload the most important part of the task for them to learn: the actual thinking and analysis. And the crucial thing to note is that the skill that is not being trained in both cases is a skill that current large language models like ChatGPT cannot perform or perform very poorly.12

I suspect this argument looks plausible to people because they are not thinking in terms of being trained to think about novel problems, but in terms of the assignment itself; they are thinking about the most efficient way to produce ‘one unit of essay.’ But what we’re actually doing is practicing a non-novel problem (by treating it as a novel problem for the purpose of the assignment), so that when we run into novel problems, we’ll be able to apply the same skills. Consequently they imagine that ChatGPT, trained as it is on what seems to be an awful lot of mediocre student essays (it mimics the form of a bad student essay with remarkable accuracy), can perform the actual final task in question, but it cannot.

Conclusion: Preparing to Be ‘Disrupted.’

The reply that all of this gets has generally been some combination of how this technology is ‘the future,’ that it will make essay writing obsolete so I should focus on training for it,13, and most of all that the technology will soon be so good, if it is not already, that any competent student will be able to use it to perfectly fake good papers. Thus, I am told, my classroom is doomed to be ‘disrupted’ by this technology so I should preemptively surrender and get on board.

And no. No, I don’t think so.

I do think there are classrooms that will be disrupted by ChatGPT, but those are classrooms where something is already broken. Certainly for a history classroom, if ChatGPT can churn out a decent essay for your assignment, chances are the assignment is poorly designed. ChatGPT after all cannot analyze a primary source (unless it is already been analyzed many times in its training materials), it struggles to cite scholarship (more often inventing fake sources) and it generally avoids specific evidence. Well-designed assignments which demand proper citation, specific evidence to support claims (rather than general statements) and a clear thesis are going to be beyond ChatGPT and indeed require so much editing to produce from a ChatGPT framework as to make it hardly worth the effort to cheat. If your essay prompt can be successfully answered using nothing but vague ChatGPT generated platitudes, it is a bad prompt.14

Meanwhile, ChatGPT responses seem to be actually pretty easy to spot once you know how to look for the limitations built into the system. There are already programs designed to detect if a piece of writing is machine-written; they’re not fully reliable yet but I suspect they will become more reliable over time mostly because it is in the interests of both AI-developers (who do not want their models trained on non-human produced writing) and search engines (who want to be able to exclude from search results the veritable river of machine-produced content-mill garbage we all know is coming) to develop that capability. But because of the ways ChatGPT is limited, a human grader should also be able to flag ChatGPT generated responses very quickly too.

It should be trivially easy, for instance, for a grader to confirm if the sources a paper cites exist.15 A paper with a bunch of convincing sounding but entirely invented sources is probably machine-written because humans don’t tend to make that mistake. If instead, as is its wont, the paper refers merely vaguely to works written by a given author or on a given topic, insist the student produce those works (and require citation on all papers) – this will be very hard for the student with the ChatGPT paper as those works will not, in fact, exist.16 ChatGPT also has a habit of mistaking non-famous people for famous people with similar names; again for a grader familiar with the material this should be quite obvious.

And then of course there are the errors. ChatGPT makes a lot of factual mistakes, especially as it gets into more technical questions where the amount of material for it to be trained on is less. While the text it produces often looks authoritative to someone with minimal knowledge in that field, in theory the person grading the paper should have enough grounding to spot some of the obvious howlers that are bound to sneak in over the course of a longer research paper.17 By way of example, I asked ChatGPT to write on, “the causes of Roman military success in the third and second centuries BCE.” Hardly a niche topic.18 The whole thing was sufficiently full of problems and errors that I’m just going to include an annotated word document pointing them all out here:

Needless to say, this would not be a passing (C or higher) paper in my class. Exact counting here will vary but I identified 38 factual claims, of which 7 were correct, 7 were badly distorted and 24 were simply wrong. A trainwreck this bad would absolutely have me meeting with a student and raising questions which – if the paper was machine written – might be very hard for the student to answer. Indeed, a research paper with just three or four of these errors would probably prompt a meeting with a student to talk about their research methods. This is certainly then also an error rate which is going to draw my attention and now cause me to ask questions about who exactly wrote the essay and how.19

And that’s the thing: in a free market, a competitor cannot simply exclude a disruptive new technology. But in a classroom, we can absolutely do this thing. I am one of those professors who doesn’t allow laptops for note-taking (unless it is a disability accommodation, of course) because there’s quite a bit of evidence that laptops as note-taking devices lower student performance (quite apart from their potential to distract) and my goal is to maximize learning. This isn’t me being a luddite; I would ban, say, classroom firecrackers or a live jazz band for the same reason and if laptops improved learning outcomes somehow (again, the research suggests they don’t), I’d immediately permit them. Given that detecting machine-writing isn’t particularly hard and that designing assignments that focus on the skills humans can learn that the machines cannot (and struggle to fake) is good pedagogical practice anyway, excluding the technology from my classroom is not only possible it is indeed necessary.

Now will this disrupt some classrooms? Yes. Overworked or indifferent graders will probably be fooled by these papers or more correctly they will not care who wrote the paper because those instructors or graders are either not very much invested in learning outcomes or not given the time and resources to invest however much they might wish to. I think schools are going to need to think particularly about the workload on adjuncts and TAs who are sometimes asked to grade through absurdly high amounts of papers in relatively little time and thus will simply lack the time read carefully enough. Of course given how much students are paying for this, one would assume that resources could be made available to allow for the bare minimum of scrutiny these assignments deserve. Schools may also need to rethink the tradeoffs of hiring indifferent teachers ‘for their research’ or for the prestige of their PhD institutions because the gap between good, dedicated teachers and bad, indifferent ones is going to grow wider as a result of this technology.

Likewise, poorly designed assignments will be easier for students to cheat on, but that simply calls on all of us to be more careful and intentional with our assignment design (though in practice in my experience most professors, at least in history and classics, generally are). I will confess every time I see a news story about how ChatGPT supposedly passed this or that exam, I find myself more than a little baffled and quite concerned about the level of work being expected in those programs. If ChatGPT can pass business school, that might say something rather concerning about business school (or at least the bar they set for passing).

The final argument I hear is that while ChatGPT or large language models like it may not make my job obsolete now, they will inevitably do so in the future, that these programs are inevitably going to improve to the point where all of the limitations I’ve outlined will be surpassed. And I’ll admit some of that is possible but I do not think it is by any means certain. Of the processes we’ve laid out here, observing, analyzing those observations, arranging evidence to support conclusions and then communicating all of that, ChatGPT only does (or pretends to do) the last task. As I noted above, an entirely new machine would be necessary for these other processes and it is not certain that such a machine is possible within the limits of the computing power now available to us. I rather suspect it is, but it doesn’t seem certain that it is.

More broadly, as far as I can tell it seems that a lot of AI research (I actually dislike a lot of these terms which seem to me to imply that what we’ve achieved is a lot closer to a synthetic mind than it really is, at least for now) has proceeded on a ‘fake it till you make it’ model. It makes sense as a strategy: want to produce a mind, but we don’t really know how a mind works at full complexity, so we’ve chosen instead to try to create machines which can convincingly fake being a mind in the hopes that a maximally convincing fake will turn out to be a mind of some sort. I have no trouble imagining that strategy could work, but what I think AI-boosters need to consider is that it also may not. It may in fact turn out that the sort of machine learning we are doing is a dead end.

It wouldn’t be the first time! Early alchemists spent a lot of time trying to transmute lead into gold; they ended up pioneering a lot of chemistry, exploring chemical reactions to try to achieve that result. Important things were learned, but you know what no amount of alchemical proto-chemistry was ever going to do? Turn lead into gold. As a means of making gold those experiments were dead ends; if you want to turn lead into gold you have to figure out some way of ripping three protons off of a lead atom which purely chemical reactions cannot do. The alchemist who devised chemical reactions aiming to produce progressively more convincing fakes of gold until he at last managed the perfect fake that would be the real thing was bound to fail because that final step turns out to be impossible. The problem was that the alchemist had to experiment without knowing what made some things (compounds) different from other things (elements) and so couldn’t know that while compounds could be altered in chemical reactions, elements could not.

In short, just as the alchemist labored without really knowing what gold was or how it worked, but was only able to observe its outward qualities, so too our AI engineers are forced to work without really knowing what a mind is or how it works. This present research may turn out to be the way that we end up learning what a mind really is and how it really works, or it may be a dead end. We may never turn ChatGPT into gold. It may be impossible to do so. Hopefully even if that is the case, we’ll have developed some useful tools along the way, just like those alchemists pioneered much of chemistry in the pursuit of things chemistry was incapable of doing.

In the meantime, I am asking our tech pioneers to please be more alive to the consequences of the machines you create. Just because something can be done doesn’t mean it should be done. We could decide to empirically test if 2,000 nuclear detonations will actually produce a nuclear winter,20 but we shouldn’t. Some inventions – say, sarin gasshouldn’t be used. Discovering what we can do is always laudable; doing it is not always so. And yet again and again these new machines are created and deployed with vanishingly little concern about what their impacts might be. Will ChatGPT improve society, or just clutter the internet with more junk that will take real humans more time to sort through? Is this a tool for learning or just a tool to disrupt the market in cheating?

Too often the response to these questions is, “well if it can be done, someone will do it, so I might as well do it first (and become famous or rich),” which is both an immorally self-serving justification but also a suicidal rule of conduct to adopt for a species which has the capacity to fatally irradiate its only biosphere. The amount of power our species has to create and destroy long ago exceeded the point where we could survive on that basis.

And that problem – that we need to think hard about the ethics of our inventions before we let them escape our labs – that is a thinking problem and thus one in which ChatGPT is entirely powerless to help us.

  1. And I should be clear right here ahead of time that nothing that follows is particular to any paper(s) I may have received. Do not ask “what happened to the student(s)?” or “how did you know?” or “what class was this in?” because I can’t tell you. Student privacy laws in the United States protect that sort of information and it is a good thing they do. The observations that follow are not based on student papers, instead they are based on a number of responses I had ChatGPT produce for me to get a sense of what such an effort at cheating might look like and how I might detect it.
  2. After all I may not have experience as a creator of large language models, but I am a fully qualified end user. I cannot and indeed will not critique how ChatGPT was created, but I am perfectly qualified to say, “this product as delivered does not meet any of my needs.”
  3. Not that pets don’t have emotions or some kind of understanding, but we anthropomorphize our pets a lot as a way of relating to them.
  4. Since I am going to use this phrase a lot I should be clear on its meaning. To ‘beg the question’ is not to ask someone to ask you something, but rather to ask your interlocutor in a debate or discussion to concede as a first step the very thesis you wanted to prove. If we were, say, debating the value of Jane Austin’s writing and I lead by saying, “well, you must first concede she writes extremely well!” that would be question begging. It’s more common to see actual question begging occur as a definitional exercise; an attorney that defines the defendant at a trial as a ‘criminal’ has begged the question, assuming the guilt of the person whose guilt has not yet judged in the proceeding where that is the primary concern.
  5. In our previous definition this conclusion is an argument, but we could easily expand our definition to also include descriptive essays (which aim not to make a new conclusion about something but merely assemble a collection of generally accepted facts). There is still an analytical process here because the writer must determine what facts to trust, which are important enough to include and how they ought to be arranged, even though no explicit argument is being made. Indeed, such a descriptive essay (like a Wikipedia article) makes an implicit argument based on what it is considered important enough to be included (e.g. on Wikipedia, what exactly is ‘notable’).
  6. the description of The Limits of Empire in particular is poor and mostly misses the book’s core argument that there was no Roman ‘grand strategy’ because the Romans were incapable of conceiving of strategy in that way.
  7. I’m pretty sure from the other responses I have seen (but cannot be 100% confident) that the BMCR, which is open and available to all, was included in ChatGPT’s corpus.
  8. While we’re here I should note that I think The Limits of Empire is hardly the last word on this question. On why, you want to read E. Wheeler, “Methodological Limits and the Mirage of Roman Strategy” JMH 57.1 and 57.2 (1993); Wheeler systematically destroys nearly all of Isaac’s arguments. I also asked ChatGPT to tell me what Wheeler’s critiques were, but since Wheeler isn’t in its training corpus, it couldn’t tell me. When I asked for a list of Isaac’s most prominent critics, it didn’t list Wheeler because, I suppose, no one in its corpus discussed his article, despite it being (to the best of my knowledge) generally understood that Wheeler’s critique has been the most influential, as for instance noted by J.E. Lendon in this review of the topic for Classical Journal back in 2002. ChatGPT can’t tell you any of that because it can only tell you things other people have already written in its training corpus. Instead, it listed Adrian Goldsworthy, Jeremy Armstrong, John W.I. Lee and Christopher S. Mackay because they all wrote reviews of the book; none of these scholars (some of whom are great scholars) are particularly involved in the Roman strategy debate, so all of these answers are wrong. The latest in this debate is James Lacey’s Rome: Strategy of Empire (2022), which is a solid reiteration of the Luttwakian side of the debate (valuable if only because Luttwak himself is a poor interlocutor in all of this) but seems unlikely to end it. It is possible I am working on trying to say something useful on this topic at some point in the future.
  9. It also isn’t very good at discoverability. It can’t tell you who or where that better idea is from if you find yourself wanting more explanation or context. Once again, as a research tool, Google is pretty clearly superior.
  10. This is painfully obvious when it comes to trying to get information about video games. In ye days of yore, Google would swiftly send you to the GameFaqs page (remember those!?) or the helpful fan Wiki, but more recently it becomes necessary to slog through a page or two of overly long (because Google prefers pages with at least a certain amount of text) answers to very simple questions in order to find what you are looking for (which usually ends up being a helpful response to someone’s question on Reddit or a Steam guide or, because I still like to live in 2004, an actual GameFaqs page).
  11. And thus, dear students, if you are not reading the comments you are not getting what you paid tens of thousands of dollars for when you paid tuition. Read the comments. You are in college to learn things not prove what you already know or how smart you already are. We know you are smart, that’s why you got admitted to college; the question now is about drive and willingness to learn.
  12. There is thus a meaningful difference between this and the ‘why did I need to learn math without a calculator’ example that gets reused here, in that a calculator can at least do basic math for you, but ChatGPT cannot think for you. That said, I had quite a difficult time learning that sort of thing as a kid, but (with some extra effort from my parents) I did learn it and I’ve found it tremendously useful in life. Being able to calculate a tip in my head or compare the per-unit price of, say, 3-for-whatever sale on 12pack sodas vs. a 24pack of the same brand without having to plug it into my phone is really handy. I thus find myself somewhat confused by folks I run into who are bitter they were forced to learn mathematics first without a calculator.
  13. A point we have already addressed.
  14. The one exception here are online courses using ‘closed book’ online essay tests. That is an exam model which will be rendered difficult by this technology. I think clever prompt writing (demand the students do things – be specific in evidence or reference specific works – that ChatGPT is bad at) or use alternative assignments (a capstone project or essay instead). For in-person classes, the entire problem is obviated by the written in-class essay.
  15. And if they don’t, that’s academic dishonestly regardless of who wrote the paper.
  16. And a student that cannot or will not cite their sources has plagiarized, regardless of who wrote their paper. ChatGPT is such a mess of academic dishonesty that it isn’t even necessary to prove its products were machine-written because the machine also does the sort of things which can get you kicked out of college.
  17. And if the student has gone back and done the research to be able to correct those errors and rewrite those sentences in advance…at this point why not just write the paper honestly and not risk being thrown out of college?
  18. In the event I asked for 8,000 words because I wanted to see how it would handle organizing a larger piece of writing. Now in the free version it can’t write that many words before it runs out of ‘tokens,’ but I wanted to see how the introduction would set up the organization for the bits it wouldn’t get to. In practice it set up an essay in three or four chunks the first of which was 224 words; ChatGPT doesn’t seem to be able to even set up a larger and more complex piece of writing. It also doesn’t plan for a number of words limited by how many it can get to before running out of tokens either, in case anyone thinks that’s what it was doing: to get to the end of the essay with all of the components it laid out in the introduction I had to jog it twice.
  19. Of course if the student has just tried honestly and failed, they’ll be able to document that process quite easily, with the works they read and where each wrong fact came from, whereas the student who has cheated using ChatGPT will be incapable of doing so.
  20. a hotly debated topic, actually!

369 thoughts on “Collections: On ChatGPT

  1. ChatGPT is useful as a form of autocorrect on steroids. You can type your own ideas in rough form, directly into ChatGPT, and get functional, readable sentences out. The style will not be great, and ChatGPT won’t supply any content you didn’t give it. But it makes it much easier to turn a stream of consciousness infodump into well-structured and readable paragraphs.

    Perhaps in history, good writing is commonplace. I’m in engineering, and terrible writing is all over the place. I wish my colleagues would use ChatGPT more often, not to come up with their ideas, but to make my life easier when I have to read the ideas they came up with.

    1. That would be doing them a disservice : if your engineering colleagues are routinely unable to produce functional, well-structured, readable sentences or paragraphs, they should definitively learn to do so, it will also help them in their day to day life ! (Well, maybe unless the document is written in some obscure language I guess…)
      Not to mention that this is borderline unprofessional behavior that shows a lack of respect for the readers, wastes their time, potentially introduces costly errors, &c. (heh, somewhat like carelessly using ChatGPT ?!)

      This is related to another issue I am concerned with : would these neural network programs cause the elimination of entry-level jobs, that are also used to teach new hires the basics of their profession ? (Something that at least partially happens in the university classroom in the case of our host ?) Think jobs like translator, writer of sports columns (later «leveling up» to a journalist tasked with things requiring more thought and skill), &c.
      Since a lot of companies now find it unfashionable to spend effort to teach new hires, I am afraid they won’t realize the shortsightedness of eliminating these entry-level jobs before a lot of damage has been done ?

      1. “Not to mention that this is borderline unprofessional behavior that shows a lack of respect for the readers, wastes their time, potentially introduces costly errors, &c.”

        I think this is really harsh.

        Part of the problem is that what constitutes good writing isn’t universal. What’s good writing in a technical document would be considered horrible in an academic publication, and neither would fly in a newspaper. A blog post has different criteria than a typical essay; an email has different criteria if it’s internal vs external. A lot of the criticisms I’ve seen leveled against engineers (and I grew up in a family of engineers, and routinely work with them, so I’ve heard a bunch) really boils down to “They use different standards than I do in my job, ergo their writing is bad and they should feel bad.”

        In other words: You’re not too far off saying that these documents are written in an obscure language. It absolutely is not normal English. Writing in these contexts is a VERY technical skill, one that requires a fair degree of training in.

        Secondly, writing is only one way to convey information. Many of the engineers I work with are more inclined to think in terms of diagrams. They can draft perfectly clear spec sheets, but struggle to provide a narrative for those sheets because that’s not what they’re trained to do. This is a reasonable division of labor–we don’t expect a plumber to handle electrical work, or an auto mechanic to be an expert at nuclear submarines! If you train extensively in one area of course other areas are going to lag. And to be clear, there’s a reason for this. Visual thinking and mathematics are simply better ways to address the questions engineers deal with.

        It’s also worth noting that what engineers deal with is complicated. I’ve been a construction manager, and the sheer volume of data and number of decisions necessary is overwhelming. The risks as tremendous as well–there’s often a body count associated with engineering screwups, as evidenced by the fact that there are multiple agencies that provide oversight to prevent that. Trying to put even a simple construction project–like “I dug out the contaminated dirt and put in clean dirt”–into a publication that non-experts can understand is tremendously difficult (the tracking sheets alone tend to give people nightmares). If a normal essay is a typical jog, the type of writing most engineers need to do is a marathon through No Man’s Land at the height of WWI while wearing a weighted vest and chains around your ankles (that may be somewhat hyperbole, but not too much).

        In either case having a pseudo-AI program take a first stab at correcting things isn’t significantly different from having spell/grammar checks and an editor; it seems like an extra step, perhaps warranted, perhaps not, depending on the situation.

        1. I actually agree with all of this !

          (Except for the «obscure language» bit, as I was actually literal : say the engineer in the only one in the company fluent in that language, and for some reason or another it is not worth the effort to hire a full-blown translator for that.)

          I agree even that I was too harsh… but because he’s not the only one that had to suffer through the writing of his peers ! (That the level of care in writing was insufficient *for that specific context* seemed implied to me ?) Also because quite often the deadlines are tight, which might or might not point to an issue with the management.

          Still, I insist that it’s extremely worthwhile to learn to express yourself well… though now I wonder whether tools like ChatGPT might actually be helpful at least at the lowest levels of proficiency : by showing how to write well (enough) ?

          (But again, with the same caveat as with students : at some point you have to take off the training wheels… and if you were already constantly under pressure, and don’t have the equivalent of student exams where you cannot «cheat»… progress will be hard !)

          But that’s for the people studying linguistics to do experiments on.

          1. “Except for the «obscure language» bit, as I was actually literal”

            The boundary between jargon and language is fuzzy. Engineers require a jargon that’s well on its way to being a new language; at the very least it’s different enough to be nearly if not totally incomprehensible by others. And to be clear, engineers understand one another fairly well (at least, those within the same field of engineering); it’s in communicating to non-engineers that they struggle, in part because of the difficulty in translating their jargon to vernacular.

            More to the point though, engineers (at least the ones I work with) think differently than essayists. The essay is a powerful tool for training rational thinking, but not the only one. Design specs, redlines, tracking sheets, and the like are not essays in any sense of the term, yet are often used to gather data, present arguments, and even organize thinking about problems–often to levels of precision that are simply not available to verbal formats (our host doesn’t argue his points to within 1/1000th of an inch). In this context every engineer I’ve met can express themselves extremely well, whether they can present the arguments in verbal or written format or not.

            Put a different way: I could spend several days writing an essay about two different design options, carefully crafting the terms so that the costs and benefits of each are clear. Or I can take two hours and make a drawing that shows the same data in a more useful and more accurate fashion. Ignoring costs, the latter wins hands-down. (To tie this back to the blog post, the reason is that the picture addresses the real-world referents in a more straightforward way in this context than the essay.)

            Is it useful to express these ideas verbally? Sure. Useful enough that most firms hire writers (someone who’s got the technical chops to understand what’s going on, and the writing chops to present it accurately). But that proves that we’ve already established a way to deal with this problem. The question you’re attempting to answer is “Is it useful for engineers to express these ideas verbally?” An emphatic “No!” is not an unreasonable answer here. Because remember, every hour my engineer spends learning and utilizing a skill that’s secondary to his job (because I’ve already got a way to handle that job) is an hour he’s NOT spending doing the thing he’s good at.

            A counter-argument will be that ChatGPT or some other program can replace the writer, but that’s not true either. Writing of this sort is a back-and-forth thing–I don’t just hand a pile of documents to the writer and say “Okay, monkey, write!” The relative importance of various things changes over the course of the project and the writing process, and thus a back-and-forth is required between the writer and the doer. It requires deep understanding on the part of both parties, and excellent communication between them. Not something that chat bots can replace.

          2. > In this context every engineer I’ve met can express themselves extremely well

            And this is great, and I have no reason to doubt you… for the engineering fields where screwups have quite direct and deadly consequences !

            (Looks like the «engineering» term might have been the wrong one to use here, as bskubi seems to have had something more akin to research in mind instead ?)

        2. I disagree: that’s not harsh enough. A significant chunk of engineering training is invested in communicating ideas to other engineers and to non-engineers, because that is what they spend all of their time doing. It is important. If they cannot do it, they are professionally unfit.

          1. Then an enormous number of professional engineers are professionally unfit.

            That aside, however, the main point of my argument isn’t that using ChatGPT as a form of autocorrect will allow people who *can’t* communicate effectively on their own to do so. The point is that it will lower the amount of effort it requires for *anyone* to communicate their ideas effectively in the specific forms of writing I am talking about.

            Think of it as analogous to a linguistic power saw. If a carpenter did not know how to use a manual saw, that would be reason to doubt their professionalism. Sometimes, a manual saw is the right tool for the job. Some people choose to exclusively use hand tools in a hobbyist context.

            Furthermore, power tools are more dangerous than manual tools in most cases. People can cut their fingers off, or hurt the people around them, if they use them improperly.

            Nevertheless, in modern times, a carpenter who refused to use power tools on the job would not be compatible with most work environments. We recognize that despite the risks, power tools are extremely useful for specific uses, and we embrace them in those cases.

            If you can explain to me why ChatGPT is fundamentally different from a power tool for writing, used in the ways I’ve described, as a form of “autocorrect” to improve sentence and paragraph structure when the user has supplied not just a prompt but a complete expression of the idea, then I will be interested to hear your argument.

            But the reason I am pushing back on your comment, and others here, is precisely because of the unfavorable heat/light ratio I see there. People like myself figuring out how to use ChatGPT in a useful, professional manner for our jobs should not have to endure people like yourself calling us unprofessionally unfit for doing so. Not only is it degrading to us, it is also setting a bad example for younger and more impressionable people, creating a cloud of confusion and judgment that they will have to endure as they attempt to establish their professional skillset.

      2. Here’s a simple example: not all engineers and scientists are native English speakers. Plenty of people have a diagnosable issue bearing on their ability to write.

        We don’t complain about translation software. Why would we complain about using ChatGPT to polish up solid ideas to make it easier for readers to process them?

        The goal of writing in an engineering and science context is to convey ideas accurate and efficiently to the reader, not to display the writer’s verbal prowess. Perhaps this is different in the humanities and social sciences, but I defend an engineer’s right to use engineered tools to enhance both their engineering and their engineered communication. Explain why in this one very narrow piece of the overall workflow there should be an exception where we do not use tools. Using tools is what humans do.

        1. I would know, I’m not a native English speaker myself !

          Our host has already explained why over-reliance on tools can be… «harmful» (?) (sub-optimal ?) by using the example of the calculator.

          And I didn’t get proficient in English thanks to translation software (or at least not more than a tiny fraction), but thanks to (after basic school education that I muddled through) reading ever more complicated texts with easy access to a digital English-English dictionary (the name of which escapes me now since it has been years. Linguee DOT com and its offline equivalents is great for that these days.) (and from some point, English-speaking & English-subtitled video with subtitles postponed for 1 second)

          And I even learned today that this method has a name !
          https://en.wikipedia.org/wiki/Input_hypothesis «also known as the monitor model»

          (Also you’re being disingenuous when you assume that I would bar disabled people from using whatever tools they might need to.)

          1. The example of the calculator is precisely why we would want to encourage the use of ChatGPT in appropriate engineering contexts. I would not want engineers relying on manual calculations, although it’s tolerable in some specific circumstances. ChatGPT is not a direct equivalent of a calculator, because the function I am describing is not to perform an exact calculation, but to produce a more readable product. Evaluating readability can probably be done quantitatively in some contexts, but in practice is mostly a qualitative judgment because it depends on the specific use-case.

            It is obvious to all that for the purpose of learning how to construct English sentences, writing or speaking English is the only effective way to learn that skill. But this is completely unrelated to the use-case I am advocating, which is not to teach English composition skills, but to produce useful engineering documents.

            The reason it’s relevant to bring this up here is that, currently, there is a big dark cloud being conjured by academics over the use of ChatGPT in a professional context, as you illustrated in your first reply to me (“Not to mention that this is borderline unprofessional behavior that shows a lack of respect for the readers, wastes their time, potentially introduces costly errors, &c.”). I want to emphasize that there is a real and highly beneficial use case for ChatGPT in the context of producing written documents, including academic documents, and I’d personally like to see future conversation centered on this topic.

            Brett is not conjuring up this big dark cloud, at least not deliberately. He is expressing his thoughts on why ChatGPT is an inappropriate tool for an entirely different use-case: producing the content of student essays on history topics.

            Let’s set aside the topic of idea generation, and focus on the topic of the construction of the sentences in which those ideas are conveyed. This is the primary topic of our discussion here.

            Sometimes, essays will partly be meant to teach the specific skill of constructing and polishing English sentences manually. ChatGPT would subvert this goal.

            Other times, essays will be meant to teach the specific skill of producing a document that clearly expresses an idea, and be partially or entirely agnostic about the specific set of skills the author uses to craft that essay.

            As Brett advocates, it’s important for teachers to be specific and thoughtful about what aspects of the process of essay production they’re intending for students to practice. It is clear to me that ChatGPT, and its future iterations, will be as integral to the production of writing as the calculator is to doing math in a professional engineering context. Insofar as teachers are intending to teach students skills that will allow them to participate effectively in the workforce, they should teach them appropriate ways to use ChatGPT in the classroom. To neglect this would be to do the students a disservice.

            As an example, the economist Tyler Cowen is currently teaching his students how to produce academic essays using ChatGPT as part of his normal curriculum.

            That doesn’t mean ChatGPT needs to be allowed on every assignment, or that it’s always the right tool for the job. But teachers should be making complains that students are using this powerful tool inappropriately, not that it’s a useless tool.

            This is also why I refer to people with disabilities and non-native English speakers as two obvious cases where having a tool to polish their English prose for functional purposes could be exceptionally useful. But I would go further and say that many people have to produce written documents in a professional context when writing is neither their passion nor their talent. For them, ChatGPT will be a way to produce a better output more quickly and with less pain, and I think that is an excellent form of technological progress that we should celebrate.

          2. > But teachers should be making complains that students are using this powerful tool inappropriately, not that it’s a useless tool.

            Hence why I added the carelessly in the «somewhat like carelessly using ChatGPT ?!» bit.
            (Also, just in case of a mistaken reading, I want to reiterate that the «borderline unprofessional» bit was about being unable to express themselves properly, while the bit about ChatGPT was an association that I found somewhat funny that I added later, though I get your point about why you are not pleased with this association.)

            Also, there might have been some form of miscommunication here, from your first post I’ve definitely gotten the idea that you suggest giving (non-disabled) people the equivalent of a calculator before they have even learned to do basic math properly.

            I mean… it’s not like once you’ve mastered relatively basic writing skills like orthography, grammar, «functional, readable sentences», and *maaaybe* even a more advanced one like «well-structured and readable paragraphs» (?) (but I see your point that this takes more effort)… – that you can easily forget them (assuming you are at least *sometimes* using these skills) ?!?

            Also I would guess that the answers to this overall question might be, like you had first pointed out, somewhat similar to the usage of autocorrect and thesauruses, for which we already have quite some years of data…
            *Supposedly* grammar and especially orthography have gotten worse since the widespread adoption of autocorrect with cellphones… but I’m not sure how much this is real and how much an urban legend..?
            (Also it might be almost completely unrelated, and instead coming from the usefulness of using shortened slang in fast text communication ?)

          3. I don’t think reliance on a tool is inherently problematic, so long as that tool is reliably available. Calculators are an excellent example of this. I’m quite good at math by most standards, and have far more education in the topic than average, but compared to a calculator my arithmetic is very slow and has a high error rate. Calculators are everywhere. I have three in easy reach right now (my phone, a keypad calculator, and my desktop computer). I can be much more productive and use my strengths much better if I don’t have to waste hours manually calculating and checking basic sums.

            To extend the analogy, not everyone needs strong arithmetic skills on an everyday basis, nor does everyone have a chance to practice it regularly. Many of the people I work with are primarily manual laborers who need to do calculations on shipping weights or similar once or twice per month. Calculators are universally available and make this task trivially easy, where without a tool it would be quite difficult for many people who otherwise excel at the job. I have known many clever, creative, competent people who have a hard time with basic math. Why should they beat their head against the wall struggling to do it by hand when the tool exists? Should we also avoid using electric drills, pallet jacks, and circular saws? What about spreadsheets for tabulating and analyzing data?

            Similarly, many people have a lifelong struggle with written communication. I’ve known plenty of skilled experts with great ideas, excellent analytical abilities, who can explain something perfectly well in person… but who simply cannot communicate well in writing. They have the education, they practice regularly at work, but they’re just BAD at it. Much of the work I do is rewriting documents produced by other people on my team to make them clear enough for an external audience. Everything from clarifying ambiguous grammar to rewriting paragraphs so they convey the same information at a lower reading level to restructuring an entire document for more coherent flow. If AI can do that as competently as I can, I would love to see them use it… and free me up to spend more time on other valuable tasks a machine can’t do!

            I think the negative reaction to AI-polished writing comes more from our emotional associations of writing as a human voice than from a rational evaluation of tool use. Every occupation uses tools and technology to reduce the burden of repetitive or difficult tasks, so workers can use their time, skill, and strengths in the most efficient way possible. ChatGPT is just another tool… not suitable for all applications, but where it’s useful why not use it?

        2. “We don’t complain about translation software.”

          “We” who? At least over in the anime/manga world, complaints about translation software are widespread and loud. Professional translators lead the howling pack.

          Why?

          Because translation software, at least the easily publically accessible ones, aren’t significantly different from ChatGPT. They produce something that *looks* right so long as you don’t look at it with any degree of critical thinking… But which actually contains errors ranging from the subtle to outright show stoppers. Translation is a difficult art, not a precise science.

          I agree with what the other poster is saying about jargon though. (It’s true of dialect too.) I speak fluent Submariner, and I sometimes have a hard time translating it understandably. Some of it is variant meanings of common words, some of it is that there’s a cultural background the listener has no experience with or referent for.

          1. That’s a fair point. Likewise, I’ve found that ChatGPT is useless for trying to turn basically competent prose into stylistically elegant material. It has no stylistic nuance. It can distinguish between negative and positive affect, and between very rough gradations of negative affect, but it really doesn’t have the deeper sense of empathy that would allow it to get the words just right, from an artistic standpoint.

            As an example, there’s an essay by Eliezer Yudkowsky that channels the mindset of a parent who does not feel attached to their infant. In the essay, Yudkowsky describes the baby as a “voiceless pink blob.” I tried using ChatGPT to see if it was able to reword the section in a way that preserved the original content and affect, but expressed it in different language. ChatGPT simply failed here, either wanting to switch to positive affect (i.e. “delightful rosy cherub”), using a negative affect that didn’t accurately represent the sort of limp emotional distance Yudkowsky was really trying to convey (i.e. “mewling slobbery monster”), or that was just not interesting to read (i.e. “silent uninteresting entity”). Sometimes it would complain about being put to this task, as it is programmed to do whenever it detects that it’s being asked to say something potentially derogatory.

            What I meant by “you don’t complain about translation software” was that people don’t complain that it’s somehow plagiaristic or morally inappropriate to use this software. They complain about the quality of the output, not typically the fact that somebody chose to try and use it. And translation software continues to improve, and find an increasing number of use cases, such as facilitating communication on the fly between a tourist and local who don’t speak the same language.

        3. With regard to the applicability to a science context: the core issue is, once again, that ChatGPT has no idea what an ‘idea’ even is. As an example adapting from my professional (biochemistry/toxicology) work:

          Consider a scientific paper in which someone wishes to communicate the relationship between a disease (e.g. COVID) and clotting of blood. They may have a lot of good data on both the direct measure of a clotting factor that’s produced (Evidence A), along with some data on the lipids in the red blood cell membrane (Evidence B). In my experience, poor scientific communication comes a failure to present synthesise Evidence A and B in such a way that a reader can follow their experiment. So you jump into “Evidence A (p-values here) was observed…” transitioning directly to “Evidence B (more p-values) was observed…”, except without the clarity of each piece of evidence being marked.

          In order to fix this problem we would need a tool that can first extrapolate the argument (Evidence A and B individually support the argument, but together they provide more support than either alone), and then refine how it’s being communicated. But since ChatGPT can’t determine a central idea of a paper (since it doesn’t ‘know’ what an ‘idea’ is) it can’t reorganize around making it clear.

          I can’t speak for engineering, but the struggle in scientific communication is fundamentally one of communicating *novel* ideas and findings. Thus a system that simply reproduces that which it has seen before is totally inept at tackling these new situations.

          1. Let me give you a specific example. I wrote a response to your comment, which is reproduced below. I then went to ChatGPT, typed “Polish this up,” followed by the entire comment in scare quotes, and ChatGPT produced a new version.

            Here is the ChatGPT version, which I personally prefer:

            I agree with your point that ChatGPT may not be effective in generating original insights, but I maintain that it can be useful for rewording existing insights to make them more readable. Therefore, I consider it an “autocorrect” tool rather than an “autoinsight” tool. While it may not be as valuable as a system that can generate novel insights, it still has potential for certain use cases.

            For instance, writers often find it challenging to edit their own work without taking a break from it first. However, with ChatGPT, they can get a polished version of their writing that feels like a new version without having to wait for a few days. This is especially useful when they need to send off their writing immediately, such as in the case of an email.

            While these use cases may not be relevant to everyone, I believe they are helpful for many people. It’s important to identify how to use this technology effectively and avoid using it in ways that could be problematic. Thus, rather than debating whether ChatGPT is good or bad, we should focus on how to use it to best effect.

            In my opinion, the statement “ChatGPT can function as autocorrect on steroids, and many people will find that useful in their workflow” is a fact. Similarly, the statement “ChatGPT can’t effectively generate novel insight” is also a fact. It’s important to acknowledge both statements without treating them as counterarguments to each other.

            —————————————————

            Here is my original version:

            I’d give a similar response to your example as I gave to others here, which is that you’re arguing that ChatGPT is inappropriate for a use case different from the one I am advocating.

            I agree with you that ChatGPT is not able to effectively come up with original insights. I am simply saying that it is able to reword insights that you have already provided to it to be more readable, which is why I refer to it as “autocorrect,” not “autoinsight.” And while I agree that this is much less useful than a system that was genuinely capable of coming up with novel insights, that doesn’t diminish the fact that “autocorrect on steroids” can be a highly useful feature for the right use case.

            Here is a second example to highlight another area where “autocorrect on steroids” is of clear utility to the writer, in a context that is probably familiar to many writers.

            Most people who’ve tried to edit their own writing know that it’s very difficult to do so, unless you’ve walked away from it for a day or two. It becomes psychologically difficult to approach the language from an outside perspective when you’ve just written it down.

            With ChatGPT, you can take material you’ve just written, and immediately get back a polished new version expressing the same thing, but in language that’s sufficiently different as to feel like an entirely new version. This provides you with an experience very similar to coming back to your work a couple days later, except you don’t have to wait. This is especially useful when you would like to be able to send off your writing immediately, for example if you’re writing an email. The workflow feels very different when you approach it in this way.

            Now, these use cases may not seem interesting or useful to you, and that is OK. They are interesting and useful to me, and I suspect to many others. The important thing is that we invite people to make best use of this technology. Part of that is identifying ways in which it can be used badly, but we should always approach that conversation from a perspective not of “Is ChatGPT good or bad,” but of “how can we use ChatGPT to best effect, and what should we avoid?”

            To me, “ChatGPT can function as auocorrect on steroids, and many people will find that useful in their workflow” is an epistemically positive statement – a statement of fact. “ChatGPT can’t effectively generate novel insight” is also a statement of fact. What is confusing to me is that your comment, perhaps unintentionally, comes off to me rhetorically as a counterargument to “ChatGPT can function as autocorrect on steroids,” but it isn’t. Both statements are true.”

          2. Reply to bskubi:

            Well, the ChatGPT version is certainly more concise, I’ll give it that. However, it also strikes me as…really bland. The overly-formal tone feels out of place for a comment on a blog (i.e. I much prefer your original, more conversational phrasing of “which is why I refer to it as “autocorrect,” not “autoinsight.”” to ChaGPT’s oddly formal “Therefore, I consider it an…”).

            I guess part of the reason I dislike ChatGPT’s version so much is because I genuinely enjoy all the idiosyncracies of communication of the different commenters on this blog (i.e. I’ll take a commenter who calls something an “epistemically positive statement” any day over one who’ll call it a “fact”).

            I can see ChatGPT being useful in cases where concision matters more than style (corporate memos, etc), or to reign in an overly-verbose writer (looks guiltily in the mirror), but in most cases, I’d prefer to read something written by a human.

          3. @Ben L,

            Thanks for taking the time to read and consider!

            The key point here is that whether or not you prefer ChatGPT’s version or my version comes down to a matter of taste. Each may have their own virtues, and be appropriate for different contexts, as you point out. If I’d wanted to, I could have selected the stylistic flourishes from my original post and incorporated them into ChatGPT, either by telling it what to do via a prompt or by manually editing them in.

            For example, I could have lightly edited the first paragraph of the ChatGPT version to read a little more conversationally:

            __________________________

            I agree with your point that ChatGPT may not be effective in generating original insights, but I maintain that it can be useful for rewording existing insights to make them more readable, which is why I called it “autocorrect,” not “autoinsight.” It may not be as valuable as a system that can generate novel insights, but it still has potential for certain use cases.

            __________________________

            One thing to consider here is that I’m a native English speaker and have been blogging and writing for a very long time. I’m not a great writer, but I can write with enough style that there is actually something to lose to ChatGPT’s tendency to neutralize stylistic flair. But when I write for a formal context, as in my research, or when I don’t have the spoons to be stylish but still want to participate in the conversation, I’ve found that ChatGPT allows me to be more readable, generous in debate, and efficient in writing what I have to say.

            My last point is that ChatGPT strictly adds options to the writer’s menu. Nobody has to use it. What we should mainly focus on is a nuanced conversation about how to use it effectively, and what the tradeoffs are, which is exactly what you and I are doing here. This makes me pleased. Brett’s original post is necessary, because a lot of students are going to make the mistake of trying to use ChatGPT as “autoinsight.” I’d encourage him and other teachers to make a substantial effort to identify ways students can use ChatGPT appropriately on their assignments, and encourage them to explore this tool. Right now, it feels like they are creating a climate of fear, in which even a student motivated by a genuine interest in using ChatGPT in a positive way would be afraid to admit they are even trying, lest they be accused of some combination of unprofessionalism, plagiarism, laziness and stupidity.

      3. Are the number of translators less now because of Google Translate? Not really, it has just made the capability for translation more available to people who couldn’t pay for one. GT after all is best for things like single words, and it doesn’t give you translations for (many) slang words or idiomatic phrases (or it gives you the literal translation of each word, which is not the point)
        As for writer of sports columns – I suppose a column giving you the box scores (or equivalent) could be done by machine – but then if you’re just looking for the scores a ten line program can do that, while our host has already shown just how far off ChatGPT is from actually providing any sort of analysis.
        A journalist who can be replaced by ChatGPT looks like someone who takes press releases and reprints them (occasionally changing some words around). There’s a number of those around (this is famously what one Obama staffer said). But someone who does that is not training at all.

        1. Feature articles get all the attention in journalism, but news outlets spend a ridiculous amount of money producing straightforward content. Business outlets are already automating article production for their initial coverage of routine financial announcements — earnings announcements, government jobs numbers, etc. That gets the base article out faster so the journalists can spend their time on the contextual stuff with subsequent revisions.

        2. “Are the number of translators less now because of Google Translate? Not really, it has just made the capability for translation more available to people who couldn’t pay for one.”

          This is untrue, unfortunately. As someone who works in media translation, I can tell you that DeepL and Google Translate do in fact depress translator wages as well as reduce the amount of available jobs. A major reason is that many executives and middle managers cannot tell the difference between the outcomes of machine translation and manual translation, but can tell the difference between “free” and “costs money”.

          This is particularly present in the video game and web novel subfields, where people will run the source through machine translation, see that it looks vaguely right (often only to someone who doesn’t speak the language) and ship it, often with the idea that errors can be “cleaned up” once they’re noticed afterwards (Anyone who works in translation can tell you that cleaning up machine translated text is at best just as long as translating it from scratch, at worst twice as long, but is paid significantly less. I outright tell people that if they have a machine translated version, I don’t want to see it.).

          And the damage it does to the industry isn’t just financial- maybe it’d be okay if it actually worked, but it just doesn’t. It lacks the same emotional beats as proper writing, it frequently misses context or subtleties we would see… I’ve seen novels I quite enjoyed have the translation completely botched by machine translation causing plot-critical or character relationship critical information to be lost.

          Maybe compared to Google Translate and DeepL, ChatGPT will be somewhat better at this. Perhaps it’ll function better if one adds in the prompt “Make sure to write each character with a unique voice. Use source-language quirks of speech and favourite phrases for a character, but don’t go overboard on it if there’s no good equivalent in English. When you encounter an idiom or a pun, try to substitute an equivalent one in English, but don’t force it- you can make up for a missing joke by putting another one elsewhere if you like, etc.”. Yet as Bing is amusingly showing us, writing goals and directives in a prompt isn’t the same as it having the capability to carry them out. Interestingly, if it is capable of that, its process is entirely different from how I as a human do it, which is by imagining myself as a bilingual version of each character and thinking how I’d say something.

          Ironically, for a variety of theoretically-soluble technical reasons, short of medical and legal translation or other high-liability situations, video games are actually one of the *worst* things to translate via machine translation- in my experience novellas are handled much better (well, less badly, anyway).

          As far as I’m concerned, language model machine translation is useful only for pasting a sentence in and seeing if the model has an interesting turn of phrase it comes up with (it almost never does, but pretending it’s a novice translator you’re mentoring can occasionally provide help- a reverse version of Rubber Duck Engineering.)

      1. This is actually an area where neural networks have a lot of potential. When you’re looking at large bodies of human-written text, you can start to make inferences about what words normally go in that place in that sentence that are much better than the inferences you might get from trying to guess what word is meant by that sequence of keystrokes.

        (That’s the quibble I’d make with Bret’s “using AI in any academic work is plagiarism” stance (and one I suspect he would modify if this became a thing): a model that’s tuned for this sort of revision might make an excellent revision pass. You write your first draft, run in through EditGPT to get a second draft, then go over it yourself to get a final.)

        1. When I’m writing and the software suggests the next word (this happens on my iphone and on some blog comment pages), I usually figure that I’ve become hackneyed, stop, and look for a different word. (Obviously not in simple cases, like when I’m texting my wife and the iphone wants to insert “home” after “On the way.”)

          1. I have never personally used autocomplete in this fashion, but it sounds like you have identified another potentially useful function of this software as a way to stimulate more creative expression in writing. It’s these sorts of insights that motivate me to keep raising the topic of how to use these software tools most effectively. Thanks for sharing.

    2. Can it really “autocorrect” ideas though?
      I often use google translate for basic translation of documents, then check and fix it until it readable. But it is for translating language.
      Translating ideas into more good writing seems to be much more difficult though.

      1. I don’t mean it autocorrects ideas. I mean it autocorrects language, in a directly analogous way to normal autocorrect. Instead of correcting typos and local grammatical mistakes, it can reorganize the structure of sentences and paragraphs to make them more readable, while preserving the ideas you have already supplied intact.

        While it is imperfect at this, you can regenerate responses and check to ensure the idea was indeed expressed accurately and clearly, as you should be doing no matter how the draft is produced.

          1. It can, I have written many pages of complex engineering writing to explain my own research methods that I use daily in the lab, and it performs nearly flawlessly at preserving the specific content I provide intact. When it screws up, I can spot check it, which is the specific use case I am advocating. In that process, I sometimes discover places where I myself had screwed up (i.e. substituting “homogeneous” for “heterogeneous”). Because ChatGPT frees up time and mental bandwidth for this spot-checking process, I believe that the resulting output is more correct than it would be if I handcrafted this documentation without ChatGPT. It is also certainly more readable.

        1. Then I think there are already some websites and online converters that can do it (making an awkward sentence sound better).

    3. From one engineer to another, please do not wish for this.

      The very best possible outcome, if your colleagues started using ChatGPT for documentation, would be that they thoroughly read and analyzed the output and manually corrected it for errors and poor communication. But people who (on your description) aren’t good writers to begin with probably lack the sensitivity to do a good job of editing the output, or even realize the ways in which it needs to be done.

      Set aside the issue of confidently, subtly wrong information–one of the big tells I’ve seen in ChatGPT output is that it is colossally wordy. Without careful coaching, it’ll keep repeating the same (under-specific) idea in different ways for paragraphs, producing the prose equivalent of junk food. That is also definitely not going to lead to accessible documentation that clearly communicates the core ideas of a technical subject. Instead it’ll give you ten pages of fuzzy circumlocutions without ever telling you what you actually want to know.

      1. Using ChatGPT to edit writing as I propose is a skill, much like writing itself. My direct experience is that it is extremely useful in this regard.

        Perhaps there is a greater risk of inaccuracies slipping through if engineers use it than if they are forced to craft the precise wording of their technical documentation themselves. It may be a bad idea to use it in something like an electronics data sheet. The use case I am imagining is the wordier sections of a scientific paper – probably not the methods, but the abstract, introduction, results and discussion section. It may also be useful in intermediate drafting stages.

        But I’m generally against the “safeguards” argument you are making, especially as this software continues to improve. I acknowledge the risk, but the benefits are also important, and I think the benefits will continue to grow while the risks will shrink over time. My main goal here is to explain the professional use case, not to say that every piece of engineering-related writing ought to be put through ChatGPT.

      2. Oh, forgot to mention: you seem to have interpreted my original as calling for engineers to supply a general sort of prompt, then have ChatGPT produce an output much longer and more information-dense than the prompt supplied.

        That is not what I said. Instead I am pointing out that the engineer can put a paragraph in containing all the relevant information, then get a polished and readable version of roughly equal length back out. It is possible to use ChatGPT to shorten the prose (ie tell it to “polish this up, making it 90% of the original length”). It is usually good to about 20% error margins when you do this, up to a point.

        Again, it’s absolutely necessary to check to make sure the information is preserved intact and is actually clear, and you’d only do this in certain documents and not others. But for the places where it works, it will be an enormous boon to sharing information with greater clarity, and I suspect it will boost accuracy overall as well by letting authors focus more time on that and less time on sentence construction, while lowering the risk that bad writing causes readers to misinterpret the author’s intentions.

      3. One last note: I have produced extensive academic writing (as yet unpublished) using ChatGPT in the manner I describe here. Again, I supply all the ideas, and the output is just a reorganizing of my structure and my ideas on the sentence and paragraph level. The description you give of output being “fuzzy circumlocutions” is just inaccurate using the workflow I describe, although it’s certainly accurate if you just supply short prompts and hope that ChatGPT will supply the substance.

  2. As it happens, I’m (re-)reading Mackay’s EXTRAORDINARY POPULAR DELUSIONS AND THE MADNESS OF CROWDS. I’m on the alchemy chapter now, and I found myself thinking of the exact analogy you drew between alchemy and AI research. It’s very exciting to be in a field that is making great strides on local problems—but that’s no guarantee that the guiding star of the overall research program is within the grasp of current methods!

    And similarly, when we get to those guiding stars, we often find that the science involved has quite different pragmatic implications than we thought, centering on different questions than the ones we first asked. Nuclear physics’s ability to turn lead into gold is a peripheral factoid with no practical concern, and wildly overshadowed by a dozen other more-meaningful applications. Powered flight’s impact on society is dramatically more concentrated on its ability to move a jetliner of mass hundreds of miles in an hour than it is on the (at this point still roughly implausible) ability to let an individual person flutter around a city like a sparrow. And so on.

  3. Stupid technical question:

    How do you get the comments on the download in a easily readable form? My computer opens it with Open Office 4 and refuses to show more that 1.5 lines of a comment at a time. I can’t find out a way to expand the whole comment, or even tell it to hide all the “Bret Devereaux Yesterday 16:13” part hat seems to be considered more important by my computer.

    What’s the right tool for reading this?

    1. I suspect that they would show correctly in Microsoft Word. If you don’t have that, I’ve found that if you upload it to Google Drive, the editor they have for Word documents shows the comments in a very readable fashion.

    2. I thought the development of Open Office had been stopped for like a decade now ?

      Anyway, I have the same issue with Libre Office… but at least they are aware of it :
      https://ask.libreoffice.org/t/change-width-of-comments-column/39784

      And using MS Office or Google Docs is out of the question.

      Thankfully, it’s relatively easy to extract the comments themselves : just extract the .docx as a .zip, and the comments are under word/comments.xml. Any text viewer can open this, though you probably want one that can show the structure of the XML for easier reading, like Firefox.

      (The biggest remaining annoyance is that some of these comments take multiple XML tags for some reason.)

    3. Microsoft Word did a good job with that, but if you don’t have access to that, it was readable when opened in Google Docs.

  4. My one experience with ChatGPT was in trying to find a citation to a paper I had read ages ago but didn’t remember enough about to successfully find in a search engine. I assumed that, by asking pointed enough questions, I could lead it along the garden path far enough that its word-association with the correct citation would be very strong.

    I was successfully able to lead it to the point where asking vague questions got correct (if a bit vague themselves) answers, but that final step–asking it to find a citation which *must* have existed in the works it was trained on–failed. I couldn’t get it to produce a citation to a paper that actually existed.

    The most frustrating part was that the paper it produced had a very plausible title, publication year, journal, and author list. The description it gave for the paper listed methods which were so reasonable and precise (as opposed to vague) that I suppose that some paper in its corpus must have actually *performed* those methods. If I could simply ask it “what document in your corpus most influenced your last answer?” I’m sure it would have been a great research tool!

    But unfortunately, the one task where I thought that a pile of linear algebra on word associations would be useful–digging up a half-remembered fact–is beyond the current iteration of this tool. Fortunately, I don’t think the task I had in mind requires true intelligence, and chatgpt 2.0 might well be capable of it. But for now, I’m back to scratching my head wondering where that paper I read a year ago is.

  5. Interesting article!

    A disagreement:

    “It does not know that water makes you wet, only that ‘water’ and ‘wet’ tend to appear together and humans sometimes say ‘water makes you wet’ (in that order) for reasons it does not and cannot understand.”

    I was with you until that part, but this statement is more controversial than you claim. Many ML experts would argue that for most useful definitions of knowledge, GPT-3 does indeed know that water is wet, and does ‘understand’ what it means for water to be wet. (getting that knowledge out of the model is another issue, since it doesn’t ‘care’ about saying true things – hence prompt engineering, RLHF, ..)

    for some weak evidence of this idea we can look at the arithmetic capabilities of LLMs[1-2]: at the outset, many claimed that models didn’t ‘understand’ addition, but simply had memorized all sequences of numbers like ‘1 + 2 = 3’. Except that the models would get some arithmetic operations right that were not in the dataset. Later, through (partly) prompt engineering, we saw clear examples of correct addition which could not have been memorized, and so implies that the model must have learned an arithmetic-like circuit for this operation.

    [1] https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html
    [2] https://mobile.twitter.com/stanislavfort/status/1599415657556508673

    this – combined with the fact that most things we consider ‘understanding’ can be related to operations of symbols, and that it appears that, at scale, learning the actual underlying symbolic relationships, rather than memorizing or statistically associating outputs can end up being more efficient and possible for gradient descent to do [3] – leads many researchers to believe that LLMs do in fact have similar models of how water is wet – though certainly less detailed, missing links to things like latent embeddings of physical experiences with the water, etc, but nonetheless good enough to pass the turing test about water-being-wet related subjects, in the limit of scale

    [3] https://www.lesswrong.com/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking

    I really appreciate your post, and agree with the general sentiment, but there are quite a few statements like the one above where we ought to have a little more epistemic humility.

    I think many of these ‘only human brains actually know things’ statements will age poorly, and will end up sounding like this (satirical, but entertaining) paper[4] – not just in the distant, but also in the near future.

    [4] https://arxiv.org/abs/1703.10987

    1. “Understanding” is a concept that our teacher in philosophy class (yes, I study a CS related study but we had a philosophy class) had trouble defining and explaining. Ironic.

      And makes it a huge pain to discuss whether computers “understand” something or “just seem to”. Though that was also partially on purpose I think

    2. You are not understanding the human concept of ‘wetness’. It is not a word, but a sensation. Give ChatGPT fingers and some sensory analogue to the nerves that detect temperature, viscosity and the rest, and it might make it. We are not brains riding a body, but bodies inhabiting a world.

    1. Buddy, this is a blog. Here I write how I want and if that means being appallingly verbose, so be it; it’s an informal register. If you want to read my formal writing, you’re welcome to click over to my CV and go track it down.

      That said, I invite you to look at all of the comments around you, maybe glance at the attached Patreon, draw an inference about the total readership here and consider if perhaps as an exercise in communication my writing is quite a bit more effective than you thought.

  6. I think the hubbub around ChatGPT in academia mainly exposes how our priorities around education have shifted, especially in basic education.

    A lot of people (unfortunately) don’t see education as actual education, but rather as a series of trials to determine who is “worthy” of the fancy paper that says “you’re smart and deserve a good job!” That’s the context which leads to regarding assignments like essays as just arbitrary hurdles on the way to proving oneself rather than opportunities to learn.

    Consequently, the real challenge ChatGPT and potential more refined future versions pose isn’t that they will make certain forms of learning obsolete, but rather that they will make obsolete the ways in which we test that learning has been done (admittedly more so in basic education than at university). The real challenge we need to deal with is shifting focus away from examination and back towards the value of learning itself.

    1. I think it exposes the problem of a lot of academia having lost a lot of its credibility (I don’t want to argue rightly or wrong here, but that is a topic that’s up for debate). I’m a Uni dropout who’s met a lot of people who succeeded where I’ve failed and seen that some departments seem to churn out a lot of people who know how to regurgitate what they’ve been told, but don’t understand their material.

      I was surprised at how similar Brett’s essay is to a lot of academic writing. It carries the same tone. You could easily convince me that it was written by somebody who didn’t know what they’re talking about. And answering with bullshit instead of saying you don’t know an answer is very human.

      I think talking about ChatGPT for essay writing is missing the point. It can write in a very human way. But Brett doesn’t want essays written like a normal human.

    2. Yep, a lot of higher education has become essentially an initiatory ritual, where you are “taught” things you will never need for your actual future work and have to prove you’ve “learned” them simply as a multi-year trial to allow you access to the better segment of the job market.

      1. What gave you the idea that the main purpose of higher (undergraduate) education was to prepare you for work ? That’s what vocational schools, graduate projects, PhD’s, and especially in-job training are for. Whereas undergraduate education is first there to make of you an independent and educated citizen, one knowing how to learn and fully able to learn and to participate in the democratic process. (And for networking, for better or worse.)

        And don’t forget that this is a time of (self-)discovery when most people aren’t even sure what they want to do in the future, so specializing them too early would be counter-productive anyway.

        Also, isn’t the great thing about Anglo-American companies is that they don’t care about your credentials, and (some regulated for safety professions aside) will give you a job regardless of your diploma, as long as you’re able to make a positive contribution to the company ? (Of course, this doesn’t mean that you will not have a hard time to *prove* that to them, if the competition is tough.)

        1. Colleges sell the job aspect. They do not get to back out.

          Also lawyers do everything they can to prevent allowing people to prove themselves

        2. “What gave you the idea that the main purpose of higher (undergraduate) education was to prepare you for work ? ”

          Well, the fact that nearly any job has an undergrad degree as a requirement.

          For a lot of people, college is now no different from vocational school, except for a different segment of the job market.

        3. I took a few senior-level history courses full of students who were about to get an undergraduate history degree. Sometimes a job recruiter would come in looking for people to hire, but the job never had anything to do with history; they just wanted a bachelor’s degree, any bachelor’s degree. And that’s exactly the kind of job you expect to get if you get a liberal arts bachelor’s degree without moving on to post-grad.
          This is what gives people the idea that the purpose of higher education is to prepare for work. As for Anglo-American companies not caring about credentials, that has not been my experience or the experience of people I know.

          1. Heh, I guess that I should have been upfront about my tone in the first half being half-ideal / half-sarcasm about how the education is often presented, and in the second part half-wondering about how it *really* is in the US / half-sarcasm about the people afflicted with «the grass is greener elsewhere» syndrome. (Also being more aware of the STEM fields.)

          2. “half-ideal/half-sarcasm”–Poe’s Law tells us that comments in that mode definitely won’t work.

          3. @ey81 :

            To be extra-clear, I let my «wishful thinking» (is that the appropriate term here ? a «playing dumb» reflex persisting since middle school ?? «performative activism» with myself (my super-ego ?) as the intended spectator ???)
            get in the way of my knowledge that things are *definitely* not as rosy…

      2. I love higher education.

        But this has always been true. You can tell what higher education is for by who it has historically tried to exclude (and still, in fact, spends a lot of time excluding).

    3. The problem is that college is sold for the degree. Complaining that people don’t want what they didn’t come for is pointless.

      What is really needed is Statistics 101 for the practice of law. Then disparate impact flies out the window, companies go back to testing, and a lot of colleges close because the market for learning is less than for degrees.

      1. This would require a lot of consent decrees to be struck down and regulatory agencies to reverse themselves on their rulings.

  7. Superb. Thanks very much for this Bret. I particularly loved the Amazon box analogy. So lucid and intuitive.

  8. Of course ChatGPT is useless for writing good essays. Of course it will make stuff up. Its model is nowhere big enough to hold a fraction of all the information in the world, let alone close to all of it.

    If you read the papers on the underlying technology, the creators of ChatGPT know all this. Their estimates of accuracy of the underlying technology are 10 to 40% (the so called “Zero Shot” approach).

    But while ChatGPT is meant to be the “attract mode” of their world, the underlying tools they’ve built (and sell access to) are far more useful. Should you move from a generic application like ChatGPT into a bespoke one that “knows” a lot more about a specific topic, the quality of the answers will go up. We’re not at a point where these tools are readily usable by history students (unless they’re CS double majors), but that day will come.

    A small anecdote from my personal life. When I was an undergraduate, word processors and spell checkers did not exist. But as a CS major, I was able to cobble together crude tools to ensure my essays were nicely formatted and my text free of blatant spelling errors. At a time when almost all students were still using typewriters, I had a real advantage because I could use $100,000 computers to do my typing. I suspect that these tools, too, will find an adoption curve of some sort, starting with the technologically advanced.

    1. But writing quality of these tools will increase as you add more data. Sure, you could decrease the amount of training data to make it more specific for your topic, but then it will probably also get worse and the prediction of the next word in general

  9. I agree with all of the conclusions you have reached here. However, I wanted to clear up what seems to be a misunderstanding of terminology.

    > ChatGPT gets called an artificial intelligence by some boosters […] but it is not some sort of synthetic mind

    > (I actually dislike a lot of these terms which seem to me to imply that what we’ve achieved is a lot closer to a synthetic mind than it really is, at least for now)

    As a former professional AI researcher, I can tell you that ChatGPT absolutely does qualify as artificial intelligence. The reason for my confidence is that “artificial intelligence” is a piece of jargon whose modern meaning is quite different from its meaning fifty years ago. These days if you wanted to talk about a program designed to do something comparable to capabilities of human intelligence (like reasoning or problem-solving or moral intuition), you would use one of a number of much more specific terms like “cognitive systems”.

    ChatGPT transforms natural language content into an internal representation and then interacts with a human user via natural language to produce from that representation a different natural language content, and that puts it squarely within the current technical meaning of “artificial intelligence” in research/industry. Saying otherwise would be like me saying “the Roman legions get called a military force by some boosters”.

    In the same vein, you say

    > More broadly, as far as I can tell it seems that a lot of AI research […] has proceeded on a ‘fake it till you make it’ model […]
    > we don’t really know how a mind works at full complexity, so we’ve chosen instead to try to create machines which can convincingly fake being a mind in the hopes that a maximally convincing fake will turn out to be a mind of some sort. […]
    > so too our AI engineers are forced to work without really knowing what a mind is or how it works

    But this implies that the singular goal of the (enormously broad) modern field of artificial intelligence is to produce a mind or something like one. That was pretty much true at the field’s inception, but is now only true for a few of its subfields! To say that is the ultimate goal of work in the whole field of artificial intelligence is like saying that the ultimate goal of work in the whole field of history is to analyse all the social connections between historical individuals.

    I suspect this misunderstanding of key jargon is quite common outside of compsci because e.g. lots of popular science fiction uses “artificial intelligence” in that narrower and more grandiose original sense. Personally I would have preferred to keep the original meaning and develop new terms for work that’s in the field and has nothing to do with mind or cognition… But that’s not what happened.

    1. I’m surprised that you haven’t directly referenced the https://en.wikipedia.org/wiki/AI_effect : «The AI effect occurs when onlookers discount the behavior of an artificial intelligence program by arguing that it is not *real* intelligence.» (And that I forgot how appropriate it is here.)

      Because of this I prefer to just use the term «(computer) program».

      Or for the recent brand of «AIs» : «neural network»
      (or «machine learning» if/when that becomes not the only effective way we have to do machine learning),
      which is radically different in the sense that we aren’t dealing any more with programmers starting from an algorithm on paper (or just in their minds),
      but with a bunch of (these days simulated) neuron-like entities that are first trained on a task, at the end of which there are probably some digitized algorithms in there, but extracting and understanding them is at this point almost hopeless.
      (Of course you still need to program the overall architecture of the network, and various interfaces, but I would say that this is more akin to what the creators of programming languages, compilers, development environments… do for «classical» programmers.)

  10. This is actually frightening as it displays a lack of, well, thinking (a simplification for all the complex workings of the human mind) in the humans who use the AI chatbot and who evaluate its working.
    I looked up your links about AI hallucination (this is all rather new to me) and I think the emotive language used is really, really dangerous: Bots “>>sociopathically<< and pointlessly embed plausible-sounding random falsehoods"; "It embellished with abandon, even when it didn’t have to";"the computer still lies" – imply intention and malice which is even more antropomorphisation than the underlying argument that the chatbot "knows the truth" but lies rather than to admit it. The point is your core statement, that the bot has no concept of truth and lie and whether something is real or not. A fake reference fulfils the (formal) criteria of "reference" it was given.
    I'm rather afraid that people get the sort of arteficial intelligence their intelligence deserves…

    1. There’s IMHO important context missing here : chatbots aren’t the only applications of neural networks, in fact IIRC the first really promising ones were working with pictures. And the kind of hallucinations that they are susceptible there are AFAIK much closer (and maybe even identical in the way they work ?) to at least some of hallucinations that people can have, that happen at quite «low» levels in our visual systems : https://en.wikipedia.org/wiki/DeepDream

  11. I’m generally an AI skeptic, and I agree that students using this will only be hurting their education. But you’re going too far in claiming that no understanding is or ever could happen, unless you have a theory of what understanding actually is that shows a clear line between what ChatGPT does and what humans do. I’ve certainly seen humans write superficially convincing sentences that completely miss the point of the thing they purport to describe, or write essays that merely regurgitate, poorly, a mishmash of existing essays on the subject (or at least, are indistinguishable from that). And I’ve seen ChatGPT-like tools produce legitimate insights – you may say it stumbled on it randomly, which may be true, but we could say the same of when a human does the same thing. Talking about how it’s “just statistical correlations” is missing the point – our brains are just electrical impulses firing according to some very simple rules, but somehow thought – or something that looks like it – emerges.

  12. Some notes on coding: There are very different kinds of bugs, of different severity and associated costs of fixing them.

    Code editors for the most popular frameworks are tightly integrated with their supported languages (Integrated Development Environment, IDE) and will precisely point out syntactic mistakes (those that would prevent you from compiling successfully) as you type them, causing you practically no time loss. This is the easiest bug to spot and fix whether or not the code was written by a language model. This is also a bug that I would least expect a language model to make because its training data probably contains very little code that doesn’t even compile.

    Logical errors that cause a crash at runtime can be more tricky. The IDE will (under most circumstances) point out exactly where the error happened and let you inspect the state of the program just before it happened, eliminating most of the guesswork as to the underlying cause. This is more expensive to fix than syntactic bugs, but identifying the cause usually doesn’t take more than a few minutes either.

    The most expensive kinds of bugs are those that are also the least amenable to automatically assisted fixes, i.e. when the program runs nominally correctly but produces some undesirable output or, relevant to gaming especially, it crashes only under specific configurations. These have a basically unlimited timeframe to fix them, ranging from minutes to years. I would expect this to be the most common bug by a language model, not lastly because a language model can’t really know what output counts as desirable.

    In summary, I don’t expect language models to significantly help software development or even replace developers any time soon. The bugs they are most likely to produce are those that are the most expensive to fix, and the cheapest to fix bugs are cheap because existing tools are very good at analyzing them. I’m ambivalent about being able to generate a lot of boilerplate code, even when it’s entirely correct, because it can make economic sense but also speaks of poor code quality that incurs technical debt.

    1. “The bugs they are most likely to produce are those that are the most expensive to fix, and the cheapest to fix bugs are cheap because existing tools are very good at analyzing them.”

      Yes, that’s it in a nutshell.

      So far, nobody has shown me an example of a LLM writing code for me that I wouldn’t have done better to just look up on StackOverflow, which exposes how answers are rated and the discussions surrounding them. And the Large in LLM is a problem; if I want to do something common, it’s almost certain to be already available in my library. More commonly, I notice that we have done something repetitive that justifies factoring out; for that I would need a Small Language Model.

    2. A tool to “generate a lot of boilerplate code” is called a compiler. I would say that the problem indicated by the existence of copious boilerplate code is not so much a matter of low quality (at least, not directly), but that the project is written in too low-level a language.

  13. I’ve been mulling over “AI” for a while now, unconvinced by the qualities that its proponents claims it possesses. But I was struggling to articulate WHY, and you’ve outlined it here perfectly. It seems wrong to claim ChatGPT, and the various other “AI” services out there, are actually AI, because although they’re artificial, there is no intelligence involved in the process, as there is no thought or consideration. It seems it’s essentially a highly advanced form of mimicry, but it no more understands what it’s imitating than a lyre bird understands what chainsaws and car alarms are.

    The proponents of AI definitely don’t seem to understand what an essay is actually for, either because they haven’t been in the education system in a very long time, or they never understood the purpose of essays in the first place. I’ve recently gone back to college as a mature student and it was immediately obvious to me that the point of the essays was to demonstrate not only that I understood the subject, but that I could communicate my ideas and analysis in a way that could be understood by others. It’s not really about the essay itself. Even if I could successfully cheat the teacher (and I don’t think I could), I would be cheating myself as well, which is far more important.

    I’ve got an essay due in 6 weeks. The prompt strikes me as well-written: it’s about investigating the relationship between two groups of artists, one of them quite obscure but local to my geographic area. I’m not going to use ChatGPT… yet. Once I’ve written the essay myself and submitted it, I’d like to submit the prompt to ChatGPT just to see what it spits out. Might be giggle-worthy.

    As specifically an art student, I have at times been worried about the art-focused “AI”, but… I think it’s the same kind of situation as with ChatGPT. Its proponents think it will replace artists because everyone will be able to just type in a prompt and get something pretty – because art is another of those sectors that “AI” enthusiasts think is simple, but which they actually have no understanding of whatsoever. Just as “AI” cannot understand what words mean, it doesn’t know WHY it’s creating a specific image, both on a practical level (why certain colours or compositions work) and on a spiritual/emotional one. As a result it will never be able to create anything genuinely new, and without more artwork going into the training datasets (which with or without regulation, is going to become more difficult as professional artists become more guarded with their work), “AI” generated art will not be able to keep pace with changing trends and tastes.

    1. Funnily enough, there is a valid concern about art-focused AI… which is that it will disrupt the “lucrative” business of producing commissioned fan and/or fetish art (spoiler: it isn’t actually all that lucrative, but it can pay your rent if you juggle enough of them at once).

      But that’s mostly because a non-negligible portion of the people purchasing that stuff don’t actually care that much about the quality of the work they’re commissioning, as long as the end result isn’t outright bad. To indulge in a metaphor, they’re actually want the essay, and are probably over the moon about the fact that they can get more essays for less money now.

      1. Commercial art often has criteria like “fits the color scheme of the room it’s in, subject matter is work-safe, does not break the budget”.
        Artistic merit may not be included.

    1. He’s arguably the best short story writer currently working in the field. That could be damning with faint praise given that short work isn’t as dominant in science fiction as it was in the Golden Age, but he’s very good and I only wish he were more prolific.

      One of his stories, “Story of Your Life”, was adapted into the film “Arrival”. “The Truth of Fact, The Truth of Feeling” in an interesting science fictional spin on the effect of writing on traditional oral history.

      1. He is absolutely one of the top science fiction authors writing today, and I was a bit shocked to see him described as a “technical writer”.

  14. While I agree with the main points, that (current) technology can’t do proper research, and if you use it you, at best, cheat (yourself), I also hold the point that if assignments can be faked using ML tools, they should be faked.

    1. I would say that even for an assignment vulnerable to them, there is value in actually doing the assignment; the real purpose of the assignment is to develop skills that will be applicable to tasks other than completing assignments.

      This seems to be a key point that the people who are saying the essay should stop being taught are missing; the point is for the student to do the thinking.

      1. No, even better: “If you can cheat *yourself*, you should?”

        The assumption seems to be that essays exist for the satisfaction of the teacher, rather than for the improvement of the student. Which is probably true in some schools, unfortunately.

    2. The number of assignments that can be faked with these tools is close to non-existent and (because of what he said above) will remain that way.

      The number of instructors who will be too tired to spend time knowing whether they are is another thing, and will occasionally be an issue if they get good enough. The number of students who won’t be able to tell the difference between a successful fake and a wildly unsuccessful one is a very, (very, very, very) different thing than either of those, and is already an issue even without any functional version of this stuff. (Ask any college instructor for their best plagiarism story, there are some magnificent ones out there.)

  15. Do you have any thoughts on how some professors are using chatGPT outside of the essay context as a learning tool (e.g., having the student “teach” chatGPT about the errors it is making? (E.g. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4300783&utm_source=substack&utm_medium=email)

    Also- it seems like integrating search into chatGPT shows some promise for resolving the “no citations/make stuff up” bits chatGPT has – not there yet, but it seems not improbable that it will be able to make a semi-convincing facsimile within a few years. https://twitter.com/emollick/status/1626021146406662146?t=KQCS2Tq4CBr44QhAz2h6xg&s=19

  16. I’m not a historian, nor do I write essays. I was pretty skeptical about AI… This began too change, I’m a Software Developer (I don’t know much more about AI as the average reader, totally different domains). But I started to using integrating ChatGPT in my workflow and I’m excited, its almost the same joy when I started programming.

    So ChatGPT will not write you whole programs and often lacks certain domain knowledge. But I can take care of a lot of boilerplate and other stuff. For example I often need parsers for archaic domain specific data, the inner workings of this is data is domain knowledge (byteformat, endianess, layout, stringformat, etc) and certainly ChatGPT will have lots of errors there. But it gets the stuff around right, creating ArgumentParse(inputfile, outputfile, otherflags etc.), openfile for read/write, error checking …

    This allows me to concentrate on domain specific stuff and not rechecking documentation of frameworks(which I forgot since I was not using it for half a year).

    So with a little bit of tweaking, I have now a workflow were I’m excited again to program. Currently I’m looking forward.

  17. Good points, all of them! I hope you are done with discussing this tiring topic now so we can all move om to other things

  18. > an entirely new machine

    I agree; I have long thought that what passes for AI these days is simply not intelligence, and will never turn into intelligence.

    But you question whether a more-real type of automated intelligence is even possible. I agree that you can’t simulate human intelligence well without understanding human intelligence. But I think an “intelligence” that can marshal facts and the relationships between them, and use them to construct arguments, is perfectly feasible. Of course, it would require a lot of research work (and time); you get quicker results by concentrating on large language models. But an AI based on knowledge would actually be useful.

    1. Well, at the extreme you could in theory use a sufficiently powerful computer to create an exact simulation of how neurons behave, then string enough of these together to replicate a brain. That would require radical changes in computing power, but it’s hypothetically possible.

      1. An “exact model of how neurons behave” might equally result in a skilled plagiarist or bullshitter; real collections of neurons behave like that. In addition to modelling the brain, we’d also have to model human personality, and we haven’t begun to understand human personality.
        So yeah, hypothetically possible, but so remote it’s like plans to land a human on Pluto – there are no such plans, because we don’t even know how to begin.

      2. “Well, at the extreme you could in theory use a sufficiently powerful computer to create an exact simulation of how neurons behave, then string enough of these together to replicate a brain.”

        In theory you can use a sufficiently powerful computer to create an exact simulation of how the atmosphere moves around a planet, thus producing perfect weather forecasting.

        In practice, you can’t. It has been demonstrated (possibly proven, in the mathematical sense, though I’d have to look it up) that even if you replicated every atom and molecule in the Earth and atmosphere, weather forecasting is impossible. It’s not an issue of computational power, but rather of the system itself not being amenable to such replication. This is one way in which nonlinear equations and chaos mathematics was discovered.

        There’s also the concept of emergent properties–properties of a system which are not predictable by study of the components of that system. No amount of study in molecular biology can teach someone ecology, for example. There are important insights from molecular biology for ecology, and biochemistry constrains ecology, but ecology is fundamentally different from biochemistry. You can dissect all the humans you want, you’ll never learn why some cultures eat goat while others don’t.

        In the same way, I believe we’ll find that minds cannot be reproduced by merely simulating the smallest bits of the brain. If we look at this from a materialistic perspective, the mind can be viewed as an emergent property of the brain, which is itself a collection of interacting chaotic systems. From that perspective, given what we know about the limitations of analysis (literally the taking apart of things to learn how they work), we should assume that producing a mind via simulating cells is doomed to failure. (If you want a non-materialistic view of this question, listen to Leslie Fish’s song “The Squatter”. Loki is not the worst ghost to have in your machine, but he’s certainly not ideal.)

        1. Perfect simulation as in perfect replication is also impossible because of Heisenberg’s uncertainty principle limiting our accuracy of measurement.

          And quantum simulations on classical computers are ridiculously inefficient.

          But yeah, even «before» that the consequences of chaos theory where small changes to initial conditions quickly spiral out into extreme differences wreck the hopes of an exact simulation.

          But the question is how much this matters ? We can’t know the exact position of every cloud drop, but we can be pretty certain about what will be the average monthly temperature in January 2024 in some place.

          And one nice thing about emergent properties is that they can emerge from systems at wildly different scales : the most obvious example being probably phase transitions : we can differentiate ice from water from steam across *many* orders of magnitude.
          And also with very different materials : while solids/liquids/gases have variations in behavior, the very fact that these designations exist show that they have common emergent properties.
          Hard to say what conditions of scale and substrate are necessary and sufficient for «minds»…

        2. I’m not sure the analogy with weather prediction fits. For a weather model to be useful in predicting our weather it needs to replicate our world exactly. It’s perfectly capable of simulating a different weather system.

          The same isn’t true of our neuron-brain model. It doesn’t need to model my mind exactly to be a mind. It could model something completely different but still be a mind. Case in point: every human alive has a mind made up of roughly the same bits but parsing differently (we think anyway).

    2. I occasionally have fun by prompting ChatGPT to talk about various fictional works. It’s pretty good at summaries, but falls apart when asking specific questions.

      It’s interesting to see how it falls apart. If asking about groups of characters, it’ll tend to mix in thr main characters of the work who aren’t members, much more often than inventing characters or bringing in ones from other works. When it gets things wrong it’ll stay in genre, like asking about nations in the Stormlight Archive had it say the Horneater Peaks were home to dwarves and giants, or asking about characters’ superpowers getting you superpowers that aren’t that character’s power or indeed in the work at all but are superpowers.

  19. A few notes:
    1. As far as animal emotions go, I note that Our Host is a cat person. As any dog person would say, cats aren’t human.
    2. I think that something like ChatGPT would already be useful in many real-world communications, especially those put out by HR Departments. Fluffy non-communications are a staple in corporate America. And many corporate bureaucrats could not write 200 coherent words to save their lives, but are still expected to produce paragraph-shaped things. (Their real work consists of interpersonal coordination.)
    3. As far as legal writing goes, most complex contracts are boilerplate: adopting language in previous similar contracts without much thought. A lot of language in complex contracts serves no real purpose. Some language even calls on legal impossibilities, such as an irrevocable agency not coupled with an interest. I wouldn’t trust ChatGPT, but I’d be enthusiastic about an AI “expert” that reviews and suggests boilerplate.
    4. AI is mindless. Are humans really all that better? Our mindlessness is usually well-suited to the world we evolved in and subsequently constructed. But Dan Kahnemann [sp?] won a Nobel Prize by finding asperities between our mental model of reality and the world we actually live in.

    1. “AI is mindless. Are humans really all that better?”

      Depends on the human. I guess this is one of the reasons ChatGPT has raised so many eyebrows: quite a few people see it producing the same things they’re producing day-in-day-out and they’re worried. It’s the fluffy non-communications that you mention

      When I first saw some of the essays ChatGPT produced, I immediately thought of some legal opinions or assessment I occasionally get from colleagues – lot of general statements and dancing-around-the-issue language, few to none clear and credible statements on the actual matter. To be clear, these are the BAD legal opinions, which, if I can, I absolutely throw back with some more or less polite form of “cut the bull and answer the damned questions”.

      And I’d guess it’s also no coincidence why it’s getting so much traction in the journalistic world. I fear that our host is giving the profession way to much credit here – maybe I’m biased, but by observation is that the vast majority of what you get nowadays (in online papers at least) is the content-poor clickbait / outrage bait of the kind that ChatGPT would be perfectly able to produce.

    2. >1. As far as animal emotions go, I note that Our Host is a cat person. As any dog person would say, cats aren’t human.

      Well, duh. Obviously. Cats are superior after all.

    3. The point about contracts, and legal documents in general, is on point. Yes, there is a lot of filler that doesn’t matter. Why is it there? Because it has worked in the past. While it might be possible to perform an analysis that leads to taking it out, that would take time, which is money. Worse, the analysis might be wrong. Inserting standard verbiage, on the other hand, is cheap and risk free. That being said, the parts that do matter matter a lot. ChatGPT, as it currently stands, clearly is not up to that. So what we have is a technology that can reproduce the boilerplate that you would usually simply copy and paste in, while also being entirely likely to get some critical part entirely wrong. Yes, the document could be carefully reviewed and edited, but how is this either easier or more reliable than the usual way? I’m sure people will try this. I eagerly await the verdict of legal malpractice insurance carriers.

      1. That makes sense. If you have boilerplate that does what you want, you put the whole thing into each subsequent contract.
        The alternative is to do a lot of very hard thinking (by lawyers charging by the hour) to produce a smaller, clearer boilerplate that may actually do what you want. You would have to test it in court (even more hours charged). The upside is that some customers may like you better, but probably not enough to pay for the lawyers.

      2. I just reread “Practical Polytheism”, and… whoa.

        “Why do we do this ritual?”
        “Because it works.”
        “How do we know it works?”
        “Because we’ve done it for thousands of years, and we’re still here.”
        “What happens if we change this part?”
        “Maybe nothing. Maybe the gods smite us. Do you want to find out?”

      3. A lot of legal boilerplate is there to deal with very remote contingencies, or to forestall far-fetched interpretations that might tempt a desperate litigant and a sympathetic court, or to make explicit common understandings that get challenged one time in a million. That being so, it tends to be repeated verbatim document after document, and inviting a Chat GPT-style artificial intelligence to reconsider it or synthesize new clauses would be foolish. Remote contingencies and far-fetched interpretations are exactly what a LLM won’t find, and it has no understanding of common understandings.

        Every once in a while, a court case or statute comes along that is contrary to the industry understanding, and new boilerplate gets written. But this is not normally a work of synthesis, but rather a specific response to a specific situation, so again a LLM doesn’t seem like the right device to draft the language. (In some cases, where standard language is prescribed by some authority and used verbatim in more than 90% of all contracts, a LLM might find it, but that is using a sledgehammer to drive a tack.)

      4. I, not being a lawyer, am overjoyed by imagining the level of chaos that could be achieved by having big serious legal contracts written by ChatGPT.

        Imagine having to argue about whether something had satisfied a condition that was weird, unclear, and referred ambiguously to two or three other parts of the contract, but with no logical possibility of there being any intent behind it or understanding of what it could have been for at the time it was written. Now imagine if there were millions of dollars at stake!

  20. I was going to make a comment about the dissimilar incentive structures of teachers on one hand and (some) students on the other, but Ilmryn made that point first and quite honestly better than I would likely do.

    So instead I’m just going to link to someone who tried to make ChatGPT play a chess game against stockfish, because it’s hilarious and shows another aspect of how ChatGPT doesn’t really understand things.https://www.youtube.com/watch?v=rSCNW1OCk_M

    1. OMG, that is the funniest thing I have seen in a long time. I tried to explain to my wife why I was sitting at the computer laughing hysterically; she just shook her head and went to bed.

  21. But what is knowledge? Aren’t all laws of science that we know of essentially forms of statistical correlation? If a physics textbook can contain real knowledge, then so can a chatbot AI that ingested the said textbook.

    Or is knowledge only manifests as an action of a physical being? Do I only possess knowledge of physics if I can apply it to build a nuclear reactor (or something that depends on the physics I know being correct). Does Tesla autopilot AI posses knowledge (perhaps imperfect, but certainly not worse than some humans) of how to drive?

    1. Knowledge-as-statistics is a summary of *inductive* knowledge – Hume noticed this long ago. But there are also *deductive* and *inferential* modes of knowledge, which are certainly correct given their premises and not a good fit for a Bayesian epistemology. Even when you have a technical hack that can cope with certainty, a particular deduction may not leave a large trace in the corpus of learning material and requires intervention to mark as certain.

      1. I think *deductive* and *inferential* modes of knowledge are also based on statistics, It’s just that these statistics are performed over results of experiments. If I observe that event B always follows event A in time and never occurs without event A, I can deduce a causality link, and the more I observe this, the more certain I become. Now, naturally a chatbot can’t perform experiments, it has no body, but nothing prevents it from digesting results of experiments done by other agents and reflected in the training data corpus, and gaining knowledge this way.

  22. I pray that you are right about the college essays. But you are definitely wrong about college_admission_ essays, where the readers don’t seem to care much about the facts, but rather their form.

    Speaking of the “real world”, do not discount Gell-Mann amnesia (https://en.m.wikipedia.org/wiki/Michael_Crichton#GellMannAmnesiaEffect) how often do you need to persuade people about facts that they know little about in the real life rather that college where you have both an incentive and means to check your students facts?

  23. I would like to point out that GPT and most other Large Language Models haven’t been trained to produce true statements or even “good” text more broadly, but to predict the next token(*), a task at which they are already superhuman (https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next). Consequently, you can’t necessarily gauge what the model “knows” (facts, but also procedures) by asking questions. If for example you asked it a question that most humans would get wrong in some specific way on the Internet, it would be functioning exactly as intended giving that same incorrect answer.

    In other words, correct and well-formulated answers to questions can demonstrate the presence of some bit of knowledge or understanding, but failure to get a good answer doesn’t demonstrate their absence: a different kind of prompt, one for which the correct answer ought to be a likely continuation, might cause the model to tap into its pool of knowledge and retrieve the correct facts. For example, it has been observed that GPT produces more accurate answers if you prompt it to answer the questions as Einstein, versus not specifying the nature of the answering entity, or prompting it to answer as a village idiot (ChatGPT has been fine-tuned to continue prompts modeling a “nice helpful chatbot” persona, so it will take more to “deprogram” it to model a more competent essayist, say).

    That isn’t to say there aren’t limitations to the model, its limited amount of memory for instance is a fundamental constraint (although it can be circumvented somewhat by, for example, prompting the model to summarize previous paragraphs, and including the summaries, not whole paragraphs, in the prompt to produce the next paragraph). And if you naively give it a prompt along the lines of examples here, an unsuccessful prompt will likely continue to produce poor answers. But that also doesn’t mean that a student with more knack for prompt-engineering than writing essays couldn’t leverage even the extant LLMs a lot more than a naive attempt might suggest. And it definitely doesn’t mean that models trained differently, or trained to do different things (such as “giving truthful answers” vs “predicting the next token”), couldn’t achieve stronger capabilities. Even on topic of the current paradigm of next-token-predictors, at the limit the most accurate way to predict tokens contained in Einstein’s unpublished article about quantum gravity, say, would be to model Einstein. That language models don’t yet do that is in principle only a matter of training (although sufficient amount of training data and compute necessary to converge to that point might not in reality be available), just like the sweeping advances during last couple of years have been achieved primarily on back of simply spending more compute in training.

    *) GPT 3.5/ChatGPT has also undergone additional RLHF (Reinforcement Learning from Human Feedback) with goal of making it act more polite (as opposed to e.g. completing racist prompts with racist continuations, because that’s how they have always continued in its training dataset), but that hasn’t shifted its goals to producing truthful answers, but rather the kind of answers that humans at Amazon Mechanical Turk or similar would rank highly in categories such as helpfulness (which is partly the reason why it sounds so confident spouting bullshit: it has been trained to give clear confident replies).

  24. Tom Scott made a good video why ChatGPT scares him. In there is included an example of a useful program chatgpt has written for him, but also the limits.
    https://www.youtube.com/watch?v=jPhJbKBuNnA

    This is a great watch, for more than the single example of where chatgpt is useful.

    ChatGPT is useful in finding esoteric pieces of documentation, that kind where you would need to wade through to 2 google result pages, read a lot of pages, rephrase your search query 5 times until you find the correct small piece of information.

    Then it can churn out a rough programm, and somebody who is able to programm can fix the errors and make it useable.

    This has limitations though:
    A friend of mine called me about a a problem a few weeks ago. He couldn’t solve. The help from ChatGPT didnt solve it. I then solved from memory within half an hour.
    I told my griend to do something ChatGPT told him explicitly not to.
    And then it worked. Why did ChatGPT tell him that?
    Because the documentation says it is not needed.
    The documentation is wrong!
    I never bothered to figure out why it is wrong, I was simply happy that I solved my problem.
    So ChatGPT lacks any experience to jduge what it reads.

    Another example:
    I tried to program something via ChatGPT in a language/framework Im not familiar with.
    It was a hobby idea, nothing work related, so I figured I could goof around.
    I thought I should be able to speak programmer, that is have the right words to instruct ChatGPT to make a functional programm.
    The start was very promising. But very soon I ran into the problem that the code became to long, and it only printed part of it on the screen.
    When I told chatgpt to only print so and so section, that section was output, but in a completely different structure and implentation.
    I couldnt work around it’s length limitation by telling to only give me parts, because the parts would change. That’s when I understood, it has no sense of structure at all. it does not generate a programm and add piece by piece to it, like humans do. It generates the whole thing in the fly.

    1. Another use case of this sort is interpreting decompiled machine code, something that is notoriously very difficult (or at least very high-effort) for humans to do. Give it a prompt of decompiled code and ask it to give it human-readable function/parameter names and add comments to explain what the code does, and it tends to get it correct, at least if the code is about something relatively commonplace like a known hashing function or parsing a configuration file.

  25. I am much more concerned about programs faking emotions (a big chunk of that being art). Humans also have emotional needs (whether as active (co-)creators or passive «consumers»), and your median artist already cannot live off their art. And it seems to me that this is much easier to fake ? (See the recent quite convincing «emotional outbursts» from the new Bing chatbot.)

    And while I cannot any more see a difference between real comments and GPT-produced ones (unless I specifically look for it), and this only will get worse, I doubt that spammers will be able to properly exploit this (in that specific context) without quickly showing their hand : the most sophisticated attempts that I have noticed were YouTube threads where (almost ?) all of the comments were praising some specific product, and I only got fooled the first time.

    But yeah, Web search might be screwed… will we have to move to search internal to domain-specific websites ? This might worsen our already bad platformization issues though, and I *already* wish that non-Web† programs like Discord and even Websites (but platformized and fundamentally netiquette-violating ones) like Reddit would burn to the ground…

    † P.S.: I now realize that I have a potentially circular definition here, since I define «Web» as «indexed by an (outside) search engine». AFAIK Discord *does* allow links to specific messages, though you might need access to the (not-)«server» first ?

  26. The concern for me would be that ChatGPT may not be able to produce good essays, but it can at least generate convincingly mediocre essays–the examples this post gives are still more coherent than some essays I regularly receive–, and a lot of students have the attitude that Cs get degrees, so I can imagine the temptation of having the robot do your homework will be too great. I am not confident I would be able to tell apart the lackluster human essays from the lackluster robot essays in the sea of papers I usually grade. But ultimately, my attitude to cheating is that the student is hurting themselves more than anyone else, so I try not to worry overmuch.

    In terms of the future potential of AI technology, personally I don’t think creating a mind out of 1s and 0s is possible. In a strictly materialist sense, we may not know the specifics of consciousness, but we do know that the physical processes underlying it are your brain’s biochemistry. Or rather, said biochemistry is not the cause of consciousness, but consciousness itself. A numerical construct is definitionally not conscious but a simulacrum of consciousness. And to the usual counterargument, the difference between a conscious being and an indistinguishable imitation of one is that between an actor pretending to be sad and someone actually being sad.

  27. I agree with 99% of what you wrote. People really do misunderstand both the purpose of writing a college essay and the capabilities of a machine learning chat bot.

    I did get ChatGPT to write a decent enough essay regarding your Alexander the Great essay prompt. All it took was to be a little more specific and give the bot a side of the argument. I typed in “Write a critique arguing that Alexander the great was not ‘great’.” The prompt, as well as even a passing familiarity with history, make it pretty obvious that you are looking for a critique of his greatness. I would agree with a comment that user Gwydden made about the ability of this bot to put out passable essays about general topics. It might be hard to catch those papers in a high school or general level history or English class.

    Full disclosure, I’m a math teacher so we are very used to the apparent disruption that new technology can bring, like calculators and online solvers. We had to decide what value the new tech brings and what about it could be detrimental to a student’s learning. At least personally, I compromised and rethought some of my assignments and assessments. It took some time and effort, but it did force me to understand the implications and capabilities of the new tech and question and rework some of my methods. Humanities teachers will have to do the same.

  28. I think this is an important corrective, and I completely agree with you that ChatGPT will not (and should not) revolutionize the classroom. I also agree ChatGPT is completely inadequate at writing essays.

    I do think though that you are underestimating GPT’s core capabilities. The way you put it is that it assembles words based on their “statistical probability” of occurring together. I don’t think that’s entirely correct. Or at least, I think it is a misleading way of putting it.

    I don’t think what GPT does (or other large language models do) ought to be described as statistics. The nature of statistics is to radically simplify the domain such that a basic mathematical expression can probabilistically model the space. An example of this for text would be something like a Markov chain algorithm, which really does just model which words tend to follow others. Markov chain algorithms are capable of emitting strings of words that sound vaguely legible, but are nothing like that GPT is able to do.

    What GPT has done instead is built an enormously complex web that relates various tokens to each other in highly complex ways.

    It is true that GPT does not have the human experience of cuddling up in a blanket. But it does clearly know enough English grammar to know that “blanket” is a noun, not a verb or adjective. And it knows what adjectives might be applied to it – warm, fuzzy, soft, etc. It also has deep associations for all those words – it knows, for instance, that warm may also be associated with a fire in a fireplace, is abstractly connected to safety, and could even be used to metaphorically describe a human social interaction.

    I think many AI researchers implicitly believe that the process of gradient descent hasn’t just taught GPT how to string words together. The iterative optimization process of gradient descent has forced it to, in some sense, reverse-engineer a plausible-ish simulation of the world.

    The result is that the AI is, in a sense, wildly overengineered for its current application. If this theory is correct, the AI is essentially running a low-fidelity universe simulator just to auto-complete some text.

    It’s worth noting that this theory is essentially the basis of ChatGPT (as opposed to just vanilla GPT-3). GPT-3 is truly just a prompt completer. For instance, when given a question, it is likely to just spit out additional questions in the same vein as the first question, rather than attempt to answer the question.

    The idea of ChatGPT was to harness some of the model’s hidden capabilities to make it behave as a chatbot. This is the RLHF process (reinforcement learning from human feedback). The idea is that the internal model already has all the capabilities needed to be a chatbot, but those capabilities are hidden from view. The RLHF process is intended to bring these latent capabilities out front and center. RLHF is nowhere near powerful enough to actually create new capabilities from scratch.

    The fact that this works at all is sort of remarkable, and indicates that the theory has at least some merit. A little fine-tuning really did surface a huge amount of latent capability.

    Now, I don’t think anyone thinks GPT is especially human in how it thinks or functions. If it is intelligent, it is a profoundly alien sort of intelligence. And it is very likely that its simulation of the universe is quite low fidelity, good enough to vaguely understand how certain concepts relate, but not really sophisticated enough to actually produce novel thoughts worth having. But I do think it is a disservice to describe as nothing more than a simple statistical text model. What’s going on behind the scenes is much weirder than that.

    1. Grammar nitpicking – just because it’s fun.
      “Blanket” most assuredly is an adjective as well as a noun. And the adjective relates only on a fairly deep level to the noun. E.g., “there is no blanket prohibition on parking here.” Blanket here means “sweeping or complete.” But the deep root between the two meaning is “to cover.”

  29. I figured it’d be fun to have ChatGPT weigh in on the discussion. I gave it the prompt “can you describe the limits of using you to create analytical essays on historical topics?” and it provided the following answer. I wanted to have it try to read this whole post and give a reply but it was too long sadly. That would’ve been very funny to me.


    As an AI language model, I can assist with the research and data gathering stages of creating an analytical essay on a historical topic, but there are limits to my ability to write analytical essays on such topics.

    Here are some of the limitations to keep in mind:

    Limited subject matter expertise: While I have access to a vast amount of information on historical topics, I do not have the same depth of subject matter expertise as a human historian or expert in the field. This means that while I can provide information and generate text based on a given prompt, I may not have the same level of understanding and critical thinking skills as a human writer with subject matter expertise.

    Lack of context and nuance: Analytical essays on historical topics often require an understanding of the context and nuance of the time period being studied. While I can certainly provide information on historical events and figures, I may not have the same level of contextual understanding and nuance as a human historian.

    Limitations in detecting bias and accuracy: Historical accounts can often be biased, and it takes a critical eye to evaluate sources and determine their accuracy. While I can certainly provide information on a wide range of sources, I may not be able to detect bias or evaluate accuracy in the same way a human historian would.

    Lack of creativity and originality: Analytical essays often require a high degree of creativity and originality to provide fresh insights and perspectives. While I can generate text based on a given prompt, I may not be able to provide the same level of creativity and originality as a human writer.

    Overall, while I can be a helpful tool in the research and data gathering stages of creating an analytical essay on a historical topic, it may be necessary to have a human writer with subject matter expertise to provide critical analysis, interpretation, and original insights.

    1. I wish it would not insist on using “I” like it bloody means something.

      I hate it as much as when I get an app notification like “we miss you!”
      Fudge off you ersatz being!

      1. Well, the prompt was to describe the limits of “using you”, in which case the use of first person is only reasonable: If someone asked you what your favorite pizza topping is, you wouldn’t say “ktm’s favorite pizza topping is x”, but “my favorite pizza topping is x”, and GPT’s training data is full of this first-person use.

        I switched the prompt to “Describe the limits of using ChatGPT to create analytical essays on historical topics” and as expected there’s nothing resembling a self-reference (for the record, the model appears to “know” it’s called ChatGPT, in a more fundamental world-model kind of way than fine-tuned responses it gives when the user asks it about e.g. something unlawful, but really teasing out that information is trickier than just asking).

      2. We have to admit it was good essay though 😀 if a bit lacking in humility with all that ”i might not be able”

    2. I’m pretty sure that was either hardcoded or very specifically trained for. It has some pretty formulaic standard responses to stuff it’s not supposed to answer and questions about itself

      One funny one that I’ve seen in a couple seperate conversations (so it’s not because it was earlier in the conversation) is that it refuses to settle inter-franchise versus debatew, saying that they wouldn’t meet because they’re in seperate fictional universes and if they did the outcome would be controlled by the circumstances of the fight.

    3. How is it useful for research if it doesn’t cite the references it’s using to write its response? If I want a summary of 15 books on ancient plumbing and ChatGPT can’t tell me which section it’s writing came from which book, then its only value has been to tell me that there’s _something_ worthwhile in that stack of texts. But obviously I think there is something worthwhile or I would not have fed that particular stack of texts into ChatGPT!

    4. One thing that’s interesting about what it generated is that assisting “with the research and data gathering stages” is something it absolutely cannot do, and it does not really have access to “a vast amount of information on historical topics”. It has access to a lot of statistical information about words, some of which come from texts about historical topics, but that’s as far as it goes.

      A search engine like google could help with those stages, but ChatGPT is not looking up information or finding data for you, it is generating predictive text that, statistically speaking, is probably similar to stuff you might find in some ways but not in others.

  30. I think a big part of the handwringing about it is just that most teachers (at least prior to college, as I have not yet completed a fourth college class and thus cannot speak to that) obscure the purpose of their assignments, and so students are trained not to think but to produce rote responses to prompts with no clear meaning or purpose. In my experience, I have only heard of the purpose of essay writing once or twice in my academic career, and it doesn’t really stick for me. In turn, that makes ChatGPT more feasible to my mind (and I suspect many others) as a replacement for essay writing; it is also producing rote responses to prompts with no clear meaning or purpose, it is also focused more on the appearance of aptitude than aptitude itself (as a student may learn to write in ways that /look/ good rather than /are/ good, because of the referenced lack of resources and care in grading), and perhaps most importantly it purports to free up time for things that the student using it actually finds useful or interesting. If it was customary to say what the purpose of each given type of assignment is at least once per class, rather than allowing it to seem to be busywork, I think it would be more common to recognize ChatGPT and similar as insufficient; it would also encourage feedback, as the students could say whether it was fulfilling its purpose based on its effects, and therefore more and better learning if the teacher can adjust assignments.

    Sorry if this isn’t the most clear, I’m pretty tired right now and my brain works weird at the best of times.

    1. There’s a reflection on this in Zen and the Art of Motorcycle Maintenence: Pirsig’s main character presents student essays to his class and is shocked at how nearly unanimous is the opinion about which one is “better”.

      In this context I think what matters is that, for us, seeming and being are much more tightly bound than for a machine (which has not even the basis for making the distinction.) So, as a human, it would be difficult to learn how to make your writing look good without improving it.

      On a more personal note: when I was younger, perhaps around your age, I thought much the same thing. I greatly regret how long it took me to realize my own folly.

  31. I’ve found some great stuff on Gamefaqs. A user called TheSocialBunny wrote guides to Sims 2 expansions that were things of twisted hilarious beauty: https://gamefaqs.gamespot.com/community/TheSocialBunny.

    But the Gamefaqs UI is one of the most confusing, aggravating, frustrating things I’ve ever seen.

    I’m trying to post a guide for Half Earth Socialismhttps://play.half.earth/. It’s one of the most addictive games imaginable because it’s a) fairly fast-cycling and b) stimulates the recurring thought that utopia can always be better. 🙂

    Whether my strategy ever sees the light of day will depend on if I ever find out what ‘Select the primary product for this content’ means. And it’s not (in the order I tried them) the format, the name of the game, or the name of the game followed by the format in brackets.

    1. Okay, it turns out ‘primary product for this content’ did mean the name of the game followed by the format in brackets, but only if it’s listed in the website’s database. Presumably so the system doesn’t think it’s a different game every time somebody makes a spelling mistake. So now that’s sorted I’ve submitted two guides (utopia is so, so addictive…) and I’m waiting to see if they get accepted.

  32. The conclusion here reminds me a lot of the (albeit provisional,) definition of knowledge in Theaetetus: “right opinion plus explanation.” ChatGPT is actually a great demonstrator of the limits of the material metaphors for knowledge in that dialogue.

    I’m also reminded of my rage and frustration with the sheer volume of bullplop essays advertised when you have the misfortune of looking on google for an opinion on some “overused”* source material, say Robinson Crusoe or Moby Dick.

    *The scare quotes are because I would absolutely hate to come across as suggesting that these texts are or could be overused in education — rather, what we’re dealing with is a sort of pedagogical “tragedy of the commons”.

  33. Our host’s conclusions about ChatGPT match my own. I expect that content mills are already using it, because the output of a content mill by definition does not matter. I can imagine someone with a solid command of a subject but terrible writing skills might find it helpful, but frankly I doubt that someone with terrible writing skills has the editing skills required. But mostly, this piece gave me some insight into a minor mystery. My testing of ChatGPT included asking it basic questions about the rules of baseball. When asked about the infield fly rule, the answer began correctly. Only at the end, when it claimed that runners cannot advance if an infield fly is called, did it go wrong. I also asked it to explain force plays. The response was hilariously wrong from start to finish. I wondered about the disparity, but now I think it is the result of the training corpus. Baseball fans typically internalize at an early age how force plays work, so it isn’t a frequent topic of discussion. Many fans, however, find the infield fly rule mysterious, so there are a lot of explainers out there. Hence the AI’s having a harder time with what intuitively seems the easier question.

    1. Oh, they definitely are. Baseball is an especially fun example, because with the lower-rent writebots, you can find them very easily when they discuss the Baltimore Orioles. The Orioles are a baseball team, but Baltimore orioles are a type of bird known for liking grapes. Any time a baseball site starts talking about feeding the Orioles grapes, you can be sure you’re looking at a writebot.

  34. I haven’t done much with ChatGPT, but I’ve experimented with its fiction-writing cousin SudoWrite. In short, it’s as you describe – it produces interesting but often bizarre content. SudoWrite is usable if you’re willing to cherry-pick the best parts from the prompts it gives you and rewrite in natural language. For fiction, this actually can produce usable material. If I prompt it with “The doors burst open, and I stared in horror as”, it can provide several continuations and I can pick which one I want – none is “more correct” than any other. A dragon, a robot army, and the FBI are all possible reasons for the door to be bursting open. (Depending on setting, of course.)

    But in nonfiction, you’re less likely to have the problem of “I just don’t know what ought to happen next” – as you say, anyone who can pick the workable parts and rewrite them to the point where the essay passes muster can probably just write the whole thing themselves.

    (Also, yes, the plagiarism issue. You’re less likely to get caught for it in fiction, since we don’t generally deal in facts that need to be cited to their original source, but that doesn’t mean it’s not a problem.)

  35. I agree that this will not and should not kill the College Essay. But what I think ChatGPT et al might do is kill the HIGH SCHOOL or middle school essay. “Write three paragraphs about George Washington” or “write five paragraphs about Old Yeller” at a tween-teen level seems to be ChatGTP’s top-level capability. That to me is more pernicious than screwing with higher education. Pre-college educators at all levels tend to be more overworked than their college-level peers. The essays require less in the way of citations and facts and even proper flow, and tend to focus on well-known events/people/things and need (indeed only have the length for) a surface level overview of the topic. You cannot fundamentally change these sorts of assignments due to the nature of the students. (for example, you can’t expect a 14 year old to know an obscure historical topic and the length and breadth means a look at wikipedia’s citation section would cover whatever citations they have “used”) I’ve had numerous friends who teach middle/high school students (history, social studies, and english) who have all said ChatGPT essays, while not perfect, would certainly pass. It seems to average a borderline B or C which for a cheater is certainly worth the effort. But it will destroy the student’s ability to learn how to write before they’ve even properly begun. So when they hit college and suddenly the topics and standards are such that they’re instantly caught out, there’s very little that can be done to fix the underlying problem. Which is the kid never learned to write.

    1. Yes, a lot of grade school writing assignments are designed not to elicit thought, but to (i) check that the students have done the reading (or force them to do so) and (ii) give them practice in proper grammar and syntax. Chat GPT has (in some sense) done the reading, and it produces proper grammar and syntax, so its essays should earn passing grades.

  36. The issue with getting ChatGPT to do your research for you is that it doesn’t know what it doesn’t know. There are, as far as I know, no ways to stop it from making guesses whenever it’s knowledge base doesn’t cover a subject. These guesses will as well supported as any other part of the essay to a random outsider with no knowledge of the subject (or a particularly lazy student), but will be clear to anyone genuinely familiar with the subject.

    Random example (I’m sure others can contribute more comical \ convincing AI answer) – I recently asked ChatGPT about the contrast between European and traditional Japanese swordsmanship. I know Europeans have theory-crafted a great deal about their use of the sword point being unfamiliar and lethal when facing a samurai. Was that theory supported by any actual events?

    Per the AI, while stationed in 19th century Japan, French soldier Nicolas Léonard Sadi Carnot had in fact studied Kenjutsu, and combined it with traditional European swordsmanship. He anecdotally managed to face down several samurai, but this isn’t strongly supported by historical data.

    Had I never heard of Carnot, it’s fully possible that I would have accepted this reasonable sounding historical tidbit without bothering to google.

    But! Research aside, ChatGPT is really quite good at formatting essays, dealing with all manner of pesky footnotes, endnotes, stylistic questions and proper structure. I always found essays annoying less in terms of finding and repeating the relevant data, and more in terms of constructing them according to the technical specifications of an individual instructor. That is genuinely useful, and exactly the role an AI is meant to play.

    1. The problem is that “I don’t know” very very rarely comes up in datasets (both because people are also inclined to bullshit their way through, and because if a person doesn’t know something, they likely will just not comment), and because GPT and similar language models work by trying to predict the next token, “I don’t know” has low probability. It is doing exactly what it has been trained to do when it gives wrong answers confidently.

      You can try to work around this by prompt-engineering however, giving the model examples of nonsense or unanswerable questions with appropriate “I don’t know” sort of responses in the prompt. Turns out the model does in fact seem to have an internal model of when its responses are bullshit or unreliable, but you have to tease it out a bit.

      https://arr.am/2020/07/25/gpt-3-uncertainty-prompts/

      1. I feel like there’s a pretty decent distance between “this question doesn’t make sense” and “I’m not sure I know the correct answer to this question”. And being able to recognize the later that would make ChatGPT a superior alternative to google at the very least (which I should have mentioned in my original answer).

        There are plenty of things to take issue with regarding google, but it works pretty hard to rarely feed you total bullshit unless you go out and look for it.

  37. “Chances are, if I say ‘wool blanket’ your mind is going to not merely play word association but also conjure up some memories of actual wool blankets – their sight, touch or smell. ChatGPT lacks this capability; all it has are the statistical relationship between words stripped entirely of their referents. It will thus invent descriptions for scientific phenomenon that aren’t real, embellish descriptions of books that do not exist and if asked to cite things it will invent works to cite, because none of those things is any more or less real to ChatGPT than actual real existing things.”

    People invent descriptions of lares and penates, embellish biographies of Romulus and Remus, and if asked to justify their preferred reforms will claim that it’s the way to return to the mos maiorum. They will strongly insist that these are real. We do not say that their failure at observation or analysis implies that they lack minds.

    Surprisingly many people live with non-debilitating sensory deficiencies for quite a while, and nobody — neither themselves nor the people around them — notices that they are, say, colorblind or cannot smell. It is well-known that there are quite a lot of people who are unbothered by e.g. astrology lacking in content. A more worrying question is, how many are there who think (or fail to think, if you will) in this way about, say, medicine? I don’t mean people who endorse e.g. homeopathy; I mean laypeople who endorse “mainstream” medicine.

    There are two parallel epistemologies here. One starts from a prototypical question like “what do I expect to experience if I mix these two liquids?”, packs it into “what will happen”, and takes it for granted that final answers are given by an observer-independent reality. The other starts from a superficially barely different question like “what do I expect to experience if I wear this item of clothing?”, packs it into “what is popular”, and takes it for granted that final answers are given by “social reality” that consists of the consensus of observers. “All the world’s a stage, all the men and women merely players”, says the latter in full earnestness. To be an expert means for other people to acknowledge one as an expert; being an expert, and successfully pretending to be an expert, are the same thing. For the first job interview and for exams, parents helpfully offer the advice to never say “I don’t know”.

    On what basis do we consider Alexander to be ‘great’? Is this a sound basis to apply this title?

    We consider Alexander to be ‘great’ because other people consider Alexander to be ‘great’. Because documentaries, textbooks, games, etc. all used the same epithet. If we had seen others refer to him as Alexander the Wicked, we would call him Alexander the Wicked, and would give the matter just as little thought as we give to ‘great’. This is not an answer on the level of society, but for most people it is the answer on the individual level.

    Skills of observation and analysis need to be taught and practiced (or cynically, filtered for) because they are not especially common. One could very well decide not to praise ChatGPT by claiming that it is a mind and on par with humans, but instead to damn humans by pointing out that a lot of the time, they are as mindless as ChatGPT.

  38. Some observations, which may already have been made:
    1. The output of ChatGPT when asked to produce an essay sounds a lot like that of a student who didn’t actually do the reading (or skimmed the Cliff’s Notes really quickly) and is taking a test. Lots of talking around the question and attempting to ‘cold read’ the test marker, but specifics are guessed (and are correct only by chance).
    2. The definition of information (related to that of entropy) is related to “surprise”. Given a stream of bits (zeros and ones), the information you get from the next bit is related to how well you can guess which it will be – zero or one. ChatGPT’s output has information in it only to the extent that a reader happens not to have read and absorbed the corpus it was trained on.
    3. Bruce Schneier in his latest newsletter has an item (https://www.schneier.com/blog/archives/2023/01/ai-and-political-lobbying.html) on how these engines could be used to generate lots of lobbying effort (and more to the point simultaneous effort in many jurisdictions and areas) more cheaply. It may be true, but it depends on the reliability of the observations that it can generate, e.g. that legislator X is the key one to lobby on area Y. Since ChatGPT seems to just make stuff up, this doesn’t really sound like a solution.

  39. Technocracy, rule by experts, may be a more concise and appropriate term than ‘technofetishistic egotism’.

    Industrial experts were doing amazing things during the industrial revolution. Even some of the experts did not fully understand why those specific methods were effective at producing results. Very many people then were inspired to invent magical theories of ‘the expert’, and to try to get similarly amazing results by using similar methods on wildly inappropriate problem domains.

    The last thirty to sixty years of engineering research has actually filled in a lot of the then unknown theoretical explanations for the successes of the industrial revolutions. Microscopic defects connect to statistics, and to inspection methodologies, and so forth.

    One of the currently unpopular implementations of technocracy, that is fairly testably invalid, were the efforts to create a biologically superior cohort of future humans by ensuring that the best genes were passed down. IE, eugenics. There were several issues there, but biology may be quite a bit weaker than other factors, and we don’t know what superior genes would be, or how to measure them.

    A more popular implementation right now is the mainstream scheme for primary and secondary schools, and the research in the field of Education based on that. We break down by grades and subjects, like stations on a production line. We presume that we have a specialist worker expert in doing the work of that station. Then the research is like doing a set of industrial experiments to optimize process in a repeatable way with implications at scale. The issue is, modern industrial experiments that successfully scale do so because the information about defects in widgets is a very simple thing. The ‘defects’ in students are not simple, and are potentially correctable in ways that cannot work for widgets. A lot of Education research takes as a premise that it is looking for ‘defects’ in human beings that are not at all simple, and then immediately moves back to assuming that humans are very simple things where the defects are concerned. If your university is promoting reforms that are ‘experience based’ improvements in the learning methodology, these are very likely coming out of the academic field of Education. I am skeptical that these reforms are improvements, for more reasons than simple skepticism of Education’s experimental validity.

    I am broadly skeptical of quite at lot of things at current universities.

    I was a little surprised to look up the University of North Carolina at Chapel Hill school of law, and not find that it is one of the schools of law that I believe are outright working against the public good. Possibly this is a result of only doing a superficial inspection.

    I absolutely do not believe that the technocrats in the Big Tech industry, as well as programming technocrats in other industries, are only doing good things. To me, it looks like there is a great amount of gross mismanagement of programming, and very stupid choices of programming project, in a very wide range of industries.

    I think you may be a little above what I would expect of humanities faculty, or of faculty working way outside of their specialty. I have my usual concern with academics about your grasp of the concepts that you borrow from other academics in other fields, who developed those concepts with tools that are very far outside of your field.

    I am not more impressed with you than I am with the folks managing big tech. I think you are living in a glass house, and do not have a moral right to throw stones when it comes to people doing things that are profoundly unwise, and should probably not be done.

  40. As a Machine Learning practitioner and mathematician, as well as history buff and fan of this blog, I think this essay actually misses the point _somewhat_. I’d summarise my argument thus: if you imagine a successor to ChatGPT, which works exactly the same but is much better, then it would indeed be intelligent in the same way that humans are. Arguably, ChatGPT is already there, just with a low-ish IQ score (or whatever metric you choose).

    The crux of the Author’s argument is that a language AI cannot be intelligent, since all it does it stitch together words that statistically fit together – and that description is correct. However, the process of figuring out _which_ words go well together is complex and involves processes we’d call thinking. Clearly, it doesn’t take much poking at ChatGPT to conclude it is not very knowledgeable or intelligent! But that’s somewhat besides the point.

    The point is, the representation of “facts” and “thinking processes” does indeed reside somewhere inside ChatGPT’s billions of parameters (where and how – impossible to say, just as it is impossible to pinpoint where such nuggets of information sit in our brains). The process of ChatGPT reading a sentence and working out a plausible sequence of words is not merely some statistical look-up, but involves analysing the sentence, the meaning of words, their co-occurence in source materials and so on. And the sought patterns are complex too – it is not merely that “king” and “queen” tend to occur together, but could be that if someone “crossed the Rubicon” and “conquered Gaul”, then the “missing word” is likely “Julius Caesar”. Or that if Emperors come in sequence, then the “successor” to “Tiberius” is “Caligula”, and his “successor” in turn is “Claudius”. Now is that knowledge or not? Or a statistical pattern that spots words describing numbers and mathematical operations, and understand that the next likeliest word is one that describes the result of the calculation – is that knowledge of arithmetic or not? Indeed ChatGPT is not terrible at addition.

    The key difference is that we have an internal monologue that comments on this process as we do it. We can remember discrete books, go step-by-step through an addition process and so on. For ChatGPT, this process is contained in a black box algorithm and carries no such narrative (at least one that’s visible to us). Ironically, the fundamental building block for both ChatGPT and human intelligence is the same – a simple neuron. In both cases, intelligent (or apparently intelligent) behaviour emerges from having a collection of connected neurons, without humans having a _definitive_ understanding of how exactly it emerges.

    For sure, ChatGPT is much better at constructing grammatically-plausible sentences than actual knowledge. Risking an ancient history analogy, it is like a skilled orator without much knowledge! This imbalance perhaps helps conclude that ChatGPT cannot think in the way we humans do.

    No contest as to the rest of the essay arguing about the “value” of machine-written essays, or worse still, students using it to fake homework.

  41. Minor note about detecting AI generated texts: it doesn’t seem obvious to me, that tools recognising AI texts will improve faster than the AIs generating them. And since the goal is to imitate human writing, there might come a point where it is successfull enough, to make it impossible/extremely hard to differentiate.

  42. ChatpGPT seems ripe to disrupt one classroom: language instruction. Not the whole of course, but a large part of learning a language is you have to talk to someone in the language. This is an area that classrooms already know they do poorly (1 on 1 tutors do well, but this is expensive and so not viable for most people). Nobody cares what you talk about, just that you are holding a conversation. ChatGPT likely knows the right words to use in the topic you are interested in, and so can keep you talking (or at least you get a list of words that are interesting to you to look up for next time)

    There is still a lot needed to figure out how to put this into a classroom, but it seems like useful disruption. This is really the exception to that proves the rule though, mostly it isn’t useful in class.

    1. I agree with your main point, but I’d reframe your last sentence.

      By analogy, a screwdriver “mostly isn’t useful” in construction projects, because you’re only using it a minority of the time. Even your toolbox “mostly isn’t useful,” because so much of the work is using power tools, planning, shopping for parts, carrying things around, and so on.

      But if we had just invented power tools, the sensible thing to do would be to have almost the entire discussion dominated by what each specific tool *is* useful for, not to have a discussion along the lines of “jackhammers are useless for driving nails.”

      However, I like and agree with the substance of your comment, which points out yet another case in which ChatGPT and its successors may be transformative in an extremely important aspect of education. I’m sure others will come up with 100 more such examples!

    2. Language learning programs already exist (Babel; Duolingo) and they are much better than ChatGPT as it currently stands.
      A frequent confusion is statements that ChatGPT “knows everything in the world”, whereas its data set is much, much smaller than that. Specific AI applications such as language learning programs, and tax prep programs, are already excellent at what they do and far cheaper, more efficient than human providers of these services.

  43. I’m a machine learning engineer, before that I was focused on complex algorithms and full stack development. I’m a bit dyslexic and reading has often given me a headache. When I grew up I thought I was dumb because teachers like you constantly told me I wasn’t reproducing your subjective expectations. I’ve always enjoyed Math and Science because the answer is the answer and when you get the answer you know you’re done. I’m 40 now and I love consuming books, though I still don’t read, I listen to audiobooks. Just this year I’ve read the first 9 books from lex Friedman’s reading plan along side my wife.

    Here is the point…
    I’m sure there is nuance I’m missing from the ability of chatGPT to generate the kinds of responses you want from a question like you asked about Alexander the Great, but I fail to see it. I also don’t value that nuance. I see it as some kind of twisted ivory tower of subjective reality. To me this looks like a correct answer. It obviously shouldn’t mean that chatGPT replaces my own thoughts, but I don’t see why I shouldn’t ask for it’s perspective…

    Let’s talk about that a second. You sorta miss the mark on describing what is going on here. I describe it as the regurgitation of human understanding. We have the ability to digest all of human writing and generate consensus around it. This is just in it’s infancy and it’s amazing. This will probably not ever generate new conclusions, but it is remarkable in its ability to synthesize the conclusions that came before and summarize them.

    I believe we would be foolish to ignore this. I don’t begrudge ChatGPT because it cant do my job better than I can, for it is already much better than me at writing. Instead I can lean on it as tool as I would ask an intern to get a job started for me or to give me ideas on how to rephrase my own words to imbue a less offensive tone.

    Here, as an example:
    As a machine learning engineer with a background in complex algorithms and full stack development, I’ve found that reading can sometimes be a challenge for me due to my dyslexia. However, I’ve always found solace in math and science, where the answer is clear-cut and definitive. Recently, I’ve discovered the joy of consuming books through audiobooks, and have even gone through the first nine books of Lex Friedman’s reading plan with my wife this year.

    With regards to ChatGPT’s ability to generate responses, I must admit that I may not fully understand the nuances of its capabilities. However, while some may argue that there is a level of subjective reality that cannot be captured by a machine, I personally find the consensus generated by ChatGPT to be a valuable tool in synthesizing and summarizing the conclusions of human writing. Although it may not necessarily lead to new insights, I believe that we would be remiss to ignore its potential.

    Rather than seeing ChatGPT as a threat to my own expertise, I view it as a useful tool that can assist me in my work, much like an intern might help me get a job started or give me ideas on how to rephrase my own words to convey a more diplomatic tone. And while I don’t believe that ChatGPT can replace my own thoughts, I see no harm in asking for its perspective as a way to expand my own understanding.

    Which of these responses would you have preferred to receive?

    ChatGPT:
    Alexander the Great, also known as Alexander III of Macedon, is widely considered to be one of the greatest military leaders and conquerors in history. He lived from 356 BCE to 323 BCE and was the king of Macedon from 336 BCE until his death at the age of 32.

    The reasons why Alexander is considered great are numerous. Here are a few factors:

    Military conquests: Alexander was a highly successful military leader who conquered a vast empire, stretching from Greece to Egypt and the Persian Empire. He won many battles against powerful enemies, including the Persians, and expanded his empire beyond what any other Greek had accomplished before him.

    Legacy and impact: Alexander’s legacy has been felt for thousands of years. He established Greek culture and ideas in many parts of the world, which had a significant impact on the development of Western civilization. He also founded many cities, including Alexandria in Egypt, which became major cultural and economic centers in their own right.

    Personal charisma and leadership skills: Alexander was a charismatic leader who was able to inspire his troops and win their loyalty. He was also an excellent strategist who could adapt his tactics to the circumstances he faced.

    However, while there are many reasons why Alexander is considered great, it is worth noting that some of his actions, such as the destruction of Thebes and the massacre of the inhabitants of Tyre, were brutal and controversial. Additionally, his reign was relatively short, and his empire did not last long after his death, which has led some historians to question the long-term impact of his achievements.

    In conclusion, while Alexander the Great’s military prowess and cultural legacy have earned him a place in history as one of the greatest leaders of all time, it is important to recognize that his legacy is complex and not without controversy. Therefore, whether or not to apply the title of “Great” to him is ultimately a matter of personal judgment.

    1. *With regards to ChatGPT’s ability to generate responses, I must admit that I may not fully understand the nuances of writing as you do.

      I would have corrected this line but I felt it might be cheating. 🙂

    2. I’m a little confused: Was that at the end supposed to be one response, or several? It looks like a single response.

      Anyway, my quick grading response (as, admittedly, a software engineer myself) is that it only answers half the question: It provides several potential explanations, but it expresses no opinions on whether any of these possible reasons to call him “the Great” are justified. Put another way, the essay ends, “Therefore, whether or not to apply the title of “Great” to him is ultimately a matter of personal judgment,” and if I were a grader, I’d say, “Yes, and what do you think?”

    3. “…I’ve always found solace in math and science, where the answer is clear-cut and definitive.”

      Well, I’m certainly willing to believe you’re an engineer.

  44. This post reminds me of two things:

    1) When I was in undergrad, I took a two-part Abstract Algebra course that had a bit of an interesting structure — we would have weekly homework that was due on Wednesday, and then on Friday the professor would have four or five of us present our proof of one of the problems, and he would ask questions about why we made such-and-such conclusion or where a given leap of logic came from. Since mathematical proofs are effectively just a very particular form of persuasive essay, doing so makes quite a bit of sense — it might be uncomfortable to stand in front of a classroom full of students and admit to making mistakes, but it (theoretically) improves your arguments. Most importantly for this story, however, the content of those presentations wasn’t actually graded — you were already graded on your proof when you handed in your homework, so this was just class participation.

    What became obvious over time, however, was that a large portion of the class produced beautiful, “correct” proofs, but couldn’t explain their work or answer the professor’s questions. Eventually, we came in one Friday and, instead of calling people up to the board to start writing their proofs on the blackboard, the professor stood in front of the class and told us that he wasn’t going to assign any more homework for the rest of the semester, because doing so was a waste of both his time and ours. And, notably, the people who complained the most about this decision were the people who caused it in the first place — the people who scored really well on the homework but who couldn’t explain their proofs to save their lives.

    In retrospect, it seems pretty inevitable that this kind of thing would happen — the class was rather large for the math department (20+ people), so it was pretty likely that a given student could cheat, copy a proof, and get away with it. And the payoff was pretty good — you’d get a perfect score, and you’d end up with an excellent grade in the course… meaning that you’d have a better-looking transcript.

    The second part of the course only had 6 students in it (because it wasn’t required by the degree), and the professor went back to the original “assign homework on Wednesday, ask for presenters on Friday” structure. And it worked FAR better (because the students who were in that class were taking it because they were interested in the topic, not because it was required to graduate, so the professor’s incentives and the students’ incentives ended up aligning), and the professor was FAR happier with us the second time around.

    (For the record, I wasn’t one of the people who copied proofs. Before the cheating was exposed, the whole situation was actually really nerve-wracking, because everyone else’s proofs were WAY better than mine.)

    2) ChatGPT reminds me heavily of how videogames have been using “Roguelike elements” as a selling point in the last few years (I could write an essay about how saying a game with some procedural generation has “Roguelike elements” is like saying that a game with 3D movement has “Doom clone elements”, but this comment section is NOT the place for that). At first, I was super excited about it — I dabbled in Nethack when I was younger, so that style of game becoming more mainstream seemed like a good thing. At this point, though, I’m thoroughly disillusioned by it, because “Roguelike elements” being a code-phrase for “we didn’t have enough money to hire a dedicated level designer” is distressingly common.

    And you can usually tell, for essentially the same reasons that you can tell that a ChatGPT-generated essay is full of malarkey — the results make sense locally (for a certain value of “make sense”), but lack the density of high-level structure that typifies how humans do stuff. It’s why AI-generated text feels rambly — there isn’t a “point” that it’s working towards, it can’t really foreshadow anything, and it has no problem with introducing or dropping an element from the text seemingly at random.

    It’s a shame, really, because there are situations were tools like ChatGPT excel… it’s just that those aren’t the situations that proponents are hollering about disrupting.

  45. We’ve had this conversation already on Twitter, but to get it down here (as a software person who is not specialist in ML models, but has some education in the sub-field):

    Whether ChatGPT can “know” or merely is able to imitate is an uninteresting philosophical question as far as I’m concerned. What’s interesting is what, precisely, it “knows”.

    It “knows” how to manipulate language, how sentences and paragraphs and questions and answers work. This has been a very active field of research, because it’s one of those things humans do nigh-effortlessly and but can only with great difficulty build machines to do. As a research project in that challenge alone, it is an enormous advance.

    However, because of that focus, it has none of the more structured, facts-based knowledge that machines are traditionally good at. As you noted, it has read 570GB… once, and then forgotten most of it. In theoretical terms: understanding structure is equivalent to lossy compression in deep information-theoretic terms that probably are relevant to human cognition as well. What we want and expect of machines, however, is the lossless, high-volume information storage of a normal database.

    When someone figures out how to connect the two – build a ChatGPT-equivalent language model that knows how to query a Wikidata-like structured fact base – we’ll have something truly earthshaking. I am not qualified to say whether that task is trivial or another decades-long piece of work, but until then ChatGPT is mostly useful as a script doctor or as a reformulator of facts the user feeds into its short-term memory.

  46. Bret Devereaux wrote:

    “I’ve heard it suggested that it could massively reduce the cost of producing formula-driven legal and corporate documents and perhaps it can. … I don’t write those sorts of things, though, so I can’t speak to the question. I would be concerned though, because ChatGPT can make some very bad mistakes and has no way of catching those mistakes, so very high stakes legal or corporate documents seems like a risky use of ChatGPT. ChatGPT can’t write a good essay, but a bad essay only wastes a few minutes of your time; a bad contract can cost a company millions and a single bad line of code can crash an entire program (or just cause it to fail to compile and in either case waste hours and hours of bug-hunting to determine what went wrong).”

    I work in a role that partners with corporate communications. Most of the content is not high stakes documents where a change of phrase will cost millions. The bulk is routine information employees need to know to do their jobs or access their benefits — think things like “make sure to sign up for a health care plan by such and such deadline” or “please attend the all hands meeting with the CEO.”

    Much of this work was already being offshored to low labor cost countries. This was made possible, in part, because some countries developed national economic strategies centered on attracting shared service centers from big multinationals by investing in infrastructure and educating students with languages of the target companies they hoped to attract. Native speakers complained in the beginning that offshored writing lacked the polish of a native speaker — and this could at times be true — but it didn’t matter. Companies in many cases concluded that the reduced labor costs exceeded whatever benefits there might be from the slightly more polished writing a native speaker might have (and to be fair, the writing of offshore authors is often indistinguishable from the writing of native speakers). Corporate communications is not a revenue-generating activity, and so there’s great temptation to slash costs whenever possible.

    All this is to say that companies have already commoditized writing and developed processes to make it cheap and easy to do so. McKinsey, Gartner and the like have whole playbooks around how to do this. More importantly, companies have internalized “good enough writing”, which is exactly what ChatGPT-like tools provide. Automation isn’t a revolutionary leap from the perspective of businesses like these. Rather, it’s a logical, incremental extension of a trend already growing — namely, get the words out as cheaply as possible instead of aiming for the highest quality writing possible. You see a comparable trend in translation, where many companies reserve more-costly human translation for sensitive topics where the writing must be nuanced and use machine translation for routine documents.

    Further, the fact that businesses will be evaluating this technology alongside relatively uncontroversial process automation efforts will make them predisposed to test it. Vendors already sell ChatGPT-like tools for corporate communications. I’d be very surprised if those vendors didn’t already have customers who had reduced headcount based on the tools’ capabilities, even if reductions happened through attrition or slower-than-planned expansions of communications departments. It’s not whether or when that’s going to happen. Those processes are underway right now. I suspect it’s going to be increasingly hard for newly graduated communications professionals to get a foot in the door as they once could through jobs that have them writing low-stakes copy.

  47. “at no point in this process do I actually want the essays”

    This is the point at which I, a former adjunct college composition instructor, snorted into laughter.

Leave a Reply to Darkest YorkshireCancel reply