If today's generative AI isn't a pending AI Apocalypse, then what is it?
People are incredibly creative in imagining human-like properties in the things we interact with. People using a simple program in the 1960s named Eliza believed they were having a conversation with a live human being. People see pictures of Jesus in a Dorito. As a species, we're good at projecting human qualities in the things surrounding us. However, that creativity causes problems when we try to get at what LLMs are good at. Saying that LLMs are "creating" or "hallucinating", anthropomorphizes the technology and muddies our understanding how to be productive with it.
However, one helpful way of thinking about LLMs is as "word calculators". The way that LLM's statistical probability routines work is that a model will try and complete a document in the most statistically likely way possible. It is not trying to be correct. It is trying to complete a document.
One of the ways to illustrate this is shown in the slide. "Which is heavier, one pound of feathers or one pound of lead" is a common introductory science question for exploring mass, density, and weight concepts. In its training, ChatGPT ingested copious amounts of text where the answer appearing after the question was "a pound of feathers and a pound of lead weigh the same amount".
When we slightly tweak the question and ask what is heavier, "a pound of feathers or five pounds of lead" ChatGPT isn't parsing the sentence and applying logic the way we do. Rather, it is attempting to answer the question in the most statistically probable way – since it has seen that similar questions often result in "they're the same", it too replies that the weights are the same. Amusingly, it then goes on to contradict itself in the next sentence.
An essential part of successfully working with an LLM, like ChatGPT, is following an iterative process that allows us to surface and correct these internal assumptions. It is less about creating the perfect, singular prompt to perform work – asking ChatGPT to "Write an application that will make me rich" will end in disappointment. Instead, it is about thinking critically and creatively about refining what we're after.
We're almost to the point where we're about to play with some responses. However, let's cover some final important warnings if you use something like ChatGPT for business use.
- Data Privacy – obviously you don't want public LLM to learn on your data. There is a current wave of restrictions in this area: Apple is restricting ChatGPT access. Microsoft is said to launch a private alternative sometime this summer – a version of ChatGPT that runs on dedicated cloud servers where the data will be kept separate from those of other customers. While that will be nice, a proportional cost will most likely accompany it.
- ChatGPT's availability has been much better in recent months. However, there are still times when the service is unavailable or returns an unknown error. Having this occur during a live demo is not desirable.
- Also, LLMs like ChatGPT are non-deterministic, meaning we'll get different answers if we ask the same question multiple times. That can be a problem when trying to recreate behaviors.
Another approach is to take something you've already written and ask ChatGPT or fix grammar or punctuation. You can even ask it to rewrite to clarify the main points or apply more professional polish than what currently exists.
Sometimes it is helpful just to have a starter reference. Things like performance reviews, things to cover in 1-on-1s, internal CMS documentation, and more are all opportunities to overcome initial inertia (the structure) and get to the fun parts – injecting your personality to make it something special.
That can be highly beneficial when doing competitive analysis. Imagine being able to take a user forum filled with customer feedback and easily identifying new feature opportunities. ChatGPT, in this case, is not just summarizing the reviews, but can infer whether the comment is overall good or bad.
In the immediate future, we will see all manner of existing software tools incorporating 'AI' or 'Copilot' features. It is just too much of a compelling selling point now. Some of this will be genuinely beneficial and create new opportunities for productivity. In other cases, existing algorithmic automation will get rebranded.
With all of this 'AI' advertising, we need to be able to evaluate the claims made. There will be helpful functionality. There will also be a tremendous amount of 'AI snake oil'.
Much of what we discussed in both the design and development areas has the potential to create better results. However, it is crucial to recognize that both of these steps pale when discussing software's total cost of ownership.