AI - Separating Facts from Fictions

In this article we tried to put some thoughts on much hyped GenAI topic. We want to stress that LLMs aren't just something that happened in the last few years and there's an obvious challenge in covering such a fast-moving topic. Those familiar with the long history of AI will readily acknowledge that it goes through periods of boom and bust.

Initial excitement at sprouting new approaches quickly melts under the bright lights of reality. Much of the 80s and 90s was dominated by times of under-investment, referred to as 'AI winters'.

It seemed like we reached something a plateau in 2019 or 2020, after an especially intense period of excitement around new LLM methods. However, the rapid release of AI tools in recent years – tools like Dalle-2, Midjourney, Stable Diffusion, and ChatGPT, has poured gasoline on an already brightly burning fire. In fact, ChatGPT recently set a record for the fastest-growing user base in history.

When many of us hear "AI", we're actually thinking about "Artificial GENERAL Intelligence", or the kind of self-aware conscious construct we see in sci-fi movies and read about in futuristic books. The current crop of LLMs is not that – in fact, they are very far from it. However, some of the biggest names in the space are working hard to present that very impression.

Sam Altman, CEO of OpenAI, creators of ChatGPT and image generation Dalle-2, has warned that AI poses 'risk of extinction' to humanity, on par with pandemics and nuclear warfare.

At the same time, an open letter has been circulated, signed by several prominent figures, calling for a temporary halt to LLM training and development so the potential risks can be studied. That sounds like a reasonable request until you dig deeper into what is being asked for. Those concerns are not about paying the creators whose work was used to train the model. They are not about mitigating existing injustice. Rather, the risks were imagined 'AI Apocalypses'.

As scientist, author, and current AI-hype critic Emily Bender points out:

"Powerful corporations hype about LLMs and their progress toward AGI create a narrative that distracts from the need for regulations that protect people from these corporations."

There is a certain amount of hubris in going before the world's governments and declaring that "we're so good at our jobs, you must acknowledge us". Especially when at the same time, you're marketing your products as "too good", you undermine good faith efforts to regulate AI.

The danger with automation isn't some remote, far-off sci-fi dystopia. Instead, the harms are the much more mundane, everyday negligence that Virginia Eubanks documented in her 2018 book, Automating Inequality. Software can perpetuate bad systems. Automation can perpetuate bad systems at scale.

If today's generative AI isn't a pending AI Apocalypse, then what is it?

People are incredibly creative in imagining human-like properties in the things we interact with. People using a simple program in the 1960s named Eliza believed they were having a conversation with a live human being. People see pictures of Jesus in a Dorito. As a species, we're good at projecting human qualities in the things surrounding us. However, that creativity causes problems when we try to get at what LLMs are good at. Saying that LLMs are "creating" or "hallucinating", anthropomorphizes the technology and muddies our understanding how to be productive with it.

However, one helpful way of thinking about LLMs is as "word calculators". The way that LLM's statistical probability routines work is that a model will try and complete a document in the most statistically likely way possible. It is not trying to be correct. It is trying to complete a document.

One of the ways to illustrate this is shown in the slide. "Which is heavier, one pound of feathers or one pound of lead" is a common introductory science question for exploring mass, density, and weight concepts. In its training, ChatGPT ingested copious amounts of text where the answer appearing after the question was "a pound of feathers and a pound of lead weigh the same amount".

When we slightly tweak the question and ask what is heavier, "a pound of feathers or five pounds of lead" ChatGPT isn't parsing the sentence and applying logic the way we do. Rather, it is attempting to answer the question in the most statistically probable way – since it has seen that similar questions often result in "they're the same", it too replies that the weights are the same. Amusingly, it then goes on to contradict itself in the next sentence.

An essential part of successfully working with an LLM, like ChatGPT, is following an iterative process that allows us to surface and correct these internal assumptions. It is less about creating the perfect, singular prompt to perform work – asking ChatGPT to "Write an application that will make me rich" will end in disappointment. Instead, it is about thinking critically and creatively about refining what we're after.

We're almost to the point where we're about to play with some responses. However, let's cover some final important warnings if you use something like ChatGPT for business use.

- Data Privacy – obviously you don't want public LLM to learn on your data. There is a current wave of restrictions in this area: Apple is restricting ChatGPT access. Microsoft is said to launch a private alternative sometime this summer – a version of ChatGPT that runs on dedicated cloud servers where the data will be kept separate from those of other customers. While that will be nice, a proportional cost will most likely accompany it.

- ChatGPT's availability has been much better in recent months. However, there are still times when the service is unavailable or returns an unknown error. Having this occur during a live demo is not desirable.

- Also, LLMs like ChatGPT are non-deterministic, meaning we'll get different answers if we ask the same question multiple times. That can be a problem when trying to recreate behaviors.

Another approach is to take something you've already written and ask ChatGPT or fix grammar or punctuation. You can even ask it to rewrite to clarify the main points or apply more professional polish than what currently exists.

Sometimes it is helpful just to have a starter reference. Things like performance reviews, things to cover in 1-on-1s, internal CMS documentation, and more are all opportunities to overcome initial inertia (the structure) and get to the fun parts – injecting your personality to make it something special.

That can be highly beneficial when doing competitive analysis. Imagine being able to take a user forum filled with customer feedback and easily identifying new feature opportunities. ChatGPT, in this case, is not just summarizing the reviews, but can infer whether the comment is overall good or bad.

In the immediate future, we will see all manner of existing software tools incorporating 'AI' or 'Copilot' features. It is just too much of a compelling selling point now. Some of this will be genuinely beneficial and create new opportunities for productivity. In other cases, existing algorithmic automation will get rebranded.

With all of this 'AI' advertising, we need to be able to evaluate the claims made. There will be helpful functionality. There will also be a tremendous amount of 'AI snake oil'.

Much of what we discussed in both the design and development areas has the potential to create better results. However, it is crucial to recognize that both of these steps pale when discussing software's total cost of ownership.

Another concern, as suggested by papers in space, we could be reaching the point of diminishing returns in raw training data set size. There is a high likelihood that performance improvements in the immediate future will be because of targeted, task-based optimizations for specific industries, as opposed to larger training data sets.

All of that is some interesting, challenging technical problems. We're not saying that we won't see improvements in LLMs in the future. What we are trying to impress upon you is that the rate of progress isn't up and to the right forever – at least not using current techniques.

We're not talking about a model when we say 'AI did this' or 'AI disrupted that'. Whether we're talking about laying off thousands of people or undermining a workers' strike, those are decisions being made by people. And we need to hold them accountable for their actions.

"Rather than asking, 'can AI do that?', the more appropriate question is 'Is this an appropriate use of automation?'"

That applies to a lot more than just ChatGPT. From our dev pipelines to our call center support, how do new forms of automation enable us to BEST serve our customers and what they are trying to do? In whose interest is it to lay off journalists, creatives, and even technology professionals?

What does exist is a set of tools - useful, powerful tools that tighten the feedback loop for creators. Further, these tools leverage the power of language. Rather than requiring a priesthood versed in ritualized incantations, the promise is that anyone who speaks can create software. While much of the low or no-code effort has focused on greater visual metaphors, from Visual Basic to Scratch, the promise with tools like ChatGPT is to forgo that learning ramp entirely."

Grady Booch, an American Software Engineer perhaps best known for developing the Unified Modeling Language, once said, "the entire history of software engineering is one of rising levels of abstraction". There is a solid argument to be made that LLMs represent the next major abstraction in software development. The key to unlocking ever greater levels of software potential was not enriching our communication with more rigid semantic meaning. Instead, it was about training the machines to be more comfortable with our language's ambiguity.

In conclusion, as those most able to understand these tools' nature, we must use them wisely – to empower where necessary and avoid perpetuating a status quo of inequality and apathy.

Follow us on LinkedIn or just read our blog here