AI tools have different properties than classical computer tools, reflecting the technology they are built on. One way to think of classical computer programs is that they are a series of “If X is true then do Y, otherwise do Z”, where Y and Z are in turn “if-then-else” constructs. Reducing a problem to a program is finding the series of “if-then-else” steps required until Y and Z are something that the computer hardware can do. While programs can be incredibly complex, if you have the program’s source code and know what it is acting on, you know what it will do.
AI tools are built on neural networks, which, as one might guess, are inspired by how the human brain works (a “network” of neurons). A neural net is “trained” by repeatedly presenting it with “training data” – a collection of inputs/output (or stimulus/response) pairs. During training, the neural network’s training algorithm adjusts parameters so that, for a given input/stimulus, the output/response is repeatedly closer to a desired output. While neural networks are themselves programs, the “program” of an AI model is actually in its parameters (weights). Unlike a classical program whose algorithm is designed by the programmer and then laid down in code, a neural network “programs” itself during the training process. While this relieves the programmer of coming up with algorithms, it also means that for networks of any moderate complexity, the actual algorithm is not known. While the operation of the network can be observed and traced, its “algorithm” cannot be expressed in a manner meaningful to humans.
Neural networks and traditional programs operate in fundamentally different ways. One way to think about neural networks is that they operate on patterns, where a pattern could be an image (a pattern of pixels) or some text (a pattern of words). Their outputs can be thought of as probabilities. If you present an image (a pattern of pixels) to a neural network that has been trained to detect dogs, its output might be the probability that there is actually a dog in the picture.
While classical programs excel at problems that can be reduced (by their coders) to binary logic and if-then-else steps, the current brand of neural network AI excels at problems that can be reduced to patterns and probabilities. Typically, the latter are problems that computer programs struggle with. The reason is that patterns can be “fuzzy”. Writing a program to detect dogs from a pattern of pixels, for example, is incredibly difficult. There are many types of dogs, and even the same dog can be represented by quite different patterns of pixels in various photos. No two pictures are identical, and can be quite different. Neural network training hands the problem of finding a “dog detection” algorithm over to the training process. The “hard part” is presenting the network with sufficient examples of dogs (and non-dogs) so it “learns” what a dog is.
This all sounds great, right? Neural networks effectively “program” themselves! Well, there’s a downside. While neural networks excel at patterns and “fuzzy” inputs, they also produce “fuzzy” outputs – the outputs of a network are probabilities. Let’s go back to our dog-detector neural network. You feed it a bunch of photos with dogs, birds, and people. Since this is supposed to be a dog detector, the desired output for dog pictures is 1.0, and the desired output for photos without dogs is 0.0. After training, you test the network. The dogs tend to get outputs close to 1.0 (“dog detected”). The people tend to get outputs close to 0.0 (“not a dog”). Then you feed it a bunch of photos with cats. The outputs are close to 0.7. What does this mean? The network’s training data didn’t include cats. Its world consists of dogs and humans. The cat is closer to a dog than a human, so the output leans in that direction. But not with as high a confidence.
From here, you can guess how neural networks can “hallucinate”. It’s not that the network went totally off the rails – but it can only do as well as its training. And things can get increasingly convoluted with large neural networks, which can be thought of as multiple neural networks feeding into each other. One low-confidence “conclusion” can ripple through the network, with increasing inaccuracy at each stage.
AI tools are usually a mixture of neural networks and classical programs. Users interact with the program, which turns around and uses neural networks for certain tasks. It then takes the neural network’s results and presents them to the user.
But that neural network component means that we shouldn’t interact with AI tools in the same way as regular computer programs. We implicitly trust computer programs to do what they’re supposed to do. And, in general, their actions are predictable and repeatable. When you take your dog to the vet, and the receptionist uses a program to look up your dog’s records, everyone trusts that all available records for your dog are brought up. Not someone else’s dog, and not only a subset of those records (unless the receptionist explicitly directed the program to filter them). On the other hand, if you asked a chatbot-type LLM (a rather evolved form of neural network) to display all of your dog’s records, it may or may not give the right answer. It may “forget” one or two. It may hallucinate some records. It may present some records inaccurately (depending on the exact model, training, and data sets the chatbot is based on).
It is better to think of an AI tool as a human assistant (like a secretary or student helper). When first working with the tool, you supervise and monitor its results frequently. Over time, trust is developed for certain types of tasks. You gain an understanding of what it’s good at – and what it’s not.