What Is AI?

The quest for Artificial Intelligence, or AI, has a history that stretches as far back as the earliest computing machines (circa the mid to late ’40s). Its goal has been to create an “intelligent” machine – a rather nebulous endeavor given that what it means to be intelligent has varied over time and across different circles.

Various approaches have been taken over the years, influenced by various thoughts on intelligence. For many years, the popular focus was on increasingly complex algorithms. This branch of AI focused on things like “search spaces” and avoiding “local maxima” when finding a “solution” in the search space. This approach capitalized on the logical behavior as a cornerstone of intelligence. At one time, a program that could play champion-level chess would have been considered a crowning achievement.

In recent years, as computers and computer programs have pervaded our lives, attention has turned to more practical matters. Can we create machines that can do what we do – that can serve as our assistants? Can we create a machine that we can converse with in natural language? Can we ask a machine to prepare dinner for two – meeting certain dietary restrictions, but also the palates of the diners? These tasks may not seem as grand as playing champion chess. But they are arguably more helpful. As it turns out, it is also difficult to create programs that deal with the fuzzy nature of our everyday lives.

ChatGPT 3.5 can be considered the poster child of the current brand of AI. Since its public release in November 2022, there has been immense public interest in what AI can do. ChatGPT 3.5 – like most modern AIs – was not created by programmers creating algorithms covering everything we might ask ChatGPT to do. It is based on a different approach – neural networks. Neural networks attempt to mimic how brains work – as a cluster of neurons. Intelligence becomes an emergent property of those neurons in action. Also, in contrast to traditional computer programs, they are trained rather than programmed. In classical computer programming, one develops algorithms that are then reduced to computer code. With neural networks, a network is presented with various input and output (or stimulus and response) pairs. At each training iteration, the network’s learning algorithm tweaks the network (strictly speaking, its “weights”). Over many training iterations, the network’s output/response gets closer to the desired/ideal output for a given input/stimulus.

While less visible than classical AI for many years, the neural network approach stretches just as far back. In fact, it can be traced to the 1943 paper A Logical Calculus of the Ideas Immanent in Nervous Activity by McCulloch and Pitts. In the 1950s, the perceptron brought the idea of a network of artificial neurons to a machine embodiment. In the decades since, much research has gone into neural networks – including what the shape (topology) of networks should look like and how they should be trained.

Early neural networks were limited by technology. The human brain has about 86 billion neurons, with each neuron having, on average, 7,000 synapses (connections to other neurons). (It should be noted that the author’s brain is generally thought to consist of two neurons – but with about 50 quadrillion connections between them.) In the ’80s, a neural network running on Sun workstation could have from a few dozen to a few hundred nodes (neurons). Training such a network could take hours to days. Needless to say, these networks could only perform rudimentary tasks. But they demonstrated the potential for this form of computing.

Over the years, as technology advanced, the scale of neural networks increased. By the mid-2010s, they started becoming more “interesting”. Image analysis and natural language processing were early targets – they were areas where we struggled to create algorithms. But they were natural fits for training neural networks. The “large language models”, or LLMs – neural-network-based models designed with language processing in mind – started to appear.

Neural networks at this time were software that often ran on hardware accelerators. Graphics processing units (GPUs) became a favored accelerator. GPU shaders are designed to do a large number of operations in parallel. And, as it turns out, at a basic level, neural networks operate by doing a very large number of matrix math operations. NVidia had, for quite some time, advertised its GPUs as more than just graphics engines with its CUDA architecture. This gave it a leg up in the neural network AI space. (NVidia has since created a whole software ecosystem for AI, often referred to as its “AI moat,” which protects it to an extent from competitors.)

The public release of ChatGPT 3.5 in November 2022 generated huge public interest in neural-network-based AI. As a result of that interest, tremendous amounts of capital started flowing into AI R&D and data centers. This has resulted in a somewhat “unnatural” rate of advancement. Under normal circumstances, a number of factors would limit the rate at which AI tools can advance. Aside from the capital required to fund R&D, chips are only so big and so fast, and they can only be produced at a certain rate. AI models such as ChatGPT and Claude also require vast amounts of memory and storage. While the normal pace of advance would allow more complex chips and denser memories and storage over time, the massive influx of capital has resulted in diversion at an industry level towards powering AI. This is evident in the cost of computer memory – as both the memories and the process used to fab (create) them are diverted to AI. This memory shortage is affecting not only computers, but also gaming consoles, smartphones, and televisions.

Another evidence of the”unnatural” rate of AI advancement is the demands the AI data centers are placing on power grids. In normal advancement, this would be tempered as over time the chips that power AI would consume less power (advances in process technology allow more transistors to be crammed into the same area, for them to be run faster, to take less power, or some combination of the three). But to meet demand, we are cramming more AI processors into data centers, resulting in massive power requirements. AI capacity is now being quoted in gigawatts. Nuclear micro reactors are being considered to power AI data centers, and a number of methane gas turbines were “temporarily” brought online to power

With this new form of computing comes the opportunity for new tools that take advantage of its strengths. Chatbots are one class of tools. Generative tools which can produce images and videos are another. AI agents that can act autonomously are starting to come to the fore.

Tools have also been created that are more focused in their scope. For example, tools to examine CT scans and MRIs have been developed. (It was once thought that the advent of AI tools in radiology would mark the end of the radiologist. After all, they could spot tumors or other abnormalities, right? Those tools are here, but rather than replacing the radiologist, they are aiding them. Someone still has to be there to sign off on the results, and more importantly, to verify results, particularly those that indicate an abnormality. And instead of putting radiologists out of work, we now have a shortage of radiologists!)