ExtremeTech Explains: What is a Neural Net?

Photo: Mike MacKenzie, Flickr
As Moore’s Law approaches its endgame, the technosphere has looked to different and more diverse approaches to computing. In order to continue to increase the computational power of our systems, we can’t just depend on driving clock speeds higher and higher, nor can we continue making transistors increasingly smaller. To truly move forward, a new paradigm must be considered, and when peering far into the future, comparisons abound between computers and the human brain. After all, it’s all about computational power, and that’s one area where our brains are seemingly still better than computers. Therefore, the next step forward is clearly to try and make a computer that functions similar to our brains.

Like we predicted back in 2016, Artificial Intelligence (AI) is powering the current technological revolution in human healthcare. It also appears that neural nets may be the next frontier in the advance of computing technology. As we increase our understanding of the human connectome and the way our brains handle information, our ability to understand its phenomenal information throughput grows in tandem. This has led to Neuromorphic computing tools blossoming in popularity.

This guide is intended to be a brief but thorough primer on neural networks. If you’re left with questions after reading, please let us know in the comments: we may address your questions in an update or a future article.

So, without further ado…

What is a Neural Network?

A neural network, also known as a neuromorphic system, is a system of hardware and/or software that is designed to emulate some aspect of the human brain. There are many types of artificial neural networks, or ANNs, so we’ll cover that in a moment. For now, let’s consider the hardware.

The comparison of computers to the brain runs both broad and deep. That’s because both brains and computers are similar on both a semantic and structural level. They are both convoluted mesh networks of multiple layers, responsible for high-volume information processing. They both have working memory whose buffer can falter or overflow. They both have “cold storage,” which is organized in a rough semantic hierarchy across the surface of the brain, similar to how data is stored in different places across a hard drive platter. They both consist of a network of nodes, whose importance is individually expressed by the “weights” of their connections. Their nodes communicate using a series of electrical spikes that has data embedded in the sequence.

The analogy breaks down when you compare the brain to a CPU, however, because of the brain’s enduring habit of “parallelize ALL THE THINGS.” CPUs are beholden to a data pipeline, in a way brain regions are not. A closer analogy would be comparing the brain, as the sum of its connectome, to an FPGA. FPGAs are designed to be customized by the user after manufacturing. Neural plasticity means that even after the brain’s hardware is “fully matured,” its synaptic and electrical software can still receive updates. As the environment changes, the brain adapts.

The eye is not a camera, except that in an important sense, it is; the structure of the retina is arranged in an orderly map of the visual field which persists the whole way along the optical nerve to the visual cortex. We can diagnose where an injury to the optic nerve must have occurred, based on the part of the visual field that has problems. Each neuron in the processing layers of the retina reports upstream. You can compare the retina — the capture device of the eye — to the grid of sensors on a CCD camera, like the one Hubble uses to capture its images. Mirrors, CCDs and retinas all just capture the two-dimensional photon stream they receive. A two-dimensional photon stream is also called a video.

Similarly, the brain is not a computer, except that it kind of is. We even have an internal clock that organizes the brain’s many firing rhythms in time, nudging them into an orderly and layered composition not unlike a song.

In summary, neural nets typically elaborate on three core components: A neural network is a system composed of neurons or nodes, plus their connections and weights. It must also possess some kind of propagation function. Some neural networks are capable of observing their own results, and changing their approach to a task.

Why Mimic the Brain?

In short, we want to mimic the brain because brains are really good at what they do. They’re fast, light, and very low-power. The human brain is thought to have a petascale information throughput — much greater than any single PC we’ve ever built. (Distributed supercomputers don’t count.) And the brain operates on about twenty watts. This winning combination is partially because of its physical structure, and partially because of the unique way information travels through the nervous system. Brains work in parallel and serial, binary and analog, all at the same time.

When neurons fire, those impulses travel down axons in series. But neurons are arranged in cortical columns, and cortical columns are arranged into brain regions. In this way, whole brain regions of many cells can work on the same task, in parallel.

Neurons communicate with one another by receiving, interpreting, and then propagating a tiny wave of electricity down the length of the axon. It’s extremely low-power, because that little sizzle of depolarization across the membrane is performed by simply cooperating with the electrochemical gradient outside the cell. Cortical neurons, in particular, have a profoundly branching structure. They connect in a “many-to-one” fashion to their neighbors who are physically next to them, and to “upstream” neurons who come before them in the information flow. They also connect in a “one-to-many” manner when communicating to downstream colleagues. To manage all these surfaces, they analyze the “weight” of connections in order of importance by building synapses.

Neurons also process information in an analog sense: Neurons do analog uptake by tallying up all the overlapping, real-time flickers of input from dendrites. It’s a physiological Fourier transform. But neurons are also strongly binary when it comes to how they convey their messages. A neuron’s signal is made of electrical spikes organized in the time domain, and the spikes don’t vary in amplitude. A neuron is either spiking or it is not. Spikes are composed of a tiny traveling zap. Spike trains are time-delineated sequences of electrical waves that contain fragments of data. It’s almost like a Morse code receiver.

Hardware vs. Software Neural Nets

Approaching brain function as an emergent property of its physical structure leads us to a hardware interpretation of a “neural network.” IBM’s TrueNorth, produced in 2014, was a manycore neuromorphic CMOS chip hosting a convolutional neural net. TrueNorth had its own software ecosystem, including a bespoke programming language, libraries, and a whole IDE. Similarly, Intel’s Loihi ecosystem is a hardware neural net with an associated software framework. Loihi is a neuromorphic chip, and Lava is the software access path to Loihi’s powers.

Loihi’s physical architecture mimics the physical organization of the brain. While it is made using conventional semiconductor materials and will be manufactured in the future on an Intel 4 process node, Loihi is organized very differently than the silicon we’re all used to. Loihi has up to a million neurons: individual entities in the network, each with 128kB of memory attached. That pool of information is the chip’s analog for synapses. It reflects the state of the neuron’s connectivity at any given time. It is supervised by adjacent x86 cores, which impose an external clock to correct the neurons’ firing rhythms. The supervisor cores also periodically force the neurons to check their memory against the rest of the group, or recalculate the strength of their connections.

Intel Loihi 2

This structure matches the hierarchical and parallel aspects of the brain’s organization. In addition, Loihi 2 revised its approach to firing. Loihi 1 fired its neurons in binary: one or zero, nothing in between. Loihi 2 encodes its spikes as integers, allowing them to carry metadata. It also means that they can exert some influence on neurons downstream. Spikes with integer values can emulate the catalogue of different electrochemical signals a neuron can send or receive.

Developers can interface with Loihi and Lava using Python. (This is starting to sound like quite a tropical adventure.) The Loihi system will eventually be available to researchers, but consumer applications are a low priority.

Major Types of Neural Nets

Neural nets, with their layers and redundancy, excel at handling highly parallel tasks. They also help with tasks that require the user to ingest a huge amount of data in order to identify patterns within it — this is often called “drinking from the firehose.” To get the benefits of big data, we have to be able to process it at a useful speed. Neural nets also excel at manipulating data with metadata or many dimensions.

There are many different individual neural net projects, but they all fall within a few different families of function. Each algorithm is built for a different type of problem,  and they all engage in subtly different kinds of machine learning. Here, we’ll discuss four major subtypes of neural networks: convolutional, recurrent, generative adversarial, and spiking.

Convolutional Neural Nets: Convolutional neural nets (CNNs) are “feed-forward” systems, which means that the flow of information is restricted to one direction through the network. This type of neural network was mentioned in media reports of AIs / neural nets that can perform operations, but can’t explain how they arrived at their answers. This is because convolutional neural nets simply aren’t built to show their work. They consist of an input layer, one or more hidden layers, and an output layer or node.

CNNs are often used for processing images. Because they’re feed-forward, mathematically, CNNs do great work on data that comes in a grid format. They are robust when applied to two-dimensional arrays of data, such as images and other matrices. Under the hood, they are applying a long mathematical formula that directs the algorithm in how to perform operations on not just two numbers or terms or equations, but a whole body of data, like upscaling an image.

Feed-forward also entails a certain amount of dead reckoning. One way this shows itself is in the way a CNN can perform image recognition and then use its newfound understanding to produce distorted, trippy images derived from its training dataset. In 2016, MIT released an AI that could harness this runaway behavior to “spookify” images, producing a torrent of nightmare fuel just in time for Halloween.

Recurrent Neural Nets: In contrast to the feed-forward approach, recurrent neural nets do a thing called back-propagation. Back-propagation is the act of relaying information from deeper to shallower levels in the neural net. This type of algorithm is capable of self-improvement.

Recurrent neural nets perform back-propagation by making connections to other neurons in the system — on a scale up to and including having every neuron connected to every other neuron. This redundancy can produce highly accurate results, but there is a ceiling of diminishing returns. It is not unlike super-sampling anti-aliasing (SSAA). As the algorithm makes more and more passes over data it’s already processed, there’s less and less it can do. Going from 2xAA to 4xAA can produce clearly noticeable results, but it’s tough to tell the difference between 8x and 16xAA without a practiced eye.

This type of neural network can be trained using gradient descent, a method of analysis that makes a three-dimensional landscape of possibilities. Desired or undesired results can be represented as “terrain” in the landscape, in places that correspond to their statistics. As we’ve said before, gradient descent isn’t the best neural network training method, but it is a powerful tool. Recurrent neural nets can give gradient descent a boost by maintaining some memory of what the changing landscape used to be.

Spiking Neural Nets: As we’ve seen above, neuromorphic design comes in both physical and digital formats. Instead of a stream of binary data running constantly through a single CPU, spiking neural nets can be software, hardware or both. They are made of decentralized cores, physical or logical, which fire in a cadence called a “spike train” to convey their signal. These arrays of cores are united by a common network structure, and each neuron is a node within that network.

Attempts to model the way neurons manage their many I/O channels have taught us that there is information encoded in many different channels. There is data related to whether the neuron is firing, and also in the sequence of spikes. It also matters which of its networked colleagues were firing at the same time.

One important tool in neuromorphic computing is the “leaky integrate-and-fire” model. Communication between neurons is modeled by a set of differential equations that describe each neuron’s different I/O. (Differential equations are excellent tools for comparing rates. In some contexts, you can use diffEQs to model an arbitrary n-dimensional set of comparators.)

Each neuron in a spiking net has a weight. The weight represents a rolling average of the neuron’s recent activity. More activations push the value higher. But the weight is “leaky,” in the sense of having a hole in a bucket. As time passes, there’s a decay function that slowly decreases every neuron’s network weight. This accounts for the idea that biologically, not every neuron is active at all times. You can contrast it against the permanently saturated wreath of connections in a recurrent neural net.

Spiking nets are not so good at gradient descent, nor optimization problems of that kind. However, they may be uniquely suited to modeling biological functions. As spiking nets become more complex, they may become able to encode more information within a series of spikes. This will enable a much closer computational pass over the diverse functions of the nervous system. We’ve already simulated the nervous systems of C. elegans roundworms and Drosophilia fruit flies, and researchers are attempting to simulate a human cortical column in real-time.

Another possible direction for spiking neural network research is into additional levels of abstraction. Researchers are working on creating a spiking net wherein each individual neuron is itself made up of a neural network.

Generative Adversarial Neural Networks (GANs): One type of neural net with rising popularity is the generative adversarial neural network, or GAN. GANs are another evolution of artificial intelligence, frequently used to alter or generate images. The “adversarial” part means that these neural nets are built to compete with themselves.

Just as Cerberus had three heads, within a GAN there are often two separate neural nets with their own intentions, one generative and one discriminative. The generative model produces a result, often an image. Then the generative side tries to “fool” the discriminative model, to see how close it can get to a desired output. If the discriminative side isn’t fooled, the result is discarded. The results of this trial, both the success of the generative side and also the content it made, are filed away; sometimes the learning is supervised, and sometimes it is not. But in either case, after each round of judgment, the GAN goes back to the drawing board and tries again. This is how the pair iterate toward success.

GANs are capable of producing unique, photorealistic images of people who don’t exist in the real world. To do this, they look through many photos of real humans, to gather data on how we differ from one another, and on the ways in which we resemble one another. In effect, this is brute-force phenotyping. Then, once the GAN is sufficiently educated, its generative side can start trying to produce its own original work. One example is Nvidia’s StyleGAN, which can produce images of startling, deceptive realism. There’s even a derivative project that challenges viewers to identify whether a given StyleGAN picture of a person is real or fake.

The person in this image does not exist. This is a deepfake image created by StyleGAN, Nvidia’s generative adversarial neural network.

The results of a GAN’s labor can be so realistic, in fact, that in 2019 the state of California (the home of both 2257 forms and Hollywood) instituted a law banning the use of technology like GANs to create deepfake pornography of a person without their consent. The state also outlawed the distribution of manipulated video of political candidates within two months of an election. DARPA is trying to keep abreast of this A/V arms race by instituting an entire division to study both GANs, and ways to defeat them.

While this all sounds very stressful, there are uses for GANs that don’t involve scraping the internet for public-facing profile photos. One application is particle physics; physicists require exquisite certainty in their measurements before they are willing to say they’ve found a new particle or explained a phenomenon.

Another place GANs excel is game theory. Presented with a list of rules and priorities, GANs can assess the likely choices of participants, and use that probability spread to predict the endgame. This type of neural net is also under study for use improving astronomical images, and predicting gravitational lensing.

Summer 2021 saw the release of CodexAI, a generative neural net capable of improving its own software. The model can translate natural language to code. It can also generate snippets of intelligble code after being fed all of Github.

While CodexAI can be considered a fully fledged neural net on its own merits, it also looks like it could easily be part of a much larger, hierarchical system. CodexAI’s behavior resembles the first faltering sparks of a lone neuron as it establishes its first synapses. It also shows the limitations of the technology. Neural networks can learn to correct their assumptions, but the reach of AI still exceeds its grasp. Integrating multiple diverse models is the path of the future.

Where Do Neural Networks Fall Down?

Neural networks are great at fulfilling specific and well-constrained requests, but they can be overeager. The great strength of computers is the speed at which they can perform repetitive operations. These rapid iterations also make it possible to over-train a neural net. When that happens, its dead reckoning goes totally awry. An over-trained AI can produce some remarkably strange images, making it not very useful for predictive purposes like weather forecasting.

Ultimately, though, these sundry weaknesses are minor concerns compared to the problem of growth. To get more powerful a neural net has to get bigger, and therein lies the rub. Neural nets can’t scale infinitely. Their scaling efficiency is actually worse than a regular datacenter, because of the very thing that makes neural nets so capable. The central concept of a layered neural network, its layered depth and redundancy, demands an exponentially increasing amount of power. Thus far we’ve been using brute force to achieve our ends, which works — to a point.

This problem with power scaling is why Intel is using Loihi’s low power consumption as a primary selling point. Eventually, the combined challenges of power use and thermal dissipation will put a hard limit on our ability to just link up more of these chips to make bigger and more sophisticated AIs.

Final Thoughts

The difference between a neural network and an artificial intelligence is largely a matter of opinion. One school of thought considers a neural net to be an artificial intelligence in and of itself. Others consider an artificial intelligence to be made of subordinate neural networks. The only difference is the level of abstraction at which the speaker chooses to make the distinction.

One thing everyone seems to agree on is that neural nets can’t do what they do without data. Big data. As edge computing and data science take off, a whole new realm of information is opened to our analysis. There is a staggering amount of raw data produced every day. It is up to us to find creative and clever ways to use it.

Now Read:


Older Post Newer Post