Newsroom > Blog
July 06, 2023
With smart sensors creating data at an ever-increasing rate, it is becoming exponentially more difficult to “consume” and make sense of the data to extract relevant insight. This is providing the impetus behind the rapidly developing field of artificial intelligence (AI). When most of us think about AI we picture video feeds with green boxes drawn around people, suspicious objects or faces to alert monitoring personnel of a situation that needs further attention. This may indeed be where AI got its first foothold but increasingly, it is evolving and being used in everything from language translation to DNA sequencing. In this blog, we will discuss the role that FPGAs have to play in this rapidly evolving field.
In discussing AI we must first draw a distinction between training an AI model and using the trained model for inference on a real world data stream. Training is many orders of magnitude more compute intensive than inference and requires the compute resources of a data center to achieve results in a reasonable time frame. The near unlimited compute capability, large power budget and virtualized, time sliced business models of the data center mean that economics permit the use of expensive and highly parallel compute resources to achieve the training task. Here we see entire racks of Graphical Processing Units (GPUs) and high-end FPGAs that specialize in massively parallel operations at the expense of power and budget. Given the increasing complexity of AI models and the massively asymmetric use case of training vs inference, it is likely that training will remain firmly in the data center although incremental training algorithms are being deployed that reduce the asymmetry and place some of the burden of training out at the network edge.
Inference on the other hand, while still extremely computationally intensive, can be achieved outside of a data center and closer to the network edge where the data is being created. Here, power, physical space and budgets are constrained. If AI inference is to be deployed at the edge, at scale, it must be cost effective, run on available power and operate in the harsh environments found at the networks edge. These constraints are seemingly at odds with each other and severely limit the choices of a designer seeking to deploy a sufficiently performant AI capability at the edge.
AI architectures are evolving rapidly. What seemed impossibly complex and the subject of academic research just a few years ago, today seems mundane. Efforts to provision sufficient compute resources to run state of the art AI models have largely been successful only to find that the AI models have progressed to new levels of complexity and accuracy. The following is a very high-level discussion of three major categories of AI architectures and the challenges they present.
CNNs have been the workhorse of AI for many years. In their simplest form they take a frame of input data and step a smaller analysis window across the data in two dimensions performing progressive mathematical convolutions. Subsequent steps of normalizing the result and resolution reduction of the input data before repeating the convolution process have the effect of reducing the data to a mathematical abstraction. With sufficient convolution parameters, repeated steps and the use of fully connected and hidden neural network layers, the presence of features within the input data frame can be numerically extracted. CNNs lend themselves to data sets such as video where a two dimensional array of input data (and subsequent video frames) can be exploited to expose massive parallelism in the algorithms and greatly increase performance. CNNs then benefit from processing architectures that can expose parallel processing capability such as FPGAs and GPUs rather than sequential CPU and MCU architectures.
While CNNs perform well on arrays of data that have a limited time dependency between sets, there are applications such as language translation, mathematical biology and financial analysis where insight can be gained from the order of data elements over time. For these applications we need to include some form of “memory” in the AI algorithm so that past data can be used to gain insight from subsequent readings. This is the realm of RNNs. RNNs include a feedback path in their convolution process so that past state can be carried forward to alter the calculation result for the current state. The content of the feedback path can be used to change the characteristics of the model and make it more or less “forgetful” of its past. The sequential nature of the RNNs data structure along with the need to process each data element in turn with regard to the past, means that it is more difficult to exploit parallelism in RNN models and sequential architectures such as CPUs and MCUs can be applied.
The memory capability of RNNs makes them good at processing sequential data. The feedback mechanism though makes for a “leaky” memory where the further back in time an event occurred, the less likely it is to have an impact on the current calculation. This is particularly problematic for tasks such as natural language processing or translation. It is the nature of translations for example that not every word in one language can be sequentially translated into an equivalent in another. In most cases, the context of an entire sentence must be known before an equivalent can be successfully derived. This is the realm of transformers.
Transformers were introduced to the world in a Google paper in 2017 and have all but displaced the other techniques since. Transformer algorithms have a concept of self-attention. In language applications for example, each word of an input sentence is analyzed in parallel. Its position within the sentence is recorded as well as the potential importance of the word to the context of the sentence. Words that are critical to the context of the sentence are given “attention”. This is the way a human brain might consider a sentence to translate and results in a translation that has a correct translation of the context and meaning of the original rather than a word for word translation of the English. Clearly transformers contain massive parallelism. They are more complex and have higher compute requirements than their CNN predecessors and benefit from the massive compute and parallel architectures of FPGAs and GPUs.
Transformers have revolutionized language processing but find use across myriad of applications where context of data in a time sensitive stream is critical. Their ability to extract context and propose additional content with similar intent has led to Generative Transformers that specialize in creating “original” content. Generative Pre-trained Transformers (GPTs) are being made commercially available in a wide variety of application. Some specialize in human type interactions, most notable, recently, Chat GPT that has an almost uncanny ability to propose original, human like content in response to a spoken or written input.
As previously mentioned, solutions for the edge must have low power consumption, be cost effective and be able to operate in harsh environments. Faced with an increasing need for parallel compute at near hardware speeds, the option for designers are becoming increasingly sparse.
Efinix FPGA | Traditional FPGA |
Custom Silicon |
Micro- controller |
Processor | GPU | |
---|---|---|---|---|---|---|
Compute | ||||||
Power | ||||||
Flexibility | ||||||
Time to Market | ||||||
Development Cost | ||||||
Part Price | ||||||
Size |
GPUs and traditional FPGAs offer a parallel compute capability but consume too much power and are too expensive. Microcontrollers and microprocessors are sequential in nature and lack the performance for full AI models. Custom silicon is prohibitively expensive to develop and comes with design times longer than entire generations of AI architectures. The only viable option for a designer is an Efinix FPGA.
It is worth an aside here to mention that the open source community has put a lot of work into developing tools that can quantize AI models and reduce their complexity to a point where running on a microcontroller can, in some cases, deliver the desired performance. Tools such as TensorFlow Lite can create models that, with the aid of runtime libraries, can execute on microcontroller architectures. Efinix has embraced this effort and has developed a series of custom instruction libraries that run on the Sapphire embedded RISC-V SoC inside Efinix FPGAs. In this way, quantized models can be made to run many hundreds of times faster delivering a high-performance AI capability in a small, low power footprint. This is the basis of the Efinix TinyML platform and in the spirit of the open source community, is free of charge on the Efinix GitHub.
As AI models increasingly migrate from traditional CNN architectures to transformers, the compute requirements and need for parallelism will push them out of the reach of microcontrollers and will leave the ultra-efficient Efinix architecture as the only viable option for state-of-the-art AI at the edge.
The utility, accuracy and complexity of AI models will continue to increase and at an exponential rate. As it does, the efficiency of the revolutionary Efinix FPGA architecture will come into its own. The parallelism and compute capability of the FPGA will continue to deliver the performance needed with ultra-low power consumption to provide the platform for this exciting field of innovation.
In conclusion, it seemed only fitting for AI to have the last word. The author asked Chat GPT the following question…
Question:
“Why are FPGAs Important to Generative AI and Transformers”
Being a product of Generative AI, I can’t vouch for the origin of the response or how original it is so it is included here in its entirety without edit or any claim to be the original work of the author. That said, it did seem like a fitting summary for the blog…
Chat GPT Response:
Field-Programmable Gate Arrays (FPGAs) play a significant role in Generative AI and Transformers due to their unique characteristics that make them well-suited for these tasks. Here are a few reasons why FPGAs are important in these domains:
Overall, FPGAs bring unique advantages to the fields of generative AI and Transformers, including parallel processing, customization, power efficiency, low latency, and scalability. These characteristics make FPGAs an important technology for accelerating and optimizing the performance of these AI tasks.