Nvidia’s Jetson AGX Orin Packs an AI Punch in a Small Package

If there is any company that’s on a roll when it comes to providing more compute power in smaller packages, it’s Nvidia. Their Jetson product line that provides AI and other forms of accelerated computing is a great example. I’ve been able to spend some time with its latest embeddable “robot brain” offering — the Nvidia Jetson AGX Orin (starts at $399 when available for production applications later this year). It has enough GPU power for some of the most demanding robot applications, while still fitting in the same form factor as Xavier, its predecessor. It consumes from 15 to 60 watts, depending on the power profile used.

What we’re reviewing here is a developer kit ($1,999), that comes complete with an enclosure and some accessories. It is available now (in theory, but back-orders have been piling up) so that developers can get a head start, but volume quantities of the Jetson Orin modules suited for commercial deployment aren’t expected until later this year.

Nvidia Jetson Orin by the Numbers

The Orin System-on-Chip (SoC) is based on the Nvidia Ampere GPU architecture and has up to 2,048 CUDA cores, 64 Tensor Cores, and 2 Deep Learning Accelerator (DLA) engines. It can deliver an astonishing 275 TOPS of raw AI performance for models that have been optimized for 8-bit integer math.

For current Jetson customers, Orin features the same pin-out and footprint as the Jetson AGX Xavier.

In raw inferencing performance, Jetson Orin can be as much as 8x faster than Jetson Xavier AGX

This is not your Father’s (or Mother’s) Jetson Dev Kit

The first time I reviewed a Jetson Dev Kit, it arrived as a board, a daughterboard, and some screws. I think I had to buy my own small fan, an appropriate power supply, and I 3D printed a cheesy enclosure. The Orin Dev kit is an ode to design. A machined metal enclosure, with internal fans, and a magnetically attached cover for a PCI-e slot. It looks cool, and can draw power either from a barrel connector or a USB-C port.

There are several versions of developer kits available to order. The review unit we have includes both Wi-Fi and a 1Gb/s Ethernet port, as well as four USB 3.2 and two USB-C ports. There is a DisplayPort 1.4a output for video as well.

The Jetson Orin module in the dev kits reviewers were provided feature 32GB of 256-bit LPDDR5 RAM, and an embedded 64GB boot drive. Commercial units will be available with several different options. In addition to a microSD slot, there is also an M.2 slot, allowing for high-speed additional storage.

Nvidia’s Jetson AGX Orin Dev Kit and system board (Image Credit: Nvidia)

Nvidia’s JetPack 5.0 SDK

For starters, JetPack 5.0 updates Ubuntu to 20.04LTS and the kernel to 5.10, both welcome changes. CUDA 11 and TensorRT 8 have also been updated to the latest versions. UEFI is now used for the bootloader, and Over-The-Air (OTA) updates will be possible for deployed Jetson devices.

One of the features I really like about JetPack 5.0 is the easy integration into Nvidia’s DeepStream imaging and vision toolbox. Once you have a model, for example, you can simply point DeepStream at it, give it a data source(s), and let it run. The process is simpler than when I’ve needed to couple a model to cameras using previous versions of JetPack.

Nvidia provides plenty of sample code, but there are some tasks that seem like they could be automated instead of requiring housekeeping code like this

Nvidia’s Upgraded TAO and Why it Matters — A Lot

As neural networks have become more sophisticated, and have been trained on larger-and-larger datasets to achieve unprecedented accuracy, they require unprecedented amounts of compute power and training time. That makes competitive, trained-from-scratch networks a highly-sought after asset created mostly by large corporations with enough time and money — and takes training out of the hands of most. Fortunately, it turns out that networks trained on a fairly general dataset (like faces, images, github code, or reddit text) have as a result a sort-of general knowledge that can be re-purposed.

Specifically, the features they have learned to extract and score can be very useful in other domains. For example, the features extracted from color images can also be valuable in evaluating IR images. Personally, I used network tuning to help Joel (ExtremeTech’s Managing Editor) with an AI-based upscaler for DS9 (this was a fascinating experiment – Ed) , and to create an ET article generator based on GPT-2. More recently, I used an Nvidia-trained face detector on a Jetson that I adapted using several masked-face datasets from the web to teach it how to identify people with and without masks.

Realizing the crucial importance of this approach to training and fielding robots and other “Edge”-based AI solutions, Nvidia has really upped its game here. My first attempts at using their cross-training (TRT) package were fairly-painful and limited. Now, TAO (Train, Adapt, Optimize) has been packaged into an easy to use system. It still requires either writing some code or adapting one of Nvidia’s examples, but the actual logic doesn’t need to be too complex. Just as importantly, the “Adapt” and “Optimize” pieces are now much more automated.

As part of the review kit for Orin, Nvidia sent us an example application where we could deploy a pre-trained version of PeopleNet, or adapt our own with additional data. As expected the pre-trained network achieved excellent performance at detecting people, their faces, and bags. What was impressive was the ability to throw an additional dataset of people with and without helmets at it and have it tune itself to learn how to distinguish between them.

Adapting PeopleNet to recognize construction helmets was a fairly-simple process with reasonably-good results. 1 Epoch is one run through the additional helmet dataset.

I didn’t have time to do it for this review, but I’m planning to do a larger project using TAO to cross-train an existing network on some type of novel automotive camera design. That’s an important use case, as developers of new camera systems by definition have limited datasets actually captured with their cameras. That makes it hard to train a model from scratch with just their own data. Adapting pre-trained models has become a necessity.

NGC and Docker Images are Key to Jetson Development

The last time I reviewed an Nvidia embedded processor it was all about Docker images. That seemed like a very-powerful innovation for an embedded device. With Orin, while there are still some Docker images in the mix, most of the SDK and the models have a more direct means for downloading and running them.

Fortunately, Nvidia’s own NGC has an increasing number of models that are free to use on Nvidia GPUs, and easy to download. TAO already “knows” how to work with them, once you feed it your data in a format that the underlying ML engine understands.

The PeopleNet demo uses Tensorflow running on Azure for training, although of course it could also be run locally if you have enough GPU horsepower. The adapted model weights are then downloaded to the Jetson GPU and run locally. The high-level examples I worked through are written in Python and stored in Jupyter notebooks, but the JetPack dev kit also comes with plenty of examples in C++ about how to use the various individual Nvidia libraries.

Overall Impressions

Unless you’re Tesla, or a company with similar resources to develop your own AI stack, it’s hard to argue with the choice of Nvidia’s Jetson platform for robot and similar applications. Its hardware offerings have progressed rapidly, while maintaining good software compatibility. And no company has a bigger developer ecosystem than Nvidia. The GPU is definitely the star of the Jetson show, so if your application is heavily CPU-dependent, that could be an issue.

Now Read:

Comments are closed.