The components I’m using are:
- Jetson Nano 2Gb
- Raspberry Pi USB-C power supply
- Logitech wireless USB keyboard
- 64Gb SD card
- PiHut USB switch (so you can switch the Nano power on and off without having to use the plug socket
Setup instructions from nVidia are pretty good:
https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit
Note that the Nano comes in two versions – 4Gb and 2Gb – the first download link for the operating system is the 4Gb version, so if you are using the 2Gb version like me, make sure you click the second download link. It is a 6Gb zip which took around an hour to download. After that it needs to be unzipped, which produces a 14Gb disc image file, and written to your SD card. I’m using an Apple Mac, so as per the nVidia instructions, I used Etcher to copy the file to the SD card, which took around 10 minutes to write, and another 3 minutes to validate. The operating system is an Ubuntu variant, called Linux4Tigra, which runs the Lightweight Desktop Environment (http://www.lxde.org/):
Note that the Jetson Nano does not include a wireless ethernet card. In my case, I have a wireless range extender plugged in, and I’m using a wired connection to that. Alternatively, the latest versions of the Nano support wireless via USB. For example:
https://learn.sparkfun.com/tutorials/adding-wifi-to-the-nvidia-jetson/all
After getting the Nano up and running, I installed Python 3 and TensorFlow by following these instructions:
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html
I added an alias to .bash_aliases so that “python” is aliased to “python3”. (If you add an alias like this, it is recommended you add it for your user, not as a system wide alias, as this could break operating system functionality that requires python 2.)
Then I ran through the Tensorflow Quickstart tutorial, which is an image classification neural network, using the MNIST dataset 1, which is the digits from 0-9:
https://www.tensorflow.org/tutorials/quickstart/beginner
Although the code is all included in the above link, I want to inline it here to show just how short it is:
#!/usr/bin/python3 import tensorflow as tf # load standard sample data from the mnist dataset 1 # dataset 1 is images of the digits 0-9 mnist = tf.keras.datasets.mnist # load both training and test data (x_train, y_train), (x_test, y_test) = mnist.load_data() # the images have colour values from 0-255. these need to be scaled to be from 0-1 x_train, x_test = x_train / 255.0, x_test / 255.0 # now define our model model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) predictions = model(x_train[:1]).numpy() tf.nn.softmax(predictions).numpy() loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) loss_fn(y_train[:1], predictions).numpy() model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test, verbose=2) model(x_test[:5])
I think it is amazing that in around 20 lines of python, we can load a dataset, define a neural network, train it and test it! Of course, the python is orchestrating the process, not doing the heavy lifting. The hard work is done by C code running on the GPU. Running this on the Nano took a couple of minutes. The output shows the work being sent to the GPU, as we expect. Here is an edited excerpt from the output:
2020-12-30 11:46:56.368109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0 2020-12-30 11:48:03.136928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3) Epoch 1/5 1875/1875 [==============================] - 17s 9ms/step - loss: 0.2918 - accuracy: 0.9158 Epoch 2/5 1875/1875 [==============================] - 17s 9ms/step - loss: 0.1385 - accuracy: 0.9584 Epoch 3/5 1875/1875 [==============================] - 16s 9ms/step - loss: 0.1049 - accuracy: 0.9678 Epoch 4/5 1875/1875 [==============================] - 17s 9ms/step - loss: 0.0864 - accuracy: 0.9733 Epoch 5/5 1875/1875 [==============================] - 16s 9ms/step - loss: 0.0738 - accuracy: 0.9774 2020-12-30 11:51:36.564930: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 31360000 exceeds 10% of free system memory. 313/313 - 2s - loss: 0.0733 - accuracy: 0.9774