Deep learning raspberry pi

How to easily Detect Objects with Deep Learning on Raspberry Pi

The real world poses challenges like having limited data and having tiny hardware like Mobile Phones and Raspberry Pis which can’t run complex Deep Learning models. This post demonstrates how you can do object detection using a Raspberry Pi. Like cars on a road, oranges in a fridge, signatures in a document and teslas in space.

If you’re impatient scroll to the bottom of the post for the Github Repos

Why Object Detection?, Why Raspberry Pi?

The raspberry pi is a neat piece of hardware that has captured the hearts of a generation with

15M devices sold, with hackers building even cooler projects on it. Given the popularity of Deep Learning and the Raspberry Pi Camera we thought it would be nice if we could detect any object using Deep Learning on the Pi.Now you will be able to detect a photobomber in your selfie, someone entering Harambe’s cage, where someone kept the Sriracha or an Amazon delivery guy entering your house.

What is Object Detection?20M years of evolution have made human vision fairly evolved. The human brain has 30% of it’s Neurons work on processing vision (as compared with 8 percent for touch and just 3 percent for hearing). Humans have two major advantages when compared with machines. One is stereoscopic vision, the second is an almost infinite supply of training data (an infant of 5 years has had approximately 2.7B Images sampled at 30fps).

Use of deep learning for image classification, localization, detection and segmentation

To mimic human level performance scientists broke down the visual perception task into four different categories. Classification, assigns a label to an entire imageLocalization, assigns a bounding box to a particular labelObject Detection, draws multiple bounding boxes in an imageImage segmentation, creates precise segments of where objects lie in an imageObject detection has been good enough for a variety of applications (even though image segmentation is a much more precise result, it suffers from the complexity of creating training data. It typically takes a human annotator 12x more time to segment an image than draw bounding boxes; this is more anecdotal and lacks a source). Also, after detecting objects, it is separately possible to segment the object from the bounding box.Using Object Detection:Object detection is of significant practical importance and has been used across a variety of industries. Some of the examples are mentioned below:

How do I use Object Detection to solve my own problem?

Object Detection can be used to answer a variety of questions. These are the broad categories:

Is an object present in my Image or not? eg is there an intruder in my house

Where is an object in the image? eg when a car is trying to navigate it’s way through the world, its important to know where an object is.

How many objects are there in an image? Object detection is one of the most efficient ways of counting objects. eg How many boxes in a rack inside a warehouse

What are the different types of objects in the Image? eg Which animal is there in which part of the Zoo?

What is the size of an object? Especially with a static camera, it is easy to figure out the size of an object. eg What is the size of the Mango

How are different objects interacting with each other? eg How does the formation on a football field effect the result?

Where is an object with respect to time (Tracking an Object). eg Tracking a moving object like a train and calculating it’s speed etc.Object Detection in under 20 Lines of Code

YOLO Algorithm Visualized

There are a variety of models/architectures that are used for object detection. Each with trade-offs between speed, size, and accuracy. We picked one of the most popular ones: YOLO (You only look once). and have shown how it works below in under 20 lines of code (if you ignore the comments).

Note: This is pseudo code, not intended to be a working example. It has a black box which is the CNN part of it which is fairly standard and shown in the image below.

Architecture of the Convolutional Neural Network used in YOLO


Phase 1 — Gather Training Data

Step 1. Collect Images (at least 100 per Object):

For this task, you probably need a few 100 Images per Object. Try to capture data as close to the data you’re going to finally make predictions on.

Step 2. Annotate (draw boxes on those Images manually):

Draw bounding boxes on the images. You can use a tool like labelImg. You will typically need a few people who will be working on annotating your images. This is a fairly intensive and time consuming task.

Phase 2 — Training a Model on a GPU Machine

Step 3. Finding a Pretrained Model for Transfer Learning:

You can read more about this at You need a pretrained model so you can reduce the amount of data required to train. Without it, you might need a few 100k images to train the model.You can find a bunch of pretrained models here

Step 4. Training on a GPU (cloud service like AWS/GCP etc or your own GPU Machine):

Docker Image

The process of training a model is unnecessarily difficult to simplify the process we created a docker image would make it easy to train.To start training the model you can run:

The docker image has a script that can be called with the following parameters

To train a model you need to select the right hyper parameters.

Finding the right parameters

The art of “Deep Learning” involves a little bit of hit and try to figure out which are the best parameters to get the highest accuracy for your model. There is some level of black magic associated with this, along with a little bit of theory. This is a great resource for finding the right parameters.

Quantize Model (make it smaller to fit on a small device like the Raspberry Pi or Mobile)

Small devices like Mobile Phones and Raspberry PI have very little memory and computation power.Training neural networks is done by applying many tiny nudges to the weights, and these small increments typically need floating point precision to work (though there are research efforts to use quantised representations here too).Taking a pre-trained model and running inference is very different. One of the magical qualities of Deep Neural Networks is that they tend to cope very well with high levels of noise in their inputs.

Why Quantize?

Neural network models can take up a lot of space on disk, with the original AlexNet being over 200 MB in float format for example. Almost all of that size is taken up with the weights for the neural connections, since there are often many millions of these in a single model.The Nodes and Weights of a neural network are originally stored as 32-bit floating point numbers. The simplest motivation for quantization is to shrink file sizes by storing the min and max for each layer, and then compressing each float value to an eight-bit integer.The size of the files is reduced by 75%.

Code for Quantization:

curl -L «» |
tar -C tensorflow/examples/label_image/data -xzbazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
—in_graph=tensorflow/examples/label_image/data/inception_v3_2016_08_28_frozen.pb \
—out_graph=/tmp/quantized_graph.pb \
—inputs=input \
—outputs=InceptionV3/Predictions/Reshape_1 \
—transforms=’add_default_attributes strip_unused_nodes(type=float, shape=»1,299,299,3″)
remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_orderNote: Our docker image has quantization built into it.

Phase 3: Predictions on New Images using the Raspberry Pi

Step 5. Capture a new Image via the camera

You need the Raspberry Pi camera live and working. Then capture a new Image

For instructions on how to install checkout this linkCode to Capture a new ImageStep 6. Predicting a new ImageDownload ModelOnce your done training the model you can download it on to your pi. To export the model run:

Then download the model onto the Raspberry Pi.

Install TensorFlow on the Raspberry Pi

Depending on your device you might need to change the installation a little

Run model for predicting on the new Image

Performance Benchmarks on Raspberry Pi. The Raspberry Pi has constraints on both Memory and Compute (a version of Tensorflow Compatible with the Raspberry Pi GPU is still not available). Therefore, it is important to benchmark how much time do each of the models take to make a prediction on a new image.

Benchmarks for different Object Detection Models running on Raspberry Pi

Workflow with NanoNets:

We at NanoNets have a goal of making working with Deep Learning super easy. Object Detection is a major focus area for us and we have made a workflow that solves a lot of the challenges of implementing Deep Learning models.

How NanoNets make the Process Easier:

1. No Annotation RequiredWe have removed the need to annotate Images, we have expert annotators who will annotate your images for you.

2. Automatic Best Model and Hyper Parameter SelectionWe automatically train the best model for you, to achieve this we run a battery of model with different parameters to select the best for your data

3. No Need for expensive Hardware and GPUsNanoNets is entirely in the cloud and runs without using any of your hardware. Which makes it much easier to use.

4. Great for Mobile devices like the Raspberry PiSince devices like the Raspberry Pi and mobile phones were not built to run complex compute heavy tasks, you can outsource the workload to our cloud which does all of the compute for youHere is a simple snippet to make prediction on an image using the NanoNets API Code to make a prediction on a new Image using NanoNetsBuild your Own NanoNet

You can try building your own model from:

Step 1: Clone the Repo

Step 2: Get your free API KeyGet your free API Key from

Step 3: Set the API key as an Environment Variable

Step 4: Create a New Modelpython ./code/

Note: This generates a MODEL_ID that you need for the next step

Step 5: Add Model Id as Environment Variable

Step 6: Upload the Training DataCollect the images of object you want to detect.

You can annotate them either using our web UI ( or use open source tool like labelImg.

Once you have dataset ready in folders, images (image files) and annotations (annotations for the image files), start uploading the dataset.python ./code/

Step 7: Train ModelOnce the Images have been uploaded, begin training the Model

Step 8: Get Model State

The model takes

2 hours to train. You will get an email once the model is trained. In the meanwhile you check the state of the modelwatch -n 100 python ./code/

Step 9: Make PredictionOnce the model is trained.You can make predictions using the model

python ./code/ PATH_TO_YOUR_IMAGE.jpg

Code (Github Repos)

Github Repos to Train a model:

Github Repos for Raspberry Pi to make Predictions (ie Detecting New Objects):

Datasets with Annotations:


Deep learning on the Raspberry Pi with OpenCV

by Adrian Rosebrock on October 2, 2017

I’ve received a number of emails from PyImageSearch readers who are interested in performing deep learning in their Raspberry Pi. Most of the questions go something like this:

Hey Adrian, thanks for all the tutorials on deep learning. You’ve really made deep learning accessible and easy to understand. I have a question: Can I do deep learning on the Raspberry Pi? What are the steps?

And almost always, I have the same response:

The question really depends on what you mean by “do”. You should never be training a neural network on the Raspberry Pi — it’s far too underpowered. You’re much better off training the network on your laptop, desktop, or even GPU (if you have one available).

That said, you can deploy efficient, shallow neural networks to the Raspberry Pi and use them to classify input images.

Again, I cannot stress this point enough:

You should not be training neural networks on the Raspberry Pi (unless you’re using the Pi to do the “Hello, World” equivalent of neural networks — but again, I would still argue that your laptop/desktop is a better fit).

With the Raspberry Pi there just isn’t enough RAM.

The processor is too slow.

And in general it’s not the right hardware for heavy computational processes.

Instead, you should first train your network on your laptop, desktop, or deep learning environment.

Once the network is trained, you can then deploy the neural network to your Raspberry Pi.

In the remainder of this blog post I’ll demonstrate how we can use the Raspberry Pi and pre- trained deep learning neural networks to classify input images.

Looking for the source code to this post?

Deep learning on the Raspberry Pi with OpenCV

When using the Raspberry Pi for deep learning we have two major pitfalls working against us:

  1. Restricted memory (only 1GB on the Raspberry Pi 3).
  2. Limited processor speed.

This makes it near impossible to use larger, deeper neural networks.

Instead, we need to use more computationally efficient networks with a smaller memory/processing footprint such as MobileNet and SqueezeNet. These networks are more appropriate for the Raspberry Pi; however, you need to set your expectations accordingly — you should not expect blazing fast speed.

In this tutorial we’ll specifically be using SqueezeNet.

What is SqueezeNet?

In fact, one of the smaller Convolutional Neural Networks used for image classification is GoogLeNet at

25-50MB (depending on which version of the architecture is implemented).

The real question is: Can we go smaller?

As the work of Iandola et al. demonstrates, the answer is: Yes, we can decrease model size by applying a novel usage of 1×1 and 3×3 convolutions, along with no fully-connected layers. The end result is a model weighing in at 4.9MB, which can be further reduced to Figure 2: Deep Learning for Computer Vision with Python book

If you’re interested in learning more about SqueezeNet, I would encourage you to take a look at my new book, Deep Learning for Computer Vision with Python.

Inside the ImageNet Bundle, I:

  1. Explain the inner workings of the SqueezeNet architecture.
  2. Demonstrate how to implement SqueezeNet by hand.
  3. Train SqueezeNet from scratch on the challenging ImageNet dataset and replicate the original results by Iandola et al.

Go ahead and take a look — I think you’ll agree with me when I say that this is the most complete deep learning + computer vision education you can find online.

Running a deep neural network on the Raspberry Pi

The source code from this blog post is heavily based on my previous post, Deep learning with OpenCV.

I’ll still review the code in its entirety here; however, I would like to refer you over to the previous post for a complete and exhaustive review.

To get started, create a new file named , and insert the following source code:

Lines 2-5 simply import our required packages.

From there, we need to parse our command line arguments:

As is shown on Lines 9-16 we have four required command line arguments:

  • —image : The path to the input image.
  • —prototxt : The path to a Caffe prototxt file which is essentially a plaintext configuration file following a JSON-like structure. I cover the anatomy of Caffe projects in my PyImageSearch Gurus course.
  • —model : The path to a pre-trained Caffe model. As stated above, you’ll want to train your model on hardware which packs much more punch than the Raspberry Pi — we can, however, leverage a small, pre-existing model on the Pi.
  • —labels : The path to class labels, in this case ImageNet “syn-sets” labels.

Next, we’ll load the class labels and input image from disk:

Go ahead and open synset_words.txt found in the “Downloads” section of this post. You’ll see on each line/row there is an ID and class labels associated with it (separated by commas).

Lines 20 and 21 simply read in the labels file line-by-line ( rows ) and extract the first relevant class label. The result is a classes list containing our class labels.

Then, we utilize OpenCV to load the image on Line 24.

Now we’ll make use of OpenCV 3.3’s Deep Neural Network (DNN) module to convert the image to a blob as well as to load the model from disk:

Be sure to make note of the comment preceding our call to cv2.dnn.blobFromImage on Line 31 above.

Common choices for width and height image dimensions inputted to Convolutional Neural Networks include 32 × 32, 64 × 64, 224 × 224, 227 × 227, 256 × 256, and 299 × 299. In our case we are pre-processing (normalizing) the image to dimensions of 227 x 227 (which are the image dimensions SqueezeNet was trained on) and performing a scaling technique known as mean subtraction. I discuss the importance of these steps in my book.

Note: You’ll want to use 224 x 224 for the blob size when using SqueezeNet and 227 x 227 for GoogLeNet to be consistent with the prototxt definitions.

We then load the network from disk on Line 35 by utilizing our prototxt and model file path references.

In case you missed it above, it is worth noting here that we are loading a pre-trained model. The training step has already been performed on a more powerful machine and is outside the scope of this blog post (but covered in detail in both PyImageSearch Gurus and Deep Learning for Computer Vision with Python).

Now we’re ready to pass the image through the network and look at the predictions:

To classify the query blob , we pass it forward through the network (Lines 39-42) and print out the amount of time it took to classify the input image (Line 43).

We can then sort the probabilities from highest to lowest (Line 47) while grabbing the top five predictions (Line 48).

The remaining lines (1) draw the highest predicted class label and corresponding probability on the image, (2) print the top five results and probabilities to the terminal, and (3) display the image to the screen:

We draw the top prediction and probability on the top of the image (Lines 53-57) and display the top-5 predictions + probabilities on the terminal (Lines 61 and 62).

Finally, we display the output image on the screen (Lines 65 and 66). If you are using SSH to connect with your Raspberry Pi this will only work if you supply the -X flag for X11 forwarding when SSH’ing into your Pi.

To see the results of applying deep learning on the Raspberry Pi using OpenCV and Python, proceed to the next section.

Raspberry Pi and deep learning results

We’ll be benchmarking our Raspberry Pi for deep learning against two pre-trained deep neural networks:

As we’ll see, SqueezeNet is much smaller than GoogLeNet (5MB vs. 25MB, respectively) and will enable us to classify images substantially faster on the Raspberry Pi.

To run pre-trained Convolutional Neural Networks on the Raspberry Pi use the “Downloads” section of this blog post to download the source code + pre-trained neural networks + example images.

From there, let’s first benchmark GoogLeNet against this input image:

Figure 3: A “barbershop” is correctly classified by both GoogLeNet and Squeezenet using deep learning and OpenCV.

As we can see from the output, GoogLeNet correctly classified the image as “barbershop” in 1.7 seconds:

Let’s give SqueezeNet a try:

SqueezeNet also correctly classified the image as “barbershop”

…but in only 0.9 seconds!

As we can see, SqueezeNet is significantly faster than GoogLeNet — which is extremely important since we are applying deep learning to the resource constrained Raspberry Pi.

Let’s try another example with SqueezeNet:

Figure 4: SqueezeNet correctly classifies an image of a cobra using deep learning and OpenCV on the Raspberry Pi.

However, while SqueezeNet is significantly faster, it’s less accurate than GoogLeNet:

Figure 5: A jellyfish is incorrectly classified by SqueezNet as a bubble.

Here we see the top prediction by SqueezeNet is “bubble”. While the image may appear to have bubble-like characteristics, the image is actually of a “jellyfish” (which is the #2 prediction from SqueezeNet).

GoogLeNet on the other hand correctly reports “jellyfish” as the #1 prediction (with the sacrifice of processing time):

What’s next? I recommend PyImageSearch University.

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you’ll find:

  • 53+ courses on essential computer vision, deep learning, and OpenCV topics
  • 53+ Certificates of Completion
  • 57+ hours of on-demand video
  • Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
  • Pre-configured Jupyter Notebooks in Google Colab
  • ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
  • ✓ Access to centralized code repos for all 450+ tutorials on PyImageSearch
  • Easy one-click downloads for code, datasets, pre-trained models, etc.
  • Access on mobile, laptop, desktop, etc.


Today, we learned how to apply deep learning on the Raspberry Pi using Python and OpenCV.

In general, you should:

  1. Never use your Raspberry Pi to train a neural network.
  2. Only use your Raspberry Pi to deploy a pre-trained deep learning network.

The Raspberry Pi does not have enough memory or CPU power to train these types of deep, complex neural networks from scratch.

In fact, the Raspberry Pi barely has enough processing power to run them — as we’ll find out in next week’s blog post you’ll struggle to get a reasonable frames per second for video processing applications.

If you’re interested in embedded deep learning on low cost hardware, I’d consider looking at optimized devices such as NVIDIA’s Jetson TX1 and TX2. These boards are designed to execute neural networks on the GPU and provide real-time (or as close to real-time as possible) classification speed.

In next week’s blog post, I’ll be discussing how to optimize OpenCV on the Raspberry Pi to obtain performance gains by upwards of 100% for object detection using deep learning.

To be notified when this blog post is published, just enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you’ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

About the Author

Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.