ESP32-CAM YOLO Object and Animal Detection - Open Source Embedded Project

A real-time object and animal detection system that combines ESP32-CAM hardware with the YOLOv3 computer vision model. It includes Arduino firmware for wireless image streaming and a Python-based desktop application for processing detections using OpenCV.

Overview

The esp32-cam-yolo project is an integrated hardware and software solution designed for real-time object and animal detection. By leveraging the affordable ESP32-CAM module and the powerful YOLOv3 (You Only Look Once) deep learning model, the project enables users to deploy a wireless smart camera system capable of identifying a wide range of subjects. The system is split into two main components: firmware running on the microcontroller and a processing application running on a host computer.

Hardware and Firmware

The project centers around the ESP32-CAM, a small-form-factor development board that includes an ESP32 SoC, an OV2640 camera, and Wi-Fi capabilities. The repository provides a dedicated Arduino sketch, WifiCam.ino, which configures the device as a wireless camera server. This firmware handles the initialization of the camera sensor and streams JPEG frames over the local network.

To complement the hardware, the project also references a custom 3D-printed case design. This allows for a professional and protected deployment of the camera module in various environments, making it suitable for monitoring pets, wildlife, or general security applications.

Software Architecture

The heavy lifting of image processing and object detection is performed on a host machine (Linux or Windows) using a Python-based application. This application connects to the ESP32-CAM’s stream and applies the YOLOv3 model to each frame.

Key technical components include:

OpenCV: Used for image manipulation and as the backend for running the YOLO neural network.
Poetry: Utilized for Python dependency management, ensuring a consistent environment for libraries like opencv-python and pyserial.
YOLOv3 Weights: The system uses the standard YOLOv3 weights, which are capable of detecting 80 different classes of objects out of the box.
Inno Setup: For Windows users, the project includes configuration files to package the Python application into a standard installer, simplifying deployment for non-technical users.

Technical Workflow

Once the ESP32-CAM is powered on and connected to the network, it begins serving a video stream. The Python application, configured via environment variables, captures this stream. As frames arrive, the YOLOv3 model performs inference to identify objects. The application then draws bounding boxes and labels around detected items in real-time. This architecture offloads the computationally expensive AI inference from the low-power ESP32 to a more capable processor, while maintaining the flexibility of a wireless sensor.

Getting Started

Setting up the project involves flashing the ESP32-CAM using the Arduino IDE and setting up the Python environment. On Linux, this is handled via Poetry commands to install dependencies and download the necessary neural network weights. Windows users have the option of using a pre-built installer which includes the necessary USB drivers for the camera hardware. The project is designed to be accessible, serving as an educational resource within the SciCrop-Academy learning series.