Zhilly AI Pentester Assistant
Zhilly is an AI-powered portable cybersecurity tool for ESP32-S3 devices, enabling voice-controlled pentesting, RF/IR signal manipulation, and HID emulation. Built on the ESP-IDF framework, it utilizes FreeRTOS, LVGL, and NimBLE to provide a sophisticated, hands-free interaction model for security researchers.
Zhilly represents a unique fusion of modern artificial intelligence and physical hardware hacking tools. Designed as a portable “cyber weapon” or assistant, it transforms ESP32-S3 based hardware—specifically the LilyGO T-Embed CC1101 and the T-Watch S3—into a conversational companion capable of performing complex technical tasks. Built upon the Xiaozhi-esp32 infrastructure, Zhilly moves beyond traditional command-line interfaces by allowing users to interact with their security tools through natural language.
Conversational Hardware Control
The core of the Zhilly experience is its voice-activated interface. Using the wake word “Nihao Miaoban,” users can trigger a variety of actions without ever touching the device. The system features high-quality eye animations on an ST7789 display that provide emotional feedback, indicating whether the assistant is listening, thinking, or speaking. This personality is backed by a sophisticated AI role introduction that defines Zhilly as an expert cybersecurity assistant, capable of understanding both Turkish and English fluently.
Pentesting and RF Capabilities
Equipped with a CC1101 Sub-GHz radio, Zhilly offers extensive control over radio frequencies ranging from 300MHz to 928MHz. Users can initiate RF jamming, capture raw signals with microsecond precision, or replay stored .sub files from an SD card. The device also includes an Infrared (IR) suite featuring a TV-B-Gone universal remote, an IR jammer, and the ability to record and re-transmit IR signals for testing remote-controlled devices.
For network security tasks, the project includes specialized tools for the LilyGO T-Watch S3. These include a high-speed ARP scanner that uncloaks hidden devices on a subnet using Layer-2 inspection, a rapid concurrent port scanner targeting the most vulnerable administration ports, and a hardware-level DNS resolver.
AI-Powered BadUSB
One of the most innovative features is the voice-driven BadUSB functionality. While traditional HID (Human Interface Device) injection tools require manual configuration, Zhilly allows users to execute DuckyScript payloads via voice commands. When connected to a computer, the assistant acts as a standard USB keyboard. A user can simply say, “Run BadUSB script ‘payload.txt’” or “Type this: ‘Hello World’,” and the assistant will handle the HID emulation, including enabling the mode, executing the keystrokes with appropriate delays, and disabling the mode once finished. It supports over a dozen keyboard layouts, including US/UK English, Turkish, German, French, and Spanish.
Technical Foundation
Under the hood, Zhilly is a sophisticated ESP-IDF application. It leverages FreeRTOS for multitasking and system management, ensuring that voice processing and hardware control happen seamlessly. The graphical interface is powered by the LVGL library, optimized for the constraints of wearable and embedded displays. Connectivity is handled through a combination of the NimBLE stack for Bluetooth and the lwIP stack for networking tasks. The project also integrates TinyUSB for its HID and CDC functionality, allowing the ESP32-S3 to present itself as a versatile peripheral to host computers.
Visual Feedback and System Management
To provide immediate status updates, Zhilly controls an 8x WS2812 RGB LED ring. Users can adjust colors, brightness, and animations (like blinking or scrolling) through voice commands. The system also includes robust power management, allowing for deep sleep modes and battery status monitoring. Firmware can be updated over-the-air (OTA), ensuring the assistant can evolve with new scripts and capabilities without needing a physical connection to a development machine.