Voice interface
Whisper transcribes commands so the drone can be controlled through conversational language instead of a rigid command console.
Autonomy Project
A conversational drone control stack that converts spoken intent into either direct flight commands or continuous target tracking.
The project combines voice input, LLM routing, and real-time computer vision into a control loop that can either execute discrete drone commands or maintain continuous target tracking.
Demo Preview
The preview links to a walkthrough of the voice interface, intent routing, and live tracking behavior.
Overview
The system uses voice transcription, LLM intent parsing, and a vision loop to decide whether the drone should execute a discrete movement command or enter a continuous tracking mode.
Whisper transcribes commands so the drone can be controlled through conversational language instead of a rigid command console.
An OpenAI-powered agent routes intent into either discrete flight actions or longer-running follow behavior.
YOLOv8 and OpenCV work together to maintain a target lock and recover through occlusion during follow mode.
Pipeline
The architecture is easiest to understand as four cooperating scripts that hand control from transcription to intent parsing, then into flight execution or follow behavior.
Handles speech-to-text using Whisper.
Uses an OpenAI-backed agent and custom tools to parse user intent.
Executes discrete movement instructions like turns, altitude changes, and repositioning.
Runs the live YOLOv8 and OpenCV tracking loop for follow mode.
Engineering challenges
Project Links
The live demo and source repository show the voice-to-action loop, tracking behavior, and overall control stack in full.