Autonomy Project

Agentic LLM Autonomous Drone

A conversational drone control stack that converts spoken intent into either direct flight commands or continuous target tracking.

The project combines voice input, LLM routing, and real-time computer vision into a control loop that can either execute discrete drone commands or maintain continuous target tracking.

Watch Demo View Code

Demo Preview

Open the autonomous drone demo.

The preview links to a walkthrough of the voice interface, intent routing, and live tracking behavior.

open demo video offline static live

Overview

Natural language to autonomous action.

The system uses voice transcription, LLM intent parsing, and a vision loop to decide whether the drone should execute a discrete movement command or enter a continuous tracking mode.

Voice interface

Whisper transcribes commands so the drone can be controlled through conversational language instead of a rigid command console.

Agentic parsing

An OpenAI-powered agent routes intent into either discrete flight actions or longer-running follow behavior.

Continuous tracking

YOLOv8 and OpenCV work together to maintain a target lock and recover through occlusion during follow mode.

Pipeline

Four-part control loop

The architecture is easiest to understand as four cooperating scripts that hand control from transcription to intent parsing, then into flight execution or follow behavior.

voice_transcriber.py

Handles speech-to-text using Whisper.

openapi.py

Uses an OpenAI-backed agent and custom tools to parse user intent.

drone_controller.py

Executes discrete movement instructions like turns, altitude changes, and repositioning.

skytrack.py

Runs the live YOLOv8 and OpenCV tracking loop for follow mode.

Engineering challenges

Real-time coordination mattered as much as model choice.

Multi-threading was needed for heartbeat, video feed, command routing, and continuous control.
Smooth tracking required tuning acceleration and update timing to avoid overshoot.
Vision design balanced OpenCV speed against YOLOv8 robustness for re-acquiring targets.
Voice isolation was important to reduce background noise and false triggers.

Project Links

Demo and repository

The live demo and source repository show the voice-to-action loop, tracking behavior, and overall control stack in full.

Open Demo GitHub Repo