/>Deep Skylabs
Back to Case Studies
General

We're Open Sourcing Gaze Control — Hands-Free Accessibility for Everyone

Gaze Control is a real-time, on-device eye-gaze tracking engine for Android that lets anyone scroll social feeds without touching their phone. Today we're making it open source.

We're Open Sourcing Gaze Control — Hands-Free Accessibility for Everyone

We're Open Sourcing Gaze Control

The codebase is functional and daily-usable, but some features are still rough — calibration on certain device configurations needs more work, and a handful of edge cases in the blink detector are known. We're shipping it now because waiting for perfect would mean waiting forever.

For the past several months, we've been quietly building something that we believe can genuinely change how people with motor disabilities interact with their smartphones. Today, we're making it available to everyone.

Gaze Control is a hands-free Android accessibility utility that lets users scroll Instagram, TikTok, YouTube Shorts, Twitter/X, and Reddit using nothing but their eyes, head nods, or palm gestures — processed entirely on-device, in real time, with no data ever leaving the phone.

Deep-SkyLabs/gaze-control

Star on GitHub to follow development

Star on GitHub

Why We Built This

Accessibility on mobile is often treated as an afterthought. Switch controls exist, voice navigation exists — but for the specific, very human act of scrolling through a social feed while your hands are occupied (or unavailable), there's almost nothing.

We kept asking: what would it feel like if you could casually browse your feed while eating, holding a baby, recovering from surgery, or living with a condition that limits hand mobility? The answer shouldn't be "impossible." It should just work.

That's where Gaze Control started.


What It Actually Does

At its core, Gaze Control is a real-time computer vision pipeline that translates physical signals from your face and hands into system-wide scroll gestures via the Android Accessibility Service API.

Three Interaction Modes

Eye Gaze Tracking — The primary mode. Look up or down and hold your gaze past a configurable dwell threshold. The system fires a scroll action. No hardware beyond your front camera is required.

Head Nodding State Machine — A faster, lower-fatigue alternative. Subtle vertical or horizontal head nods trigger instant scroll actions using a gyroscope-fused state machine. Calibrated to feel natural, not robotic.

Palm Gestures — For media control and navigation. An open palm plays/pauses, a closed fist stops, a thumbs up selects. Recognized via MediaPipe's Hand Landmarker model.

All three modes can be active simultaneously. The engine arbitrates between them using confidence scores and cooldown windows to prevent conflicting actions.


The Engineering Underneath

Building this correctly was significantly harder than building it quickly. Here's what's actually happening at each layer.

The Native Gaze Engine (Kotlin)

The entire perception pipeline runs as a sticky Android Foreground Service — GazeForegroundService.kt — so it survives app backgrounding and maintains system-wide gesture dispatch authority.

Camera pipeline: Front camera frames are captured via Camera2 API at throttled rates. Resolution and frame rate are dynamically adjusted by ThermalPerformanceManager based on device temperature and current tracking confidence, keeping the CPU load proportional to actual need.

MediaPipe + L2CS-Net: MediaPipe's Face Landmarker extracts the 478-point face mesh and isolates eye Region of Interest crops. Those crops are then passed to an integrated L2CS-Net ONNX model that predicts high-resolution pitch and yaw gaze angles. Native inference runs via OnnxReflectiveRunner with NNAPI hardware acceleration and a mathematical fallback for devices without NPU support.

Sensor fusion decoupling: This is the piece that makes the gaze tracking actually usable in real life. When you tilt your head, your pupils move in the camera frame — but that's head rotation, not a gaze intent. GravityRollDecoupler fuses accelerometer gravity vectors with gyroscope data to separate head rotation from eyeball movement, so gaze coordinates stay stable regardless of how you're holding your phone.

Filtering stack: Raw gaze predictions are noisy. The pipeline applies:

  • CropStabilizer — temporal EMA + affine alignment to stabilize eye crops across frames
  • ConfidenceEngine — combines landmark visibility, eyelid closure blendshapes, and gyro motion to generate a 0.0–1.0 tracking quality score
  • BlinkDetector — Eye Aspect Ratio (EAR) combined with blendshape inputs, freezes cursor coordinates during involuntary blinks so they don't trigger unintended scrolls
  • DirectionHysteresis — separate entry/exit thresholds eliminate boundary flickering near directional state transitions
  • One Euro Filter — adaptive smoothing tuned per-application via InteractionProfileManager

The state machine: Gaze control isn't just "eyes up = scroll up." The GazeStateMachine implements full state-driven navigation: NO_FACE → SEARCHING → TRACKING → UNSTABLE → STABLE_LOCK → ACTION_READY → COOLDOWN. Actions are only dispatched from ACTION_READY, preventing the false positives that would make the system feel broken.

Calibration system: A 9-stage guided calibration flow establishes the user's personal neutral anchor, gaze extents, deadzones, and directional biases. The resulting profile is stored as a JSON blob in SharedPreferences and loaded into the native engine on service start.

The Flutter Application Shell

The Flutter layer handles everything the user directly interacts with: the calibration wizard, the real-time diagnostic dashboard, and the settings interface.

State is managed via Riverpod NotifierProviders. Real-time telemetry (confidence scores, internal state transitions, active gesture events) streams from the native layer via platform method channels and surfaces in the dashboard without any polling.

GoRouter handles path-based navigation between the five primary screens. Forui — a shadcn/ui-inspired component library for Flutter — handles the visual design system, keeping every spacing value, color, and typography choice bound to design tokens rather than hardcoded values.

gaze/
 ├── android/
 │    └── app/src/main/kotlin/com/example/gaze/
 │         ├── MainActivity.kt               # Method Channel bindings
 │         ├── GazeAccessibilityService.kt   # System-wide gesture dispatching
 │         └── GazeForegroundService.kt      # CV engine, filters, state machine
 ├── lib/
 │    └── src/
 │         ├── core/
 │         │    ├── platform/                # Platform channel adapters
 │         │    ├── services/                # Background service controllers
 │         │    └── theme/                   # Forui customization tokens
 │         └── features/
 │              ├── dashboard/               # Real-time telemetry visualization
 │              ├── gaze_tracking/           # Calibration UI & diagnostics
 │              ├── onboarding/              # Welcome carousel
 │              ├── permissions/             # Dependency checks
 │              └── settings/               # Threshold & whitelist controllers
 └── pubspec.yaml

Privacy by Architecture

Gaze Control processes everything locally. Camera frames never leave the device — they're captured, processed in native memory, and discarded. No frames are serialized to disk, no coordinates are transmitted to any server, and no user identity is involved at any layer of the stack.

The only data that persists is the calibration profile (stored in Android SharedPreferences as a local JSON blob) and user-configured thresholds. Both are scoped entirely to the local device.

This wasn't a business decision — it was the only design that made sense. Gaze data is deeply personal. Processing it anywhere else would be a fundamental breach of trust.


Why Open Source, Why Now

We've reached a stable enough state that we believe the codebase is genuinely useful as both a working application and a reference implementation for real-time gaze tracking on Android. The CV pipeline, sensor fusion approach, and state machine architecture represent months of iteration that shouldn't have to be rediscovered.

By open sourcing it, we're hoping for a few things:

  1. More platforms — The core gaze engine logic is portable. We'd love to see iOS, Windows, and Linux ports emerge from the community.
  2. More input modalities — The interaction model is extensible. Tongue detection, facial expression triggers, breath sensing — the architecture supports new signal sources.
  3. More use cases — Scrolling social feeds is the consumer hook, but the underlying accessibility dispatch layer works for anything that accepts swipe gestures. Gaming, reading, navigation.
  4. Better calibration — The current 9-stage guided calibration works well but requires patience. Adaptive, faster approaches would meaningfully improve first-run experience.

The project requires a physical Android device with a front camera. Camera2 functionality is unavailable on emulators, which fall back to heuristic engines with significantly degraded tracking quality.


Getting Started

Prerequisites

  • Flutter SDK ^3.11.5
  • Dart SDK ^3.11.5
  • Android Studio with SDK Platform Tools 33+
  • A physical Android device (front camera required)

Running Locally

# Clone the repository
git clone https://github.com/Deep-SkyLabs/gaze-control.git
cd gaze-control

# Install dependencies and run code generators
flutter pub get
flutter pub run build_runner build --delete-conflicting-outputs

# Deploy to connected device
flutter run --release

The --release flag matters here. The CV pipeline is performance-sensitive and will feel sluggish in debug mode due to assertion overhead and unoptimized native calls.


What's Next

We're actively working on three near-term improvements:

Smart App Detection — Auto-waking the camera service when a supported application enters the foreground, eliminating the manual toggle step. This requires Accessibility Service event filtering and app package detection, which we've prototyped but haven't yet shipped.

Context-Aware Smoothing — Dynamically adjusting One Euro Filter coefficients based on the user's active scroll velocity. Fast flick-scrolling (TikTok) wants different smoothing characteristics than careful precision scrolling (reading a Reddit thread).

Ambient Dynamic Theming — Mirroring the color palette of the active foreground app in the Gaze Control overlay, so the UI feels native to whatever the user is doing.

Longer-term, we're interested in exploring on-device model fine-tuning to let the calibration system improve itself passively over time as it accumulates data about a specific user's gaze patterns.


Contributing

The project follows standard Flutter/Kotlin conventions with a few additional constraints:

  • All colors, typography, and spacing must use context.theme design tokens — no hardcoded hex values
  • State management uses explicit NotifierProvider implementations — no implicit state leakage
  • Strict linting via analysis_options.yaml (prefer_const_constructors, prefer_final_locals)

Pull requests, issues, and architectural discussions are all welcome. See CONTRIBUTING.md for the full guidelines.

Deep-SkyLabs/gaze-control

The full source is on GitHub. Issues, PRs, and forks are all welcome.

Star on GitHub

If you're building something in the accessibility space and want to collaborate or just talk through the architecture, reach out directly at hello@deepskylabs.in.

Infrastructure Stack

Client Partner:Self
Timeline Completion:2026-06-02