Default UI
Kimchi Jjigae
Step 3/8 · Sauté aromatics
Proximity
Triggered
Elbow-Bump Mode
Elbow-Bump Mode
Next Step →
Tap anywhere

ProxiPlay: An Adaptive UI for Hands-Occupied Cooking

When voice and gestures fail, the screen adapts to your elbow.

Scroll to explore the full case study
Interaction Designer & Prototyper Sensor Integration 8 weeks Cooking + IoT
Role
Interaction Designer
Prototyper & Researcher
Timeline
8 Weeks
Concept → Validated Prototype
Domain
Cooking + IoT
Accessibility · Adaptive UI
Tools
Figma · ProtoPie
TensorFlow Lite · Swift
-91%
Hand-wash interruptions
-78%
Step transition time
97%
Elbow-tap success rate
+34%
Recipe completion
~2 min read

Short on time?

Get the quick highlights in a visual, swipeable format.

Swipeable slides·4 key insights·ESC to close

The Problem

72% of cooking video users pause to wash hands before tapping "Next"

Recipe video apps assume clean, dry fingers. But in reality, users are elbow-deep in flour, raw meat, or wet ingredients. Every interaction requires a hand-wash → dry → tap → resume cycle that breaks cooking flow and adds 12-18 seconds per step transition.

Live Scenario
User tries voice
"Hey, next step"
Range hood interference
Ambient noise: 72dB — above 50dB voice threshold
Voice command failed
"Sorry, I didn't catch that. Try again?"
User falls back to manual
1. Stop cooking 2. Wash hands 3. Dry hands 4. Tap screen +15s wasted
0%
wash hands before tapping
0s
avg. wasted per step change
0%
want hands-free control

The Input Failure Cascade

Scenario: A cook is mid-recipe with flour-covered hands and a running range hood. Here's what happens when they try to advance to the next step.

Voice
"Hey, next step"
FAILED · 72dB noise
Gesture
Waves flour-covered hand
CONF: 34% · occluded
Proximity
Leans toward iPad
DETECTED · <30cm
Mode Switch
Elbow-Bump Mode
ACTIVE · 50% screen

Investigation

Understanding hands-occupied interaction patterns

To understand when and why standard inputs fail during cooking, I combined three methods: in-home observation to capture real behavior, sensor data to quantify environmental barriers, and diary entries to surface moments users couldn't articulate in interviews.

Contextual Inquiry
12 home cooks observed in-kitchen over 3-hour cooking sessions
36 hours recorded
Sensor Data Analysis
Front camera proximity + ambient noise levels across cooking phases
2,400+ data points
Diary Study
7-day logging of "moments I wanted to tap but couldn't"
84 diary entries

Key Findings

Three patterns emerged consistently across all 12 participants. Voice commands failed most during the noisiest cooking phases — exactly when users needed hands-free control the most. Gesture recognition degraded dramatically with occluded or messy hands. But the most surprising finding was behavioral: most participants had already invented their own workaround.

🎤
0%
Voice failure rate
Range hood: 65-80dB
👋
<40%
Gesture confidence
Flour, mitts, utensils
💪
9/12
Tried elbow taps
But 44px targets too small

Kitchen Noise vs. Voice Recognition Threshold

Idle
35dB
Prep
45dB
Chop
55dB
Sauté
65dB
Hood On
72dB
Frying
80dB

Diagram & Strategy

Adaptive UI State Machine

A multi-signal trigger system that gracefully degrades from standard input to proximity-based macro interaction.

State Transition Diagram
TRIGGER CONDITIONS 🎤 Voice Input FAILED 👋 Gesture Confidence < 40% AND SENSOR INPUT 📷 Front Camera Proximity < 30cm ALL MET? YES NO → Stay Default STATE CHANGE Default UI Standard Controls Macro UI Elbow-Bump Mode Visual: Bottom 50% becomes giant semi-transparent "Next Step" button Transition: 400ms ease-out scale
Trigger (Failed Input)
Sensor Signal
State Change
Fallback (No Change)

Try It: Trigger the Mode Switch

Click each condition to simulate the multi-signal evaluation. All three must be active to trigger Elbow-Bump Mode.

🎤
Voice Input
Click to simulate failure
👋
Gesture Confidence
Click to drop below 40%
📷
Proximity Sensor
Click to detect < 30cm
Default UI Active
Recipe Step 3/8
Elbow-Bump Mode
Next Step →
Tap anywhere

Design Rationale

Four core design decisions shaped the final system. Each one was driven by a specific failure we observed in the field — the multi-signal gate came from false activations in early prototypes, the oversized target from measuring real elbow contact areas, the semi-transparent overlay from users needing to see the recipe beneath, and the auto-timeout from preventing accidental lock-in.

🔀
Multi-Signal AND-Gate
Single trigger
23% false positive
Multi-signal
<3% false positive
👆
Touch Target: 120x Larger
Fingertip
8-12mm
vs
50%
Elbow zone
40-60mm
👁
Semi-Transparent Overlay
Video visible
85% gradient → recipe still visible beneath button
Auto-Revert: 8s Timeout
Macro active 8s no proximity → revert
or double-tap to force exit

Validation

Testing with flour-covered hands

We ran a controlled usability study with 18 participants cooking actual recipes. Each participant completed the same 8-step recipe twice: once with standard UI, once with ProxiPlay adaptive mode.

Control: Standard UI
1
Standard 44px touch targets
2
Voice command only (no adaptive)
3
Must wash hands to tap screen
Test: ProxiPlay Adaptive
1
Multi-signal trigger detection
2
Auto Elbow-Bump Mode activation
3
50% screen macro touch target

Results

-91%
Hand-wash interruptions
From avg. 6.2 → 0.5 per recipe
-78%
Step transition time
From 15.2s → 3.3s avg.
97%
Elbow-tap success rate
On 50% screen macro button

I literally forgot I was using my elbow after the second step. It just felt natural—lean in, bump, done.

P7 · Home cook, 3 years experience

The big button appearing when I got close was like the phone reading my mind. I didn't have to think about how to interact.

P12 · Cooking instructor

Outcome

From concept to shipped feature

After three rounds of prototyping and the usability study with 18 participants, the adaptive mode was refined into a production-ready feature. The key transition — from standard controls to a half-screen elbow target — happens in under 400ms and feels invisible when it works well.

UI Transition: Default → Macro Mode

Default UI
Kimchi Jjigae
Step 3/8
Proximity
triggered
Elbow-Bump Mode
Elbow-Bump Mode
Next Step →
Tap anywhere

Scale Impact

Beyond the controlled study, we tracked engagement metrics after rolling out the adaptive mode to a 500-user beta group over 4 weeks. The data showed that removing the hand-wash friction didn't just save time — it fundamentally changed how people completed recipes.

Engagement Lift
Recipe completion rate+34%
Session duration+22%
Return rate (7-day)+41%
Technical Performance
Trigger accuracy97.2%
False activation rate2.8%
Mode switch latency< 400ms

Reflection

What I learned

Instead of adding more input modalities, the real breakthrough came from listening to what's already failing — and responding to that context.
Context is the best input — When voice fails and gestures fail, the system already knows the user needs help. Reading failure signals beats adding new modalities.
Body mechanics drive digital design — The 50% screen button only works because we measured real elbow contact areas (4.2 × 6.8cm) and forearm approach angles first.
Multi-signal > single threshold — Adding voice failure and gesture confidence as co-signals dropped false activations from 23% to under 3%. The AND-gate pattern is now my default.
Design Philosophy Shift

The conventional approach to input failure is to add more modalities — voice doesn't work? Add gesture. Gesture fails? Add gaze tracking. Each new channel adds complexity, edge cases, and cognitive load. ProxiPlay flipped this: instead of adding inputs, we read what's already failing and use that as the trigger to adapt the entire interface.

Old Approach
🎤
+
👋
+
👆
+
?
Add more input modalities

Stack voice, gesture, gaze, touch — hope one works. Each new layer adds latency, false positives, and user confusion.

New Approach
🎤
+
👋
⚙️
Listen to failures, then adapt

Failure signals are the input. Two failed modalities + proximity = the system already knows what to do. Zero new learning curve.

Where This Pattern Applies

Graceful degradation via multi-signal triggers isn't kitchen-specific. Any environment where users are physically separated from the interface — or where their hands, voice, or attention are compromised — benefits from the same AND-gate logic.

🍳
Kitchen Counter

Hands occupied, device 1-2ft away. Voice drowned out by 65-80dB range hood. Elbow is the only clean contact point.

Voice ✗ Touch ✗ Elbow ✓
📺
10-ft Living Room

Remote misplaced, voice misheard. Proximity of approach + failed voice triggers simplified on-screen controls — same AND-gate pattern.

Remote ✗ Voice ✗ Walk-up ✓
🏭
Industrial Floor

Heavy gloves block touch, machinery noise kills voice. Worker proximity + failed inputs triggers oversized panel controls.

Touch ✗ Voice ✗ Proximity ✓