Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.
This is a series of blogs aimed at showcasing how to build using in-browser AI models and explain abit about the underlying technology
Nearly every person has a mobile phone, they have multiple sensors like cameras:
- microphones
- GPS
- Accelerometers
- Gyroscopes
- Proximity sensors
all great for building a useful apps.
There are other benefits;
- sending data to the cloud isn't always desirable. ** Privacy is important **
- We want low latency
- We also want our software to work when we’re not connected to the internet.
There are problems working with mobile phones (I'm pointing at Android and Apple).
- The app stores are closed systems (its how they make money)
- This introduces restrictions to working with the platforms compared to window/linux and macs (see restrictions section for more information).
The underlying tech
- GPU exeucution from within the webbrowser vs CPU
Links to useful resources
This aim of this dialogue is how show HueVision and the underlying technologies it uses uses.
The application : HUE Vision is a web-based application that brings real-time eye tracking and gaze prediction directly into the browser, its powered by TensorFlow.js and MediaPipe FaceMesh. It showcases how on-device (mobile phones and web browsers) computer vision and machine learning can work seamlessly together for intuitive, privacy-friendly gaze interaction.
Why it matters: Mobile phones and browsers have sensors (camera, mic, GPS, etc.) that can feed into AI models. Running AI on-device gives you privacy, low latency, and offline capability.
Overview of the architecture:
- Python/FastHTML — Serves the HTML shell and static files
- JavaScript — Does all the heavy lifting:
| File | Role |
|---|---|
globals.js |
Browser compatibility polyfills |
facetracker.js |
MediaPipe FaceMesh detects face & extracts eye region |
mouse.js |
Tracks cursor position (ground truth labels) |
dataset.js |
Captures eye images + mouse positions as training samples |
training.js |
Builds & trains a CNN model (Conv2D → Dense) |
ui.js |
Manages UI state, calibration mode, user feedback |
heat.js |
Draws heatmap of where you looked |
main.js |
Glue code — wires buttons, keyboard shortcuts, tracking loop |
The original version used py files. To be able to break it down and explain it, I converted it to use FastHTML use it within SolveIt (A jupyter like notebook experience on steroids).
-
Server setup — Creates a FastHTML app and tries to start it with
JupyUvi(though it's failing because port 8000 is already in use) -
Static files — Mounts CSS and JS files so the browser can access them
-
Main route (
/) — Returns a full HTML page with:- A Training Panel for calibrating the eye tracker (collecting data points, training the model)
- A Session Panel for running the tracker
- A Heatmap Panel for visualizing where you looked
- Video/canvas elements for the webcam feed
- External scripts for TensorFlow.js, MediaPipe face mesh, and custom JS modules
The actual eye tracking logic lives in the JS files (facetracker.js, training.js, etc.) — the Python just serves the HTML shell.
@rt('/')
def get():
return Html(
Head(
Meta(charset='utf-8'),
Meta(http_equiv='x-ua-compatible', content='ie=edge'),
Title('HUE Vision'),
Meta(name='description', content='A web-based application for eye tracking using facial recognition and machine learning.'),
Meta(name='viewport', content='width=device-width, initial-scale=1, shrink-to-fit=no'),
Link(rel='favicon', type='image/ico', href='favicon.ico'),
Link(href='https://fonts.googleapis.com/css?family=Roboto|Source+Code+Pro', rel='stylesheet'),
Link(rel='preconnect', href='https://fonts.gstatic.com'),
Link(href='https://fonts.googleapis.com/css2?family=KoHo&display=swap', rel='stylesheet'),
Link(rel='stylesheet', href='https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css'),
Link(rel='stylesheet', href='https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css'),
Link(rel='stylesheet', href='style.css')
),
Body(
Div(id='modal-overlay', cls='hidden'),
Div(id='help-modal', cls='hidden'),
Canvas(id='heatMap'),
Div(
H3('Hello!'),
'Welcome to',
Strong('HUE Vision'),
'👀',
Br(),
'To continue, Please grant access to your webcam. 📷',
id='info',
data_content='info'
),
Div(
Div(
Div(
Div(
H3('Training Panel'),
Div(
Button('-', id='toggle-panel-button', title='Collapse Panel', cls='icon-button'),
Button('?', id='help-button', title='Show Help', cls='icon-button'),
cls='panel-header-buttons'
),
cls='panel-header'
),
Div(
Div(
Table(
Tr(
Td('Webcam'),
Td('Disconnected', data_content='webcam-status')
),
Tr(
Td('Face Detected'),
Td('No', data_content='face-detected')
)
),
cls='status-panel'
),
Div(
Table(
Tr(
Td('Data Points Collected'),
Td('0', data_content='n-train')
),
Tr(
Td('Test Points Collected'),
Td('0', data_content='n-val')
),
Tr(
Td('Training Cycles Completed'),
Td('0', data_content='n-epochs')
),
Tr(
Td('Model Accuracy'),
Td('?', data_content='train-loss')
),
Tr(
Td('Test Accuracy'),
Td('?', data_content='val-loss')
)
),
cls='training-data-panel'
),
Div(
Div(
Div(
Button('Start Detection', id='start-detection'),
cls='buttonrow'
),
Div(
Button('Calibration', id='start-calibration', disabled=''),
Button('Start Training', id='start-training', disabled=''),
cls='buttonrow'
),
cls='button-group'
),
Div(
Div(
Button('Reset Model', id='reset-model', disabled=''),
Button('Customize Target', id='customize-target', disabled=''),
cls='buttonrow'
),
Div(
Button('Save Dataset', id='store-data', disabled=''),
Button('Load Dataset', id='load-data'),
Input(type='file', id='data-uploader'),
cls='buttonrow'
),
Div(
Button('Save Model', id='store-model', disabled=''),
Button('Load Model', id='load-model'),
Input(type='file', id='model-uploader', multiple=''),
cls='buttonrow'
),
cls='button-group'
),
cls='buttonwrap'
),
cls='panel-content'
),
id='training',
cls='panel'
),
Div(
Button('Start Session', id='start-session', disabled='', cls='emph'),
id='session-start-container'
),
id='training-phase'
),
Div(
Div(
Div(
H3('Session Panel'),
cls='panel-header'
),
Div(
Div(
Button('Start Tracking', id='start-tracking'),
Button('Stop Tracking', id='stop-tracking', disabled=''),
cls='buttonrow'
),
Div(
Button('Draw Heatmap', id='draw-heatmap', disabled='', cls='emph'),
cls='buttonrow'
),
cls='buttonwrap'
),
id='session',
cls='panel'
),
id='session-phase',
cls='hidden'
),
Div(
Div(
Div(
H3('Heatmap Panel'),
cls='panel-header'
),
Div(
Div(
Button('New Session', id='new-session'),
Button('Retrain Model', id='retrain-model'),
cls='buttonrow'
),
cls='buttonwrap'
),
id='heatmap',
cls='panel'
),
id='heatmap-phase',
cls='hidden'
),
id='main-content'
),
Video(id='webcam', width='400', height='300', autoplay=''),
Canvas(id='overlay', width='400', height='300'),
Footer(
'Made with',
I(cls='fas fa-heart'),
'by',
A('Suvrat Jain', href='https://simplysuvi.com/', target='_blank'),
'.'
),
Canvas(id='eyes', width='55', height='25'),
Div(id='target'),
Div(id='spinner', cls='hidden'),
Div(
Div(
H3('Target Settings'),
Button('×', id='close-settings', title='Close Settings', cls='icon-button'),
cls='panel-header'
),
Div(
Div(
Label('Size:', fr='target-size'),
Input(type='range', id='target-size', min='20', max='80', value='40', cls='slider'),
cls='setting-group'
),
Div(
Label('Color:', fr='target-color'),
Select(
Option('Default Orange', value='default'),
Option('Blue', value='blue'),
Option('Green', value='green'),
Option('Purple', value='purple'),
Option('Red', value='red'),
id='target-color'
),
cls='setting-group'
),
Div(
Label('Shape:', fr='target-shape'),
Select(
Option('Circle', value='circle'),
Option('Square', value='square'),
Option('Triangle', value='triangle'),
Option('Star', value='star'),
id='target-shape'
),
cls='setting-group'
),
cls='settings-content'
),
id='settings-panel',
cls='hidden'
),
Script(src='https://code.jquery.com/jquery-3.3.1.min.js', integrity='sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=', crossorigin='anonymous'),
Script(src='https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.15.0/dist/tf.min.js'),
Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/face_mesh.js'),
Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js'),
Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js'),
Script(src='js/globals.js'),
Script(src='js/ui.js'),
Script(src='js/facetracker.js'),
Script(src='js/mouse.js'),
Script(src='js/dataset.js'),
Script(src='js/training.js'),
Script(src='js/heat.js'),
Script(src='js/main.js')
),
lang='',
cls='no-js'
)
Here's how the JS files work together:
-
globals.js — Video codec detection utilities and polyfills for
getUserMedia -
ui.js — State machine for the UI. Manages phases (training → session → heatmap), shows info messages, handles calibration mode, and auto-collection of samples
-
facetracker.js — Uses MediaPipe FaceMesh to detect faces and track iris positions. Extracts eye regions and draws face mesh overlay on webcam feed
-
mouse.js — Tracks mouse position (normalized 0-1), used as ground truth labels for training
-
dataset.js — Manages training/validation data. Captures eye images as tensors, converts to grayscale, and stores alongside mouse position targets
-
training.js — Builds a TensorFlow.js CNN model:
- Conv2D → MaxPool → Flatten → Dropout → Concatenate with eye position metadata → Dense output
- Predicts (x, y) gaze coordinates
-
heat.js — Draws colored heatmap visualization of where the model predicted you were looking
-
main.js — Glue code: keyboard shortcuts, button handlers, tracking loop (calls
getPrediction()every 100ms and moves the target dot)
The flow:
- User grants webcam → FaceMesh detects face/eyes
- User moves mouse while looking at cursor, pressing Space to capture samples
- Samples = eye image + mouse position (label)
- Train CNN model on samples
- Model predicts gaze → moves target dot to predicted location
heat.js creates a visual heatmap showing where you looked during a session.
Key functions:
-
getHeatColor(value, alpha)— Converts a value (0-1) to a color using HSL. Lower values = green, higher = red. Classic "heat" gradient. -
fillHeatmap(data, ctx, width, height, radius)— Loops through all recorded gaze points and draws semi-transparent circles at each (x, y) location on the canvas. -
drawHeatmap(dataset)— Sets up the canvas to match window size, then callsfillHeatmapwith the session data. Shows "In Progress..." while drawing. -
clearHeatmap()— Wipes the canvas clean.
How it works: After tracking, the dataset.session object contains arrays of normalized x/y coordinates. The heatmap draws overlapping translucent circles at those spots — areas with more gaze points appear more saturated/opaque.
window.heatmap = {
getHeatColor: function(value, alpha) {
// Adapted from https://stackoverflow.com/a/17268489/1257278
if (typeof alpha == 'undefined') {
alpha = 1.0;
}
const hue = ((1 - value) * 120).toString(10);
return 'hsla(' + hue + ',100%,50%,' + alpha + ')';
},
fillHeatmap: function(data, ctx, width, height, radius) {
// Go through a dataset and fill the context with the corresponding circles.
let pointX, pointY;
for (let i = 0; i < data.n; i++) {
pointX = Math.floor(data.x[i] * width);
pointY = Math.floor(data.y[i] * height);
ctx.beginPath();
ctx.fillStyle = this.getHeatColor(0.5, 0.5);
ctx.arc(pointX, pointY, radius, 0, 2 * Math.PI);
ctx.fill();
}
},
drawHeatmap: function(dataset) {
this.clearHeatmap();
$('#draw-heatmap').prop('disabled', true);
$('#draw-heatmap').html('In Progress...');
const heatmap = $('#heatMap')[0];
const ctx = heatmap.getContext('2d');
const width = $('body').width();
const height = $('body').height();
heatmap.width = width;
heatmap.height = height;
this.fillHeatmap(dataset.session, ctx, width, height, 15);
$('#clear-heatmap').prop('disabled', false);
$('#draw-heatmap').prop('disabled', false);
$('#draw-heatmap').html('Draw Heatmap');
},
clearHeatmap: function() {
const heatmap = $('#heatMap')[0];
const ctx = heatmap.getContext('2d');
ctx.clearRect(0, 0, heatmap.width, heatmap.height);
},
};
main.js is the glue that wires everything together. Here's what it does:
-
Tracking object — Manages the active tracking state. When started, it clears session data and enables/disables buttons appropriately.
-
moveTarget()loop — Runs every 100ms. If tracking is active and a model exists, it callsgetPrediction()to get the predicted gaze (x, y), stores it indataset.session, and moves the target dot to that position on screen. -
Keyboard shortcuts — Space captures a sample, A toggles auto-collection, C starts calibration, T trains, H toggles heatmap, R resets model, ? shows help.
-
Button handlers — Connects all the UI buttons to their respective functions (start detection, calibration, training, tracking, save/load data and models, target customization).
-
File I/O — Handles saving datasets as JSON and models via TensorFlow.js's
save()/loadLayersModel().
In short: main.js is the event dispatcher — it doesn't do the heavy lifting itself, but orchestrates when facetracker, dataset, training, heatmap, and ui get called.
const tracking = {
active: false,
interval: null,
start: function() {
dataset.clearSession();
this.active = true;
$('#start-tracking').prop('disabled', true);
$('#stop-tracking').prop('disabled', false);
$('#draw-heatmap').prop('disabled', true);
},
stop: function() {
this.active = false;
$('#start-tracking').prop('disabled', false);
$('#stop-tracking').prop('disabled', true);
$('#draw-heatmap').prop('disabled', false);
ui.showInfo(
'<h3>Tracking Stopped</h3>' +
'You can now <strong>Draw Heatmap</strong> to visualize the session, or <strong>Start Tracking</strong> again.',
true
);
},
};
$(document).ready(function() {
const $target = $('#target');
const targetSize = $target.outerWidth();
function moveTarget() {
// Move the model target to where we predict the user is looking to
if (training.currentModel == null || training.inTraining || !tracking.active) {
return;
}
training.getPrediction().then(prediction => {
dataset.session.n += 1;
dataset.session.x.push(prediction[0]);
dataset.session.y.push(prediction[1]);
const left = prediction[0] * ($('body').width() - targetSize);
const top = prediction[1] * ($('body').height() - targetSize);
$target.css('left', left + 'px');
$target.css('top', top + 'px');
});
}
setInterval(moveTarget, 100);
function download(content, fileName, contentType) {
const a = document.createElement('a');
const file = new Blob([content], {
type: contentType,
});
a.href = URL.createObjectURL(file);
a.download = fileName;
a.click();
}
// Map functions to keys and buttons:
$('body').keyup(function(e) {
// Escape key - Close help modal
if (e.keyCode === 27) {
ui.hideHelp();
e.preventDefault();
return false;
}
// Space key - Capture example
if (e.keyCode === 32 && ui.readyToCollect) {
dataset.captureExample();
e.preventDefault();
return false;
}
// A key - Toggle auto-collection
if (e.keyCode === 65 && ui.readyToCollect) { // 'A' key
ui.toggleAutoCollect();
e.preventDefault();
return false;
}
// C key - Start calibration mode
if (e.keyCode === 67 && ui.readyToCollect) { // 'C' key
ui.startCalibration();
e.preventDefault();
return false;
}
// T key - Start training
if (e.keyCode === 84 && !$('#start-training').prop('disabled')) { // 'T' key
training.fitModel();
e.preventDefault();
return false;
}
// H key - Toggle heatmap
if (e.keyCode === 72 && !$('#draw-heatmap').prop('disabled')) { // 'H' key
if ($('#heatMap').css('opacity') === '0') {
heatmap.drawHeatmap(dataset, training.currentModel);
} else {
heatmap.clearHeatmap();
}
e.preventDefault();
return false;
}
// R key - Reset model
if (e.keyCode === 82 && !$('#reset-model').prop('disabled')) { // 'R' key
training.resetModel();
e.preventDefault();
return false;
}
// ? key - Show help
if (e.keyCode === 191 && e.shiftKey) { // '?' key (Shift + /)
ui.displayHelp();
e.preventDefault();
return false;
}
});
$('#start-detection').click(function(e) {
facetracker.startDetection();
$(this).prop('disabled', true);
});
$('#start-calibration').click(function(e) {
ui.startCalibration();
});
$('#start-training').click(function(e) {
training.fitModel();
});
$('#start-tracking').click(function(e) {
tracking.start();
});
$('#stop-tracking').click(function(e) {
tracking.stop();
});
$('#reset-model').click(function(e) {
training.resetModel();
});
$('#draw-heatmap').click(function(e) {
ui.showPhase('heatmap');
heatmap.drawHeatmap(dataset);
});
$('#clear-heatmap').click(function(e) {
heatmap.clearHeatmap();
});
$('#store-data').click(function(e) {
const data = dataset.toJSON();
const json = JSON.stringify(data);
download(json, 'dataset.json', 'text/plain');
});
$('#load-data').click(function(e) {
$('#data-uploader').trigger('click');
});
$('#data-uploader').change(function(e) {
const file = e.target.files[0];
const reader = new FileReader();
reader.onload = function() {
const data = reader.result;
const json = JSON.parse(data);
dataset.fromJSON(json);
};
reader.readAsBinaryString(file);
});
$('#store-model').click(async function(e) {
await training.currentModel.save('downloads://model');
});
$('#load-model').click(function(e) {
$('#model-uploader').trigger('click');
});
$('#model-uploader').change(async function(e) {
const files = e.target.files;
training.currentModel = await tf.loadLayersModel(
tf.io.browserFiles([files[0], files[1]]),
);
ui.onFinishTraining();
});
// Help button event handler
$('#help-button').click(function() {
ui.displayHelp();
});
// Toggle panel button event handler
$('#toggle-panel-button').click(function() {
const $panelContent = $('.panel-content');
$panelContent.toggleClass('collapsed');
if ($panelContent.hasClass('collapsed')) {
$(this).text('+');
} else {
$(this).text('-');
}
});
// Target customization
$('#customize-target').click(function() {
$('#settings-panel').removeClass('hidden');
});
$('#close-settings').click(function() {
$('#settings-panel').addClass('hidden');
});
// Target size slider
$('#target-size').on('input', function() {
const size = $(this).val();
$('#target').css({
width: size + 'px',
height: size + 'px'
});
});
// Target color selector
$('#target-color').change(function() {
const color = $(this).val();
// Remove all color classes
$('#target').removeClass('color-default color-blue color-green color-purple color-red');
// Add selected color class
if (color !== 'default') {
$('#target').addClass('color-' + color);
} else {
// Default gradient is already in the base CSS
$('#target').css('background', 'linear-gradient(135deg, #f9a66c, #f27121)');
}
});
// Target shape selector
$('#target-shape').change(function() {
const shape = $(this).val();
// Remove all shape classes
$('#target').removeClass('target-circle target-square target-triangle target-star');
// Reset any custom styles that might have been applied
$('#target').css({
'clip-path': '',
'border-radius': '',
'width': $('#target-size').val() + 'px',
'height': $('#target-size').val() + 'px',
'border-left': '',
'border-right': '',
'border-bottom': ''
});
// Add selected shape class
if (shape !== 'circle') {
$('#target').addClass('target-' + shape);
// Special handling for triangle
if (shape === 'triangle') {
const size = $('#target-size').val();
const halfSize = size / 2;
$('#target').css({
'border-left': halfSize + 'px solid transparent',
'border-right': halfSize + 'px solid transparent',
'border-bottom': size + 'px solid',
'border-bottom-color': $('#target').css('background-color')
});
}
} else {
// Circle is default
$('#target').css('border-radius', '50%');
}
});
});
ui.js manages the application's user interface state and user feedback.
Key responsibilities:
-
State tracking — Tracks current phase (training → session → heatmap), whether face is detected, and if we're ready to collect samples
-
Info messages —
showInfo()displays status messages to guide the user, with optional sound/flash effects -
Event callbacks — Functions like
onWebcamEnabled(),onFoundFace(),onAddExample(), andonFinishTraining()update the UI when things happen -
Auto-collection —
toggleAutoCollect()captures samples every 1.5 seconds automatically -
Calibration mode —
startCalibration()shows 9 points around the screen and collects samples at each position automatically -
Phase switching —
showPhase()hides/shows the appropriate panel (training, session, or heatmap)
In short: It's the "presenter" layer — it doesn't do ML or face detection itself, but responds to those systems and keeps the user informed of what's happening and what to do next.
window.ui = {
state: 'loading',
phase: 'training', // training, session, heatmap
readyToCollect: false,
nExamples: 0,
nTrainings: 0,
autoCollectMode: false,
autoCollectInterval: null,
calibrationMode: false,
calibrationPoints: [],
currentCalibrationPoint: 0,
calibrationInterval: null,
setContent: function(key, value) {
// Set an element's content based on the data-content key.
$('[data-content="' + key + '"]').html(value);
},
showInfo: function(text, dontFlash) {
// Show info and beep / flash.
this.setContent('info', text);
if (!dontFlash) {
$('#info').addClass('flash');
new Audio('hint.mp3').play();
setTimeout(function() {
$('#info').removeClass('flash');
}, 1000);
}
},
onWebcamEnabled: function() {
this.state = 'waiting for detection';
this.setContent('webcam-status', 'Connected');
$('[data-content="webcam-status"]').removeClass('disconnected').addClass('connected');
this.showInfo("Webcam connected. Press <strong>Start Detection</strong> to begin.", true);
},
onFoundFace: function() {
$('#spinner').addClass('hidden');
$('#eyes').show();
$('#overlay').show();
this.setContent('face-detected', 'Yes');
$('[data-content="face-detected"]').removeClass('not-detected').addClass('detected');
this.readyToCollect = true;
$('#start-calibration').prop('disabled', false);
if (dataset.train.n >= 2) {
$('#start-training').prop('disabled', false);
}
if (this.state == 'waiting for detection') {
this.state = 'collecting';
this.showInfo(
"<h3>Let's start!</h3>" +
'Collect data points by moving your mouse and following the cursor with your eyes and hitting the space key repeatedly.<br><br>' +
'You can also toggle automatic collection mode by pressing "A" on your keyboard.',
true,
);
}
},
onFaceNotFound: function() {
this.setContent('face-detected', 'No');
$('[data-content="face-detected"]').removeClass('detected').addClass('not-detected');
this.readyToCollect = false;
$('#start-calibration').prop('disabled', true);
$('#start-training').prop('disabled', true);
},
toggleAutoCollect: function() {
this.autoCollectMode = !this.autoCollectMode;
if (this.autoCollectMode) {
// Start auto-collection
this.showInfo(
'<h3>Auto-collection enabled</h3>' +
'Move your cursor around and follow it with your eyes. Samples will be collected automatically every 1.5 seconds.<br><br>' +
'Press "A" again to disable auto-collection.',
true
);
this.autoCollectInterval = setInterval(function() {
if (ui.readyToCollect && facetracker.currentPosition) {
dataset.captureExample();
}
}, 1500);
} else {
// Stop auto-collection
clearInterval(this.autoCollectInterval);
this.showInfo(
'<h3>Auto-collection disabled</h3>' +
'Switched back to manual collection. Press space to collect samples.',
true
);
}
},
onAddExample: function(nTrain, nVal) {
// Call this when an example is added.
this.nExamples = nTrain + nVal;
this.setContent('n-train', nTrain);
this.setContent('n-val', nVal);
if (nTrain >= 2) {
$('#start-training').prop('disabled', false);
}
if (this.state == 'collecting' && this.nExamples == 5) {
this.showInfo(
'<h3>Keep going!</h3>' +
'You need to collect at least 20 data points to start seeing results.',
);
}
if (this.state == 'collecting' && this.nExamples == 25) {
this.showInfo(
'<h3>Great job! 👌</h3>' +
"Now that you have a handful of samples, let's train the machine learning model!<br><br> " +
'Hit the <strong>Start Training</strong> button.',
);
}
if (this.state == 'trained' && this.nExamples == 50) {
this.showInfo(
'<h3>Fantastic! 👏</h3>' +
"You've collected lots of data points. Let's try training our model again!",
);
}
if (nTrain > 0 && nVal > 0) {
$('#store-data').prop('disabled', false);
}
},
onFinishTraining: function() {
// Call this when training is finished.
this.nTrainings += 1;
$('#target').css('opacity', '0.9');
$('#session-start-container').show();
$('#start-session').prop('disabled', false);
$('#customize-target').prop('disabled', false);
$('#reset-model').prop('disabled', false);
$('#store-model').prop('disabled', false);
$('#training-progress').hide();
if (this.nTrainings == 1) {
this.state = 'trained';
this.showInfo(
'<h3>Awesome!</h3>' +
'The model has been trained. Click the <strong>Start Session</strong> button at the bottom of the screen to begin eye tracking.<br><br>' +
"You can continue to collect more data and retrain the model to improve accuracy.",
);
} else if (this.nTrainings == 2) {
this.state = 'trained_twice';
this.showInfo(
'<h3>Getting better! 🚀</h3>' +
'Keep collecting and retraining!<br>' +
'You can also draw a heatmap that shows you where your ' +
'model has its strong and weak points.',
);
} else if (this.nTrainings == 3) {
this.state = 'trained_thrice';
this.showInfo(
'If your model is overfitting, remember you can reset it anytime.',
);
} else if (this.nTrainings == 4) {
this.state = 'trained_thrice';
this.showInfo(
'<h3>Have fun!</h3>' +
'Check this space for more! 😄',
);
}
},
showPhase: function(phase) {
this.phase = phase;
$('#training-phase').addClass('hidden');
$('#session-phase').addClass('hidden');
$('#heatmap-phase').addClass('hidden');
$('#' + phase + '-phase').removeClass('hidden');
},
initSessionControls: function() {
$('#start-session').click(() => {
this.showPhase('session');
this.showInfo(
'<h3>Session Started</h3>' +
'Click <strong>Start Tracking</strong> to see the model in action!',
true
);
});
$('#new-session').click(() => {
this.showPhase('session');
heatmap.clearHeatmap();
});
$('#retrain-model').click(() => {
this.showPhase('training');
heatmap.clearHeatmap();
});
},
showTrainingProgress: function(epoch, totalEpochs, loss, valLoss) {
if (!$('#training-progress').length) {
$('body').append('<div id="training-progress"></div>');
$('#training-progress').css({
position: 'fixed',
bottom: '60px',
left: '50%',
transform: 'translateX(-50%)',
background: 'white',
padding: '15px',
borderRadius: '10px',
boxShadow: '0 4px 24px rgba(0, 0, 0, 0.1)',
zIndex: 1000,
textAlign: 'center',
width: '300px'
});
}
const percent = Math.round((epoch / totalEpochs) * 100);
$('#training-progress').html(`
<div>Training Progress: ${epoch}/${totalEpochs} epochs (${percent}%)</div>
<div style="background: #f0f0f0; height: 10px; border-radius: 5px; margin: 10px 0;">
<div style="background: linear-gradient(135deg, #f9a66c, #f27121); width: ${percent}%; height: 100%; border-radius: 5px;"></div>
</div>
<div>Loss: ${loss.toFixed(5)} | Validation Loss: ${valLoss.toFixed(5)}</div>
`);
$('#training-progress').show();
},
displayHelp: function() {
const helpContent =
'<button id="close-help" class="icon-button">×</button>' +
'<h3>Keyboard Shortcuts</h3>' +
'<ul style="list-style-type: none; padding-left: 0;">' +
'<li><strong>Space</strong> - Capture training sample</li>' +
'<li><strong>A</strong> - Toggle automatic data collection</li>' +
'<li><strong>C</strong> - Start calibration mode</li>' +
'<li><strong>T</strong> - Start training (when enabled)</li>' +
'<li><strong>H</strong> - Toggle heatmap (when enabled)</li>' +
'<li><strong>R</strong> - Reset model (when enabled)</li>' +
'<li><strong>?</strong> - Show this help</li>' +
'</ul>' +
'<h3>Features</h3>' +
'<ul style="list-style-type: none; padding-left: 0;">' +
'<li><strong>Calibration</strong> - Guided data collection at specific points</li>' +
'<li><strong>Auto-collection</strong> - Automatically collect samples while you look around</li>' +
'<li><strong>Target Customization</strong> - Change the size, color, and shape of the target</li>' +
'<li><strong>Heatmap</strong> - Visualize model accuracy across the screen</li>' +
'</ul>';
$('#help-modal').html(helpContent);
$('#modal-overlay, #help-modal').removeClass('hidden');
$('#close-help').click(function() {
ui.hideHelp();
});
},
hideHelp: function() {
$('#modal-overlay, #help-modal').addClass('hidden');
},
startCalibration: function() {
if (!this.readyToCollect || this.calibrationMode) {
return;
}
// Stop auto collection if it's running
if (this.autoCollectMode) {
this.toggleAutoCollect();
}
this.calibrationMode = true;
// Define calibration points (9-point calibration)
const width = $('body').width();
const height = $('body').height();
const padding = 100; // Padding from edges
this.calibrationPoints = [
{ x: padding, y: padding }, // Top-left
{ x: width / 2, y: padding }, // Top-center
{ x: width - padding, y: padding }, // Top-right
{ x: padding, y: height / 2 }, // Middle-left
{ x: width / 2, y: height / 2 }, // Center
{ x: width - padding, y: height / 2 }, // Middle-right
{ x: padding, y: height - padding }, // Bottom-left
{ x: width / 2, y: height - padding }, // Bottom-center
{ x: width - padding, y: height - padding } // Bottom-right
];
this.currentCalibrationPoint = 0;
// Create calibration target if it doesn't exist
if (!$('#calibration-target').length) {
$('body').append('<div id="calibration-target"></div>');
$('#calibration-target').css({
position: 'absolute',
width: '20px',
height: '20px',
borderRadius: '50%',
background: 'radial-gradient(circle, rgba(249,166,108,1) 0%, rgba(242,113,33,1) 70%)',
border: '2px solid white',
boxShadow: '0 0 10px rgba(0,0,0,0.2)',
transform: 'translate(-50%, -50%)',
zIndex: 1000,
display: 'none'
});
}
// Create calibration instructions
if (!$('#calibration-instructions').length) {
$('body').append('<div id="calibration-instructions"></div>');
$('#calibration-instructions').css({
position: 'fixed',
bottom: '120px',
left: '50%',
transform: 'translateX(-50%)',
background: 'white',
padding: '15px',
borderRadius: '10px',
boxShadow: '0 4px 24px rgba(0,0,0,0.1)',
zIndex: 1000,
textAlign: 'center',
width: '400px',
fontSize: '16px'
});
}
this.showInfo(
'<h3>Calibration Mode</h3>' +
'Follow the orange dot with your eyes as it moves around the screen.<br><br>' +
'The system will automatically collect samples at each position.',
true
);
// Show first calibration point
this.showCalibrationPoint();
// Start calibration sequence
this.calibrationInterval = setInterval(() => {
// Collect sample at current point
if (facetracker.currentPosition) {
// Manually set mouse position to current calibration point
const point = this.calibrationPoints[this.currentCalibrationPoint];
mouse.mousePosX = point.x / $('body').width();
mouse.mousePosY = point.y / $('body').height();
// Capture example
dataset.captureExample();
// Move to next point
this.currentCalibrationPoint++;
// Update progress
$('#calibration-instructions').html(
`Calibration progress: ${this.currentCalibrationPoint} of ${this.calibrationPoints.length} points`
);
// Check if calibration is complete
if (this.currentCalibrationPoint >= this.calibrationPoints.length) {
this.stopCalibration();
return;
}
// Show next point
this.showCalibrationPoint();
}
}, 2000); // 2 seconds per point
},
showCalibrationPoint: function() {
const point = this.calibrationPoints[this.currentCalibrationPoint];
$('#calibration-target').css({
left: point.x + 'px',
top: point.y + 'px',
display: 'block'
});
// Animate the target to draw attention
$('#calibration-target').animate({
width: '30px',
height: '30px'
}, 500, function() {
$(this).animate({
width: '20px',
height: '20px'
}, 500);
});
},
stopCalibration: function() {
clearInterval(this.calibrationInterval);
this.calibrationMode = false;
// Hide calibration elements
$('#calibration-target').hide();
$('#calibration-instructions').hide();
this.showInfo(
'<h3>Calibration Complete!</h3>' +
`Collected ${this.calibrationPoints.length} calibration points.<br><br>` +
'Now you can train the model by clicking the "Start Training" button.',
true
);
// Enable training if we have enough samples
if (dataset.train.n >= 2) {
$('#start-training').prop('disabled', false);
}
}
};
$(document).ready(function() {
ui.initSessionControls();
});
mouse.js tracks the cursor position as ground truth labels for training.
- Stores normalized (0-1) coordinates in
mousePosXandmousePosY - Updates on every mouse move event by dividing pixel position by window dimensions
getMousePos()returns[x, y]for use bydataset.jswhen capturing training samples
It's the simplest file — just records where the mouse is so the model learns to map eye images → screen position.
$(document).ready(function() {
window.mouse = {
mousePosX: 0.5,
mousePosY: 0.5,
handleMouseMove: function(event) {
mouse.mousePosX = event.clientX / $('body').width();
mouse.mousePosY = event.clientY / $('body').height();
},
getMousePos: function() {
return [mouse.mousePosX, mouse.mousePosY];
},
};
document.onmousemove = mouse.handleMouseMove;
});
facetracker.js handles face detection and eye tracking using MediaPipe FaceMesh.
Key parts:
-
Webcam setup — Requests camera access, adjusts video proportions, handles success/failure
-
FaceMesh initialization — Loads MediaPipe's FaceMesh model with iris tracking enabled (
refineLandmarks: true) -
onResults()callback — Called each frame with detected face landmarks:- Draws mesh overlay (tesselation, face oval, lips, eyes)
- Calculates iris centers from landmarks 468-472 (left) and 473-477 (right)
- Computes a bounding box around both eyes
- Crops the eye region from the video and draws it to the
#eyescanvas
-
Stores state —
currentPosition(all 478 landmarks) andcurrentEyeRect(eye crop coordinates) are used bydataset.jsfor training samples
The eye crop in the #eyes canvas becomes the input image for the CNN model.
// ==============================================
// facetracker.js (MediaPipe FaceMesh version)
// ==============================================
$(document).ready(function () {
const video = document.getElementById("webcam");
const overlay = document.getElementById("overlay");
const eyesCanvas = document.getElementById("eyes");
const eyesCtx = eyesCanvas.getContext("2d");
window.facetracker = {
video,
overlay,
overlayCC: overlay.getContext("2d"),
videoWidthExternal: video.width,
videoHeightExternal: video.height,
videoWidthInternal: video.videoWidth,
videoHeightInternal: video.videoHeight,
trackingStarted: false,
currentPosition: null,
currentEyeRect: null,
adjustVideoProportions: function () {
facetracker.videoWidthInternal = video.videoWidth;
facetracker.videoHeightInternal = video.videoHeight;
const proportion =
facetracker.videoWidthInternal / facetracker.videoHeightInternal;
facetracker.videoWidthExternal = Math.round(
facetracker.videoHeightExternal * proportion
);
facetracker.video.width = facetracker.videoWidthExternal;
facetracker.overlay.width = facetracker.videoWidthExternal;
},
gumSuccess: function (stream) {
ui.onWebcamEnabled();
if ("srcObject" in facetracker.video) {
facetracker.video.srcObject = stream;
} else {
facetracker.video.src = window.URL.createObjectURL(stream);
}
facetracker.video.onloadedmetadata = function () {
facetracker.adjustVideoProportions();
facetracker.video.play();
};
facetracker.video.onresize = function () {
facetracker.adjustVideoProportions();
};
},
startDetection: function() {
$('#spinner').removeClass('hidden');
initFaceMesh();
},
gumFail: function () {
ui.setContent('webcam-status', 'Disconnected');
$('[data-content="webcam-status"]').removeClass('connected').addClass('disconnected');
ui.showInfo(
"There was some problem trying to fetch video from your webcam 😭",
true
);
},
};
// =====================================================
// NEW: MediaPipe FaceMesh integration
// =====================================================
let faceMesh;
let camera;
async function initFaceMesh() {
faceMesh = new FaceMesh({
locateFile: (file) =>
`https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`,
});
faceMesh.setOptions({
maxNumFaces: 1,
refineLandmarks: true, // enables iris landmarks
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5,
});
faceMesh.onResults(onResults);
camera = new Camera(video, {
onFrame: async () => {
await faceMesh.send({ image: video });
},
width: 640,
height: 480,
});
camera.start();
facetracker.trackingStarted = true;
}
function onResults(results) {
const ctx = facetracker.overlayCC;
ctx.clearRect(
0,
0,
facetracker.videoWidthExternal,
facetracker.videoHeightExternal
);
if (!results.multiFaceLandmarks || results.multiFaceLandmarks.length === 0) {
ui.onFaceNotFound();
facetracker.currentPosition = null;
return;
}
const landmarks = results.multiFaceLandmarks[0];
facetracker.currentPosition = landmarks;
// elegant, balanced mesh lines — visible but not harsh
drawConnectors(ctx, landmarks, FACEMESH_TESSELATION,
{ color: 'rgba(255, 255, 255, 0.4)', lineWidth: 0.4 });
// enhance facial structure subtly
drawConnectors(ctx, landmarks, FACEMESH_FACE_OVAL,
{ color: 'rgba(255, 255, 255, 0.5)', lineWidth: 0.5 });
drawConnectors(ctx, landmarks, FACEMESH_LIPS,
{ color: 'rgba(255, 255, 255, 0.25)', lineWidth: 0.4 });
// make eyes pop clearly without oversaturation
drawConnectors(ctx, landmarks, FACEMESH_LEFT_EYE,
{ color: 'rgba(0, 255, 100, 0.6)', lineWidth: 0.9 });
drawConnectors(ctx, landmarks, FACEMESH_RIGHT_EYE,
{ color: 'rgba(255, 80, 80, 0.6)', lineWidth: 0.9 });
// Get iris centers (more stable for gaze)
const LEFT_IRIS = [468, 469, 470, 471, 472];
const RIGHT_IRIS = [473, 474, 475, 476, 477];
function irisCenter(indices) {
let x = 0,
y = 0;
for (const i of indices) {
x += landmarks[i].x;
y += landmarks[i].y;
}
return { x: x / indices.length, y: y / indices.length };
}
const left = irisCenter(LEFT_IRIS);
const right = irisCenter(RIGHT_IRIS);
const eyeCenterX = (left.x + right.x) / 2;
const eyeCenterY = (left.y + right.y) / 2;
const eyeWidth = Math.abs(right.x - left.x) * video.videoWidth * 1.5;
const eyeHeight = eyeWidth * 0.6;
const cropX = eyeCenterX * video.videoWidth - eyeWidth / 2;
const cropY = eyeCenterY * video.videoHeight - eyeHeight / 2;
facetracker.currentEyeRect = [cropX, cropY, eyeWidth, eyeHeight];
// Draw red bounding box on overlay
// ctx.strokeStyle = "red";
// ctx.lineWidth = 2;
// ctx.strokeRect(cropX, cropY, eyeWidth, eyeHeight);
// Draw eye crop into #eyes canvas
eyesCtx.drawImage(
video,
cropX,
cropY,
eyeWidth,
eyeHeight,
0,
0,
eyesCanvas.width,
eyesCanvas.height
);
ui.onFoundFace();
}
// =====================================================
// Video setup (same as before)
// =====================================================
if (navigator.mediaDevices) {
navigator.mediaDevices
.getUserMedia({ video: true })
.then(facetracker.gumSuccess)
.catch(facetracker.gumFail);
} else if (navigator.getUserMedia) {
navigator.getUserMedia(
{ video: true },
(stream) => {
facetracker.gumSuccess(stream);
},
facetracker.gumFail
);
} else {
ui.showInfo(
"Your browser does not seem to support getUserMedia. 😭 This will probably only work in Chrome or Firefox.",
true
);
}
});
training.js builds and trains the CNN model that predicts gaze position.
Key parts:
-
createModel()— Builds a TensorFlow.js model:- Takes two inputs: eye image (25×55×3) + metadata (4 values: iris positions)
- Conv2D (5×5, 20 filters) → MaxPool → Flatten → Dropout → Concatenate with metadata → Dense output (2 units for x, y)
- Uses
tanhactivation (outputs -1 to 1)
-
fitModel()— Trains for 20 epochs with Adam optimizer and MSE loss. Saves the best model (by validation loss) to localStorage and restores it at the end. -
getPrediction()— Captures current eye image, runs it through the model, returns predicted (x, y) screen coordinates (shifted from [-0.5, 0.5] to [0, 1]) -
resetModel()— Clears the model to start fresh
window.training = {
currentModel: null,
inTraining: false,
epochsTrained: 0,
createModel: function() {
const inputImage = tf.input({
name: 'image',
shape: [dataset.inputHeight, dataset.inputWidth, 3],
});
const inputMeta = tf.input({
name: 'metaInfos',
shape: [4],
});
const conv = tf.layers
.conv2d({
kernelSize: 5,
filters: 20,
strides: 1,
activation: 'relu',
kernelInitializer: 'varianceScaling',
})
.apply(inputImage);
const maxpool = tf.layers
.maxPooling2d({
poolSize: [2, 2],
strides: [2, 2],
})
.apply(conv);
const flat = tf.layers.flatten().apply(maxpool);
const dropout = tf.layers.dropout(0.2).apply(flat);
const concat = tf.layers.concatenate().apply([dropout, inputMeta]);
const output = tf.layers
.dense({
units: 2,
activation: 'tanh',
kernelInitializer: 'varianceScaling',
})
.apply(concat);
const model = tf.model({
inputs: [inputImage, inputMeta],
outputs: output,
});
return model;
},
fitModel: function() {
this.inTraining = true;
const epochs = 20; // Increased from 10 to 20 for better training
let batchSize = Math.floor(dataset.train.n * 0.1);
batchSize = Math.max(2, Math.min(batchSize, 64));
$('#start-training').prop('disabled', true);
$('#start-training').html('In Progress...');
if (training.currentModel == null) {
training.currentModel = training.createModel();
}
console.info('Training on', dataset.train.n, 'samples');
ui.state = 'training';
let bestEpoch = -1;
let bestTrainLoss = Number.MAX_SAFE_INTEGER;
let bestValLoss = Number.MAX_SAFE_INTEGER;
const bestModelPath = 'localstorage://best-model';
training.currentModel.compile({
optimizer: tf.train.adam(0.0005),
loss: 'meanSquaredError',
});
training.currentModel.fit(dataset.train.x, dataset.train.y, {
batchSize: batchSize,
epochs: epochs,
shuffle: true,
validationData: [dataset.val.x, dataset.val.y],
callbacks: {
onEpochBegin: async function(epoch) {
// Show progress at the beginning of each epoch
ui.showTrainingProgress(epoch, epochs,
epoch > 0 ? bestTrainLoss : 0,
epoch > 0 ? bestValLoss : 0);
},
onEpochEnd: async function(epoch, logs) {
console.info('Epoch', epoch, 'losses:', logs);
training.epochsTrained += 1;
ui.setContent('n-epochs', training.epochsTrained);
ui.setContent('train-loss', (100 * (1 - logs.loss)).toFixed(2) + '%');
ui.setContent('val-loss', (100 * (1 - logs.val_loss)).toFixed(2) + '%');
// Update progress bar
ui.showTrainingProgress(epoch + 1, epochs, logs.loss, logs.val_loss);
if (logs.val_loss < bestValLoss) {
// Save model
bestEpoch = epoch;
bestTrainLoss = logs.loss;
bestValLoss = logs.val_loss;
// Store best model:
await training.currentModel.save(bestModelPath);
}
return await tf.nextFrame();
},
onTrainEnd: async function() {
console.info('Finished training');
// Load best model:
training.epochsTrained -= epochs - bestEpoch;
console.info('Loading best epoch:', training.epochsTrained);
ui.setContent('n-epochs', training.epochsTrained);
ui.setContent('train-loss', (100 * (1 - bestTrainLoss)).toFixed(2) + '%');
ui.setContent('val-loss', (100 * (1 - bestValLoss)).toFixed(2) + '%');
training.currentModel = await tf.loadLayersModel(bestModelPath);
$('#start-training').prop('disabled', false);
$('#start-training').html('Start Training');
training.inTraining = false;
ui.onFinishTraining();
},
},
});
},
resetModel: function() {
$('#reset-model').prop('disabled', true);
training.currentModel = null;
training.epochsTrained = 0;
ui.setContent('n-epochs', training.epochsTrained);
ui.setContent('train-loss', '?');
ui.setContent('val-loss', '?');
$('#reset-model').prop('disabled', false);
},
getPrediction: async function() {
// Return relative x, y where we expect the user to look right now.
const rawImg = dataset.getImage();
const img = await dataset.convertImage(rawImg);
const metaInfos = dataset.getMetaInfos();
const prediction = training.currentModel.predict([img, metaInfos]);
const predictionData = await prediction.data();
tf.dispose([img, metaInfos, prediction]);
return [predictionData[0] + 0.5, predictionData[1] + 0.5];
},
drawSingleFilter: function(weights, filterId, canvas) {
const canvasCtx = canvas.getContext('2d');
const kernelSize = weights.shape[0];
const pixelSize = canvas.width / kernelSize;
let x, y;
let min = 10000;
let max = -10000;
let value;
// First, find min and max:
for (x = 0; x < kernelSize; x++) {
for (y = 0; y < kernelSize; y++) {
value = weights.arraySync()[x][y][0][filterId];
if (value < min) min = value;
if (value > max) max = value;
}
}
for (x = 0; x < kernelSize; x++) {
for (y = 0; y < kernelSize; y++) {
value = weights.arraySync()[x][y][0][filterId];
value = ((value - min) / (max - min)) * 255;
canvasCtx.fillStyle = 'rgb(' + value + ',' + value + ',' + value + ')';
canvasCtx.fillRect(x * pixelSize, y * pixelSize, pixelSize, pixelSize);
}
}
},
visualizePixels: function(canvas) {
const model = training.currentModel;
const convLayer = model.layers[1];
const weights = convLayer.weights[0].read();
const bias = convLayer.weights[1].read();
const filterId = 1;
training.drawSingleFilter(weights, filterId, canvas);
},
};
dataset.js manages training data collection and storage.
Key parts:
-
getImage()— Captures the eye canvas as a tensor, normalizes pixel values to [-1, 1] -
getMetaInfos()— Extracts metadata: eye rectangle center (x, y) and size, normalized relative to video dimensions -
convertImage()— Converts RGB to grayscale (with gamma correction) and adds spatial coordinates as extra channels (3 channels total) -
captureExample()— Main entry point: grabs current eye image + mouse position, callsaddExample() -
addExample()/addToDataset()— Randomly assigns samples to train (80%) or val (20%) sets, concatenates tensors -
toJSON()/fromJSON()— Serialization for saving/loading datasets
The train and val objects store x (image + metadata tensors) and y (target coordinates) for model training.
window.dataset = {
inputWidth: $('#eyes').width(),
inputHeight: $('#eyes').height(),
train: {
n: 0,
x: null,
y: null,
},
val: {
n: 0,
x: null,
y: null,
},
session: {
n: 0,
x: [],
y: [],
},
clearSession: function() {
this.session = {
n: 0,
x: [],
y: [],
};
},
getImage: function() {
// Capture the current image in the eyes canvas as a tensor.
return tf.tidy(function() {
const image = tf.browser.fromPixels(document.getElementById('eyes'));
const batchedImage = image.expandDims(0);
return batchedImage
.toFloat()
.div(tf.scalar(127))
.sub(tf.scalar(1));
});
},
getMetaInfos: function(mirror) {
// Get some meta info about the rectangle as a tensor:
// - middle x, y of the eye rectangle, relative to video size
// - size of eye rectangle, relative to video size
// - angle of rectangle (TODO)
let x = facetracker.currentEyeRect[0] + facetracker.currentEyeRect[2] / 2;
let y = facetracker.currentEyeRect[1] + facetracker.currentEyeRect[3] / 2;
x = (x / facetracker.videoWidthExternal) * 2 - 1;
y = (y / facetracker.videoHeightExternal) * 2 - 1;
const rectWidth =
facetracker.currentEyeRect[2] / facetracker.videoWidthExternal;
const rectHeight =
facetracker.currentEyeRect[3] / facetracker.videoHeightExternal;
if (mirror) {
x = 1 - x;
y = 1 - y;
}
return tf.tidy(function() {
return tf.tensor1d([x, y, rectWidth, rectHeight]).expandDims(0);
});
},
whichDataset: function() {
// Returns 'train' or 'val' depending on what makes sense / is random.
if (dataset.train.n == 0) {
return 'train';
}
if (dataset.val.n == 0) {
return 'val';
}
return Math.random() < 0.2 ? 'val' : 'train';
},
rgbToGrayscale(imageArray, n, x, y) {
// Given an rgb array and positions, returns a grayscale value.
// Inspired by http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029740
let r = (imageArray[n][x][y][0] + 1) / 2;
let g = (imageArray[n][x][y][1] + 1) / 2;
let b = (imageArray[n][x][y][2] + 1) / 2;
// Gamma correction:
const exponent = 1 / 2.2;
r = Math.pow(r, exponent);
g = Math.pow(g, exponent);
b = Math.pow(b, exponent);
// Gleam:
const gleam = (r + g + b) / 3;
return gleam * 2 - 1;
},
convertImage: async function(image) {
// Convert to grayscale and add spatial info
const imageShape = image.shape;
const imageArray = await image.array();
const w = imageShape[1];
const h = imageShape[2];
const data = [new Array(w)];
const promises = [];
for (let x = 0; x < w; x++) {
data[0][x] = new Array(h);
for (let y = 0; y < h; y++) {
const grayValue = dataset.rgbToGrayscale(imageArray, 0, x, y);
data[0][x][y] = [grayValue, (x / w) * 2 - 1, (y / h) * 2 - 1];
}
}
await Promise.all(promises);
return tf.tensor(data);
},
addToDataset: function(image, metaInfos, target, key) {
// Add the given x, y to either 'train' or 'val'.
const set = dataset[key];
if (set.x == null) {
set.x = [tf.keep(image), tf.keep(metaInfos)];
set.y = tf.keep(target);
} else {
const oldImage = set.x[0];
set.x[0] = tf.keep(oldImage.concat(image, 0));
const oldEyePos = set.x[1];
set.x[1] = tf.keep(oldEyePos.concat(metaInfos, 0));
const oldY = set.y;
set.y = tf.keep(oldY.concat(target, 0));
tf.dispose([oldImage, oldEyePos, oldY, target]);
}
set.n += 1;
},
addExample: async function(image, metaInfos, target, dontDispose) {
// Given an image, eye pos and target coordinates, adds them to our dataset.
target[0] = target[0] - 0.5;
target[1] = target[1] - 0.5;
target = tf.keep(
tf.tidy(function() {
return tf.tensor1d(target).expandDims(0);
}),
);
const key = dataset.whichDataset();
const convertedImage = await dataset.convertImage(image);
dataset.addToDataset(convertedImage, metaInfos, target, key);
ui.onAddExample(dataset.train.n, dataset.val.n);
if (!dontDispose) {
tf.dispose(image, metaInfos);
}
},
captureExample: function() {
// Take the latest image from the eyes canvas and add it to our dataset.
// Takes the coordinates of the mouse.
tf.tidy(function() {
const img = dataset.getImage();
const mousePos = mouse.getMousePos();
const metaInfos = tf.keep(dataset.getMetaInfos());
dataset.addExample(img, metaInfos, mousePos);
});
},
toJSON: function() {
const tensorToArray = function(t) {
const typedArray = t.dataSync();
return Array.prototype.slice.call(typedArray);
};
return {
inputWidth: dataset.inputWidth,
inputHeight: dataset.inputHeight,
train: {
shapes: {
x0: dataset.train.x[0].shape,
x1: dataset.train.x[1].shape,
y: dataset.train.y.shape,
},
n: dataset.train.n,
x: dataset.train.x && [
tensorToArray(dataset.train.x[0]),
tensorToArray(dataset.train.x[1]),
],
y: tensorToArray(dataset.train.y),
},
val: {
shapes: {
x0: dataset.val.x[0].shape,
x1: dataset.val.x[1].shape,
y: dataset.val.y.shape,
},
n: dataset.val.n,
x: dataset.val.x && [
tensorToArray(dataset.val.x[0]),
tensorToArray(dataset.val.x[1]),
],
y: tensorToArray(dataset.val.y),
},
};
},
fromJSON: function(data) {
dataset.inputWidth = data.inputWidth;
dataset.inputHeight = data.inputHeight;
dataset.train.n = data.train.n;
dataset.train.x = data.train.x && [
tf.tensor(data.train.x[0], data.train.shapes.x0),
tf.tensor(data.train.x[1], data.train.shapes.x1),
];
dataset.train.y = tf.tensor(data.train.y, data.train.shapes.y);
dataset.val.n = data.val.n;
dataset.val.x = data.val.x && [
tf.tensor(data.val.x[0], data.val.shapes.x0),
tf.tensor(data.val.x[1], data.val.shapes.x1),
];
dataset.val.y = tf.tensor(data.val.y, data.val.shapes.y);
ui.onAddExample(dataset.train.n, dataset.val.n);
},
};
globals.js is a small utility file that:
-
Video codec detection — Provides functions to check if the browser supports video playback, H.264, and WebM formats
-
getUserMedia polyfill — Normalizes the
navigator.getUserMediaAPI across different browsers (webkit, moz, ms prefixes) -
URL polyfill — Normalizes
window.URLacross browsers
It's essentially a compatibility layer ensuring the webcam access works across older browsers.
// video support utility functions
window.supports_video = function() {
return !!document.createElement('video').canPlayType;
};
window.supports_h264_baseline_video = function() {
if (!supports_video()) {
return false;
}
const v = document.createElement('video');
return v.canPlayType('video/mp4; codecs="avc1.42E01E, mp4a.40.2"');
};
window.supports_webm_video = function() {
if (!supports_video()) {
return false;
}
const v = document.createElement('video');
return v.canPlayType('video/webm; codecs="vp8"');
};
navigator.getUserMedia =
navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||
navigator.msGetUserMedia;
window.URL = window.URL || window.webkitURL || window.msURL || window.mozURL;