Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.

Note: 4

Intro

Note: 327

This is a series of blogs aimed at showcasing how to build using in-browser AI models and explain abit about the underlying technology

Nearly every person has a mobile phone, they have multiple sensors like cameras:

  • microphones
  • GPS
  • Accelerometers
  • Gyroscopes
  • Proximity sensors

all great for building a useful apps.

There are other benefits;

  • sending data to the cloud isn't always desirable. ** Privacy is important **
  • We want low latency
  • We also want our software to work when we’re not connected to the internet.

There are problems working with mobile phones (I'm pointing at Android and Apple).

  • The app stores are closed systems (its how they make money)
  • This introduces restrictions to working with the platforms compared to window/linux and macs (see restrictions section for more information).

This website below isn’t AI, but it shows the raw data available to you on your device (Accelerometer, Gyroscope, GPS ect) that you can feed into an AI through HTML5. Try it

Note: 130 ()

The underlying tech

  • GPU exeucution from within the webbrowser vs CPU

Links to useful resources

Link to the web GPU

Link to tensorflow js

Link to media facelandmark

Note: 4

The aim

Note: 180

This aim of this dialogue is how show HueVision and the underlying technologies it uses uses.

The application : HUE Vision is a web-based application that brings real-time eye tracking and gaze prediction directly into the browser, its powered by TensorFlow.js and MediaPipe FaceMesh. It showcases how on-device (mobile phones and web browsers) computer vision and machine learning can work seamlessly together for intuitive, privacy-friendly gaze interaction.

This solveit dialogue running application - fasthtml

Note: 331 ()

Why it matters: Mobile phones and browsers have sensors (camera, mic, GPS, etc.) that can feed into AI models. Running AI on-device gives you privacy, low latency, and offline capability.

Overview of the architecture:

  • Python/FastHTML — Serves the HTML shell and static files
  • JavaScript — Does all the heavy lifting:
File Role
globals.js Browser compatibility polyfills
facetracker.js MediaPipe FaceMesh detects face & extracts eye region
mouse.js Tracks cursor position (ground truth labels)
dataset.js Captures eye images + mouse positions as training samples
training.js Builds & trains a CNN model (Conv2D → Dense)
ui.js Manages UI state, calibration mode, user feedback
heat.js Draws heatmap of where you looked
main.js Glue code — wires buttons, keyboard shortcuts, tracking loop

Note: 4 ()

The code

Note: 96

The original version used py files. To be able to break it down and explain it, I converted it to use FastHTML use it within SolveIt (A jupyter like notebook experience on steroids).

View the original Hue Vision Github here

Note: 6

Imports and setup

Code: 34 ()

from fasthtml.common import *
from fasthtml.jupyter import *
from monsterui.all import *
import os

Code: 6 ()

from pathlib import Path

Code: 33 ()

app = FastHTML(static_path='.',)
rt = app.route
srv = JupyUvi(app)

Code: 48 ()

from starlette.staticfiles import StaticFiles
from starlette.responses import FileResponse

app.mount("/js", StaticFiles(directory="js"), name="js")

Code: 22 ()

app.mount("/js", StaticFiles(directory="js"), name="js")

Code: 24 ()

@rt('/style.css')
def get(): return FileResponse('style.css')

Note: 12 ()

The route (index page)

Note: 279

  1. Server setup — Creates a FastHTML app and tries to start it with JupyUvi (though it's failing because port 8000 is already in use)

  2. Static files — Mounts CSS and JS files so the browser can access them

  3. Main route (/) — Returns a full HTML page with:

    • A Training Panel for calibrating the eye tracker (collecting data points, training the model)
    • A Session Panel for running the tracker
    • A Heatmap Panel for visualizing where you looked
    • Video/canvas elements for the webcam feed
    • External scripts for TensorFlow.js, MediaPipe face mesh, and custom JS modules

The actual eye tracking logic lives in the JS files (facetracker.js, training.js, etc.) — the Python just serves the HTML shell.

Code: 2,611 ()

@rt('/')
def get(): 
    return Html(
        Head(
            Meta(charset='utf-8'),
            Meta(http_equiv='x-ua-compatible', content='ie=edge'),
            Title('HUE Vision'),
            Meta(name='description', content='A web-based application for eye tracking using facial recognition and machine learning.'),
            Meta(name='viewport', content='width=device-width, initial-scale=1, shrink-to-fit=no'),
            Link(rel='favicon', type='image/ico', href='favicon.ico'),
            Link(href='https://fonts.googleapis.com/css?family=Roboto|Source+Code+Pro', rel='stylesheet'),
            Link(rel='preconnect', href='https://fonts.gstatic.com'),
            Link(href='https://fonts.googleapis.com/css2?family=KoHo&display=swap', rel='stylesheet'),
            Link(rel='stylesheet', href='https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css'),
            Link(rel='stylesheet', href='https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css'),
            Link(rel='stylesheet', href='style.css')
        ),
        Body(
            Div(id='modal-overlay', cls='hidden'),
            Div(id='help-modal', cls='hidden'),
            Canvas(id='heatMap'),
            Div(
                H3('Hello!'),
                'Welcome to',
                Strong('HUE Vision'),
                '👀',
                Br(),
                'To continue, Please grant access to your webcam. 📷',
                id='info',
                data_content='info'
            ),
            Div(
                Div(
                    Div(
                        Div(
                            H3('Training Panel'),
                            Div(
                                Button('-', id='toggle-panel-button', title='Collapse Panel', cls='icon-button'),
                                Button('?', id='help-button', title='Show Help', cls='icon-button'),
                                cls='panel-header-buttons'
                            ),
                            cls='panel-header'
                        ),
                        Div(
                            Div(
                                Table(
                                    Tr(
                                        Td('Webcam'),
                                        Td('Disconnected', data_content='webcam-status')
                                    ),
                                    Tr(
                                        Td('Face Detected'),
                                        Td('No', data_content='face-detected')
                                    )
                                ),
                                cls='status-panel'
                            ),
                            Div(
                                Table(
                                    Tr(
                                        Td('Data Points Collected'),
                                        Td('0', data_content='n-train')
                                    ),
                                    Tr(
                                        Td('Test Points Collected'),
                                        Td('0', data_content='n-val')
                                    ),
                                    Tr(
                                        Td('Training Cycles Completed'),
                                        Td('0', data_content='n-epochs')
                                    ),
                                    Tr(
                                        Td('Model Accuracy'),
                                        Td('?', data_content='train-loss')
                                    ),
                                    Tr(
                                        Td('Test Accuracy'),
                                        Td('?', data_content='val-loss')
                                    )
                                ),
                                cls='training-data-panel'
                            ),
                            Div(
                                Div(
                                    Div(
                                        Button('Start Detection', id='start-detection'),
                                        cls='buttonrow'
                                    ),
                                    Div(
                                        Button('Calibration', id='start-calibration', disabled=''),
                                        Button('Start Training', id='start-training', disabled=''),
                                        cls='buttonrow'
                                    ),
                                    cls='button-group'
                                ),
                                Div(
                                    Div(
                                        Button('Reset Model', id='reset-model', disabled=''),
                                        Button('Customize Target', id='customize-target', disabled=''),
                                        cls='buttonrow'
                                    ),
                                    Div(
                                        Button('Save Dataset', id='store-data', disabled=''),
                                        Button('Load Dataset', id='load-data'),
                                        Input(type='file', id='data-uploader'),
                                        cls='buttonrow'
                                    ),
                                    Div(
                                        Button('Save Model', id='store-model', disabled=''),
                                        Button('Load Model', id='load-model'),
                                        Input(type='file', id='model-uploader', multiple=''),
                                        cls='buttonrow'
                                    ),
                                    cls='button-group'
                                ),
                                cls='buttonwrap'
                            ),
                            cls='panel-content'
                        ),
                        id='training',
                        cls='panel'
                    ),
                    Div(
                        Button('Start Session', id='start-session', disabled='', cls='emph'),
                        id='session-start-container'
                    ),
                    id='training-phase'
                ),
                Div(
                    Div(
                        Div(
                            H3('Session Panel'),
                            cls='panel-header'
                        ),
                        Div(
                            Div(
                                Button('Start Tracking', id='start-tracking'),
                                Button('Stop Tracking', id='stop-tracking', disabled=''),
                                cls='buttonrow'
                            ),
                            Div(
                                Button('Draw Heatmap', id='draw-heatmap', disabled='', cls='emph'),
                                cls='buttonrow'
                            ),
                            cls='buttonwrap'
                        ),
                        id='session',
                        cls='panel'
                    ),
                    id='session-phase',
                    cls='hidden'
                ),
                Div(
                    Div(
                        Div(
                            H3('Heatmap Panel'),
                            cls='panel-header'
                        ),
                        Div(
                            Div(
                                Button('New Session', id='new-session'),
                                Button('Retrain Model', id='retrain-model'),
                                cls='buttonrow'
                            ),
                            cls='buttonwrap'
                        ),
                        id='heatmap',
                        cls='panel'
                    ),
                    id='heatmap-phase',
                    cls='hidden'
                ),
                id='main-content'
            ),
            Video(id='webcam', width='400', height='300', autoplay=''),
            Canvas(id='overlay', width='400', height='300'),
            Footer(
                'Made with',
                I(cls='fas fa-heart'),
                'by',
                A('Suvrat Jain', href='https://simplysuvi.com/', target='_blank'),
                '.'
            ),
            Canvas(id='eyes', width='55', height='25'),
            Div(id='target'),
            Div(id='spinner', cls='hidden'),
            Div(
                Div(
                    H3('Target Settings'),
                    Button('×', id='close-settings', title='Close Settings', cls='icon-button'),
                    cls='panel-header'
                ),
                Div(
                    Div(
                        Label('Size:', fr='target-size'),
                        Input(type='range', id='target-size', min='20', max='80', value='40', cls='slider'),
                        cls='setting-group'
                    ),
                    Div(
                        Label('Color:', fr='target-color'),
                        Select(
                            Option('Default Orange', value='default'),
                            Option('Blue', value='blue'),
                            Option('Green', value='green'),
                            Option('Purple', value='purple'),
                            Option('Red', value='red'),
                            id='target-color'
                        ),
                        cls='setting-group'
                    ),
                    Div(
                        Label('Shape:', fr='target-shape'),
                        Select(
                            Option('Circle', value='circle'),
                            Option('Square', value='square'),
                            Option('Triangle', value='triangle'),
                            Option('Star', value='star'),
                            id='target-shape'
                        ),
                        cls='setting-group'
                    ),
                    cls='settings-content'
                ),
                id='settings-panel',
                cls='hidden'
            ),
            Script(src='https://code.jquery.com/jquery-3.3.1.min.js', integrity='sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=', crossorigin='anonymous'),
            Script(src='https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.15.0/dist/tf.min.js'),
            Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/face_mesh.js'),
            Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js'),
            Script(src='https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js'),
            Script(src='js/globals.js'),
            Script(src='js/ui.js'),
            Script(src='js/facetracker.js'),
            Script(src='js/mouse.js'),
            Script(src='js/dataset.js'),
            Script(src='js/training.js'),
            Script(src='js/heat.js'),
            Script(src='js/main.js')
        ),
        lang='',
        cls='no-js'
    )

Note: 10 ()

High level overview of JS files

Note: 0 ()

Note: 505

Here's how the JS files work together:

  1. globals.js — Video codec detection utilities and polyfills for getUserMedia

  2. ui.js — State machine for the UI. Manages phases (training → session → heatmap), shows info messages, handles calibration mode, and auto-collection of samples

  3. facetracker.js — Uses MediaPipe FaceMesh to detect faces and track iris positions. Extracts eye regions and draws face mesh overlay on webcam feed

  4. mouse.js — Tracks mouse position (normalized 0-1), used as ground truth labels for training

  5. dataset.js — Manages training/validation data. Captures eye images as tensors, converts to grayscale, and stores alongside mouse position targets

  6. training.js — Builds a TensorFlow.js CNN model:

    • Conv2D → MaxPool → Flatten → Dropout → Concatenate with eye position metadata → Dense output
    • Predicts (x, y) gaze coordinates
  7. heat.js — Draws colored heatmap visualization of where the model predicted you were looking

  8. main.js — Glue code: keyboard shortcuts, button handlers, tracking loop (calls getPrediction() every 100ms and moves the target dot)

The flow:

  1. User grants webcam → FaceMesh detects face/eyes
  2. User moves mouse while looking at cursor, pressing Space to capture samples
  3. Samples = eye image + mouse position (label)
  4. Train CNN model on samples
  5. Model predicts gaze → moves target dot to predicted location

Note: 15

Lets take a deeper dive into the javascript files

Code: 19 ()

js_files = list(Path('js').glob('*'))
js_files

Output: 195

[Path('js/heat.js'),
 Path('js/training.js'),
 Path('js/mouse.js'),
 Path('js/main.js'),
 Path('js/dataset.js'),
 Path('js/facetracker.js'),
 Path('js/globals.js'),
 Path('js/ui.js')]

Code: 81 ()

# commented out to prevent running again 
# for item in js_files:
#     with open(item, 'r') as f:
#         add_msg(f"### {item}", msg_type="note")
#         add_msg(f.read(), msg_type="raw")

Note: 7

js/heat.js

Note: 306

heat.js creates a visual heatmap showing where you looked during a session.

Key functions:

  • getHeatColor(value, alpha) — Converts a value (0-1) to a color using HSL. Lower values = green, higher = red. Classic "heat" gradient.

  • fillHeatmap(data, ctx, width, height, radius) — Loops through all recorded gaze points and draws semi-transparent circles at each (x, y) location on the canvas.

  • drawHeatmap(dataset) — Sets up the canvas to match window size, then calls fillHeatmap with the session data. Shows "In Progress..." while drawing.

  • clearHeatmap() — Wipes the canvas clean.

How it works: After tracking, the dataset.session object contains arrays of normalized x/y coordinates. The heatmap draws overlapping translucent circles at those spots — areas with more gaze points appear more saturated/opaque.

Raw: 655

window.heatmap = {
  getHeatColor: function(value, alpha) {
    // Adapted from https://stackoverflow.com/a/17268489/1257278
    if (typeof alpha == 'undefined') {
      alpha = 1.0;
    }
    const hue = ((1 - value) * 120).toString(10);
    return 'hsla(' + hue + ',100%,50%,' + alpha + ')';
  },

  fillHeatmap: function(data, ctx, width, height, radius) {
    // Go through a dataset and fill the context with the corresponding circles.
    let pointX, pointY;

    for (let i = 0; i < data.n; i++) {
      pointX = Math.floor(data.x[i] * width);
      pointY = Math.floor(data.y[i] * height);

      ctx.beginPath();
      ctx.fillStyle = this.getHeatColor(0.5, 0.5);
      ctx.arc(pointX, pointY, radius, 0, 2 * Math.PI);
      ctx.fill();
    }
  },

  drawHeatmap: function(dataset) {
    this.clearHeatmap();
    $('#draw-heatmap').prop('disabled', true);
    $('#draw-heatmap').html('In Progress...');

    const heatmap = $('#heatMap')[0];
    const ctx = heatmap.getContext('2d');

    const width = $('body').width();
    const height = $('body').height();

    heatmap.width = width;
    heatmap.height = height;

    this.fillHeatmap(dataset.session, ctx, width, height, 15);

    $('#clear-heatmap').prop('disabled', false);
    $('#draw-heatmap').prop('disabled', false);
    $('#draw-heatmap').html('Draw Heatmap');
  },

  clearHeatmap: function() {
    const heatmap = $('#heatMap')[0];
    const ctx = heatmap.getContext('2d');
    ctx.clearRect(0, 0, heatmap.width, heatmap.height);
  },
};

Note: 6

js/main.js

Note: 382

main.js is the glue that wires everything together. Here's what it does:

  1. Tracking object — Manages the active tracking state. When started, it clears session data and enables/disables buttons appropriately.

  2. moveTarget() loop — Runs every 100ms. If tracking is active and a model exists, it calls getPrediction() to get the predicted gaze (x, y), stores it in dataset.session, and moves the target dot to that position on screen.

  3. Keyboard shortcuts — Space captures a sample, A toggles auto-collection, C starts calibration, T trains, H toggles heatmap, R resets model, ? shows help.

  4. Button handlers — Connects all the UI buttons to their respective functions (start detection, calibration, training, tracking, save/load data and models, target customization).

  5. File I/O — Handles saving datasets as JSON and models via TensorFlow.js's save()/loadLayersModel().

In short: main.js is the event dispatcher — it doesn't do the heavy lifting itself, but orchestrates when facetracker, dataset, training, heatmap, and ui get called.

Raw: 2,796

const tracking = {
  active: false,
  interval: null,

  start: function() {
    dataset.clearSession();
    this.active = true;
    $('#start-tracking').prop('disabled', true);
    $('#stop-tracking').prop('disabled', false);
    $('#draw-heatmap').prop('disabled', true);
  },

  stop: function() {
    this.active = false;
    $('#start-tracking').prop('disabled', false);
    $('#stop-tracking').prop('disabled', true);
    $('#draw-heatmap').prop('disabled', false);
    ui.showInfo(
      '<h3>Tracking Stopped</h3>' +
      'You can now <strong>Draw Heatmap</strong> to visualize the session, or <strong>Start Tracking</strong> again.',
      true
    );
  },
};

$(document).ready(function() {
  const $target = $('#target');
  const targetSize = $target.outerWidth();

  function moveTarget() {
    // Move the model target to where we predict the user is looking to
    if (training.currentModel == null || training.inTraining || !tracking.active) {
      return;
    }

    training.getPrediction().then(prediction => {
      dataset.session.n += 1;
      dataset.session.x.push(prediction[0]);
      dataset.session.y.push(prediction[1]);

      const left = prediction[0] * ($('body').width() - targetSize);
      const top = prediction[1] * ($('body').height() - targetSize);

      $target.css('left', left + 'px');
      $target.css('top', top + 'px');
    });
  }

  setInterval(moveTarget, 100);

  function download(content, fileName, contentType) {
    const a = document.createElement('a');
    const file = new Blob([content], {
      type: contentType,
    });
    a.href = URL.createObjectURL(file);
    a.download = fileName;
    a.click();
  }

  // Map functions to keys and buttons:

  $('body').keyup(function(e) {
    // Escape key - Close help modal
    if (e.keyCode === 27) {
      ui.hideHelp();
      e.preventDefault();
      return false;
    }

    // Space key - Capture example
    if (e.keyCode === 32 && ui.readyToCollect) {
      dataset.captureExample();
      e.preventDefault();
      return false;
    }
    
    // A key - Toggle auto-collection
    if (e.keyCode === 65 && ui.readyToCollect) { // 'A' key
      ui.toggleAutoCollect();
      e.preventDefault();
      return false;
    }
    
    // C key - Start calibration mode
    if (e.keyCode === 67 && ui.readyToCollect) { // 'C' key
      ui.startCalibration();
      e.preventDefault();
      return false;
    }
    
    // T key - Start training
    if (e.keyCode === 84 && !$('#start-training').prop('disabled')) { // 'T' key
      training.fitModel();
      e.preventDefault();
      return false;
    }
    
    // H key - Toggle heatmap
    if (e.keyCode === 72 && !$('#draw-heatmap').prop('disabled')) { // 'H' key
      if ($('#heatMap').css('opacity') === '0') {
        heatmap.drawHeatmap(dataset, training.currentModel);
      } else {
        heatmap.clearHeatmap();
      }
      e.preventDefault();
      return false;
    }
    
    // R key - Reset model
    if (e.keyCode === 82 && !$('#reset-model').prop('disabled')) { // 'R' key
      training.resetModel();
      e.preventDefault();
      return false;
    }
    
    // ? key - Show help
    if (e.keyCode === 191 && e.shiftKey) { // '?' key (Shift + /)
      ui.displayHelp();
      e.preventDefault();
      return false;
    }
  });

  $('#start-detection').click(function(e) {
    facetracker.startDetection();
    $(this).prop('disabled', true);
  });

  $('#start-calibration').click(function(e) {
    ui.startCalibration();
  });

  $('#start-training').click(function(e) {
    training.fitModel();
  });

  $('#start-tracking').click(function(e) {
    tracking.start();
  });

  $('#stop-tracking').click(function(e) {
    tracking.stop();
  });

  $('#reset-model').click(function(e) {
    training.resetModel();
  });

  $('#draw-heatmap').click(function(e) {
    ui.showPhase('heatmap');
    heatmap.drawHeatmap(dataset);
  });

  $('#clear-heatmap').click(function(e) {
    heatmap.clearHeatmap();
  });

  $('#store-data').click(function(e) {
    const data = dataset.toJSON();
    const json = JSON.stringify(data);
    download(json, 'dataset.json', 'text/plain');
  });

  $('#load-data').click(function(e) {
    $('#data-uploader').trigger('click');
  });

  $('#data-uploader').change(function(e) {
    const file = e.target.files[0];
    const reader = new FileReader();

    reader.onload = function() {
      const data = reader.result;
      const json = JSON.parse(data);
      dataset.fromJSON(json);
    };

    reader.readAsBinaryString(file);
  });

  $('#store-model').click(async function(e) {
    await training.currentModel.save('downloads://model');
  });

  $('#load-model').click(function(e) {
    $('#model-uploader').trigger('click');
  });

  $('#model-uploader').change(async function(e) {
    const files = e.target.files;
    training.currentModel = await tf.loadLayersModel(
      tf.io.browserFiles([files[0], files[1]]),
    );
    ui.onFinishTraining();
  });
  
  // Help button event handler
  $('#help-button').click(function() {
    ui.displayHelp();
  });

  // Toggle panel button event handler
  $('#toggle-panel-button').click(function() {
    const $panelContent = $('.panel-content');
    $panelContent.toggleClass('collapsed');
    if ($panelContent.hasClass('collapsed')) {
      $(this).text('+');
    } else {
      $(this).text('-');
    }
  });
  
  // Target customization
  $('#customize-target').click(function() {
    $('#settings-panel').removeClass('hidden');
  });
  
  $('#close-settings').click(function() {
    $('#settings-panel').addClass('hidden');
  });
  
  // Target size slider
  $('#target-size').on('input', function() {
    const size = $(this).val();
    $('#target').css({
      width: size + 'px',
      height: size + 'px'
    });
  });
  
  // Target color selector
  $('#target-color').change(function() {
    const color = $(this).val();
    
    // Remove all color classes
    $('#target').removeClass('color-default color-blue color-green color-purple color-red');
    
    // Add selected color class
    if (color !== 'default') {
      $('#target').addClass('color-' + color);
    } else {
      // Default gradient is already in the base CSS
      $('#target').css('background', 'linear-gradient(135deg, #f9a66c, #f27121)');
    }
  });
  
  // Target shape selector
  $('#target-shape').change(function() {
    const shape = $(this).val();
    
    // Remove all shape classes
    $('#target').removeClass('target-circle target-square target-triangle target-star');
    
    // Reset any custom styles that might have been applied
    $('#target').css({
      'clip-path': '',
      'border-radius': '',
      'width': $('#target-size').val() + 'px',
      'height': $('#target-size').val() + 'px',
      'border-left': '',
      'border-right': '',
      'border-bottom': ''
    });
    
    // Add selected shape class
    if (shape !== 'circle') {
      $('#target').addClass('target-' + shape);
      
      // Special handling for triangle
      if (shape === 'triangle') {
        const size = $('#target-size').val();
        const halfSize = size / 2;
        
        $('#target').css({
          'border-left': halfSize + 'px solid transparent',
          'border-right': halfSize + 'px solid transparent',
          'border-bottom': size + 'px solid',
          'border-bottom-color': $('#target').css('background-color')
        });
      }
    } else {
      // Circle is default
      $('#target').css('border-radius', '50%');
    }
  });
});

Note: 6

js/ui.js

Note: 348

ui.js manages the application's user interface state and user feedback.

Key responsibilities:

  1. State tracking — Tracks current phase (training → session → heatmap), whether face is detected, and if we're ready to collect samples

  2. Info messagesshowInfo() displays status messages to guide the user, with optional sound/flash effects

  3. Event callbacks — Functions like onWebcamEnabled(), onFoundFace(), onAddExample(), and onFinishTraining() update the UI when things happen

  4. Auto-collectiontoggleAutoCollect() captures samples every 1.5 seconds automatically

  5. Calibration modestartCalibration() shows 9 points around the screen and collects samples at each position automatically

  6. Phase switchingshowPhase() hides/shows the appropriate panel (training, session, or heatmap)

In short: It's the "presenter" layer — it doesn't do ML or face detection itself, but responds to those systems and keeps the user informed of what's happening and what to do next.

Raw: 5,265

window.ui = {
  state: 'loading',
  phase: 'training', // training, session, heatmap
  readyToCollect: false,
  nExamples: 0,
  nTrainings: 0,
  autoCollectMode: false,
  autoCollectInterval: null,
  calibrationMode: false,
  calibrationPoints: [],
  currentCalibrationPoint: 0,
  calibrationInterval: null,

  setContent: function(key, value) {
    // Set an element's content based on the data-content key.
    $('[data-content="' + key + '"]').html(value);
  },

  showInfo: function(text, dontFlash) {
    // Show info and beep / flash.
    this.setContent('info', text);
    if (!dontFlash) {
      $('#info').addClass('flash');
      new Audio('hint.mp3').play();
      setTimeout(function() {
        $('#info').removeClass('flash');
      }, 1000);
    }
  },

  onWebcamEnabled: function() {
    this.state = 'waiting for detection';
    this.setContent('webcam-status', 'Connected');
    $('[data-content="webcam-status"]').removeClass('disconnected').addClass('connected');
    this.showInfo("Webcam connected. Press <strong>Start Detection</strong> to begin.", true);
  },

  onFoundFace: function() {
    $('#spinner').addClass('hidden');
    $('#eyes').show();
    $('#overlay').show();
    this.setContent('face-detected', 'Yes');
    $('[data-content="face-detected"]').removeClass('not-detected').addClass('detected');
    this.readyToCollect = true;
    $('#start-calibration').prop('disabled', false);
    if (dataset.train.n >= 2) {
      $('#start-training').prop('disabled', false);
    }
    if (this.state == 'waiting for detection') {
      this.state = 'collecting';
      this.showInfo(
        "<h3>Let's start!</h3>" +
          'Collect data points by moving your mouse and following the cursor with your eyes and hitting the space key repeatedly.<br><br>' +
          'You can also toggle automatic collection mode by pressing "A" on your keyboard.',
        true,
      );
    }
  },

  onFaceNotFound: function() {
    this.setContent('face-detected', 'No');
    $('[data-content="face-detected"]').removeClass('detected').addClass('not-detected');
    this.readyToCollect = false;
    $('#start-calibration').prop('disabled', true);
    $('#start-training').prop('disabled', true);
  },
  
  toggleAutoCollect: function() {
    this.autoCollectMode = !this.autoCollectMode;
    
    if (this.autoCollectMode) {
      // Start auto-collection
      this.showInfo(
        '<h3>Auto-collection enabled</h3>' +
        'Move your cursor around and follow it with your eyes. Samples will be collected automatically every 1.5 seconds.<br><br>' +
        'Press "A" again to disable auto-collection.',
        true
      );
      
      this.autoCollectInterval = setInterval(function() {
        if (ui.readyToCollect && facetracker.currentPosition) {
          dataset.captureExample();
        }
      }, 1500);
    } else {
      // Stop auto-collection
      clearInterval(this.autoCollectInterval);
      this.showInfo(
        '<h3>Auto-collection disabled</h3>' +
        'Switched back to manual collection. Press space to collect samples.',
        true
      );
    }
  },

  onAddExample: function(nTrain, nVal) {
    // Call this when an example is added.
    this.nExamples = nTrain + nVal;
    this.setContent('n-train', nTrain);
    this.setContent('n-val', nVal);
    if (nTrain >= 2) {
      $('#start-training').prop('disabled', false);
    }
    if (this.state == 'collecting' && this.nExamples == 5) {
      this.showInfo(
        '<h3>Keep going!</h3>' +
          'You need to collect at least 20 data points to start seeing results.',
      );
    }
    if (this.state == 'collecting' && this.nExamples == 25) {
      this.showInfo(
        '<h3>Great job! 👌</h3>' +
          "Now that you have a handful of samples, let's train the machine learning model!<br><br> " +
          'Hit the <strong>Start Training</strong> button.',
      );
    }
    if (this.state == 'trained' && this.nExamples == 50) {
      this.showInfo(
        '<h3>Fantastic! 👏</h3>' +
          "You've collected lots of data points. Let's try training our model again!",
      );
    }
    if (nTrain > 0 && nVal > 0) {
      $('#store-data').prop('disabled', false);
    }
  },

  onFinishTraining: function() {
    // Call this when training is finished.
    this.nTrainings += 1;
    $('#target').css('opacity', '0.9');
    $('#session-start-container').show();
    $('#start-session').prop('disabled', false);
    $('#customize-target').prop('disabled', false);
    $('#reset-model').prop('disabled', false);
    $('#store-model').prop('disabled', false);
    $('#training-progress').hide();

    if (this.nTrainings == 1) {
      this.state = 'trained';
      this.showInfo(
        '<h3>Awesome!</h3>' +
          'The model has been trained. Click the <strong>Start Session</strong> button at the bottom of the screen to begin eye tracking.<br><br>' +
          "You can continue to collect more data and retrain the model to improve accuracy.",
      );
    } else if (this.nTrainings == 2) {
      this.state = 'trained_twice';
      this.showInfo(
        '<h3>Getting better! 🚀</h3>' +
          'Keep collecting and retraining!<br>' +
          'You can also draw a heatmap that shows you where your ' +
          'model has its strong and weak points.',
      );
    } else if (this.nTrainings == 3) {
      this.state = 'trained_thrice';
      this.showInfo(
        'If your model is overfitting, remember you can reset it anytime.',
      );
    } else if (this.nTrainings == 4) {
      this.state = 'trained_thrice';
      this.showInfo(
        '<h3>Have fun!</h3>' +
          'Check this space for more! 😄',
      );
    }
  },

  showPhase: function(phase) {
    this.phase = phase;
    $('#training-phase').addClass('hidden');
    $('#session-phase').addClass('hidden');
    $('#heatmap-phase').addClass('hidden');
    $('#' + phase + '-phase').removeClass('hidden');
  },

  initSessionControls: function() {
    $('#start-session').click(() => {
      this.showPhase('session');
      this.showInfo(
        '<h3>Session Started</h3>' +
        'Click <strong>Start Tracking</strong> to see the model in action!',
        true
      );
    });

    $('#new-session').click(() => {
      this.showPhase('session');
      heatmap.clearHeatmap();
    });

    $('#retrain-model').click(() => {
      this.showPhase('training');
      heatmap.clearHeatmap();
    });
  },
  
  showTrainingProgress: function(epoch, totalEpochs, loss, valLoss) {
    if (!$('#training-progress').length) {
      $('body').append('<div id="training-progress"></div>');
      $('#training-progress').css({
        position: 'fixed',
        bottom: '60px',
        left: '50%',
        transform: 'translateX(-50%)',
        background: 'white',
        padding: '15px',
        borderRadius: '10px',
        boxShadow: '0 4px 24px rgba(0, 0, 0, 0.1)',
        zIndex: 1000,
        textAlign: 'center',
        width: '300px'
      });
    }
    
    const percent = Math.round((epoch / totalEpochs) * 100);
    $('#training-progress').html(`
      <div>Training Progress: ${epoch}/${totalEpochs} epochs (${percent}%)</div>
      <div style="background: #f0f0f0; height: 10px; border-radius: 5px; margin: 10px 0;">
        <div style="background: linear-gradient(135deg, #f9a66c, #f27121); width: ${percent}%; height: 100%; border-radius: 5px;"></div>
      </div>
      <div>Loss: ${loss.toFixed(5)} | Validation Loss: ${valLoss.toFixed(5)}</div>
    `);
    
    $('#training-progress').show();
  },
  
  displayHelp: function() {
    const helpContent =
      '<button id="close-help" class="icon-button">×</button>' +
      '<h3>Keyboard Shortcuts</h3>' +
      '<ul style="list-style-type: none; padding-left: 0;">' +
      '<li><strong>Space</strong> - Capture training sample</li>' +
      '<li><strong>A</strong> - Toggle automatic data collection</li>' +
      '<li><strong>C</strong> - Start calibration mode</li>' +
      '<li><strong>T</strong> - Start training (when enabled)</li>' +
      '<li><strong>H</strong> - Toggle heatmap (when enabled)</li>' +
      '<li><strong>R</strong> - Reset model (when enabled)</li>' +
      '<li><strong>?</strong> - Show this help</li>' +
      '</ul>' +
      '<h3>Features</h3>' +
      '<ul style="list-style-type: none; padding-left: 0;">' +
      '<li><strong>Calibration</strong> - Guided data collection at specific points</li>' +
      '<li><strong>Auto-collection</strong> - Automatically collect samples while you look around</li>' +
      '<li><strong>Target Customization</strong> - Change the size, color, and shape of the target</li>' +
      '<li><strong>Heatmap</strong> - Visualize model accuracy across the screen</li>' +
      '</ul>';

    $('#help-modal').html(helpContent);
    $('#modal-overlay, #help-modal').removeClass('hidden');

    $('#close-help').click(function() {
      ui.hideHelp();
    });
  },

  hideHelp: function() {
    $('#modal-overlay, #help-modal').addClass('hidden');
  },
  
  startCalibration: function() {
    if (!this.readyToCollect || this.calibrationMode) {
      return;
    }
    
    // Stop auto collection if it's running
    if (this.autoCollectMode) {
      this.toggleAutoCollect();
    }
    
    this.calibrationMode = true;
    
    // Define calibration points (9-point calibration)
    const width = $('body').width();
    const height = $('body').height();
    const padding = 100; // Padding from edges
    
    this.calibrationPoints = [
      { x: padding, y: padding }, // Top-left
      { x: width / 2, y: padding }, // Top-center
      { x: width - padding, y: padding }, // Top-right
      { x: padding, y: height / 2 }, // Middle-left
      { x: width / 2, y: height / 2 }, // Center
      { x: width - padding, y: height / 2 }, // Middle-right
      { x: padding, y: height - padding }, // Bottom-left
      { x: width / 2, y: height - padding }, // Bottom-center
      { x: width - padding, y: height - padding } // Bottom-right
    ];
    
    this.currentCalibrationPoint = 0;
    
    // Create calibration target if it doesn't exist
    if (!$('#calibration-target').length) {
      $('body').append('<div id="calibration-target"></div>');
      $('#calibration-target').css({
        position: 'absolute',
        width: '20px',
        height: '20px',
        borderRadius: '50%',
        background: 'radial-gradient(circle, rgba(249,166,108,1) 0%, rgba(242,113,33,1) 70%)',
        border: '2px solid white',
        boxShadow: '0 0 10px rgba(0,0,0,0.2)',
        transform: 'translate(-50%, -50%)',
        zIndex: 1000,
        display: 'none'
      });
    }
    
    // Create calibration instructions
    if (!$('#calibration-instructions').length) {
      $('body').append('<div id="calibration-instructions"></div>');
      $('#calibration-instructions').css({
        position: 'fixed',
        bottom: '120px',
        left: '50%',
        transform: 'translateX(-50%)',
        background: 'white',
        padding: '15px',
        borderRadius: '10px',
        boxShadow: '0 4px 24px rgba(0,0,0,0.1)',
        zIndex: 1000,
        textAlign: 'center',
        width: '400px',
        fontSize: '16px'
      });
    }
    
    this.showInfo(
      '<h3>Calibration Mode</h3>' +
      'Follow the orange dot with your eyes as it moves around the screen.<br><br>' +
      'The system will automatically collect samples at each position.',
      true
    );
    
    // Show first calibration point
    this.showCalibrationPoint();
    
    // Start calibration sequence
    this.calibrationInterval = setInterval(() => {
      // Collect sample at current point
      if (facetracker.currentPosition) {
        // Manually set mouse position to current calibration point
        const point = this.calibrationPoints[this.currentCalibrationPoint];
        mouse.mousePosX = point.x / $('body').width();
        mouse.mousePosY = point.y / $('body').height();
        
        // Capture example
        dataset.captureExample();
        
        // Move to next point
        this.currentCalibrationPoint++;
        
        // Update progress
        $('#calibration-instructions').html(
          `Calibration progress: ${this.currentCalibrationPoint} of ${this.calibrationPoints.length} points`
        );
        
        // Check if calibration is complete
        if (this.currentCalibrationPoint >= this.calibrationPoints.length) {
          this.stopCalibration();
          return;
        }
        
        // Show next point
        this.showCalibrationPoint();
      }
    }, 2000); // 2 seconds per point
  },
  
  showCalibrationPoint: function() {
    const point = this.calibrationPoints[this.currentCalibrationPoint];
    $('#calibration-target').css({
      left: point.x + 'px',
      top: point.y + 'px',
      display: 'block'
    });
    
    // Animate the target to draw attention
    $('#calibration-target').animate({
      width: '30px',
      height: '30px'
    }, 500, function() {
      $(this).animate({
        width: '20px',
        height: '20px'
      }, 500);
    });
  },
  
  stopCalibration: function() {
    clearInterval(this.calibrationInterval);
    this.calibrationMode = false;
    
    // Hide calibration elements
    $('#calibration-target').hide();
    $('#calibration-instructions').hide();
    
    this.showInfo(
      '<h3>Calibration Complete!</h3>' +
      `Collected ${this.calibrationPoints.length} calibration points.<br><br>` +
      'Now you can train the model by clicking the "Start Training" button.',
      true
    );
    
    // Enable training if we have enough samples
    if (dataset.train.n >= 2) {
      $('#start-training').prop('disabled', false);
    }
  }
};

$(document).ready(function() {
  ui.initSessionControls();
});

Note: 7

js/mouse.js

Note: 148

mouse.js tracks the cursor position as ground truth labels for training.

  • Stores normalized (0-1) coordinates in mousePosX and mousePosY
  • Updates on every mouse move event by dividing pixel position by window dimensions
  • getMousePos() returns [x, y] for use by dataset.js when capturing training samples

It's the simplest file — just records where the mouse is so the model learns to map eye images → screen position.

Raw: 163

$(document).ready(function() {
  window.mouse = {
    mousePosX: 0.5,
    mousePosY: 0.5,

    handleMouseMove: function(event) {
      mouse.mousePosX = event.clientX / $('body').width();
      mouse.mousePosY = event.clientY / $('body').height();
    },

    getMousePos: function() {
      return [mouse.mousePosX, mouse.mousePosY];
    },
  };

  document.onmousemove = mouse.handleMouseMove;
});

Note: 10

js/facetracker.js

Note: 328

facetracker.js handles face detection and eye tracking using MediaPipe FaceMesh.

Key parts:

  1. Webcam setup — Requests camera access, adjusts video proportions, handles success/failure

  2. FaceMesh initialization — Loads MediaPipe's FaceMesh model with iris tracking enabled (refineLandmarks: true)

  3. onResults() callback — Called each frame with detected face landmarks:

    • Draws mesh overlay (tesselation, face oval, lips, eyes)
    • Calculates iris centers from landmarks 468-472 (left) and 473-477 (right)
    • Computes a bounding box around both eyes
    • Crops the eye region from the video and draws it to the #eyes canvas
  4. Stores statecurrentPosition (all 478 landmarks) and currentEyeRect (eye crop coordinates) are used by dataset.js for training samples

The eye crop in the #eyes canvas becomes the input image for the CNN model.

Raw: 2,310

// ==============================================
// facetracker.js (MediaPipe FaceMesh version)
// ==============================================

$(document).ready(function () {
  const video = document.getElementById("webcam");
  const overlay = document.getElementById("overlay");
  const eyesCanvas = document.getElementById("eyes");
  const eyesCtx = eyesCanvas.getContext("2d");

  window.facetracker = {
    video,
    overlay,
    overlayCC: overlay.getContext("2d"),
    videoWidthExternal: video.width,
    videoHeightExternal: video.height,
    videoWidthInternal: video.videoWidth,
    videoHeightInternal: video.videoHeight,

    trackingStarted: false,
    currentPosition: null,
    currentEyeRect: null,

    adjustVideoProportions: function () {
      facetracker.videoWidthInternal = video.videoWidth;
      facetracker.videoHeightInternal = video.videoHeight;
      const proportion =
        facetracker.videoWidthInternal / facetracker.videoHeightInternal;
      facetracker.videoWidthExternal = Math.round(
        facetracker.videoHeightExternal * proportion
      );
      facetracker.video.width = facetracker.videoWidthExternal;
      facetracker.overlay.width = facetracker.videoWidthExternal;
    },

    gumSuccess: function (stream) {
      ui.onWebcamEnabled();

      if ("srcObject" in facetracker.video) {
        facetracker.video.srcObject = stream;
      } else {
        facetracker.video.src = window.URL.createObjectURL(stream);
      }

      facetracker.video.onloadedmetadata = function () {
        facetracker.adjustVideoProportions();
        facetracker.video.play();
      };

      facetracker.video.onresize = function () {
        facetracker.adjustVideoProportions();
      };
    },

    startDetection: function() {
      $('#spinner').removeClass('hidden');
      initFaceMesh();
    },

    gumFail: function () {
      ui.setContent('webcam-status', 'Disconnected');
      $('[data-content="webcam-status"]').removeClass('connected').addClass('disconnected');
      ui.showInfo(
        "There was some problem trying to fetch video from your webcam 😭",
        true
      );
    },
  };

  // =====================================================
  // NEW: MediaPipe FaceMesh integration
  // =====================================================

  let faceMesh;
  let camera;

  async function initFaceMesh() {
    faceMesh = new FaceMesh({
      locateFile: (file) =>
        `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`,
    });

    faceMesh.setOptions({
      maxNumFaces: 1,
      refineLandmarks: true, // enables iris landmarks
      minDetectionConfidence: 0.5,
      minTrackingConfidence: 0.5,
    });

    faceMesh.onResults(onResults);

    camera = new Camera(video, {
      onFrame: async () => {
        await faceMesh.send({ image: video });
      },
      width: 640,
      height: 480,
    });

    camera.start();
    facetracker.trackingStarted = true;
  }

  function onResults(results) {
    const ctx = facetracker.overlayCC;
    ctx.clearRect(
      0,
      0,
      facetracker.videoWidthExternal,
      facetracker.videoHeightExternal
    );

    if (!results.multiFaceLandmarks || results.multiFaceLandmarks.length === 0) {
      ui.onFaceNotFound();
      facetracker.currentPosition = null;
      return;
    }

    const landmarks = results.multiFaceLandmarks[0];
    facetracker.currentPosition = landmarks;

    // elegant, balanced mesh lines — visible but not harsh
    drawConnectors(ctx, landmarks, FACEMESH_TESSELATION,
      { color: 'rgba(255, 255, 255, 0.4)', lineWidth: 0.4 });

    // enhance facial structure subtly
    drawConnectors(ctx, landmarks, FACEMESH_FACE_OVAL,
      { color: 'rgba(255, 255, 255, 0.5)', lineWidth: 0.5 });
    drawConnectors(ctx, landmarks, FACEMESH_LIPS,
      { color: 'rgba(255, 255, 255, 0.25)', lineWidth: 0.4 });

    // make eyes pop clearly without oversaturation
    drawConnectors(ctx, landmarks, FACEMESH_LEFT_EYE,
      { color: 'rgba(0, 255, 100, 0.6)', lineWidth: 0.9 });
    drawConnectors(ctx, landmarks, FACEMESH_RIGHT_EYE,
      { color: 'rgba(255, 80, 80, 0.6)', lineWidth: 0.9 });



    // Get iris centers (more stable for gaze)
    const LEFT_IRIS = [468, 469, 470, 471, 472];
    const RIGHT_IRIS = [473, 474, 475, 476, 477];

    function irisCenter(indices) {
      let x = 0,
        y = 0;
      for (const i of indices) {
        x += landmarks[i].x;
        y += landmarks[i].y;
      }
      return { x: x / indices.length, y: y / indices.length };
    }

    const left = irisCenter(LEFT_IRIS);
    const right = irisCenter(RIGHT_IRIS);

    const eyeCenterX = (left.x + right.x) / 2;
    const eyeCenterY = (left.y + right.y) / 2;
    const eyeWidth = Math.abs(right.x - left.x) * video.videoWidth * 1.5;
    const eyeHeight = eyeWidth * 0.6;


    const cropX = eyeCenterX * video.videoWidth - eyeWidth / 2;
    const cropY = eyeCenterY * video.videoHeight - eyeHeight / 2;

    facetracker.currentEyeRect = [cropX, cropY, eyeWidth, eyeHeight];

    // Draw red bounding box on overlay
    // ctx.strokeStyle = "red";
    // ctx.lineWidth = 2;
    // ctx.strokeRect(cropX, cropY, eyeWidth, eyeHeight);

    // Draw eye crop into #eyes canvas
    eyesCtx.drawImage(
      video,
      cropX,
      cropY,
      eyeWidth,
      eyeHeight,
      0,
      0,
      eyesCanvas.width,
      eyesCanvas.height
    );

    ui.onFoundFace();
  }

  // =====================================================
  // Video setup (same as before)
  // =====================================================

  if (navigator.mediaDevices) {
    navigator.mediaDevices
      .getUserMedia({ video: true })
      .then(facetracker.gumSuccess)
      .catch(facetracker.gumFail);
  } else if (navigator.getUserMedia) {
    navigator.getUserMedia(
      { video: true },
      (stream) => {
        facetracker.gumSuccess(stream);
      },
      facetracker.gumFail
    );
  } else {
    ui.showInfo(
      "Your browser does not seem to support getUserMedia. 😭 This will probably only work in Chrome or Firefox.",
      true
    );
  }
});

Note: 7

js/training.js

Note: 333

training.js builds and trains the CNN model that predicts gaze position.

Key parts:

  1. createModel() — Builds a TensorFlow.js model:

    • Takes two inputs: eye image (25×55×3) + metadata (4 values: iris positions)
    • Conv2D (5×5, 20 filters) → MaxPool → Flatten → Dropout → Concatenate with metadata → Dense output (2 units for x, y)
    • Uses tanh activation (outputs -1 to 1)
  2. fitModel() — Trains for 20 epochs with Adam optimizer and MSE loss. Saves the best model (by validation loss) to localStorage and restores it at the end.

  3. getPrediction() — Captures current eye image, runs it through the model, returns predicted (x, y) screen coordinates (shifted from [-0.5, 0.5] to [0, 1])

  4. resetModel() — Clears the model to start fresh

Raw: 2,268

window.training = {
  currentModel: null,
  inTraining: false,
  epochsTrained: 0,

  createModel: function() {
    const inputImage = tf.input({
      name: 'image',
      shape: [dataset.inputHeight, dataset.inputWidth, 3],
    });
    const inputMeta = tf.input({
      name: 'metaInfos',
      shape: [4],
    });

    const conv = tf.layers
      .conv2d({
        kernelSize: 5,
        filters: 20,
        strides: 1,
        activation: 'relu',
        kernelInitializer: 'varianceScaling',
      })
      .apply(inputImage);

    const maxpool = tf.layers
      .maxPooling2d({
        poolSize: [2, 2],
        strides: [2, 2],
      })
      .apply(conv);

    const flat = tf.layers.flatten().apply(maxpool);

    const dropout = tf.layers.dropout(0.2).apply(flat);

    const concat = tf.layers.concatenate().apply([dropout, inputMeta]);

    const output = tf.layers
      .dense({
        units: 2,
        activation: 'tanh',
        kernelInitializer: 'varianceScaling',
      })
      .apply(concat);

    const model = tf.model({
      inputs: [inputImage, inputMeta],
      outputs: output,
    });

    return model;
  },

  fitModel: function() {
    this.inTraining = true;
    const epochs = 20; // Increased from 10 to 20 for better training

    let batchSize = Math.floor(dataset.train.n * 0.1);
    batchSize = Math.max(2, Math.min(batchSize, 64));

    $('#start-training').prop('disabled', true);
    $('#start-training').html('In Progress...');

    if (training.currentModel == null) {
      training.currentModel = training.createModel();
    }

    console.info('Training on', dataset.train.n, 'samples');

    ui.state = 'training';

    let bestEpoch = -1;
    let bestTrainLoss = Number.MAX_SAFE_INTEGER;
    let bestValLoss = Number.MAX_SAFE_INTEGER;
    const bestModelPath = 'localstorage://best-model';

    training.currentModel.compile({
      optimizer: tf.train.adam(0.0005),
      loss: 'meanSquaredError',
    });

    training.currentModel.fit(dataset.train.x, dataset.train.y, {
      batchSize: batchSize,
      epochs: epochs,
      shuffle: true,
      validationData: [dataset.val.x, dataset.val.y],
      callbacks: {
        onEpochBegin: async function(epoch) {
          // Show progress at the beginning of each epoch
          ui.showTrainingProgress(epoch, epochs, 
            epoch > 0 ? bestTrainLoss : 0, 
            epoch > 0 ? bestValLoss : 0);
        },
        onEpochEnd: async function(epoch, logs) {
          console.info('Epoch', epoch, 'losses:', logs);
          training.epochsTrained += 1;
          ui.setContent('n-epochs', training.epochsTrained);
          ui.setContent('train-loss', (100 * (1 - logs.loss)).toFixed(2) + '%');
          ui.setContent('val-loss', (100 * (1 - logs.val_loss)).toFixed(2) + '%');
          
          // Update progress bar
          ui.showTrainingProgress(epoch + 1, epochs, logs.loss, logs.val_loss);

          if (logs.val_loss < bestValLoss) {
            // Save model
            bestEpoch = epoch;
            bestTrainLoss = logs.loss;
            bestValLoss = logs.val_loss;

            // Store best model:
            await training.currentModel.save(bestModelPath);
          }

          return await tf.nextFrame();
        },
        onTrainEnd: async function() {
          console.info('Finished training');

          // Load best model:
          training.epochsTrained -= epochs - bestEpoch;
          console.info('Loading best epoch:', training.epochsTrained);
          ui.setContent('n-epochs', training.epochsTrained);
          ui.setContent('train-loss', (100 * (1 - bestTrainLoss)).toFixed(2) + '%');
          ui.setContent('val-loss', (100 * (1 - bestValLoss)).toFixed(2) + '%');

          training.currentModel = await tf.loadLayersModel(bestModelPath);

          $('#start-training').prop('disabled', false);
          $('#start-training').html('Start Training');
          training.inTraining = false;
          ui.onFinishTraining();
        },
      },
    });
  },

  resetModel: function() {
    $('#reset-model').prop('disabled', true);
    training.currentModel = null;
    training.epochsTrained = 0;
    ui.setContent('n-epochs', training.epochsTrained);
    ui.setContent('train-loss', '?');
    ui.setContent('val-loss', '?');
    $('#reset-model').prop('disabled', false);
  },

  getPrediction: async function() {
    // Return relative x, y where we expect the user to look right now.
    const rawImg = dataset.getImage();
    const img = await dataset.convertImage(rawImg);
    const metaInfos = dataset.getMetaInfos();
    const prediction = training.currentModel.predict([img, metaInfos]);
    const predictionData = await prediction.data();

    tf.dispose([img, metaInfos, prediction]);

    return [predictionData[0] + 0.5, predictionData[1] + 0.5];
  },

  drawSingleFilter: function(weights, filterId, canvas) {
    const canvasCtx = canvas.getContext('2d');
    const kernelSize = weights.shape[0];
    const pixelSize = canvas.width / kernelSize;

    let x, y;
    let min = 10000;
    let max = -10000;
    let value;

    // First, find min and max:
    for (x = 0; x < kernelSize; x++) {
      for (y = 0; y < kernelSize; y++) {
        value = weights.arraySync()[x][y][0][filterId];
        if (value < min) min = value;
        if (value > max) max = value;
      }
    }

    for (x = 0; x < kernelSize; x++) {
      for (y = 0; y < kernelSize; y++) {
        value = weights.arraySync()[x][y][0][filterId];
        value = ((value - min) / (max - min)) * 255;

        canvasCtx.fillStyle = 'rgb(' + value + ',' + value + ',' + value + ')';
        canvasCtx.fillRect(x * pixelSize, y * pixelSize, pixelSize, pixelSize);
      }
    }
  },

  visualizePixels: function(canvas) {
    const model = training.currentModel;
    const convLayer = model.layers[1];
    const weights = convLayer.weights[0].read();
    const bias = convLayer.weights[1].read();
    const filterId = 1;

    training.drawSingleFilter(weights, filterId, canvas);
  },
};

Note: 7

js/dataset.js

Note: 328

dataset.js manages training data collection and storage.

Key parts:

  1. getImage() — Captures the eye canvas as a tensor, normalizes pixel values to [-1, 1]

  2. getMetaInfos() — Extracts metadata: eye rectangle center (x, y) and size, normalized relative to video dimensions

  3. convertImage() — Converts RGB to grayscale (with gamma correction) and adds spatial coordinates as extra channels (3 channels total)

  4. captureExample() — Main entry point: grabs current eye image + mouse position, calls addExample()

  5. addExample() / addToDataset() — Randomly assigns samples to train (80%) or val (20%) sets, concatenates tensors

  6. toJSON() / fromJSON() — Serialization for saving/loading datasets

The train and val objects store x (image + metadata tensors) and y (target coordinates) for model training.

Raw: 2,695

window.dataset = {
  inputWidth: $('#eyes').width(),
  inputHeight: $('#eyes').height(),
  train: {
    n: 0,
    x: null,
    y: null,
  },
  val: {
    n: 0,
    x: null,
    y: null,
  },
  session: {
    n: 0,
    x: [],
    y: [],
  },

  clearSession: function() {
    this.session = {
      n: 0,
      x: [],
      y: [],
    };
  },

  getImage: function() {
    // Capture the current image in the eyes canvas as a tensor.
    return tf.tidy(function() {
      const image = tf.browser.fromPixels(document.getElementById('eyes'));
      const batchedImage = image.expandDims(0);
      return batchedImage
        .toFloat()
        .div(tf.scalar(127))
        .sub(tf.scalar(1));
    });
  },

  getMetaInfos: function(mirror) {
    // Get some meta info about the rectangle as a tensor:
    // - middle x, y of the eye rectangle, relative to video size
    // - size of eye rectangle, relative to video size
    // - angle of rectangle (TODO)
    let x = facetracker.currentEyeRect[0] + facetracker.currentEyeRect[2] / 2;
    let y = facetracker.currentEyeRect[1] + facetracker.currentEyeRect[3] / 2;

    x = (x / facetracker.videoWidthExternal) * 2 - 1;
    y = (y / facetracker.videoHeightExternal) * 2 - 1;

    const rectWidth =
      facetracker.currentEyeRect[2] / facetracker.videoWidthExternal;
    const rectHeight =
      facetracker.currentEyeRect[3] / facetracker.videoHeightExternal;

    if (mirror) {
      x = 1 - x;
      y = 1 - y;
    }
    return tf.tidy(function() {
      return tf.tensor1d([x, y, rectWidth, rectHeight]).expandDims(0);
    });
  },

  whichDataset: function() {
    // Returns 'train' or 'val' depending on what makes sense / is random.
    if (dataset.train.n == 0) {
      return 'train';
    }
    if (dataset.val.n == 0) {
      return 'val';
    }
    return Math.random() < 0.2 ? 'val' : 'train';
  },

  rgbToGrayscale(imageArray, n, x, y) {
    // Given an rgb array and positions, returns a grayscale value.
    // Inspired by http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029740
    let r = (imageArray[n][x][y][0] + 1) / 2;
    let g = (imageArray[n][x][y][1] + 1) / 2;
    let b = (imageArray[n][x][y][2] + 1) / 2;

    // Gamma correction:
    const exponent = 1 / 2.2;
    r = Math.pow(r, exponent);
    g = Math.pow(g, exponent);
    b = Math.pow(b, exponent);

    // Gleam:
    const gleam = (r + g + b) / 3;
    return gleam * 2 - 1;
  },

  convertImage: async function(image) {
    // Convert to grayscale and add spatial info
    const imageShape = image.shape;
    const imageArray = await image.array();
    const w = imageShape[1];
    const h = imageShape[2];

    const data = [new Array(w)];
    const promises = [];
    for (let x = 0; x < w; x++) {
      data[0][x] = new Array(h);

      for (let y = 0; y < h; y++) {
        const grayValue = dataset.rgbToGrayscale(imageArray, 0, x, y);
        data[0][x][y] = [grayValue, (x / w) * 2 - 1, (y / h) * 2 - 1];
      }
    }

    await Promise.all(promises);

    return tf.tensor(data);
  },

  addToDataset: function(image, metaInfos, target, key) {
    // Add the given x, y to either 'train' or 'val'.
    const set = dataset[key];

    if (set.x == null) {
      set.x = [tf.keep(image), tf.keep(metaInfos)];
      set.y = tf.keep(target);
    } else {
      const oldImage = set.x[0];
      set.x[0] = tf.keep(oldImage.concat(image, 0));

      const oldEyePos = set.x[1];
      set.x[1] = tf.keep(oldEyePos.concat(metaInfos, 0));

      const oldY = set.y;
      set.y = tf.keep(oldY.concat(target, 0));

      tf.dispose([oldImage, oldEyePos, oldY, target]);
    }

    set.n += 1;
  },

  addExample: async function(image, metaInfos, target, dontDispose) {
    // Given an image, eye pos and target coordinates, adds them to our dataset.
    target[0] = target[0] - 0.5;
    target[1] = target[1] - 0.5;
    target = tf.keep(
      tf.tidy(function() {
        return tf.tensor1d(target).expandDims(0);
      }),
    );
    const key = dataset.whichDataset();

    const convertedImage = await dataset.convertImage(image);

    dataset.addToDataset(convertedImage, metaInfos, target, key);

    ui.onAddExample(dataset.train.n, dataset.val.n);

    if (!dontDispose) {
      tf.dispose(image, metaInfos);
    }
  },

  captureExample: function() {
    // Take the latest image from the eyes canvas and add it to our dataset.
    // Takes the coordinates of the mouse.
    tf.tidy(function() {
      const img = dataset.getImage();
      const mousePos = mouse.getMousePos();
      const metaInfos = tf.keep(dataset.getMetaInfos());
      dataset.addExample(img, metaInfos, mousePos);
    });
  },

  toJSON: function() {
    const tensorToArray = function(t) {
      const typedArray = t.dataSync();
      return Array.prototype.slice.call(typedArray);
    };

    return {
      inputWidth: dataset.inputWidth,
      inputHeight: dataset.inputHeight,
      train: {
        shapes: {
          x0: dataset.train.x[0].shape,
          x1: dataset.train.x[1].shape,
          y: dataset.train.y.shape,
        },
        n: dataset.train.n,
        x: dataset.train.x && [
          tensorToArray(dataset.train.x[0]),
          tensorToArray(dataset.train.x[1]),
        ],
        y: tensorToArray(dataset.train.y),
      },
      val: {
        shapes: {
          x0: dataset.val.x[0].shape,
          x1: dataset.val.x[1].shape,
          y: dataset.val.y.shape,
        },
        n: dataset.val.n,
        x: dataset.val.x && [
          tensorToArray(dataset.val.x[0]),
          tensorToArray(dataset.val.x[1]),
        ],
        y: tensorToArray(dataset.val.y),
      },
    };
  },

  fromJSON: function(data) {
    dataset.inputWidth = data.inputWidth;
    dataset.inputHeight = data.inputHeight;
    dataset.train.n = data.train.n;
    dataset.train.x = data.train.x && [
      tf.tensor(data.train.x[0], data.train.shapes.x0),
      tf.tensor(data.train.x[1], data.train.shapes.x1),
    ];
    dataset.train.y = tf.tensor(data.train.y, data.train.shapes.y);
    dataset.val.n = data.val.n;
    dataset.val.x = data.val.x && [
      tf.tensor(data.val.x[0], data.val.shapes.x0),
      tf.tensor(data.val.x[1], data.val.shapes.x1),
    ];
    dataset.val.y = tf.tensor(data.val.y, data.val.shapes.y);

    ui.onAddExample(dataset.train.n, dataset.val.n);
  },
};

Note: 7

js/globals.js

Note: 151

globals.js is a small utility file that:

  1. Video codec detection — Provides functions to check if the browser supports video playback, H.264, and WebM formats

  2. getUserMedia polyfill — Normalizes the navigator.getUserMedia API across different browsers (webkit, moz, ms prefixes)

  3. URL polyfill — Normalizes window.URL across browsers

It's essentially a compatibility layer ensuring the webcam access works across older browsers.

Raw: 297

// video support utility functions
window.supports_video = function() {
  return !!document.createElement('video').canPlayType;
};

window.supports_h264_baseline_video = function() {
  if (!supports_video()) {
    return false;
  }
  const v = document.createElement('video');
  return v.canPlayType('video/mp4; codecs="avc1.42E01E, mp4a.40.2"');
};

window.supports_webm_video = function() {
  if (!supports_video()) {
    return false;
  }
  const v = document.createElement('video');
  return v.canPlayType('video/webm; codecs="vp8"');
};

navigator.getUserMedia =
  navigator.getUserMedia ||
  navigator.webkitGetUserMedia ||
  navigator.mozGetUserMedia ||
  navigator.msGetUserMedia;
window.URL = window.URL || window.webkitURL || window.msURL || window.mozURL;