We used a game, a webcam, and AI to track visual attention—here’s what we discovered
I recently undertook a project using simple equipment—a laptop and its webcam—to predict where a user was looking on a screen and study their visual attention. To achieve this, I trained a convolutional neural network (CNN) to predict gaze direction from webcam images. But first, I needed high-quality training data—not just images of faces, but images paired with precise gaze direction labels.
To gather this data, I started simply: I built a program that displayed a moving object on a dark screen and instructed participants to track it with their eyes while the webcam recorded their faces. However, engagement became a challenge—participants quickly lost interest, which lowered the quality and quantity of data collected.
So, I took a different approach: I turned the experiment into a game. In this game, an alien would randomly appear on the screen, and the user would aim and "shoot" it using a crosshair controlled by the mouse. Each time the user clicked, the system captured an image from the webcam and recorded both the crosshair location and the distance between the crosshair and the alien's center.
This additional metric—the distance from the target—became crucial. It served as a confidence score for each data point, enabling a technique called cost-sensitive learning. Data from highly accurate clicks (those close to the alien) were weighted more heavily during model training. This guided the CNN to prioritize learning from the most reliable examples, improving its ability to correlate facial features with gaze direction.
The game proved highly effective, generating millions of labeled data points. With this large dataset, I trained the CNN, which ultimately achieved an average gaze prediction accuracy of approximately one degree. Although professional eye-trackers can reach sub-0.6-degree accuracy, our system performed well enough for the project's goal: to create a widely accessible, webcam-based eye-tracking tool.
This tool was deployed in an IRB-approved study comparing gaze patterns between children with autism spectrum disorder (ASD) and typically developing (TD) children. Research shows that children with ASD often focus on different visual elements than their peers, such as background objects instead of faces or social cues. Our system successfully captured these differences, distinguishing whether participants were looking at faces or other objects on the screen.
The most important aspect of this project was accessibility. Using only a laptop webcam, families could participate from home without the need for costly specialized equipment or long trips to clinics.
This experience reminded me that AI is a tool, not magic. Success relied on collecting quality data and using creative approaches, such as gamification, to achieve it. As the saying goes: garbage in, garbage out. With thoughtful design and high-quality inputs, AI can drive powerful and meaningful outcomes.