Often includes synchronized gaze data (where the person is looking) Content and Activity
🎥 This video is often cited in papers involving or Transformers designed for video understanding. It serves as a "real-world" challenge because of motion blur, hand occlusions, and the visual complexity of a cluttered kitchen.
Modeling how a person’s eyes move toward an object before their hands touch it.
In this specific sequence, a subject is filmed in a natural kitchen setting performing a "recipe-driven" task.
High frequency of hand-to-object contact (e.g., opening jars, slicing vegetables, pouring liquids).
If you tell me more about your specific project, I can provide: for this specific timestamp (if available) Code snippets for loading GTEA Gaze+ videos in Python Related research papers that utilize the Group 4 dataset
Recognizing kitchen tools and ingredients from shifting, shaky angles.