The ability to understand 3D space, coordinates, or physics in a video for robotics and UX.
State-of-the-art multimodal AI with frontier reasoning and massive context