Building an End-to-End Web Agent with Large Multimodal Models

3 Likes

Fairly certain this is what the R1s system is based on, no? I could have sworn I read that somewhere in one of the OG press articles when it first got hyped up.