GitHub - MinorJerry/WebVoyager: Code for “WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models” this you can run locally and with Lama engine so you can use for free…
that is big, also agentic AI is appraacing ( with simple and coordinator agents) also Perplexity AI has now an assistant on Android.
Thanks for creating this post!
@simon when reading the announcement, I saw their performance on a set of different benchmarks for these UI interaction tasks.
That made me wonder how LAM is performing in comparison. Can anything be shared?
Jesse published a video on January 28th, 2025 with a side by side comparison between LAM playground and OpenAI Operator: https://www.youtube.com/watch?v=Jjb4lLsoMcE
Here is a TL;DR (or in this case TL;DW - Too Long; Didn’t Watch) summary of the comparisons described in Jesse’s video…
Rabbit LAM Playground and OpenAI Operator are both generic web agent systems that perform tasks on destination websites, but they have differences in their functionality.
The main takeaways:
-
User Interface (UI): OpenAI’s Operator has a more refined UI with sample queries and prompts, which is especially useful for first-time users. Rabbit LAM Playground doesn’t have a fancy UI but can be used directly on the R1 by pressing a button to talk to it.
-
Task Execution:
-
Restaurant Reservations: When prompted to find a restaurant with high ratings and reservations, OpenAI’s Operator used a specific tag for Open Table, routing the task directly to that website. Rabbit LAM Playground, however, defaults to using Google search unless specifically instructed to use Open Table. In testing, Rabbit LAM Playground was faster and did not need manual location input as it uses the R1’s location. Open AI’s Operator struggled with this task.
-
Ordering Ingredients: When ordering ingredients from Instacart, OpenAI’s Operator required more manual input, while Rabbit LAM Playground was faster in getting to the task.
-
Multiple Destinations: In a test requiring multiple destinations (Reddit, then Best Buy), OpenAI Operator was blocked by network security, highlighting issues with anti-bot detection. Rabbit LAM Playground successfully completed the task, although it added an extra item to the cart.
-
-
Speed: Rabbit LAM Playground was generally faster in completing tasks compared to Open AI’s Operator.
-
Cookie Management: Rabbit’s cookie management is more user-friendly than Open AI’s Operator.
-
Other: Both systems encounter common issues in the agent era, such as login and CAPTCHA. Both systems are also continuously evolving.
-
Cost: Access to OpenAI’s Operator is part of the pro plans, which cost $200 a month. Rabbit LAM Playground is included in the one-off cost of $199.
Rabbit LAM Playground outperformed OpenAI Operator in several aspects:
-
Speed: Rabbit LAM Playground was generally faster at completing tasks than OpenAI Operator.
-
Task Execution: In testing, Rabbit LAM Playground was faster and more efficient in the following:
-
Restaurant reservations: It did not require manual location input, using the R1’s location instead, whereas OpenAI’s Operator struggled with this task…
-
Ordering ingredients: It was faster in getting to the task of ordering from Instacart, while OpenAI’s Operator required more manual input.
-
Multiple destinations: Rabbit LAM Playground successfully completed a task requiring multiple destinations (Reddit and Best Buy), whereas OpenAI Operator was blocked by network security.
-
Cookie Management: Rabbit’s cookie management is more user-friendly compared to Open AI Operator.
In summary, while OpenAI Operator has a more refined user interface, Rabbit LAM Playground appears to be faster and more efficient in task completion, with better cookie management, and does not appear to have issues with anti-bot detection.
The ROI of rhe R1 just dropped to one month
Thanks! Was pretty interesting to watch.
Now, I am curious. I know that Operator performs very well in a variety of benchmarks for agent behavior. Can we get some benchmark comparison for LAM? Because right now, we only have anecdotical evidence (positive as in the video and my own anecdotes about it for example, oftentimes failing to post here in the forum on my behalf).
Today, I had my own test and shared the results internally in the forum, for anybody who is interested: Exhaustive analysis video of the current status of LAM Playground when using the forum