I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
Why are flight bookings the go to example always? For most people, booking a flight happens infrequently, is a non-trivial expense (to your point), and is not that burdensome to do yourself.
We agree that as a demo flight booking is probably overused.
However, in talking with my AI Labs, their perspective on flight booking is a little different. "Solving" flight booking requires the AI agent to solve a LOT of hard problems. Namely, personalization, context, weighing multiple options, interacting with the UI, math, then wrapping that all up into a coherent response. The thought process is IF a computer use agent is able to solve flight booking well, then we will have developed many other powerful primitives that will scale to other problems.
So as a standalone use case, I'm inclined to agree this might not be where the most agent traction is seen. However, as a research/capability goal, there are some generalizations that could apply to other very important use cases.
It's because most people have done it; and it's infrequent and sufficiently expensive that makes it enough of a pain point to make for a good example. Because it's infrequent, most people don't have a rigorous well-practiced system for how to go about it to get the optimal ticket for their particular circumstances for that flight, and because it can be somewhat expensive, there's a bit of a burden taken on in order to optimize for price as well, especially given all the shenanigans airlines play with pricing.
If you're rich, you can just look for the ticket at the time you like on your preferred airline and buy a first class ticket, whatever the price, for whenever you want to fly, even if it's tomorrow. For the rest, that's not practical. So the flight search has to begin a few months out, with the burden of doing multiple searches (in incognito mode) across various airlines and/or aggregators, in order to optimize various factors. This takes a non-trivial amount of time. Add in looking for hotels and rental cars, and for some it's fun,
for others it's an annoying burdensome chore that stands in the way of being on vacation.
It's just an example use case though. Similar to how "robot maid" that folds clothes isn't the be-all or end-all for robotics, if an AI is able to perform that task, it's going to have capabilities necessary for performing a wide variety of other tasks.
I don't know about you, but it takes me hours to book a flight if it's for my family, because I'm usually booking a flight, a car, and a hotel, and I have to constantly min-max the costs between hotels on certain days, flights on certain days, and cars on certain days.
If it's not burdensome for you, then you're either taking very simple trips or you're so rich that you don't care.
> I have to constantly min-max the costs between hotels on certain days, flights on certain days, and cars on certain days.
I agree it's a burdensome chore!
Just wondering - your hotel stay can't be less than the days between your flight. For car, one can manage to cut down with Uber/public transport, but still turns out to be expensive than a rental car.
> your hotel stay can't be less than the days between your flight.
This is exactly right, and why it's such a pain. Because if I have a bit of flexibility, I have to figure out which flying day is best for prices and seats, and then see if the hotel is more or less between those days.
For example, if I fly on Tuesday I can save $400 vs flying Sunday. But if I want to stay a week, the hotel may not have the following Sunday. So now I have to look an alternate hotel, which may not include parking like the first one, and so on and so on. There are so many variables that can all change based on the day of arrival and departure.
We used to have travel agents for this (and still do!). But I've used travel agents, and I've used (other people's) personal assistants, but no one ever gets it right. I only trust myself, my wife, and my sister in law to get this right.
Having an AI agent that gets this right would be incredible.
> For car, one can manage to cut down with Uber/public transport, but still turns out to be expensive than a rental car.
If I'm getting a car it's usually because it's a place where Lyft and public transport won't work. Otherwise I always default to public transport and then Lyft if necessary.
Recently, I've been travelling for events, so my dates are fixed, and that's why I could not connect with your scenario! Travel bookings and other web searches are ripe for automation. Even if the system can bring down the search results to the final 3 instead of a complete automation, I'd still call it a win. Shameless plug, I'm working on such an agent, although currently at a very early stage.
> I only trust myself, my wife, and my sister in law to get this right.
Can you please elaborate on that? Do others not look at cost savings across the board and focus only on one item out of all?
I'm a big fan of ActiveJob in Rails. I was considering building a version inspired by it in Node but now it looks like I don't have to. Thank you for building this.
Screenshots on the website would be very helpful. This sounds interesting, but would like to see what the actual UI is before setting anything up. It would also be good to link to the Github repo for the CLI part - I'm skeptical of giving an unknown tool access to my database without understanding what it's actually doing.
Totally understandable, your concerns are valid. I will have to figure out the CLI part, I would prefer to keep it private (at least for now), but I get it that people don't want to provide DB access to a completelly new tool.
And yes, you're absolutely right about the lack of screenshots. I realize now how important that is for trust and clarity. I’m working on improving the landing page with proper visuals, and I’m also exploring the idea of a live demo using an in-browser SQLite DB — that should let people try it instantly without setup. It’ll take a bit of time, but it’s now high on the priority list. Appreciate the feedback!
it screams it to me - wording such as "your concerns are valid", "yes, you're absolutely right..." and "I realize now how important...", and then usage of em-dash
Last time I was trying to be authentically me my response triggered a whole sub-thread about me using the word "subvention" and how funny Europeans sounds to Americans.
I have this too, many words I get from French and turn into English do exist in reality (they sound natural to me) and bam! there you have a snob who wants to show off :)
What is this obsession with the em-dash? For those who used LaTeX it is second nature, and a -- gets changed to an em-dash by a lot of editors or programs such as AutoHotkey
The techy people who are the ones who are the target for something like this aren't just setting up their home internet once, they're setting up the internet for their friends and family when they visit, getting roped in to fix the internet for their in-laws when it breaks, etc.
Because, 18 months from now, you or your executrix wakes up with a hangover, and your Internet connection is down, and you or your executrix begin troubleshooting, and poking through the DNS configuration, your executrix scratches her head and exclaims “who in the world is 2a13:1001::86:54:11:100 and why did we ever add this in here?!”
And then you or your executrix reset it to 8.8.8.8 because that is distinctly memorable and unmistakable.