Hacker Newsnew | past | comments | ask | show | jobs | submit | asdev's commentslogin

you can use git worktrees and just have multiple Claude Code terminal instances working on each worktree. That way they don't clash, just delete the worktree when the task is done.

I have never leveraged git worktrees... That is such a crazy useful tool that I am almost ashamed of not having researched it before. Git is such a beautiful piece of software.

I built an open source project to make the whole workflow easier: https://github.com/built-by-as/FleetCode

is Cursor Bench open? Would like to see an open benchmark for agentic coding

Unfortunately not, as we used our own internal code for the benchmark. We would also like to see more benchmarks that reflect the day-to-day agentic coding use.

Is there any information at all available, anywhere, on what Cursor Bench is testing and how?

It's the most prominent part of the release post - but it's really hard to understand what exactly it's saying.


Roughly, we had Cursor software engineers record real questions they were asking models, and then had them record the PR that they made that contained the result. We then cleaned these up. That is the benchmark.

Are you able to give a sense of how many questions, which domains they were split over, and how that split looked in % terms?

As a user, I want to know - when an improvement is claimed - whether it’s relevant to the work I do or not. And whether that claim was tested in a reasonable way.

These products aren’t just expensive - it requires switching your whole workflow. Which is becoming an increasingly big ask in this space.

It’s pretty important for me to be able to understand, and subsequently, believe a benchmark - I find it really hard not to read it as ad copy where this information isn’t present.


Which programming languages/tools/libraries did the teams questions/code involve?

I built a version of this which wraps multiple CLI sessions locally. I do think the Web aspect and being able to access your CC session from anywhere is cool.

https://github.com/built-by-as/FleetCode


George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case

That is a common refrain by people who have no domain expertise in anything outside of tech.

Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)

This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.


yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out

I dont think the claim is exactly that tier 5 is useless more that tier 5 synergizes so well with tier 4 that all the popular tier 5 products will eventually be made by the tier 4 companies.

George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.

Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.

I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.


Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers

However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.

1: https://x.com/tbpn/status/1935072881425400016


Andrew Ng argumented in 2023 (https://www.youtube.com/watch?v=5p248yoa3oE ) that the underlying tiers depend on the app tier‘s success.

That OpenAI is now apparantly striving to become the next big app layer company could hint at George Hotz being right but only if the bets work out. I‘m glad that there is competition on the frontier labs tier.


People were saying the same thing about AWS vs SaaS ("AWS wrappers") a decade ago and none of that came to pass. Same will be true here.

Claude is a model wrapper, no?

Anthropic is a frontier lab, and Claude is a frontier model


Okay, Claude is a _family_ of frontier models then. IMO that's a pedantic distinction in this context.

AI startups are becoming obsolete daily

they've lost on basically all fronts of AI right?

I'm confused about Meta AI in general. It's _horrible_ compared to every other LLM I use. Customer ingress is weird to me too - do they expect people to use Facebook chat (Messenger) to talk to Meta AI mainly? I've tried it on messenger, the website, and have run llama locally.

My (completely uninformed, spitballing) thinking is that Facebook doesn't care that much about AI for end users. THe benefit here is for their ads business, etc.

Unclear if they have been successful at all so far.


Too much training on facebook and insta shitposts.

so the biggest issue is having to pull down and manually edit changes? can't you just @claude on the PR to make any changes?

Yes, but my point is often times I don't want to. Sometimes there are changes I can make it seconds. I don't want to wait 15+ seconds for an AI that might do it wrong or do too much.

Also it isn't always about editing. It is about seeing the surrounding code, navigating around, and ensuring the AI did the right thing in all of the right places.


are you using the web ui, cli or both?

do you use the CLI or the web UI? or both?

Are researchers scared to just come out and say it because they'll be labeled as wrong if the extreme tail case happens?

No, it’s because of money and the hype cycle.

I mean you say this, but I havent touched a line of code as a programmer in months, having been totally replaced by AI.

I mean sure I now "control" the AI, but I still think these no AGI for 2 decades claims are a bit rough.


I think AI is great and extremely helpful but if you’ve been replaced already maybe you have more time now to make better code and decisions? If you think the AI output is good by default I think maybe that’s a problem. I think general intelligence is something other than what we have now, these systems are extremely bad at updating their knowledge and hopelessly at applying understanding from one area to another. For example self driving cars are still extremely brittle to the point of every city needing new and specific training - you can just take a car with controls on the opposite side to you and safely drive in another country.

Yeah i agree. However like ... I don't understand why you think I am making bad decisions? I'm a self made ( to an extent ) millionaire, I am doing ok

I don't want to sound mean, but c'mon, the reality is that if you haven't touched a line of code in months, you are/were not a programmer. I love Claude Code, it really has its moments. But even for the stuff it is exceptionally good at, I have to regularly fix mistakes it has made. And I only give it the fairly easy stuff I don't feel like doing myself.

It is ok brother I can handle the accusation. Perhaps after 2 decades... I really am no longer a programmer?

100% Agree

Let’s see the code

pls no

They are afraid to say it because it may affect the funding. Currently with all the hype surrounding AI investors and governments will literally shower you with funding. Always follow the money:) Buy the dream - sell the reality.

Also I think Andrej is just an honest guy.

I don’t think they’re scared, I think they know it’s a lose-tie game.

If you’re correct, there’s not much reward aside from the “I told you so” bragging rights, if you’re wrong though - boy oh boy, you’ll be deemed unworthy.

You only need to get one extreme prediction right (stock market collapse, AI taking over, etc ), then you’ll be seen as “the guru”, the expert, the one who saw it coming. You’ll be rewarded by being invited to boards, panels and government councils to share your wisdom, and be handsomely paid to explain, in hindsight, why it was obvious to you, and express how baffling it was that no one else could see what you saw.

On the other hand, if predict an extreme case and you get it wrong, there’s virtually 0 penalties, no one will hold that against you, and no one even remembers.

So yeah, fame and fortune is in taking many shots at predicting disasters, not the other way around.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: