Not who you asked, but I upgraded NextJS in a couple of repos by just telling Claude Code to do it. I've had it swap out and upgrade libraries successfully in one shot too. It will usually create good enough Page Objects for E2Es and scaffold out the test file, which speeds up the development process a bit. Same for upgrading Node versions in some Lambda projects, just tell it to go and come back later. Instruct it to run the test and build steps and it's also like having a mini CI system running too.
Personally, I think it really shines at doing the boring maintenance and tech debt work. None of these are hard or complex tasks but they all take up time and for a buck or two in tokens I can have it doing simple but tedious things while I'm working on something else.
> Personally, I think it really shines at doing the boring maintenance and tech debt work.
It shines at doing the boring maintenance and tech debt work for web. My experiences with it, as a firmware dev, have been the diametric opposite of yours. The only model I've had any luck with as an agent is Sonnet 4 in reasoning mode. At an absolutely glacial pace, it will sometimes write some almost-correct unit tests. This is only valuable because I can have it to do that while I'm in a meeting or reading emails. The only reason I use it at all is because it's coming out of my company's pocket, not mine.
For sure. There's tons of training data in the models for the JS and TS language and the specific tasks I outlined, but not specifically just the web, I have several Node or Bun + Typescript + SQLite CLI utilities that it also helps with. I definitely pick my battles and lean in to what it works best for though. Anything it appears to struggle at I'll just do manually and develop it like we always did. It's rarely not a net positive to me but it's very frequently a negligible improvement. Anything that doesn't pay off in spades I typically don't try again until new models release or new tools or approaches are available.
If you're doing JS/Python/Ruby/Java, it's probably the best at that. But even with our stack (elixir), it's not as good as, say, React/NextJS, but it's definitely good enough to implement tons of stuff for us.
And with a handful of good CLAUDE.md or rules files that guide it in the right direction, it's almost as good as React/NextJS for us.
I can see how these things are convenient, if it succeeds. I struggle because my personal workflow is to always keep two copies of a repo up at once. One is deep thought vs drone work. I have always just done these kinds of background tasks whenever I am in meetings, compiling etc. I haver not seen much productivity boost due to this. oddly, you would think being able to further offload during that time would help, but reviewing the agent output ends up being far more costly (and makes the context switch significantly harder, for some reason). It's just not proving to be useful consistently, for me.
Just off the top of my head (and I exclusively use Claude Code now):
Random Postgres stuff:
- Showed a couple of Geo/PostGIS queries that were taking up more CPU according to our metrics, asked it to make it faster, it rewrote it in away that it actually used the index. (using the <-> operator for example for proximity). One-shotted. Whole effort was about 5 mins.
- Regularly asking for maintenance scripts (like give me a script that shows me the most fragmented tables, or highest storage, etc).
CSS:
Built a whole horizontal logo marquee with CSS animations, I didn't write a single line, then I asked for little things like "have the people's avatars gently pulsate" – all this was done in about 15 mins. I would've normally spent 8-16 hours on all that pixel pushing.
Elixir App:
- I asked it to look at my GitHub actions file and make it go faster. In about 2-3 iterations, it cut my build time from 6 minutes to 2 minutes. The effort was about an hour (most of it spent waiting for builds, or fiddling with some weird syntax errors or just combining a couple extra steps, but I didn't have to spend a second doing all the research, its suggestions were spot on)
- In our repo (900 files) we had created an umbrella app (a certain kind of elixir app). I wanted to make it a non-umbrella. This one did require more work and me pushing it, but I've been putting off this task for 3 YEARS since it just didn't feel like a priority to spend 2-3 days on. I got it done in about 2 hours.
- Built a whole discussion board in about 6 hours.
- There are probably 3-6 tickets per week where I just say "implement FND-234", and it one-shots a bugfix, or implementation, especially if it's a well defined smaller ticket. For example, make this list sortable. (it knows to reuse my sortablejs hook and look at how we implemented it elsewhere).
- With the Appsignal MCP, I've had it summarize the top 5 errors in production, and write a bug fix for one I picked (I only did this once, the MCP is new). That one was one-shotted.
- Rust library (It's just an elixir binding to a rust library, the actual rust is like 20 lines, so not at all complex)... I've never coded a day of rust in my life, but all my cargo updates and occasional syntax/API deprecations, I have claude do my upgrades and fixes. I still don't know how to write any Rust.
NextJS App:
- I haven't fixed a single typescript error in probably 5 months now, I can't be bothered, CC gets it right about 99% of the time.
- Pasted in a Figma file and asked it to implement. This rarely is one-shotted. But it's still about 10x faster than me developing it manually.
The best combination is if you have a robust component library and well documented patterns. Then stuff goes even faster.
All on the $100 plan in which I've hit the limit only twice in two months. I think if they raised the price to $500, it would still feel like a no-brainer.
I think Anthropic knows this. My guess is that they're going to get us hooked on the productivity gains, and we will happily pay 5x more if they raised the prices, since the gains are that big.