Hacker Newsnew | past | comments | ask | show | jobs | submit | QuadrupleA's commentslogin

Glib observation, but this sounds quite generic and AI-written.

Elsewhere I've seen a post from the author talking about how his old articles hit so many of Wikipedia's identified signs of AI-generated text. As somebody who's own style hits many of those same stylistic/rhetorical techniques, I definitely sympathize.

Can we stop repeating this canard, over and over?

Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)

LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.

I'll take the gloriously reliable classical compute world to compile my stuff any day.


Agreed, yet we will have to keep seeing this take over and over again. As if I needed more reasons to believe the world is filled with morons.

Fun :) but mobile text input was the main challenge.


Been unhappy with the GPT5 series, after daily driving 4.x for ages (I chat with them through the API) - very pedantic, goes off on too many side topics, stops following system instructions after a few turns (e.g. "you respond in 1-3 sentences" becomes long bulleted lists and multiple paragraphs very quickly.

Much better feel with the Claude 4.5 series, for both chat and coding.


> you respond in 1-3 sentences" becomes long bulleted lists and multiple paragraphs very quickly

This is why my heart sank this morning. I have spent over a year training 4.0 to just about be helpful enough to get me an extra 1-2 hours a day of productivity. From experimentation, I can see no hope of reproducing that with 5x, and even 5x admits as much to me, when I discussed it with them today:

> Prolixity is a side effect of optimization goals, not billing strategy. Newer models are trained to maximize helpfulness, coverage, and safety, which biases toward explanation, hedging, and context expansion. GPT-4 was less aggressively optimized in those directions, so it felt terser by default.

Share and enjoy!


> This is why my heart sank this morning. I have spent over a year training 4.0 to just about be helpful enough to get me an extra 1-2 hours a day of productivity.

Maybe you should consider basing your workflows on open-weight models instead? Unlike proprietary API-only models no one can take these away from you.


I have considered it, and it is still on the docket. I have a local 3090 dedicated to ML. Would be a fascinating and potentially really useful project, but as a freelancer, it would cost a lot to give it the time it needs.


You can’t ask GPT to assess the situation. That’s not the kind of question you can count on a an LLM to accurately answer.

Playing with the system prompts, temperature, and max token output dials absolutely lets you make enough headway (with the 5 series) in this regard to demonstrably render its self-analysis incorrect.


And how would GPT 5.0 know that, I wonder. I bet it’s just making stuff up.


What kind of "training" did you do?


4.1 is great for our stuff at work. It's quite stable (doesn't change personality every month, and one word difference doesn't change the behaviour). IT doesn't think, so it's still reasonably fast.

Is there anything as good in the 5 series? likely, but doing the full QA testing again for no added business value, just because the model disappears, is just a hard sell. But the ones we tested were just slower, or tried to have more personality, which is useless for automation projects.


Yeah - agreed, the initial latency is annoying too, even with thinking allegedly turned off. Feels like AI companies are stapling more and more weird routing, summarization, safety layers, etc. that degrade the overall feel of things.


I also found this disturbing, as I used to use GPT for small worked out theoretical problems. In 5.2, the long list of repeated bulleted lists and fortune cookies was a negative for my use case. I replaced some of that use with Claude and am experimenting with LM studio and gpt-oss. It seemed like an obvious regression to me, but maybe people weren't using it that way.

For instance something simple like: "If I put 10kw in solar on my roof when is the payback given xyz price / incentive / usage pattern."

Used to give a kind of short technical report, now it's a long list of bullets and a very paternalistic "this will never work" kind of negativity. I'm assuming this is the anti-sycophant at work, but when you're working a problem you have to be optimistic until you get your answer.

For me this usage was a few times a day for ideas, or working through small problems. For code I've been Claude for at least a year, it just works.


I can never understand why it is so eager to generate walls of text. I have instructions to always keep the response precise and to the point. It almost seem like it wants to overwhelm you, so you give up and do your own research.


I often use ChatGPT without an account and ChatGPT 5 mini (which you get while logged out) might as well be Mistral 7b + web search. Its that mediocre. Even the original 3.5 was way ahead.


I kinda miss the original 3.5 model sometimes. Definitely not as smart as 4o but wow was it impressive when new. Apparently I have a very early ChatGPT account per the recent "wrapped" feature.


Really? I’ve found it useful for random little things.


It is useful for quick information lookup when you're lacking the precise search terms (which is what I've often do). But the way I was chatting with the original chatgpt were better.


Bullshit upon bullshit.


I always like to chime in on these things that I've been a delighted Arch user for about a year now, for similar reasons. Took a lot of setup, but it's dialed now and just works. My computer belongs to me again for the first time in years.

I should really do more to evangelize. It's not ok to use an OS monopoly to degrade and squeeze your users' often primary career and creative tool to your own short term ends, making their lives worse and worse. And it's such a delight to get out from under.

Not sure the situation for normies currently, but for power users, definitely dual boot and give it a try.


You might be overestimating the rigor of tool calls - they're ultimately just words the LLM generates. Also I wonder if "tool stubs" might work better in your case, if an LLM uses a give_medical_advice() and there's no permission, just have it do nothing? Either way you're still trusting an inherently random-sampled LLM to adhere to some rules. Never going to be fully reliable, and nowhere near the determinism we've come to expect from traditional computing. Tool calls aren't some magic that gets around that.


You’re totally right—it's ultimately just probabilistic tokens. I’m thinking that by physically removing the tool definition from the context window, we avoid state desynchronization. If the tool exists in the context, the model plans to use it. When it hits a "stub" error, it can enter a retry loop or hallucinate success. By removing the definition entirely, we align the model's World Model with its Permissions. It doesn't try to call a phone that doesn't exist.


I have the feeling this boils down to something really mundane - but the writing is so puffed-up with vague language it's hard to make out. Require human approval for all LLM actions? Log who approved?


You absolutely can remove unnecessary complexity. If your app makes an http request for every result row in a search, you'll simplify by getting them all in one shot.

Learn what's happening a level or two lower, look carefully, and you'll find VAST unnecessary complexity in most modern software.


I'm not talking about unnecessary (nor incidental) complexity. That is a whole other can of worms. I am talking about the complexity required given what you need to a system to spec. If choices are made to introduce unnecessary complexity (eg. "resume driven development" or whatever you want to call the proclivity to chase new tech) - that is a different problem. Sometimes it can be eliminated through practical considerations. Sometimes organization politics and other entrenched forces prevent it.


Been so happy with my switch to Linux about 8 months ago. The nvidia gremlins that stopped me in prior years are all smoothed out.

One big plus with Linux, it's more amenable to AI assistance - just copy & paste shell commands, rather than follow GUI step-by-steps. And Linux has been in the world long enough to be deeply in the LLM training corpuses.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: