Hacker Newsnew | past | comments | ask | show | jobs | submit | ryandrake's commentslogin

Once the event was underway, I recall Howard Stern providing rather good up-to-the-minute reporting about the event, by way of guests calling in. While the mainstream news was floundering around with stale info and generally not really knowing what was going on, you could get pretty decent information from the Stern show. Apart the occasional guest callers just calling in and shouting "baba booey," his coverage was quite good. Helps to remember this was way before Twitter, and there was not much "instant live reporting from the commoners" back then. All live news came from mainstream news behemoths.

Tangent but I recall /. being one of the only sites that could withstand the onslaught of people trying to follow the news online.

Which is weird because writing this comment made me go glance at /. and it's sad what it ultimately became.


It did! Once I confirmed it was real I wound up bouncing between the two. Such a surreal experience

This was my experience, too. I learned about it on my commute to work, listing to Howard Stern. And, yeah, at first I thought it was another stupid skit.

One thing that impressed me about Stern's broadcast that day is he kept calling for calm. One quote I'll never forget: "Don't go around beating up cab drivers." Not sure why that made an impact on me.


Computers were great when the user was in control and got to decide what gets run and what doesn’t get run. When the user was in the driver’s seat. When software developers asked “what does the user want to do with their computer?” and not “what do we want the user to do?” Now instead of driving the car, users are just passengers going wherever software companies are taking them.

A huge percentage of people who say they are into a particular hobby are really just collectors of that hobby’s gear. Photography is an easy example, but this applies to a LOT of hobbies.

I have long ago realized that I cannot buy more gear for my hobbies unless I commit to using it. I want a lot of cool gadgets, but using what I have needs more time than I put into the hobby. (I need to play mandolin several hours a day to get better, in reality I often skip days or only put in 15 minutes). It is one thing to say "I need X tool for the next step and buy the tool, but I only buy that tool if I really do use it, not because it is a gadget that looks cool (and then I have the tool). I've also found great fun in asking "how did they do this before modern tools" - often I can find an alternate path without the gear.

Rowling / Harry Potter comes to mind, too, and Heinlein. You need to be able to separate the artist from the art, the programmer from the program. It’s ok to appreciate a work even if you disagree with its creator’s morals or ethics.

This is one of the reasons I got away from writing commercial software and now only write code as a hobby.

To me, the code itself is the product. I want the code to look like a beautiful painting—the fact that it does something is secondary. I’ll sit there for hours working on things like const correctness, and making sure each class has the bare minimum amount of state/instance variables, making sure function arguments are named and ordered consistently, even though it has no effect on user-visible bugs or runtime performance. I’m the kind of person that paints the back of the cabinet. Even though no user will see it, I will know it is there.

Obviously this mentality is at odds with commercial software’s imperative to shit out barely working spaghetti code as fast and cheaply as possible, so I opted out.


“Paints the back of the cabinet” is a great analogy. LLM-driven production is so far away from this mindset.

Have you ever done research mathematics? To me, the only difference between code and math is that the code can do things, make stuff happens in the world; outside of that, mathematics has a lot more opportunities to be beautiful (not to say that there isn't beautiful code, but the beauty is not central in the way it often is in mathematics).

Yeah, a lot of businesses definitely do push things too far the other way and advocate releasing _anything_ regardless of how well it works.

I'm strongly against the "move fast and break things" mentality. But there is a happy middle ground between architecting works of art, and shipping urinals with faulty plumbing.


Although in this case it's more like using the paint in the tin to paint the tin itself. It's useless and completely missing the point of why the paint exists in the first place.

You do you, I'm sorry if I come across rude and stupid, but I am both things. But "code is the product" is what IMO caused the downfall of this entire profession. No wonder everyone is trying to get rid of us. I wouldn't want a plumber that's obsessed with the tubes itself and not whether my house has working plumbing in a reasonable time frame and within budget.


Despite the gallons of ink spilled on the subject I have not worked at a single place in my 30-year career where developers sat around perfecting masterpieces.

I have worked at a never-ending list of places where people shipped the first thing that worked, built spaghetti around it, something else got built on top, and the original thing is now critical infrastructure that takes 10x longer to fix bugs or add needed features to than it would have if we’d taken 1.5x longer to ship it in the first place. I have worked at a never-ending list of places where developers beg for time to be set aside to deal with the worst parts that sap their time, energy, or will to continue working at the job. I have worked at a never-ending list of places that eventually sets aside a few days to tackle these tasks, when the engineers estimate two or three weeks. I have worked at a never-ending list of places that then uses the failure of these momentary diversions as evidence that their engineers don’t know what they’re talking about and should shut up and ship more features.

I sure wish I knew what masterpiece factories you must have spent your career working at.


I’ve been in this profession for two decades as well. As both things.

My take on this is that we need both, because the market is cyclical. It’s just that it’s hard to perceive any of those cycles if you (a) live them (b) are not experienced enough to introspect.

I absolutely would love an obsessed plumber (and got one!) when it comes to deciding that we’re going to do PTFE tubing in our new house. An obsessed electrician in charge to overinvest into our grid, rather than a 3-month timeframe executive. Otherwise our critical infrastructure gets myopically degraded.

I also want the “working within timeframe” outcome.

And we, as an industry, swing wildly in both direction. The Cambrian explosion of shareware was the the former. We course-corrected into cathedrals of good software (I still love Windows 2000’s stability, the pinnacle of NT line), followed by the “reasonable timeframe” 4GB Electron apps, etc.

It will swing. Every complex system from logistic equation upwards will oscillate .


It looks like you can't just sit there and heal (or it takes an enormous amount of time to do so) without first taking over an island. So once you get hurt down to nothing, you need to sail around looking for castaways, which heal your ship.

It doesn't seem to do anything when you click Run Live, besides updating the status to "Connecting to DERP relay, exchanging endpoint info..."

You can get it to run by hitting "Edit" at the top - needs a real dev container to run, not a web worker.

Very cute and fun looking game, but I found it very difficult. Enemies have perfect aim, and I can't hit anything. I don't think you can even heal the ship unless you happen to stumble upon just the right treasure. Do you have a difficulty slider/setting? Maybe I missed it. It feels like I'm playing on the hardest mode.

EDIT: After playing around some more, yea it feels less like an action game and more like a "sailing around with nearly zero health" simulator.


I thought it was quite the opposite, too easy. Your ship recharges health. Also you can claim and island and heal there. It's so easy to the point it's actually boring, I got 10 kills before I quit, never got close to dying. Hell you can even claim an island, sit there in healing-mode, while shooting down enemy ships that don't heal...

A difficulty slider is a good idea though.


Yea, the difficulty seems to hinge on whether or not you manage to capture an island. If you do, then you can heal your ship, at which point I agree with you--the game becomes super easy. If you fail to kill three enemies (prerequisite of capturing an island), then it becomes a "sail around at zero health" game.

A "difficulty" feature could just be a toggle: Start with a claimed island, or start with no claimed island.


There may be a bug, I captured an island with less than 3 kills (I didn't have more than 1 kill IIRC)

Always a bit disappointed in the details in these kinds of threads. When you do get answers, they're never specific enough to try out on your own. It'll be something like "I use Qwen 3.5 and get great results!" OK but what quantization are you using? What llama parameters? What context size? What GPU are you running it on, and how much VRAM does it have? Are you hosting it on a separate box, or running it locally on your dev machine? What coding agent tool are you using, and how is it configured / hooked up to the model?

All you get here is some market signal from 1 or 2 posts if you already know how to do it. Most of these responses are garbage.

I have good results with this setup:

Hardware:

- GPU: AMD 7900xtx, 24gb vram

- CPU: AMD 5950x, AM4

- RAM: 64gb DDR4 3600

Software:

- OS: Bazzite (atomic fedora - this machine is running Steam "big picture" mode on my TV when not in use for LLM tasks)

- Virtualization: Podman Quadlets, which allows me to run container images as managed systemd units

- Network: tailscale

- Inference: llama.cpp vulkan (better performance than ROCM, though I'm keeping an eye on it in the future)

- LLM API surface: llama-swap (running as a podman quadlet exposed via tailscale svc) allows running multiple models on a single endpoint.

- Web/Chat Access: open-webui (running as podman quadlet exposed via tailscale svc) allows me to access any of the models I'm using for coding harness access for chat/general purpose queries via web browser. I also have the "conduit" app for my iPhone that allows me to hit the same models from my phone.

Models:

- Qwen3.6-27B-MTP-UD-Q4_K_XL.gguf - Unsloth Q4 quant of the qwen 3.6 27B model weights, with MTP enabled. MTP is important as it improves the speed the model can run at.

- Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf - Unsloth Q4 quant of 35B-A3B. Not MTP right now because I was having some issues with it?

- gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf - Gemma 4, which I use sometimes via open-webui instead of Qwen, but I generally think Qwen does a better job

Flags (specific for Qwen 27b, since that's primary model):

- `-ngl 99` offload all layers to GPU

- `-c 80000` 80K context window. I'd like this to be higher, but since my GPU also has to run the desktop session for the machine, I need to leave some VRAM overhead to keep the desktop from OOM-ing

- `-np 1` single slot (no parallel request handling)

- `--no-context-shift` error instead of silently sliding the context window when full

- `--cache-reuse 256` reuse cached prefix in chunks of 256 tokens (prompt cache)

- `-b 2048` logical batch size (tokens per submission)

- `-ub 1024` physical micro-batch (per GPU pass)

- `--cache-type-k q8_0 --cache-type-v q8_0` symmetric 8-bit K/V cache. Q8 is as low as I've been able to go without getting some issues with tool calling

- `-fa on` flash attention

- `--spec-type draft-mtp` use the model's built-in MTP as the draft model

- `--spec-draft-n-max 3` propose up to 3 draft tokens per step

- `--spec-draft-n-min 0` allow zero drafts if confidence is low

- `--spec-draft-type-k q8_0 --spec-draft-type-v q8_0` KV quant for the draft path

- `--reasoning-format deepseek` parse <think> blocks in proper format

- `--chat-template-kwargs '{"enable_thinking": true}'` turns on Qwen's thinking mode on by default (clients can override)

- `--jinja` use the GGUF's Jinja chat template

- `--temp 0.6` moderate randomness (Qwen recommended value for coding)

- `--top-p 0.95` nucleus sampling (Qwen recommended value for coding)

- `--top-k 20` top-20 candidates (Qwen recommended value for coding)

- `--min-p 0.0 disabled (Qwen recommended value for coding)

Performance (27b, primary model):

- ~65t/s for token generation

- ~600 t/s for prompt processing.

- If these numbers don't mean much to you, perceptually this feels about on-par with cloud model speed, maybe slightly faster.

- ~30s cold start when swapping from a different model or starting up session from idle via llama-swap.

I have llama-swap set up to unload the model after 10 min of idle, because I sometimes use this machine for gaming as well. A little annoying, but a small price to pay to be able to use the machine for other stuff (gaming) when I'm not using it with coding tasks.

CLI/Harness:

- Crush harness (https://github.com/charmbracelet/crush) less feature rich than Claude Code, but with a smaller system prompt and better built-in LSP support. I point it at the tailnet DNS (https://llama.<tailnet>:<port>)

- Headroom (https://github.com/chopratejas/headroom) to maximize the 80k context window

- Exa MCP for web search (https://exa.ai/) this alone makes the model far more useable. It's shocking how often the official claude code or codex harness get botblocked on web fetches, and the results of a good web fetch can be the difference between a good turn and a bad turn.

A lot of people get hung up on whether Qwen 3.x models are "as smart as" some parallel Anthropic model. Most people seem to agree it's somewhere between Haiku 4.5 and Sonnet 4.5. Personally, I think the biggest thing that makes the Qwen 3.x series of models _feel_ good to use for coding workflows is that its the first time that tool calling actually works consistently on local models. If tool calling is busted even 5% of the time, it can totally ruin the flow. I think that's also why people tend to say the "harness is more important than the model" or whatever. I have a few other models set up but 27B with MTP is the best compromise of speed and quality that I've found.

This setup works well enough for me that I dropped my personal Claude Code subscription. At work I'm still using frontier models, but personally I don't feel like I need that much power for anything I work on in my personal life. I'm "lucky" that I made the random financially unwise choice to buy a 7900XTX in late 2022 for $1k as a gaming card. I had no clue it would actually be a pretty decent LLM card 3-4 years later.

Edit: sorry for the horrible formatting, I always forget that HN doesn't actually do markdown :(


Now that's what I'm talking about! Very cool, thank you for the detailed response.

In the USA at least, I've found that this kind of "not working means not available" arrangement is easier or harder based on your seniority and the kind of company you work for. I am able to hold the line on this now, 25 years into my career, but it took a long time to get to this point, and I never would have been able to swing it when I was a junior programmer, and when I was working in a hyper-work-obsessed startup.

Back in the early 2000s when I was Junior Engineer Number 32204, and not particularly valuable to my medium sized company in a competitive industry, I could never have gotten away with "Oh, by the way, boss, I am totally unreachable nights and weekends, and don't bring work with me on vacation." But, now, quite a bit more senior in my career and working in a "comfortable" big tech role, it's possible.


> Back in the early 2000s when I was Junior Engineer ...

I tried something like this over July 4th weekend last time I was full-time anywhere (startup; 2010) and it very quickly devolved into an i-quit-you-cant-quit-i-fired-you situation and the company withholding my final paycheck. (New York State employment law does not mess around and I was eventually paid after dragging the deadbeat through Small Claims.)

It traumatized me and is in large part why I've been a freelancer / running my own consultancy ever since. My self-employed situation is better in some way and worse in others but I can't even imagine what it's like to not have my back against the wall 24/7/365. :(


This was mostly my experience. Once I was very Sr and reporting to the VP my solution was people could get in touch via the VP, his admin or my admin. Worked well (there were some things I really did need to be called for).

But not a general solution. But with a good manager can work more broadly. And I did see a couple managers do something similar for their teams, making it clear that if you need emergency attention contact the oncall, if for some reason that won’t do call the manager. This friction alone deals with most issues.


It's a small number of data points, but neither of my two early-career jobs had any expectations like that. I've never explicitly said "I'm not reachable," I just have never worked or responded to work communications outside of work hours, and no one has ever questioned me on it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: