One of my most frustrating things regarding the potential of an AI bubble was some very smart and intelligent researcher being incredibly bullish on AI on Twitter because if you extrapolate graphs measuring AI's ability to complete long-duration tasks (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...) or other benchmarks then by 2026 or 2027 then you've basically invented AGI.
I'm going to take his statements at face value and assume that he really does have faith in his own predictions and isn't trying to fleece us.
My gripe with this statement is that this prediction is based on proxies for capability that aren't particularly reliable. To elaborate, the latest frontier models score something like 65% on SWE-bench, but I don't think they're as capable as a human that also scored 65%. That isn't to say that they're incapable, but just that they aren't as capable as an equivalent human. I think there's a very real chance that a model absolutely crushes the SWE-bench benchmark but still isn't quite ready to function as an independent software engineering agent.
So a lot of this bullishness basically hinges on the idea that if you extrapolate some line on a graph into the future, then by next year or the year after all white-collar work can be automated. Terrifying as that is, this all hinges on the idea that these graphs, these benchmarks, are good proxies.
There's a huge disconnect between what the benchmarks are showing and what the day-to-day experience of those of us using LLMs are experiencing. According to SWE-bench, I should be able to outsource a lot of tasks to LLMs by now. But practically speaking, I can't get them to reliably do even the most basic of tasks. Benchmaxxing is a real phenomenon. Internal private assessments are the most accurate source of information that we have, and those seem to be quite mixed for the most recent models.
How ironic that these LLM's appear to be overfitting to the benchmark scores. Presumably these researchers deal with overfitting every day, but can't recognize it right in front of them
I'm sure they all know it's happening. But the incentives are all misaligned. They get promotions and raises for pushing the frontier which means showing SOTA performance on benchmarks.
>> by next year or the year after all white-collar work can be automated
Work generates work. If you remove the need for 50% of the work then a significant amount of the remaining work never needs to be done. It just doesn't appear.
The software that is used by people in their jobs will no longer be needed if those people aren't hired to do their jobs. There goes Slack, Teams, GitHub, Zoom, Powerpoint, Excel, whatever... And if the software isn't needed then it doesn't need to be written, by either a person or an AI. So any need for AI Coders shrinks considerably.
I think people underestimate the degree to which fun matters when it comes to productivity. If something isn’t fun then I’ll likely put it off. A 15 minute task can become hours, maybe days long, because I’m going to procrastinate on doing it.
If managing a bunch of AI agents is a very un-fun way to spend time, then I don’t think it’s the future. If the new way of doing this is more work and more tedium, then why the hell have we collectively decided this is the new way to work when historically the approach has been to automate and abstract tedium so we can focus on what matters?
The people selling you the future of work don’t necessarily know better than you.
I think some people have more fun using LLM agents and generative AI tools. Not my case, but you can definitely read a bunch of comments from people using the tools and having fun/experience a state of flow like they have never had before
I definitely agree with you there. I contracted with a company that had some older engineers who were in largely managerial roles who really liked using AI for personal projects, and honestly, I kind of get it. Their work flow was basically prompt, get results, prompt again with modifications, rinse and repeat, it's low effort and has a nice REPL-like loop. Paraphrasing a bit, but it basically re-kindled the joy of programming for them.
Haven't gotten the chance to ask, but I imagine managing a team of AI agents would feel a little too much like their day job, and consequently, suck the fun out of it.
That said, looking back, I think the reason why generative AI is so fun for so many coders is because programming has become unnecessarily complex. I have to admit, programming nowadays for me feels like a bit of a slog at times because of the sheer effort it can sometimes take to implement the simplest things. Doesn't have to be that way, but I think LLM copy-paste machines are probably the wrong direction.
I think the majority of people I've worked with who have the title of "Software Engineer" do not like coding. They got into it for the money/career, and dream of eventually moving out of coding into management. I can count the number of coders who I've met who like coding on one hand
I've been enjoying seeing my agents produce code while I am otherwise too busy to program, or seeing refined prompts & context engineering get better results. The boring kinds of programming tasks that I would normally put off are now lower friction, and now there's an element of workflow tinkering with all these different AI tools that lets me have some fun with it.
I also recently programmed for a few hours on a plane, with no LLM assistance whatsoever, and it was a refreshing way to reconnect with the joy of just internalizing a problem and fitting the pieces together in realtime. I am a bit sad that this kind of fun may no longer be lucrative in the near future, but I am thankful I got to experience it.
I’ll be that voice I guess - I have fun “vibe coding”.
I’m a professional software engineer in Silicon Valley, and I’m fortunate to have been able to work on household-name consumer products across my career. I definitely know how to do “real” professional work “at scale” or whatever. Point is, I can do real work and understand things on my own, and I can generally review code and guide architecture and all that jazz. I became a software engineer because I love creating things that I and others could use, and I don’t care about “solving the puzzle” type satisfaction from writing code. In engineering school, software had the fastest turnaround time from idea in my head to something I could use, and that’s why I became a software engineer.
LLM assisted coding accelerates this trend. I can guide an LLM to help me create things quickly and easily. Things I can mostly create myself, of course, but I find it faster for a whole category of easy tasks like generating UIs. It really lowers the “activation energy” to experiment. I think of it like 3D printing, where I can prototype ideas in an afternoon instead of long weekend or a few weeks.
>because I love creating things that I and others could use, and I don’t care about “solving the puzzle” type satisfaction from writing code.
Please don't take offense to this, but it sounds like you just don't like building software? It seems like the end goal is what excites you, not the process.
I think for many of us who prefer to write code ourselves, the relationship we have with building software is for the craft/intellectual stimulation. The working product is cool of course, but the real joy is knowing how to do something new.
I understand where you're coming from (and I don't take offense), but based on your reply, I don't really feel like my views came across.
When I was a student, I took classes on chip and circuit design. One class, the professor had us work on all these complex circuits to do things like flash lights and produce various signals with analog circuits. The next lesson, he had us replace all that complex work with a microcontroller and 20 lines of C - "the way it's done in industry". The students mourned the loss of the "real" engineering because the circuit that required skill and careful math was replaced by a cheap chip and some trivial software. Their entire concept of the craft was destroyed when they were given a tool that replaced the "fun parts" with some trivial and comparatively boring work. That same concept of replacing circuits with digital logic scaled up is how extremely complex and well engineered circuits like FPGAs work.
Maybe it was just my earlier wording, but I think there is joy in the act of turning your ideas into something real - creation - not just having something real. Shopping is not building. Importantly, it takes careful thought and practice and a learned instinct to engineer and create things correctly, and do it repeatably, as the original article discusses. Craft is about practice, and learning, and trying something new with what you've learned.
If LLMs mean that I'll never have to write another trivial set of methods to store a JSON object in a SQL database, I don't think I'll lose any project-wide joy. Expressing creativity, and trying new things is what's great, not typing something that's been done a million times before. It's a tired analogy, but I do think of it more like a level of abstraction, like the LLM is a "compiler" for design docs or specifications. For myself, I usually don't see a difference between a prompt instructing an LLM to write some function, and the code for the function itself - in same way that a method in Java, bytecode, and asm are basically the same (with some caveats here around complexity and originality).
For a lot of folks, the derivation of joy is not as scale-free as seems necessary to move up the hierarchy in this way. The jump in abstraction kills some joy by removing the tangible process. The tactile enjoyment someone gets from knitting is not there when operating a loom, much less when managing someone else who operates the loom.
The change in agency also kills the joy for me. I thrive on abstraction in the language and mathematics sense. But I do not at all enjoy indirection and delegation through unreliable agents. I am not interested in the loss of control and the new risk management task. I would never accept a "stochastic compiler" that offered to optimize my code but with risk of randomly changing the semantics. That determinism in the semantics needs to remain for me to accept a tool as a valid abstraction.
For context, I am a computer scientist by title and a programmer at heart. I got my CS degree from a liberal arts program rather than an engineering school. My temperament is more that of a hands-on artist at an easel or typewriter and not that of a manager of an engineering department. In my long career, I have thrived with peers or betters on collaborative projects. I have zero interest in "advancing" to a managerial role.
But honestly, the loss of control, lack of trust, and associated risk management is a big problem for me. I have rarely delegated work to less skilled or less reliable juniors, and I have never enjoyed that. The scenario of a confidently wrong subordinate is a huge trigger for me. It evokes long term trauma from growing up with a mentally ill family member. It feels like all of the burden of being a caregiver to someone with delusions, but with none of the moral context to make that worth the cost.
There is nothing wrong with finding joy however one finds joy, and that can vary from person to person. Someone may find joy from knitting by hand, but maybe someone else finds joy from experimenting with pattern and material, and a loom lets them focus on the parts that interest them.
As a thought experiment, do you think it would be just as fun if you were given access to an infinite database of apps, and you were able to search through the database for an existing app that suit your needs, and then it gave it to you?
Or would it no longer be fun, because it no longer feels like creating?
No, you were clear. I suppose I was interested to see where you drew the distinction between creating and shopping.
For example, lets say LLMs improve to the point where they can now reliably one-shot entire apps with no more input than the original prompt. Would you no longer consider that creating? What's the difference between that and typing your prompt into an infinite app store?
> Practically speaking, we’ve observed it maintaining focus for more than 30 hours on complex, multi-step tasks.
Really curious about this since people keep bringing it up on Twitter. They mention it pretty much off-handedly in their press release and doesn't show up at all in their system card. It's only through an article on The Verge that we get more context. Apparently they told it to build a Slack clone and left it unattended for 30 hours, and it built a Slack clone using 11,000 lines of code (https://www.theverge.com/ai-artificial-intelligence/787524/a...)
I have very low expectations around what would happen if you took an LLM and let it run unattended for 30 hours on a task, so I have a lot of questions as to the quality of the output
Interestingly the internet is full of "slack clone" dev tutorials. I used to work for a company that provides chat backend/frontend components as a service. It was one of their go-to examples, and the same is true for their competitors.
While it's impressive that you can now just have an llm build this, I wouldn't be surprised if the result of these 30 hours is essentially just a re-hash of one of those example Slack clones. Especially since all of these models have internet access nowadays; I honestly think 30 hours isn't even that fast for something like this, where you can realistically follow a tutorial and have it done.
This is obviously much more than just taking an LLM an letting it run for 30 hours. You have to build a whole environment together with external tool integration and context management and then tune the prompts and perhaps even set up a multi-agent system. I believe that if someone puts a ton of work into this you can have an LLM run for that long and still produce sellable outputs, but let's not pretend like this is something that average devs can do by buying some API tokens and kicking off a frontier model.
Yes but you need to setup quite a bit of tooling to provide feedback loops.
It's one thing to get an llm to do something unattended for long durations, it's a other to give it the means of verification.
For example I'm busy upgrading a 500k LoC rails 1 codebase to rails 8 and built several DSLs that give it proper authorised sessions in a headless browser with basic html parsing tooling so it can "see" what affect it's fixes have. Then you somehow need to also give it a reliable way to keep track of the past and it's own learnings, which sound simple but I have yet to see any tool or model solve it on this scale...will give sonnet 4.5 a try this weekend, but yeah none of the models I tried are able to produce meaningful results over long periods on this upgrade task without good tooling and strong feedback loops
Btw I have upgraded the app and taking it to alpha testing now so it is possible
I've tried asking it to log every request and response to a project_log.md but it routinely ignores that.
I've also tried using playwright for testing in a headless browser and taking screenshots for a blog that can effectively act as a log , it just seems like too tall an order for it.
It sounds like you're streets ahead of where I am could you give me some pointers on getting started with a feed back loop please
But then that goes back to the original question, considering my own experiences observing the amount of damage CC or Codex can do in a working code base with a couple tiny initial mistakes or confusion about intent while being left unattended for ten minutes, let alone 30 hours....
If you had used any of those, you'd know they clearly don't work well enough for such long tasks. We're not yet at the point where we have general purpose fire-and-forget frameworks. But there have been a few research examples from constrained environments with a complex custom setup.
That sounds to me like a full room of guys trying to figure out the most outrageous thing they can say about the update, without being accused of lying. Half of them on ketamine, the other on 5-MeO-DMT. Bat country. 2 months of 007 work.
What they don't mention is all the tooling, MCPs and other stuff they've added to make this work. It's not 30 hours out of the box. It's probably heavily guard-railed, with a lot of validated plans, checklists and verification points they can check. It's similar to 'lab conditions', you won't get that output in real-world situations.
Yeah, I thought about that after I looked at the SWE-bench results. It doesn't make sense that the SWE results are barely an improvement yet somehow the model is a more significant improvement when it comes to long tasks. You'd expect a huge gain in one to translate to the other.
Unless the main area of improvement was tools and scaffolding rather than the model itself.
“30 hours of unattended work” is totally vague and it doesn’t mean anything on its own. It - at the very least - highly depends on the amount of tokens you were able to process.
Just to illustrate, say you are running on a slow machine that outputs 1 token per hour. At that speed you would produce approximately one sentence.
(First of all: Why would anyone in their right mind want a Slack clone? Slack is a cancer. The only people who want it are non-technical people, who inflict it upon their employees.)
Is it just a chat with a group or 1on1 chat? Or does it have threads, emojis, voice chat calls, pinning of messages, all the CSS styling (which probably already is 11k lines or more for the real Slack), web hooks/apps?
Also, of course it is just a BS announcement, without honesty, if they don't publish a reproducible setup, that leads to the same outcome they had. It's the equivalent of "But it worked on my machine!" or "scientific" papers that prove anti gravity with superconductors and perpetuum mobile infinite energy, that only worked in a small shed where some supposed physics professor lives.
Their point still stands though? They said the 1 tok/hr example was illustrative only. 11,000 LoC could be generated line-by-line in one shot, taking not much more than 11,000 * avg_tokens_per_line tokens. Or the model could be embedded in an agent and spend a million tokens contemplating every line.
> Apparently they told it to build a Slack clone and left it unattended for 30 hours, and it built a Slack clone using 11,000 lines of code
it's going to be an issue I think, now that lots of these agents support computer use, we are at the point where you can install an app, tell the agent you want something that works exactly the same and just let it run until it produces it.
The software world may find it's got more in common with book authors than they thought sooner rather than later once full clones of popular apps are popping out of coding tools. It will be interesting to see if this results in a war of attrition with counter measures and strict ToU that prohibit use by AI agents etc.
It has been trivial to build a clone of most popular services for years, even before LLMs. One of my first projects was Miguel Grinberg's Flask tutorial, in which a total noob can build a Twitter clone in an afternoon.
What keeps people in are network effects and some dark patterns like vendor lock-in and data unportability.
There's a marked difference between running a Twitter-like application that scales to even a few hundred thousand users, and one that is a global scale application.
You may find quickly that, network effects aside, you would find yourself crushed under the weight and unexpected bottlenecks of that network you desire.
Agreed entirely but not sure that's relevant in what I'm replying to.
> we are at the point where you can install an app, tell the agent you want something that works exactly the same and just let it run until it produces it
That won't produce a global-scale application infrastructure either, it'll just reproduce the functionality available to the user.
Curious about this too – does it use the standard context management tools that ship with Claude Code? At 200K context size (or 1M for the beta version), I'm really interested in the techniques used to run it for 30 hours.
You can use the built in task agent. When you have a plan and ready for Claude to implement, just say something along the line of “begin implementation, split each step into their own subagent, run them sequentially”
subagents are where Claude code shines and codex still lags behind. Claude code can do some things in parallel within a single session with subagents and codex cannot.
Yeah, in parallel. They don't call it yolo mode for nothing! I have Claude configured to commit units of work to git, and after reviewing the commits by hand, they're cleanly separated be file. The todo's don't conflict in the first place though; eg changes to the admin api code won't conflict with changes to submission frontend code so that's the limited human mechanism I'm using for that.
I'll admit it's a bit insane to have it make changes in the same directory simultaneously. I'm sure I could ask it to use git worktrees and have it use separate directories, but I haven't (needed to) try that (yet), so I won't comment on how well it would actually do with that.
Have the released the code for this? Does it work? or are there x number of caviets and excuses. I'm kinda of sick of them (and others) getting a free pass at saying stuff like this.
It really is crazy how much was lost when Apple killed Flash. Absolutely miss Newgrounds. It's still around of course, I'm reflecting more on the vibes when it was in its heyday. Unbelievable the games people were making with Flash back then and how it spawned the careers of a ton of indie darlings. Also, not Flash at all, but does anyone remember Exit Mundi? Absolute gold.
Honestly, I kind of look back on blogging unfavorably. Before that people made websites to showcase their interests and hobbies, and because of that even the most basic looking websites could have a lot of "color" to them. Then blogging became a thing and people's websites became bland and minimalist. Arguably blogging culture is as responsible for the death of creativity on the internet as much as the constraints of mobile-friendly web design and Apple's aforementioned killing of Flash.
> It really is crazy how much was lost when Apple killed Flash.
Steve Jobs published "Thoughts on Flash" [1] in 2010; Flash was discontinued by Adobe in 2017. If Apple supposedly "killed" Flash, they sure took their time doing so.
The iPhone had about 14% marketshare at the time, so it's not like Apple was in a commanding position to dictate terms to the industry.
But if you read his letter, what he said made total sense: Flash was designed for the desktop, not phones—it certainly wasn't power or memory efficient. Apple was still selling the iPhone 3GS at the time, a device with 256Mb of RAM and a 600Mhz 32-bit processor.
And of course Flash was proprietary and 100% controlled by Adobe.
Jobs made the case for the (still in development) HTML5--HTML, CSS and JavaScript.
What people don't seem to remember: most of the industry thought the iPhone would fail as a platform because it didn't support Flash, which was wildly popular.
> Steve Jobs published "Thoughts on Flash" [1] in 2010; Flash was discontinued by Adobe in 2017. If Apple supposedly "killed" Flash, they sure took their time doing so.
I’m really surprised anyone could say that. To my view, “Thoughts on Flash killed Flash” is about as true as “the sky is blue”. It’s fairly clear to me that without a strong stance, a less principled mobile OS (like Android) would have supported it, and probably Flash would still be around today. Apple’s stance gave Google the path to do the same thing, and this domino effect led to Flash being discontinued 7 years later. You say 7 years as if it’s a long time from cause to effect, but how long would you estimate it would take a single action to fully kill something as pervasive as Flash, which was installed on virtually every machine (Im sure it was 99%+)? You correctly cite that iOS penetration was low at the time, but mobile Safari grew over the next few years to become the dominant web browser, and that was sufficient.
> You correctly cite that iOS penetration was low at the time, but mobile Safari grew over the next few years to become the dominant web browser, and that was sufficient.
First, there's no way Flash would still be alive today; Apple might have sped up its demise but it had so many disadvantages, it was just a matter of time and it was controlled by one company.
Remember that the web standards movement was kicking into high gear around the same time; we had already dodged a bullet when Microsoft attempted to take over the web with Active X, Silverlight, JScript.
The whole point of the Web Standards movement was to get away from proprietary technologies.
> You correctly cite that iOS penetration was low at the time, but mobile Safari grew over the next few years to become the dominant web browser, and that was sufficient.
Safari has never been the dominant browser; not sure why you think that. Other than the United States, iPhone marketshare is under 50% everywhere else.
Even in 2025, Safari's global marketshare is about 15% [1] and that's after selling 3 billion devices [2].
> Microsoft attempted to take over the web with Active X, Silverlight, JScript.
Silverlight was a responsive to flash.
It was also remarkably open for the time, ran on all desktop platforms, and in an alternative universe Silverlight is an open source cross platform UI toolkit that runs with a tiny fraction of the system requirements of electron, using a far superior tool chain.
Getting rid of Flash was never ever about using HTML5 for Apple. It was always to obviously to make battery life better and ofc adding more experiences to their walled garden store.
Safari is lagging on HTML5 features for decade far behind Firefox. And any features useful for "PWA" is just sabotaged. E.g like Screen Wake Lock API finally implemented in iOS 16 but to this day broken on Home screen. And like not quite obvious to use in Safari too.
Because working web standards support would make cross platform mobile apps possible outside of App Store.
I don't think it's lagging behind that much, and you could also argue that you don't need to implement every single feature blindly. A lot of features are strictly not needed, and if you do decide to do them - it needs to be done in an efficient way.
There's a reason why Safari is considered the most energy efficient browser.
> And any features useful for "PWA" is just sabotaged.
From "Every site can be a web app on iOS and iPadOS" [1]
Now, we are revising the behavior on iOS 26 and iPadOS 26. By default, every website added to the Home Screen opens as a web app. If the user prefers to add a bookmark for their browser, they can disable “Open as Web App” when adding to Home Screen — even if the site is configured to be a web app. The UI is always consistent, no matter how the site’s code is configured. And the power to define the experience is in the hands of users.
This change, of course, is not removing any of WebKit’s current support for web app features. If you include a Web Application Manifest with your site, the benefits it provides will be part of the user’s experience. If you define your icons in the manifest, they’re used.
We value the principles of progressive enhancement and separation of concerns. All of the same web technology is available to you as a developer, to build the experience you would like to build. Giving users a web app experience simply no longer requires a manifest file. It’s similar to how Home Screen web apps on iOS and iPadOS never required Service Workers (as PWAs do on other platforms), yet including Service Workers in your code can greatly enhance the user experience.
Simply put, there are now zero requirements for “installability” in Safari. Users can add any site to their Home Screen and open it as a web app on iOS26 and iPadOS26.
> Safari is lagging on HTML5 features for decade far behind Firefox.
Really?
Safari was first to ship :has() in March 2022; Firefox couldn't ship until December 2023.
I listed a bunch of web platform features Safari shipped before Chrome and Firefox [1][2].
Even now, Firefox hasn't shipped Anchor Positioning, Scroll-driven animation, text-wrap: pretty, Web GPU, Cross-document view transitions, etc. but Safari and Chrome have.
> It was always to obviously to make battery life better
I don't think it was about saving battery power. Jobs was smart in convincing people to focus on web stack for apps - Flash was king of rich app experiences, and java [inc applets] for corporate apps. Apps went iOS native batteries got drained in other ways (large video & photos, prolonged use). Just think of the costs, energy and time spent over the next 15 years maintaining multiple code-bases to deliver one service. The web remained open, where as mobile went native and closed-in.
> Apple was still selling the iPhone 3GS at the time, a device with 256Mb of RAM and a 600Mhz 32-bit processor.
That's a ton of ram. I recall spending a lot of time on flash websites in the early 2000s in college on the school issued laptop with maybe 64 mb of ram (and I think maybe pentium iii 650mhz so more cpu oomph)
Given Steve Jobs' character, there may be another reason behind it: that year, before the decision to stop supporting Flash, Adobe broke their agreement with Apple by releasing Photoshop for PCs before the version for Mac for the first time. This could sound as a conspiracy teory, and I don't know how much evidence we have that this might have been the reason (or one of the reasons) behind the decision. But, given Jobs' personality, I think this is plausible.
We should also consider that, having Flash support, would have opened the door to non-Apple-approved apps running on iPhones, something that Apple has always strenously opposed. All-in-all, at the time I got the feeling that the technical reasons provided by Jobs weren't the main reasons behind the decision.
Flash and open source actionscript allowed devs to completely circumvent the Apple App Store. That was a direct threat to the iOS business model at the time.
You're right, I did a quick search and apparently they still considered supporting it though [0], but they didn't they the results they were hoping for.
Newgrounds was incredible! At the time, games (of any quality) for free was a biiiiig deal. It was even more amazing that you didn't need to wait hours to download and install them.
I agree w/ your take on blogging... kind of a bland "one-stop-shop" for everything a person thinks of rather than an experience tailored to a specific interest. I used to make Dragonball Z fan sites mostly... even within a single domain I would have multiple websites all linking to each other, each with a different design, and subtly different content, but now I have a bland blog that I don't update regularly lol. Maybe building a retro site is what I really need to do.
I'm working on a revamp of my personal site. I do a lot of creative coding, most of them are throwaway experiments, so I thought I'd showcase more of them there. Besides that though, I have some "rare pepes" that I've been meaning to put somewhere. What I like about these is that they're highly polished, animated gifs that imitate the sort of "holographic" effect you'd find in rare collector's cards, but at the same time you can't track down who originally made them, they aren't part of some professional's online portfolio. In that sense they feel like a special piece of internet folk art, made by some complete rando.
Nowadays we have Pinterest and the like, but I really like the idea of creating my own little online space for images I like.
One of my fondest memories of the early Web was getting awards on Newgrounds for my animations(I had one for 250k views. Pretty sure I got into the top ten at one point) , and logging in to read and respond to comments, and to see that view counter tick up and up!
Agreed on both points. When blogging became the dominant reason for having a website, we were already on the way to the "content" hell. Any semi popular website had pressure to post more frequently, diluting quality. And pretty soon after that, blogs went from 500 words to 140 characters, but 10x the frequency.
Static websites that were updated only once in a while were far better at showing a cross section of someone's life In that respect, StumbleUpon and browser bookmarks were superior to RSS.
In retrospect, I would say that the "blog rush" was kind of a precursor to the rise of influencers. There was even a crowd of "blogging gurus" that would ask a pretty peny for advice on how to advance your blog.
The blogging pressure got so out of hand, that even some EU bureucrat thought it would be a great idea for each FP6 funded project to have a blog besides its static website. At least with the influencing trend they don't ask researchers to do glamour shots with their food.
Ruffle needs more love. The time and effort that's gone in to a browser extension to emulate flash, should be receiving that sponsorship from Cloudflare.
It's more to do with HTML5 than lack-of flash, although it could be argued that flash's long-prophesied downfall was one of the reasons for HTML5's rapid adoption.
HTML5 is when the web stopped being the web. It has no legitimacy in calling itself "hypertext", it's an app-delivery mechanism with a built-in compatibility layer. In this regard Flash is just as bad and probably even worse, but since it wasn't in anyway standardized or even open-source there was a fair amount of pushback from all fronts. HTML5 had no such pushback.
Flash was the first broken site I ever encountered, some restaurant had an all-flash webpage. Never did end up going to that restaurant. Why bother? If they render content inaccessible behind some needless jank like Flash or JavaScript, why bother? Another fun thing to do was to keep a tally of how many "OMG stop the presses!!" security vulnerabilities Flash had racked up over time, which was lots. Many hundreds. Made even a lolfest like Windows look bad. Flash, it was not killed with fire soon enough. Chrome and other such bloatware arguably also need some sort of fire, or at least a diet or trepaning or something, but that's a different rant, though one very much related to the Old Web or the smolweb.
That argument would have merit if the replacement wasn't apps, you can only buy in a monopoly store that own all the rights for licence management (and conveniently get deprecated at a pace decided by the company selling the hardware to run them).
My thoughts as I reconcile my conflicting feelings on this. I think it should be objectively cool that Meta has finally managed to come out with a pair of smart glasses that come incredibly close to being a practical wearable.
The thing is, it's honestly hard to imagine doing anything cool with them. I think this has less to do with hardware limitations and more to do with vendor restrictions.
I think Meta is fundamentally incapable of making anything cool. Hence why they had to partner with Ray-Ban to make these glasses rather than making their own. I think Meta's failure to realize their version of the metaverse had to do with their inability to recognize coolness and taste as much as anything else. I think any and all apps Meta ships with these glasses are cursed to be a mediocre experience.
I think Apple could do a better job but at the end of the day I think the most interesting (not necessarily best) would be ones with the most developer freedom.
My personal favorite from that time was a website builder called "The Grid" which really overhyped on its promises.
It never had a public product, but people in the private beta mentioned that they did have a product, just that it wasn't particularly good. It took forever to make websites, they were often overly formulaic, the code was terrible, etc etc.
10 years later and some of those complaints still ring true
Genuinely technically impressive, but I have a weird issue with calling these world simulator models. To me, they're video game simulator models.
I've only ever seen demos of these models where things happen from a first-person or 3rd-person perspective, often in the sort of context where you are controlling some sort of playable avatar. I've never seen a demo where they prompted a model to simulate a forest ecology and it simulated the complex interplay of life.
Hence, it feels like a video game simulator, or put another way, a simulator of a simulator of a world model.
Also, to drive my point further home, in one of the demos they were operating a jetski during a festival. If the jetski bumps into a small Chinese lantern, it will move the lantern. Impressive. However, when the jetski bumped into some sort of floating structure the structure itself was completely unaffected while the jetski simply stopped moving.
This is a pretty clear example of video game physics at work. In the real world, both the jetski and floating structure would be much more affected by a collision, but in the context of video game physics such an interaction makes sense.
So yeah, it's a video game simulator, not a world simulator.
In the "first person standing in a room" demo, it's cool to see 100% optical (trained from recorded footage from cameras) graphics, including non-rectilinear distortion of parallel lines as you'd get from a wide-angle lens and not a high-FOV game engine. But still the motion of the human protagonist and the camera angle were 100% trained on how characters and controllers work in video games.
Sure, but if you're trying to get there by training a model on video games then you're likely going to wind up inadvertently creating a video game simulator rather than a physics simulator.
I don't doubt they're trying to create a world simulator model, I just think they're inadvertently creating a video game simulator model.
Are they training only on video game data though? I would be surprised when its so easy to generate proper training data for this.
It is interesting to think about. This kind of training and model will only capture macro effects. You cannot use this to simulate what happens in a biological cell or tweak a gravity parameter and see how plants grow etc. For a true world model, you'd need to train models that can simulate at microscopic scales as well and then have it all integrated into a bigger model or something.
As an aside, I would love to see something like this for the human body. My belief is that we will only be able to truly solve human health if we have a way of simulating the human body.
It doesn't feel incredibly far off from demoscene scripts that generate mountain ranges in 10k bytes or something. It is wildly impressive but may also be wildly limited in how it accomplishes it and not extensible in a way we would like.
Somewhat related, but I’ve been feeling as of late what can best be described as “benchmark fatigue”.
The latest models can score something like 70% on SWE-bench verified and yet it’s difficult to say what tangible impact this has on actual software development. Likewise, they absolutely crush humans at sport programming but are unreliable software engineers on their own.
What does it really mean that an LLM got gold on this year’s IMO? What if it means pretty much nothing at all besides the simple fact that this LLM is very, very good at IMO style problems?
Far as i can tell here, the actual advancement is in the methodology used to create a model tuned for this problem domain, and how efficient that method is. Theoretically then, making it easier to build other problem-domain-specific models.
That a highly tuned model designed to solve IMO problems can solve IMO problems is impressive, maybe, but yeah it doesn't really signal any specific utility otherwise.
reply