Hacker Newsnew | past | comments | ask | show | jobs | submit | repeekad's commentslogin

I once asked one of the original YouTube infra engineers “will you ever need to delete the long tail of videos no one watches”

They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket


Now that they can harvest it all for AI training, that decision was the cheapest and greatest thing they ever did.

Imagine trying to pay for all that content, nobody on earth would be able or willing to supply it.


PeerTube is a thing. I like to think without centralized players like YT, that P2P supported federation may have gained a better foothold.

There’s still time


Of course videos disappear for copyright, ToS violations, or when the uploaders remove them. They do not disappear just because nobody watched them.

There’s a whole activity around discovering random 15 year old videos with almost no views. It’s usually some random home video

A friend of mine worked two years in YouTube as a content admin.

Basically being given videos to watch all day, especially coming from the middle east (this was ISIS time so any video from the area had someone watching it as soon as uploaded).

Needless to say there's endless gold no view videos according to him.

It's also interesting that it was no open secret that already in 2018 they were all told that they were essentially training machines to do their job.


I was interested in the same thing and built a search for it

https://ytstalker.mov


They also disappear when the government of Pakistan tells Google to erase them: https://lee-phillips.org/youtube/

I seem to recall reading that the HD variations may get removed leaving only 480p or lower for older unwatched videos.

The original upload would likely still be stored, but not available for viewing.


That would be an odd thing to do. HD is low resolution already, and 480 is noticeably worse.

If they really wanted to compress, take out every other frame, and regenerate those frames with a neural decoder. But I don't know why that would be worth the effort for a stable number of low res files either.


I wonder if that still holds true? The volume of videos increases exponentially especially with AI slop, I wonder if at some point they will have to limit the storage per user, with a paid model if you surpass that limit. Many people who upload many videos I guess some form of income off YouTube so it wouldn’t that be that big of a deal.

What they said only holds true because the growth continues so that the old volume of videos doesn't matter as much since there's so many more new ones each year compared to the previous year. So the question is more about whether or not it will hold true in the long term, not today

The framing here is really weird. The volume of videos increasing isn't 'growth.' Videos are inventory for Youtube. They're only good when people (without adblocks!) actually watch them.

Growth in this context is that there are a larger volume of videos each year. So each year a single video is exponentially a smaller and smaller percentage of the total.

Yeah and the math doesn't check out.

For example, if in year N youtube has f(N) new video. Let assume f(N) = cN^2. It's a crazy rate of growth. It's far better than the real world Youtube, which grew rather linearly.

But the rate of "videos that are older than 5 years" is still faster than that, because it would be cubic instead of quadratic. Unless the it's really exponential (it isn't), "videos that are older than 5 years" will always surpass "new videos this year" eventually.


Video sensors are continuously getting cheaper, better and more more prevalent over time. The trend is towards capturing all angles of everything, everywhere, at increasingly higher resolutions.

> Unless the it's really exponential (it isn't), "videos that are older than 5 years" will always surpass "new videos this year" eventually.

Such a weird strawman argument that you are making up. You've over thought this so much that you are missing the forest from the trees


Yes. a video no one watches is a waste of storage.

Maybe not.

Maybe it could be used to train a neutral network. Maybe it contains dirt on a teenager, who might become a politician two decades from now. Maybe it contains an otherwise lost historical event.


Or it just helps to cement YouTube as the go-to place for uploading and sharing videos for almost any purpose which has a long-term positive effect for user engagement and retention

^ This.

I wonder if anyone has ever compiled a list of channels with abnormally large numbers of videos? For example this guy has over 14,000:

https://www.youtube.com/@lylehsaxon


There is a channel with 2 million videos: https://www.youtube.com/@RoelVandePaar/videos One with 4 million videos: https://www.youtube.com/@NameLook

NameLook puts a whole new meaning to "low effort videos"

Lord above. This is the worst garbage I've ever seen:

https://www.youtube.com/shorts/mrOXqgShzI0

This shit is the reason I can't afford a new HDD.


First one has transcribed stack overflow to YT by the look of it

AH! I've stumbled on that first fellas videos before! The videos aren't crazy complex but the sheer volume is impressive in a perverse kind of way.

I guess I should have mentioned I wasn't looking for automated/AI-generated videos.

I assume it's an economics issue. As long as they continue making money off the uploads to a higher extent than it costs for storage, it works out for them.

Do they make a profit nowadays

Likely yes, with a margin of perhaps 38%

https://news.ycombinator.com/item?id=34268536


> The volume of videos increases exponentially

Source?


One day, it will matter. Not even Google can escape the consequences of infinite growth. Kryder's Law is over. We cannot rely on storage getting cheaper faster than we can fill it, and orgs cannot rely on being able to extract more value from data than it costs to store it. Every other org knows this already. The only difference with Google is that they have used their ad cash generator to postpone their reality check moment.

One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.


You can see how Google rolls with how they deleted old Gmail accounts - years of notice, lots of warnings, etc. They finally started deletions recently, and I haven't heard a whimper from anyone (yet).

The problem is that some content creators have already passed away (and others will pass away by then), and their videos will likely be deleted forever.

That may be, but I assume for videos that had some viewership base, there may be a consideration. E. g. if a video was viewed 20 million times, it may be worth more than one that was viewed only 5 times.

I've stumbled upon very valuable content with very low view numbers - the algorithms spiral around spectacularity and provocation, not quality or insight.

Then it's on you to share it !

>videos that had some viewership base, there may be a consideration

Those would be the worst of the lot regarding how valuable they are historically for example. Engaging BS content...


Hopefully the deletion will not affect videos with thousands of views, even if the account is lost.

[flagged]


Goog is 100% not going to delete anything that is driving any advertising at all. The videos are also useful for training AI regardless, so I expect the set of stuff that's deleted will be a VERY small subset. The difference with email is that email can be deduplicated, since it's a broadcast medium, while video is already canonical.

I expect rather than deleting stuff, they'll just crank up the compression on storage of videos that are deemed "low value."


Monuments erode away and memories of those enshrined are lost time as well, nothing lasts forever.

    I met a user from an antique land
    Who said: Two squares of a clip of video
    Stand in at the end of the search. Near them,
    Lossly compressed, a profile with a pfp, whose smile,
    And vacant eyes, and shock of content baiting,
    Tell that its creator well those passions read
    Which yet survive, stamped on these unclicked things,
    The hand that mocked them and the heart that fed:
    And on the title these words appear:
    "My name is Ozymandias, Top Youtuber of All Time:
    Look on my works, ye Mighty, and like and subscribe!"
    No other video beside remains. Round the decay
    Of that empty profile, boundless and bare
    The lone and level page stretch far away.

This is amazing.

Would've been, once. These days I assume bentcorner asked their favourite LLM to generate a poem parodying Ozymandias about once-popular youtube videos.

It doesn't feel like it at all (I'd never expect an LLM to say 'pfp' like that, or 'lossly[sic] compressed', ASCII instead of fancy quotes) but who knows at this point.

I may have gotten incredibly neurotic about online text since 2022.


Nope, I hand wrote this.

I actually considered using an LLM but in my experience they "warp" the content too much for anything like this. The effort required to get them to retain what I would consider something to my taste would take longer than just writing the poem myself. (Although tbf it's been awhile since I've asked a LLM to do parody work, so I could be wrong)


Ah, well, kudos then!

or you could get over it and still enjoy it anyway. Like how Coke Zero tastes.

That is a fair point. Especially since, assuming it was AI-generated, it presumably wouldn't have existed at all otherwise.

Brought to you by Carl's Jr

let's see what will last longer over the ages : engraved stone or google?

Depends on the pH, probably.

Like tears in rain <3

mono no aware

Dropbox seem to be doing the same thing. After years of whining about my 2TB above limit I recently received a mail with a deadline to delete my files or they will.

It depends. At the rough 2 PB of new data they get a day that’s about 10 sq ft of physical rack space per day. Each data center is like 500,000 sq feet so each data center can hold 120 years of YouTube uploads. They’re not going to have to restrict uploads anytime soon.

Not all of the square footage of a data center is usable for racks

Oh. I noticed in an AI music generation service I use that old pieces were severely degraded to the point that they were crackling really bad... And I remember thinking that it's a good thing I downloaded an mp3 of my favorites. I confirmed that the quality is very different by listening to the downloaded recording with the hosted version side-by-side.

Wouldn't it also be a performance nightmare?

The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.

It might really just be cheaper to keep buying new HDDs.


S3 allows delete and is efficient here. I’m sure Google can figure it out

They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view


This is why they removed searching for older videos (specific time) and why their search pushes certain algorithmic videos, other older videos when found by direct link are on long term storage and take a while to start loading.

I’m pretty sure this is the real reason why they changed old unlisted videos to being marked private: https://blog.youtube/news-and-events/update-youtube-unlisted...

Well the time filters (before/after:date) still seem to work, but for controversial / hot topics, somehow, more recent videos tend to still show up at the top. Try "scandal after:2010 before:2012"..

Besides with their search deteriorating to the point where a direct video title doesn't result in a match, nobody can see those videos anyway and they don't have to cache them.

It's not just the search deteriorating. The frontend is littered with bugs. If you write a comment and try to highlight and delete part of that comment, it'll often delete the part you didn't highlight. So apparently they implemented their own textfield for some reason and also fucked it up. It's been like that for years.

The youtube shorts thing is buggy as shit, it'll just stop working a lot of the time, just won't load a video. Some times you have to go back and forth a few times to get it to load. It'll often desync the comments from the video, so you're seeing comments from a different video. Some times the sound from one short plays over the visuals of another.

It only checks for notifications when you open the website from a new tab, so if you want to see if you have any notifications you have to open youtube in a new tab. Refreshing doesn't work.

Seems like all the competent developers have left.


and if you do a hard refresh on the webapp, it literally takes like 10 seconds for the homepage to load

Yeah, one that I forgot to mention is if you pause a youtube short and go to a different tab, the short will unpause in the background, or it might change to an entirely different short and start playing that.

Careful what you wish for, local cops have already abused cameras and license plate readers to arrest people just for driving by the location and looking similar to doorbell video, over package theft...

https://youtu.be/37fp2n6p19Q


In Poland some moron in surveillance center visually profiled a random guy as one wanted person (by jacket color), then police took the guy to the police station and beat him to death.

[1] https://en.wikipedia.org/wiki/Death_of_Igor_Stachowiak


Wow. What an infuriating case, again and again. Only 2 years in prison for the perps because the court decided that "excited delirium" killed the guy and not being beaten and tased 4 times.

And some of the protestors that protested this got 4+ years. Clearly police property has more rights than a human being.

The guys who beat Rodney King faced zero justice, American prison guards pretty regularly do horrific things and face little repercussion, and the guys who executed Pretti were working the next day, until that led to backlash and now they are being systemically protected.

In some states in the US, it is not illegal for a cop to have sex with someone they have arrested. Ain't that just dandy.

The Polish situation could be an upgrade.


This is called context rot

I thought context rot was only for long distance queries.

they are not an unbiased news source, they profit from being biased toward what elites with money for a subscription want to hear

Now imagine how big the builds are for Instagram's server side doomscrollable feed algorithm, given their inverse incentives to this project.


The primary differentiator for me is that bsky does not have a central algorithm, you only see content from people you follow or explicitly go looking for. Yes the "top today" overall feed is very left biased, but the default is an empty feed that you have to populate yourself.

You can't complain about content on bluesky because unlike every other platform you must choose which feeds you use.


Something like Postgres embedding search works out of the box, but don’t forget any search engine today also needs reranking

https://github.com/with-logic/intent


This feels like a much better feature than “they can track your realtime location from the mobile app” as implied in the article? Plus employees will have to opt in?

The tracking is still gross, but limited to opt-in on office WiFi seems a lot less dramatic of a headline, especially given the main concern people have is work from home


> Plus employees will have to opt in?

If a company policy says you have to opt in, not opting in means you're breaching the policy and might get fired. Entirely legal in at-will employment places, but potentially not in places with better worker protections.

Saying that, I just got announcement from my employer they will not be turning it on for now.


Employees need to join a union


Personally I wouldn’t even start working for an organisation that uses Microsoft …


So how many dozens of organizations can you work for?



> More and more every day.

That's not a bad thing.

But I think its totally unrealistic and impractical to deal with this kind of thing by being so choosy that you won't work for an org that uses Microsoft. Actually acting that way probably just means choosing to be unemployed (for the vast majority, at least).


Honestly I don’t know. Pretty comfortable where I’m now and we would never even consider using any M$ products ever. I know US culture is more about job-hopping every other year but I’m at the same place for many years now


We used to use GSuite then we got acquired and we're a microsoft shop. :(


God that is the one big fear I have xD


unfortunately, my org that used google got acquired by an org that forced MS on us...


My large corp is moving to Google from MS, which doesn’t impact me much (I’m contracted out to another large corp) but I really wonder at the expense (in time) of a migration. What a huge drain on resources in the short term.


They can already do… pretty much any organization uses a VPN or “ZTNA” to provide access to resources so they know where you are.


> Plus employees will have to opt in?

I mean, that's not really how "opt-in" works for features that your company owns; you might have to "opt-in" technically but your company will probably make that a little more mandatory.

I do agree that the blog post, headline, and HN comments are as usual quite an overreaction, but this feature is pretty gross. It's also weird because the controversy/grossness-to-utility ratio seems awful, which either means that Microsoft product management has gotten as bad as everyone thinks it has or there's some future plan to make it more "robust."


My concern is if the employee is aware, at least let me quit before I’m silently opted into my boss realizing I can get the same work done with less time at the desk from home


I can’t remember being more enraged than when I learned my YouTube premium was more expensive per month than it needed to be because I had signed up on iPhone, so many people wasting money every month, and YouTube isn’t allowed to mention the option to pay on web

If they weren’t a public company, you’d think they were the mob. I’ll never trust the Apple ecosystem ever again


I called the non-emergency line for the local police department when someone went home with my wallet after I left it on a plane, tracked with an AirTag. 2 hours later an officer said they didn't have probable cause but could knock on the door and ask anyway. I think he basically offered for there to be no trouble if they gave it back, thief claimed they were "going to return it to lost and found", and sure enough I was able to go show my passport at the station and collect it the next day.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: