Who invented file extensions in file names?

xkcd-sucks · on Nov 2, 2023

Somewhat related, the question of "what IS a file's 'type' really" is pretty messy and AFAIK doesn't have any satisfying solution or any satisfying single answer [0]. A satisfying solution might look like a (long) feature vector. Libmagic goes in the right direction at least although the binary-fingerprint-at-the-beginning approach doesn't work well for everything / past some level of detail

[0] where different use cases might be satisfied by e.g.

    "a text file" 
    "a file exclusively comprised of alphanumeric UTF16 characters" 
    "a CSV file"
    "a file with tabular data and key-value metadata that calls itself CSV but is not spec-compliant so internally we'll call it CSVprime"
    "A file with tabular data written as ASCII text that represents $this_kind_of_measurement"
    ... etc

If you have any resources on this topic off the top of your head I'd appreciate it if you shared them

duped · on Nov 2, 2023

GNU file doesn't just check magic numbers against libmagic, it also defines a scripting language and a series of tests/printers written in it.

That's what allows it to do complex things, eg identify all the flavors of ELF objects even though they share a magic number or determine if something is JSON or CSV without one.

prmph · on Nov 2, 2023

The point is that, fundamentally, the concept of a file type is undecidable or not well defined.

Think about it: A JSON file can also be considered a text file. It could also be some higher level type of file, depending on whether it conforms to some application-specific JSON schema. Thus the kind of file it is has more to do with what you want to do with it; it's not some intrinsic property of the file.

duped · on Nov 2, 2023

Ok? I was replying to a comment asking about how such systems can work. GNU file is example of a program that makes a best effort to classify file type in a useful way.

pests · on Nov 2, 2023

A JSON file is also valid YAML.

doubloon · on Nov 3, 2023

so much of our mathematical theories are based on the idea that we have objects which are certainly of one type or another.

what would happen to type theory if, say, the type of an object has a probability attached to it?

justsomehnguy · on Nov 3, 2023

> A JSON file can also be considered

A Curly braces Separated file filled with Values

clort · on Nov 2, 2023

I'm pretty sure the file command is not GNU btw, and I can't find anything about a GNU version.. do you have a reference?

https://en.wikipedia.org/wiki/File_(command)

giancarlostoro · on Nov 3, 2023

https://www.darwinsys.com/file/

This is the authors website. Apparently yeah its not part of GNU utils, I had no idea, I knew it came with most Linux systems so I looked for the Debian package and found the site linked above.

https://packages.debian.org/bookworm/file

duped · on Nov 2, 2023

My bad, I thought it was a part of coreutils and didn't check. I've only ever dug into the Linux utility and assumed.

Too late to edit!

spullara · on Nov 2, 2023

Not sure why it would think a module javascript file is java. Maybe just not updated very often.

    % file crawl.mjs 
    crawl.mjs: Java source, ASCII text

vpzom · on Nov 2, 2023

probably looks for "import" at the start

Edit: yep, the test for Java source is just /^import.*;$/

giancarlostoro · on Nov 3, 2023

I just think of every novice who uses Java when referring to JavaScript, even software does it! Heh

Lammy · on Nov 2, 2023

> If you have any resources on this topic off the top of your head I'd appreciate it if you shared them

I wrote a Ruby library that attempts to be good at this https://github.com/okeeblow/DistorteD/tree/NEW%E2%80%85SENSA...

fsckboy · on Nov 8, 2023

A file is "of a type" if there exists some agent (program X or human H) that creates/writes such files, and whereby there exists some agent (program Y or could be Y==X or human I or could be I==H) that reads/modifies such files, and whereby the entire contents of such files can be made use of or understood in a useful/meaningful way; and other files not of the type's contents either can't be made use of meaningfully nor updated. Some file types may be sub- or super-sets of other file types (ex: .C source files are .TXT files, but .TXT source files are not necessarily .C files, they could be .CSV files) And files could be streams.

might wish to discuss files containing type information, and data/value information both, thus numerous but different valued files can all be of the same type.

you can get fancier if you want and talk about valid C source files or C source files which contain bugs, etc.

you could consider a file "weakly or adhoc typed" if there is only one or a pair of agents that use that type, and "strongly or standard typed" if there are numerous "different/distinguished" agents that "rely" on that file type.

this was off the top of my head

amelius · on Nov 2, 2023

Well, you can define a file's type in the same way as you define in-memory data structures, with a type system.

nomel · on Nov 2, 2023

Are you suggesting putting the (I assumed versioned) type system into the file's metadata, so that others can make sense of it? Or, some type system UUID, in the metadata, that can be used? I think anything like this would require many new concepts, like the move away from files being a collection of arbitrary stored bytes.

anonymfus · on Nov 3, 2023

Well in theory if we are only doing an inheritance, we can just combine file extensions hierarchically in the name.

For example, for CSV files we can use .csv.txt extension, so if shell has no software for CSV files, it will treat them as a plain text, and if it has a software for viewing CSV files, but not for editing or printing them, it will add these verbs from records for txt files.

This is backwards compatible and would require minimal changes to file managers and graphic shells.

prmph · on Nov 2, 2023

See my comment: https://news.ycombinator.com/item?id=38117891

amelius · on Nov 2, 2023

A C struct can also be addressed as bytes if one really wants to. The structured way of looking at the data is still more valuable, most of the time.

Another example: you can easily grep through a JSON file sometimes. But what if that JSON doesn't contain newlines? In that case it would be great if the system knew how to convert the data to a multiline string that makes sense for grep. This cannot be done if the data is "just bytes".

Further, JSON data can be represented with less bytes if you know the structure.

cafard · on Nov 2, 2023

Back 40 years ago, Data General's AOS/VS defined 255 file types. In many cases, maybe all, there was a corresponding extension for the file, e.g. .exe for executables, .txt for text and so on.

However, there was a trick that I encountered: copies made not by a CLI command or a properly written utility, say by sending a file over email, did not preserve the type in the filesystem. Then you would try to compile or assemble the code somebody sent you, and get a baffling message about your file being of the wrong type. If you had encountered this before, you would

  REN xyz.asm xyz.xyz
  CREATE/TYPE=ASM xyz.asm
  COPY/APPEND xyz.asm xyz.xyz
  DEL/V xyz.xyz

and go on with your work.

mixmastamyk · on Nov 2, 2023

Wow, begging for a script or direct utility.

daveslash · on Nov 2, 2023

I don't know who specifically invented file extensions, but Raymond Chen's The Old New Thing has some good insight into why they are important on Windows. Edit: In summary, it is to be able to support Tape based drives and avoid opening a file "unless a user tells you to".

https://news.ycombinator.com/item?id=32811707

tpmx · on Nov 2, 2023

Just to be clear: Bill Gates and Microsoft weren't anywhere near inventing this. They bought what was mostly a clone of CP/M and called it PC-DOS/MS-DOS. That's where they got the file extension scheme from.

Digital Research (the creators of CP/M) also didn't invent this. For more details, see the stackexchange link.

gilcot · on Nov 3, 2023

The poorly written clone was QDOS (for quick and dirty)

rqtwteye · on Nov 2, 2023

It's always amazing to think that somebody at some point had to invent things we now take for granted and feel like they just have to be that way.

As a sidenote it's sad that WinFS failed.

thriftwy · on Nov 2, 2023

WinFS failed because users do not care that much about local files. Cloud is everything.

Local files are even easier to lose and even harder to search than cloud.

dleslie · on Nov 2, 2023

It was shelved in 2006. The product managers must have had incredible foresight if they expected cloud storage to supercede local storage.

Hell, try as they might, Microsoft still doesn't have ubiquitous OneDrive usage among Windows users.

HumblyTossed · on Nov 2, 2023

> WinFS failed because users do not care that much about local files. Cloud is everything.

This is such a web weenie take. Not everyone lives in the "cloud".

So annoying that we have to call it "the cloud".

Dalewyn · on Nov 3, 2023

You should realize that people like us who understand file systems and the "files and foldors" metaphor are a very small, niche minority.

Most users today don't know what a file even is[1], and even in the 90s and 00s most users still didn't understand files. Ever seen a desktop filled with icons of all sorts because the user just dumps all their files there? I'm sure you have, and most users are like that. Navigating a file system and using it is fucking pig latin nonsense to them.

Couple that also with users not taking backups because it's an inconvenience (until they wish they had one), and cloud storage solves all the problems most users have had. They don't need to manage their files anymore and they're all backed up by the cloud provider; they don't need to worry about their konpoohters anymore as they go about enjoying their lives, if they even have a konpoohter at home at all anymore.

[1]: https://news.slashdot.org/story/21/09/27/2032200/students-do...

Izkata · on Nov 3, 2023

> Ever seen a desktop filled with icons of all sorts because the user just dumps all their files there? I'm sure you have, and most users are like that. Navigating a file system and using it is fucking pig latin nonsense to them.

And possibly an undervalued reason why the iphone took off. Anyone else remember this image? https://cdn.osxdaily.com/wp-content/uploads/2011/03/windows-...

pixelesque · on Nov 2, 2023

First part I'd agree with: the second part I'm not sure, if only because of my experiences and frustrations of using MS OneDrive as part of my work's O365 integration: it unbelievably seems impossible to actually list all files in the web interface - I can only see "Recent" documents (which are documents I've viewed or opened, not new ones I've added), and I end up having to search to find new documents I've added (i.e. saved from Outlook, or scanned docs from printer), open them, and THEN I can see them in the main doc view.

Maybe I'm doing something wrong, but other colleagues have complained about the same thing, so I'm not sure...

ravenstine · on Nov 2, 2023

That is until The Google inexplicably locks your entire account with no recourse.

tamimio · on Nov 2, 2023

> Local files are even easier to lose and even harder to search than cloud.

Not if you have your own NAS, and you can use programs like everything (1) for an easy quick search, and you are in control of everything not under someone else’s policy mercy.

(1) voidtools.com

stronglikedan · on Nov 2, 2023

That's funny. To me, "cloud" is just a backup that I've thankfully never had to restore from on my computer, and only ever when I get a new phone.

avgcorrection · on Nov 2, 2023

Why is Java apparently one of the few languages that are brave enough to use four characters in its file extension? “Rust” becomes “rs”.

xnx · on Nov 2, 2023

Some languages (e.g. Perl, Python) are old enough to have been concerned with the 3 character extension limit from MS-DOS: https://en.wikipedia.org/wiki/8.3_filename

kibwen · on Nov 2, 2023

I can answer the Rust question. In the ancient era, Rust source code was split into two types of files: .rc files and .rs files. The latter were "Rust source" files, and the former were "Rust crate" files. The crate files were essentially header files; they defined the public interface of the crate by explicitly listing items to be exported from the crate. Eventually explicit export lists were removed (and replaced with visibility modifiers on items themselves), and so there was no reason for .rc files to stick around, and nobody really ever objected to .rs as the now-universal extension for Rust files, since it's still a valid abbreviation.

icedchai · on Nov 2, 2023

Probably because the first release was in 1996, so Java never ran on a platform (like DOS) that needed only 3 characters. Originally it only ran on Solaris and Windows 95/NT.

hn92726819 · on Nov 2, 2023

If you think .java is brave, wait until you hear about Mojo's extension: https://en.m.wikipedia.org/wiki/Mojo_(programming_language)

They allow the extension to be the fire emoji

lp0_on_fire · on Nov 2, 2023

Maybe I'm just an old curmudgeon but things like "you can use emojis as variables" (or in this case "the extension can be an emoji") scream amateur hour and a lack of seriousness to me. There's zero benefit, IMO, other than "it's quirky".

Which is a shame because the people who created Mojo are definitely not amateurs.

nix0n · on Nov 2, 2023

Full support for Unicode has a lot of benefit to people who don't speak English.

But, when you're trying to show off your Unicode support to Californians, emojis get more reaction than umlauts.

MrRolleyes · on Nov 3, 2023

> amateur hour and a lack of seriousness to me.

These things seem unrelated—experienced people are allowed to have fun too. I don't see how this correlates with lack of experience at all. This just speaks to your personal bias against having fun.

hn92726819 · on Nov 6, 2023

I looked through the mojo docs and it looks like the tech behind mojo might be great, but they have a really weird marketing strategy. So I'd like to believe that the emoji extension is the annoying marketing half of the really cool tech behind it.

skeaker · on Nov 3, 2023

Wouldn't it be harder to implement and therefore indicative of higher effort, if anything?

qingcharles · on Nov 2, 2023

I had to answer my own question the other day -- can you make a table or a column in MS SQL that is simply an emoji?

Yes. Yes you can.

And God help us all.

Izkata · on Nov 3, 2023

Also valid in CSS class names. CSS modules, or one of the loaders in webpack, could be told to use them when munging class names [it's been a long time and I'm unsure of the webpack terminology and which one it was].

qingcharles · on Nov 3, 2023

Don't give me ideas!

I am literally going to put some CSS emoji easter eggs in my web app right now... o_O

CSMastermind · on Nov 2, 2023

Hot take: we should include emojis as reserved words in programming languages.

marssaxman · on Nov 2, 2023

I once included U+2620 'SKULL AND CROSSBONES' as a synonym for 'poison' in a language I was working on.

duxup · on Nov 2, 2023

I used to write all my html files with .htm… apparently that went out of fashion.

Edit: I got my htm and html backwards and now comment not sense. Fixed now but obviously the folks below were right / responding to an earlier version.

treyd · on Nov 2, 2023

I personally use .htm for anything that's an incomplete chunk of an HTML-ish document, like a Jinja2 template file, and .html for full rendered documents.

gilcot · on Nov 3, 2023

As unixes user, I tend to use `.j2.html` where j2 stands for Jinja2 her.

Affric · on Nov 2, 2023

Thank you for this clever trick.

duxup · on Nov 2, 2023

Oh I like that.

veidr · on Nov 2, 2023

wh... what extension do you use now then

ForkMeOnTinder · on Nov 2, 2023

When I first started doing web dev, everyone was using ".htm". But I haven't seen that used since the 20th century.

antod · on Nov 2, 2023

I always regarded .htm as a sign the site was created using FrontPage. Elsewhere .html was common even back then.

Izkata · on Nov 3, 2023

I learned the basics of HTML in the early 2000s from a book, it used .htm so that's also what I used for a long time.

Karellen · on Nov 2, 2023

I prefer not using an extension.

That way if I change my site infrastructure from, e.g. static html, to php, to mediawiki, to cgi-bin, to asp, to ruby on rails, all my URLs can stay the same (because cool URLs don't change) throughout, without containing misleading legacy baggage.

joemi · on Nov 2, 2023

Isn't this something you can just configure at the server level, so you don't need to have your files be extensionless? I could swear I've done that before in apache.

Karellen · on Nov 2, 2023

> Isn't this something you can just configure at the server level

Oh, yeah. I mean, you have to configure the server to treat extensionless files as html/cgi/php/whatever anyway, so you could just have it load extensionless URLs from .html/.cgi/.php files, but I prefer to keep my server config as "surprise-free" as possible. If `/article/foo` is served from the file `$DOCUMENT_ROOT/article/foo` on the filesystem, I find that less confusing for everyone than if it's served from `$DOCUMENT_ROOT/article/foo.php`.

It's a matter of personal preference, for sure. Just like `.htm` over `.html`. (Or `.jpg` over `.jpeg` for that matter.) I wouldn't say the way I do it is "right", but it's a trade-off that works for me.

vallode · on Nov 2, 2023

I guess I'll add to the pile of thoughts by saying that I think the parent comment was making a snide remark about .jsx/.tsx files, i.e "no one writes HTML anymore"

watermelon0 · on Nov 2, 2023

.htm is another option, but AFAIK it went out of fashion a long time ago.

duxup · on Nov 2, 2023

I edited my post, my bad :(

Aardwolf · on Nov 2, 2023

Possibly the following? It wouldn't 'sound' enough like a file name extension if it was the full name '.rust'

Where the inertia of the history of 3-letter extensions in DOS and the likes is what caused this convention, of course

throwaway167 · on Nov 2, 2023

As a RapidSketch user on my organisations web ratings board working group living in a coastal area affected by soil erosion and with a penchant for DNA modelling, I find Rusts choice of file extension perturbing.

.rust.programming.language would make far more sense than pretending Rust will ever need to work in DOS.

capableweb · on Nov 2, 2023

> Rust DOS : Creating a DOS executable with Rust

https://github.com/o8vm/rust_dos

paholg · on Nov 2, 2023

Ah, but to make the 3 character limit matter, you need to have the source files on dos. So, it's not enough to compile a binary for dos, but one needs to run the compiler itself on dos!

Clamchop · on Nov 2, 2023

I think they meant compiling rust on DOS.

cesarb · on Nov 2, 2023

> four characters in its file extension

You are not looking close enough. While Java source code has four characters in its file extension (.java), the output of its bytecode compiler has five characters in its file extension (.class).

js2 · on Nov 2, 2023

There's no abbreviation of Java that isn't weird. These all work well:

perl -> pl

python -> py

rust -> rs

ruby -> rb

javascript > js

But java?

java -> ja? jv? jav?

Clearly Sun should've gone with a different name for the language entirely, I dunno, maybe they could've named it after an oak tree instead:

oak -> ok

Hmm, maybe I could get used to "jv" after all.

adwn · on Nov 2, 2023

Why would "ruby -> rb" be okay but not "java -> jv"?

kps · on Nov 2, 2023

That's up with the oak leaves, over everyone's head. Anyway, `.ja` is German for `.ok`.

gsich · on Nov 2, 2023

.ja would be german for .yes.

duped · on Nov 2, 2023

rewmie · on Nov 2, 2023

".jay". Why not?

Izkata · on Nov 3, 2023

The logo has always been a cup of coffee (one of the meanings of the word "java"), so maybe either ".cof" or ".caf" (for "caffeine")

pbhjpbhj · on Nov 2, 2023

Why not .jva ?

infradig · on Nov 2, 2023

Prolog -> pl

Then Perl tried to steal it.

halper · on Nov 3, 2023

Boiled down to its essence: ".137"

the_mitsuhiko · on Nov 2, 2023

Pretty sure the answer is that developers are lazy and want to write fewer characters.

doubloon · on Nov 3, 2023

"why use big word when small word do trick" - Kevin Malone

avgcorrection · on Nov 6, 2023

Also Orwell.

gardenhedge · on Nov 2, 2023

Another brave language is Dart. example.dart

liampulles · on Nov 2, 2023

Likely it was not a brave decision - just a decision, with no reason to pivot.

otteromkram · on Nov 2, 2023

Markdown is .md and .markdown.

No one uses the latter, though.

tbm57 · on Nov 2, 2023

fewer letters saves valuable developer time

typical java, prioritizing verbosity at the expense of efficiency /s

dmoy · on Nov 2, 2023

> /s

Not entirely sarcastic though, looking at e.g. DescriptiveFooThingFactoryFactory

avgcorrection · on Nov 2, 2023

Considering that Java needs a four-letter acronym (POJO) to describe value classes... I think a fancy acronym for fully spelled-out file extensions is in order.

The Spelt Out File Extension Pattern?

PaulHoule · on Nov 2, 2023

I remember them on the DEC PDP-8 (I was taking classes at the high school part time in the 4th grade, the high school had one that had those big 8 inch floppy drives) and RSTS/E on PDP-11 early on (spent a lot of time with one at the high school in Milford one summer.) These were on CP/M, a microcomputer OS that was big on high end microcomputers from 1975 to 1985 or so that was the inspiration for MS-DOS which also had them.

owlstuffing · on Nov 2, 2023

This is also sort of a flame war topic for some. Divided primarily by those who believe the file’s content should contain the type (or be sampled to infer it) and those who prefer that the file’s name contain the type. More or less.

veidr · on Nov 2, 2023

No, just an explicitly-defined metadata field. Inference would be worse than filename extensions (which are awful).

I think the original Mac OS fucked us all by implementing this pretty much perfectly in the 1980s, but using their weird HFS "resource forks" to do it so that it not only couldn't be implemented on other platforms, but also lent credence to the "ugh fuck it lets just mangle the file name... and then hide it~!!! wooo000!~!" argument..

¯\_(ಠ_ಠ)_/¯

UPDATE: My memory was faulty; the "what kind of file is this?" metadata was actually stored as per-file metadata, the "type" and "creator" fields in the HFS(+) filesystem.

paholg · on Nov 2, 2023

Why are filename extensions awful? I find it incredibly useful to tell what kind of file something is by just glancing at it. Or do you envision tools like `ls` and file browsers also showing metadata at all times?

hombre_fatal · on Nov 2, 2023

That reminds me of relying on static type inference so much that the code is hard to follow in Github and you need to pull it into your IDE just so you can keep track of the types as you read it.

While a filename extension could probably be superseded by file metadata, it's also nice when you can read the intention right in the name without additional tooling.

Especially when you consider a hypothetical world without filename extensions thus every UI shows the filetype next to the filename thus "readme.md" in our world is just "readme:md" in their world, and it's not obvious what was gained nor lost.

veidr · on Nov 2, 2023

Yeah, I envision a future 1995 where not only does the GUI icon indicate what type of file it is, but the CLI tools also do (e.g. right before/after the "drwxr-xr-x@" or whatev)

paholg · on Nov 2, 2023

When I type `ls`, I generally want to see file names and types, and nothing else. It would be really unfortunate to need a full `ls -l` to see file types.

In fact, I always want to see file types alongside their names, so I really don't mind that the two are together.

js2 · on Nov 2, 2023

I think your imagination is failing you. The extension could be stored as a distinct field apart from the name. After all, we don't embed things like the file type (link, device, directory, etc) in the filename, nor things like the owner, group, ACL, access, creation time, etc.

Separately, what `ls` displays with or without switches would be independent of how the extension is stored. Say the extension were a distinct field, `ls` could still display that as `{filename}.{extension}` by default.

It's an accident of history that the extension is part of the filename.

eurleif · on Nov 2, 2023

>Separately, what `ls` displays with or without switches would be independent of how the extension is stored. Say the extension were a distinct field, `ls` could still display that as `{filename}.{extension}` by default.

But that would break piping the output of `ls` to other tools that expect a filename.

js2 · on Nov 2, 2023

You shouldn't parse the output of `ls`:

https://mywiki.wooledge.org/ParsingLs

https://www.shellcheck.net/wiki/SC2010

It's brittle and the output is designed for humans, not machine parsing.

Use a glob or find.

In a world where extensions were their own distinct metadata, whether or not to show them would be a switch to `ls` (whose output you shouldn't be parsing) and how to retrieve that metadata from a filename w/o parsing `ls` output would be something like:

   ext=$(stat --format="%E" -- "$filename")

You'd also have a switch to `find` to select files by extension instead of having to pattern match their names.

eurleif · on Nov 3, 2023

This is a reasonable argument against parsing the output of `ls` within a shell script that you're going to save and reuse in arbitrary directories. It's not a reasonable argument against ever parsing the output of `ls`. If it were, it would also be an argument against humans reading the output of `ls`. The example in your first link makes its point by showing output from `ls` that the human reader can't parse. That doesn't demonstrate that the output of `ls` is "designed for humans, not machine parsing"; it demonstrates that in the general case, its output cannot be parsed by either humans or machines.

My rule is that it's ok to use `ls` only in a directory where you know the filenames aren't weird. That rule is the same regardless of whether a human or a computer will be parsing its output. In general, reusable shell scripts shouldn't make assumptions about their environment, so they shouldn't use `ls`; but piping `ls` to another command in a one-off manner at an interactive shell is fine if you know the filenames in the directory aren't weird, just like it would be fine to use `ls` at the same interactive shell and simply read its output yourself.

porridgeraisin · on Nov 2, 2023

You can have tty output be different from pipe output

criddell · on Nov 2, 2023

If they weren't together, you would just define an alias that works exactly how you want it.

anonymouskimmer · on Nov 2, 2023

Instead they went with executables all having unique icons, and icons for non-executable files changing depending on which executable was last granted permission to open them.

rascul · on Nov 2, 2023

I've learned in code not to rely on arbitrary user generated strings to designate the document type as they can be incorrect or non-existent. They're fine as a possibly unreliable indication to users, though.

infradig · on Nov 2, 2023

Exactly, filename extensions are like personal pronouns for files.

hfuyf65 · on Nov 2, 2023

Not awful, but misleading. You can fake name any file extension anything ala quake approach to data files being archives.

gsich · on Nov 2, 2023

They are not. Everything else is insanity.

pavlov · on Nov 2, 2023

Frustratingly NTFS has had proper metadata support for 30 years but FAT32 was a bag of rotten hacks, so Microsoft couldn’t fix this either even though their forward-looking OS had everything needed.

veidr · on Nov 2, 2023

I know. I'm getting close to 50 years old, and I have lots of *COMPLAINTS!!!* about how various tech (Atari 2600 ~ iPhone 15 Pro Max) worked out.

But filesystems seem like the biggest species-level fuckup.

Windows had their shit, Mac had HFS then "HFS+" until... tbh, quite breathtakingly, like five minutes ago, When they finally introduced APFS. But like... no checksumming, just Mötley Crüe data integrity, it's all good, iCloud backups FTW amirite!?!

ZFS is good; I use it. But it didn't pan out, because a humanity-saving filesystem has to work on the systems humans actually use... (T_T)

babypuncher · on Nov 2, 2023

Apple considered using ZFS but decided to build their own because Oracle.

I do not blame them one bit.

veidr · on Nov 2, 2023

I don't, either; that Oracle submarine pervert is the worst-remaining tech villain.

Nevertheless, Apple built something objectively worse (T_T)

lostlogin · on Nov 2, 2023

> worst-remaining tech villain.

That crown would be a pretty tough one to hold, there are some good candidates.

chungy · on Nov 2, 2023

They canceled those plans long before Oracle was even thinking of buying Sun.

It's more a classic case of Apple's "not invented here"

veidr · on Nov 3, 2023

Well, even Sun was wedded to the (shit-the-bed) CDDL, but before Oracle there was the sense that a one-off deal might be made.

Once Oracle bought Sun, it was like "Which is more important, your filesystem or your OS?"

Which isn't a hard question with regards to any filesystem I've ever used except ZFS. Seems insane to use anything else for long-term storage, unless it has some kind of high corporate budget.

babypuncher · on Nov 2, 2023

APFS wasn't publicly unveiled until 2017, did it really spend more than 7 years in active development before then?

chungy · on Nov 2, 2023

Probably. File systems are hard to make sure to get just right.

ZFS took just about 5 years to mature into a production release. From its inception in 2000 to production release in 2005.

Oracle's btrfs (their attempt at competing with ZFS) began in 2008. Still isn't mature.

Someone · on Nov 2, 2023

> but using their weird HFS "resource forks" to do it

That’s incorrect. File type and creator code always were stored where they belong: in the file system.

Also, what’s weird about alternative file streams?

> so that it not only couldn't be implemented on other platforms

The main reason it “couldn’t” be implemented elsewhere is because of the dominance of FAT, which didn’t think it worthwhile.

veidr · on Nov 2, 2023

Oops, yeah, you are right. The type/creator codes were indeed FS-level properties. I'll update my comment.

gsich · on Nov 2, 2023

>No, just an explicitly-defined metadata field. Inference would be worse than filename extensions (which are awful).

Which are bad because those don't survive filesystem transmissions. A filename with extension will.

veidr · on Nov 2, 2023

Right, that was the original point that I was trying to make (but botched). The 80s Mac solved this near-perfectly, but then regressed to the current "just add a dot and some gibberish to the filename (and perhaps, or perhaps not, hide the gibberish and dot).

Because when you copy a file to some other system (e.g. USB drive, or email attachment, etc), only the common properties survive. So having a filesystem with awesome capabilities that the others lack becomes worse, not better. (T_T)

duped · on Nov 2, 2023

> Which are bad because those don't survive filesystem transmissions. A filename with extension will.

It won't actually because paths are not portable across file systems. The big reason is case sensitivity (and yes, some file extensions are case sensitive), but there's also issues with ASCII vs UTF-8 vs UTF-16 encoding.

Meanwhile there are multiple standards for exchanging file metadata between file systems (SMB, NFS, FUSE, FTP, whatever). Most of these support arbitrary metadata on files.

gsich · on Nov 2, 2023

A path is not a filename. The filename is part of it. Even with encoding issues, the chances that a filename survives are still much higher than filesystem metadata.

For example: HTTP does not supply that metadata, so a file would lose it's "extension". Where would you store mimetype? Not in the file obviously.

duped · on Nov 2, 2023

There's actually quite a bit of metadata that needs to be preserved when moving files between filesystems (file type, permissions, atime/ctime/mtime). Modern file systems support extended attributes for regular files and you can copy files between file systems with xattrs without too much trouble these days.

In fact it's arguable that metadata is more reliable than file names, because that metadata is more standardized.

> For example: HTTP does not supply that metadata, so a file would lose it's "extension". Where would you store mimetype? Not in the file obviously.

Not sure what this point is. It's up to the HTTP serving application to determine what the mimetype of another file is however it needs to. A limited design could use file extensions, but a resilient design would just query the file system for unambiguous metadata. Like I said, there are multiple ways to do this.

There's a lot of stuff that's not "in" the file. Like it's name.

gsich · on Nov 2, 2023

>There's actually quite a bit of metadata that needs to be preserved when moving files between filesystems (file type, permissions, atime/ctime/mtime). Modern file systems support extended attributes for regular files and you can copy files between file systems with xattrs without too much trouble these days.

Those are not hard requirements. It's nice to have, but in doubt it will just be what the destination filesystems think those values should be.

>There's a lot of stuff that's not "in" the file. Like it's name.

Correct. And everything not in the file is in danger of being lost. The filename (with extension) is the minimal viable data that has a chance to survive. The extension even more so than the basename.

alana314 · on Nov 2, 2023

I remember Mac OS 8 not having file extensions and it was very difficult to change a file's type. I believe hexedit was involved. Extensions IMO are way more reliable.

edbaskerville · on Nov 2, 2023

ResEdit was the convenient app for that.

Apple type/creator was a superior metadata system for sure, but the UI was opaque for non-technical users. Paired with the modern "open with..." settings, it would still be better than how file extensions are used today.

gus_massa · on Nov 2, 2023

For extra confussion, most files have the info in a header at the begining, but zip files have the info at the end, so someone made a file that is at the same time a valid .pdf and a valivalid .zip. (I can't find the link now.)

capableweb · on Nov 2, 2023

POC||GTFO does this with their releases, where the file released usually doubles as multiple file formats, and contains a ton of hidden messages all over the place. https://pocorgtfo.hacke.rs/

From the latest edition (#21):

> Technical Note: The electronic edition of this magazine is valid as both PDF and ZIP. Thanks to Ange Albertini, it is also a PCAP-NG packet capture of an experiment by Yannay Livneh. See page 7.

Edit: Ah, and of course, Cosmopolitan (by @jart) that produces an amalgamation of formats bundled into one file (including ZIP) that runs across a bunch of OSes https://news.ycombinator.com/item?id=38101613

saghm · on Nov 2, 2023

There also used to be an exploit where you could make a file both a valid .jar Java executable and a valid .gif, which back in the days of Java applets would potentially let you execute code when someone loaded your image (e.g. if you uploaded it as your profile picture): https://en.wikipedia.org/wiki/Polyglot_%28computing%29#GIFAR...

13of40 · on Nov 2, 2023

My favorite is storing a batch file uncompressed as the first entry in the ZIP archive, because you can rename it with .BAT or .CMD and run it. Sprinkle in some PowerShell and you can make a self extracting archive/installer.

gsich · on Nov 2, 2023

Common Linux way of installing non-repo stuff (.sh with gzipped content at the end)

frabert · on Nov 2, 2023

https://www.sultanik.com/cv

BobaFloutist · on Nov 2, 2023

That's actually a very fun idea.

dylan604 · on Nov 2, 2023

Filenames are the worst place to store metadata. Even a file's specific metadata fields are not always trust worthy. In the realm of media assets, relying on the file's metadata is a giant footgun. I'm not even talking about anything malicious, as that's designed to attempt to fool people. I'm referring to simple mistakes which would cause incorrect processing down the line if solely depending on metadata. For example, marking the video field dominance as progressive when the content is truly interlaced or vice versa.

Even in the world of image uploading via browser, just checking extensions is discouraged. If the feature is that humans can be allowed to manipulate it, it must be sanitized/verified before accepting. This isn't just for text heading for a database.

pbhjpbhj · on Nov 2, 2023

>Filenames are the worst place to store metadata.

Where's the best place? It seems like being able to send a file handle to an app without having to open the file nor read a sidecar/db could be quite convenient?

dylan604 · on Nov 2, 2023

the problem with depending on metadata alone is just an error prone concept. this could matter more or less depending on the use case. telling a text editor that a plain text file is rich text is much less problematic than telling a video app that a file is 16x9 when it is actually 4x3. saving a tab delimited file as CSV is also going to cause issues if only the extension is used.

i'm not saying throw away extensions, as they are great hints, but trust without verifying is not just for politics.

babypuncher · on Nov 2, 2023

The problem with declaring a file's type in the content itself is that you end up rendering all prior file formats obsolete because they lack this information.

The better solution I've seen proposed is to store file type information as metadata in the filesystem itself, but this would lead to compatibility problems with archiving tools and any other scenario where a file is moved between differing file systems.

File extensions are the best way to make determining a file type easy without parsing the whole thing while also maintaining cross and backwards compatibility.

morsch · on Nov 3, 2023

It's getting less relevant as regular users stop using the filesystem directly and instead rely on proprietary services for their data. Google Drive doesn't rely on file extensions. Though for all I know, Microsoft's web stuff does...

What's the web equivalent of using a floppy disk to move data between Mac, OS and Windows, I wonder. Downloading to the local computer and uploading, presumably. I guess that is pretty equivalent. Though you can't even download the native representation of a Google (text, spreadsheet, ...) document.

netsharc · on Nov 2, 2023

I'm already tempted to throw in my arguments.

Is it a flame war or a religious debate? E.g. vi vs emacs

alexvitkov · on Nov 2, 2023

Yeah, except one side is obviously correct. Not knowing what program will open up when they click a file is insane.

__MatrixMan__ · on Nov 2, 2023

Having files double as a launcher for some program was a mistake, the result of which is that we not how have to sit through those awful trainings about not clicking untrusted files.

Just open the program first.

People should fear untrusted programs, but should be at ease viewing data through programs that they trust.

gmac · on Nov 2, 2023

(1) I would find this very annoying.

(2) I suspect for lots of files, lots of users wouldn't know which program to open.

__MatrixMan__ · on Nov 2, 2023

I guess I'm not in a good position to judge how annoying it would be for others. I can't remember the last time I let the OS decide how to handle a file type.

As for (2)... good. Asking your computer to perform actions that you understand so poorly that you can't figure out what program is going to run is a recipe for disaster. That's the behavior we should be training people to avoid, not merely engaging with suspicious data but doing so recklessly.

mminer237 · on Nov 2, 2023

I understand computers plenty well, but I can't envision any scenario where not knowing the default program for a file type would bring disaster. No one should have to memorize what program opens every file, and there's no good way to even learn that information easily. I want to open a photo. Why would I care whether this system uses Shotwell or GNOME Photos or what have you? How would I even learn what's possible? Especially with Linux, there's so much fragmentation that just trying to open the file manager feels like a guessing game of going through all the ones I've heard of to see which works until I finally break down and Google what fork this distro uses.

That change would make computers completely unusable beyond Chrome and maybe Office for 99% of people.

__MatrixMan__ · on Nov 2, 2023

I'm not the pen tester, I just hang out with him, but the particular disaster I'm thinking of has to do with samba file shares. I guess there's some file extension that will prompt windows to attempt to log in to the server at an address found in the file. So you email that file to your target and when they open it it sends you their password under the auspices that you're an internal samba file share.

As I understand it, a very large portion of the attacker's toolkit has to do with tricking users into running programs they've never heard of by clicking things they think are familiar.

But the real disaster is not the successful attacks, it's the culture that we're creating where users are taught to click things and trust the OS default behavior while simultaneously trained to never click things that seem out of the ordinary.

It creates a paralysis in the user when it comes to exploring their tooling and fails to create a learning gradient. This widens the gap between them and people like you and me.

That's a disaster for them because they get taken advantage of by people in the know, and it's a disaster for you and me because they end up blindly supporting bad behavior (drm, companies mishandling user data, etc) since it's bad in a dimension that they've been locked out of by our failure to pave a path towards competence.

alexvitkov · on Nov 2, 2023

You certainly will care if you double click a .png and suddenly WinRAR opens up, because that .png is actually an ACE archive (application/x-ace-compressed), and guess what ACE stands for - arbitrary code execution! Probably not, but they've had one for 20-ish years: [1]

I find that insane, and that's basically what happens on Linux today.

[1] https://www.theverge.com/2019/2/21/18234448/winrar-winace-19...

mminer237 · on Nov 3, 2023

If it's a .png, it would open up in Microsoft Photos/eog/LXImage for me. Obviously some exploit could be found in there and any untrusted file is a risk, but it won't open WinRAR. That's the whole point of why file extensions are so much better than any alternative.

__MatrixMan__ · on Nov 2, 2023

What happened to me today on Linux was `$ geeqie image.png`. It might be possible to trick geeqie into doing something bad, but it's a heck of a lot harder.

alexvitkov · on Nov 2, 2023

All programs become untrusted after they reach a certain size. It's highly unlikely that a 1,000,000 line monstrosity with a file format so complex no single human can understand has no exploitable flaws in its file handling.

With "open by extension" the .pdf file can only target vulnerabilities in my PDF reader, but with "open by mimetype" they could be trying to exploit any program that is configured to open files. If you're doing "open by extension" and know that Adobe's exquisite PDF reader will open when you double click a PDF, doing so is not more unsafe than opening up Reader and navigating to the file from within.

That's without mentioning the main obvious advantage of "open by extension" - the fact that I can configure programs that open file formats that haven't been blessed by whoever wrote your `file` utility.

pbhjpbhj · on Nov 2, 2023

IIRC my KDE has little icon overlays, so a traffic cone on the mp4 preview image or mp4 icon. That seems like a nice way to give that information. But, it's using magic numbers.

michaelcampbell · on Nov 2, 2023

You don't REALLY know NOW. You rely on the assumption that the extension->application maps in a given machine are sane.

alexvitkov · on Nov 2, 2023

A string->string table is not that difficult to get right, I've never been surprised by what happens when I open a file in Windows. I could write 10 paragraphs on the thought process which my Linux install goes through as it decides that when I click a .zip file the correct behavior is to open it in Chromium and immediately download it to ~/Downloads.

michaelcampbell · on Nov 12, 2023

> A string->string table is not that difficult to get right

You're correct, which makes it doubly confusing as to why Windows doesn't do it.

mauvia · on Nov 2, 2023

I've had this exact kind of problem where windows decides to open picture files in edge because the image preview program doesn't understand it.

gustavus · on Nov 2, 2023

For those embroiled in this holy war there has come a mediator that has quelled the discord and brough harmony and his name is Doom, Doom Emacs

(Seriously if you've been a vimmer for a while and wanted to give emacs a go give Doom Emacs a try and drop hlissner a couple of bucks it's definitly worth it)

wazoox · on Nov 2, 2023

ProDOS on the mighty Apple /// (and also the Apple ][) had a file type metadata back in 1983 :)

In fact even Apple DOS 3.1 (1978) supported 8 different file types.

jdswain · on Nov 2, 2023

It was called SOS on the Apple ///, originally Sara's Operating System but officially Sophisticated Operating System. It was a very advanced OS for its time, most notably one of the first to include a useful driver model.

The SOS filesystem was reused for ProDOS (and later GS/OS as well), but a lot of the advanced features were removed for ProDOS due to wanting to fit into the smaller RAM of the Apple ][.

pavlov · on Nov 2, 2023

JPEG files should have the extension .jpeg. Fight me.

belter · on Nov 2, 2023

Legacy systems, web norms, and user habits dictate '.jpg' triumphs over '.jpeg' for compatibility, brevity, and conformity....Smack down...

pavlov · on Nov 2, 2023

My only friend and ally is the macOS standard Image Capture app which writes scanned files with the .jpeg extension.

I’m guessing this could even be a NeXT leftover because Image Capture is ancient. The .jpeg extension was more popular on 1990s Unix.

veidr · on Nov 2, 2023

I think almost all Apple apps do this, e.g. Photos.app.

Which is super-annoying, even though I agree with in in theory. But generally the only reason I am writing an image to .jpg is to email it to somebody or upload it to some web app and that principled .jpeg filename extension certainly isn't helping compatibility..

1024core · on Nov 2, 2023

.. and not .JFIF?

mjcohen · on Nov 2, 2023

Originally extensions were only 3 characters long.

pavlov · on Nov 2, 2023

Originally as in Unix? I don’t think so.

But later on microcomputers in CP/M and thence MS-DOS, yes.

mseepgood · on Nov 2, 2023

They didn't originate in Unix.

nashashmi · on Nov 2, 2023

Makes for an intriguing concept in a tv show set in an alternate universe (Fringe?), where file extensions are not in the name but in the attributes of the file.

You would not need to hide file extensions then.

CharlesW · on Nov 2, 2023

This is how it worked in (Classic) Mac OS. I’m sad we lost this in (today’s) macOS.

280877267 · on Nov 2, 2023

file content contains the type can be read by computer.

file name contains the type can be read by human.

didgetmaster · on Nov 2, 2023

Using file extensions (or having to open the file and inspect contents) to determine file type is a slow way to find files. If you have one of the newest hard drives that is capable of storing 100M+ files on it (or a NAS made up of multiple, smaller drives); then it can take forever to find all files of a certain kind (e.g. photos, documents, etc.).

Just try to write a program that lists all your photos. You have to know all the various extensions that apply to still images (.jpg, .jpeg, .gif, .png, etc.) and do a string comparison against every file in the drive hierarchy. Might as well go to lunch waiting for it to finish if you have many millions of files to search.

This was the primary reason why I set out to build a better system that could scan through hundreds of millions of files and find all that matched a given subset in just a few seconds. https://www.Didgets.com

fooker · on Nov 2, 2023

You seem to go out of your way to avoid mentioning 'index' when that's the usual way to solve this problem.

didgetmaster · on Nov 2, 2023

Apparently, your experience with various indexing services has been different than mine. I found it is just too easy for the index to become 'out of sync' with the actual file system and has to be occasionally rebuilt. Even with just 1M files, that always took a long time; but with 100M+ files?? If the storage was removeable or if you booted occasionally with another OS (or even just a different version of the same OS) then I found it was near impossible for an indexing service to keep track of all the changes.

fooker · on Nov 3, 2023

I was curious how slow it was for 1M files and tried it out on XFS and btrfs partitions.

It was basically instant without any explicit indexing on my end.

eviks · on Nov 2, 2023

But Everything does this kind of search instantly on NTFS

porridgeraisin · on Nov 2, 2023

Yeah, it uses some kind of journal to keep updated on changes etc

user3939382 · on Nov 2, 2023

We usually can use file extensions to differentiate files that otherwise share the same name. I once worked on a VAX mainframe OS that to my surprise supported completely duplicate filenames including extension!

LargoLasskhyfv · on Nov 3, 2023

tl;dr MAD FAP

crazygringo · on Nov 2, 2023

Tangentially related: I find it weird that relating file extensions to GUI document icons still doesn't work intuitively.

On my Mac, the document icons for file types still correspond to whatever program I've chosen to open that extension type by default.

Which is so strange -- an .mkv file isn't a VLC document; an .mp4 file isn't a QuickTime document, an .mp3 file isn't an Apple Music document, a .jpg isn't a Preview document, and a .tiff isn't a Photoshop document. What the heck?

I wish the OS would just define reasonable, attractive, default document icons for every standard file type, that applications couldn't control/overwrite. Reserve application-defined document icons just for those apps who truly have their own proprietary file format (e.g. .psd or .docx).

leejoramo · on Nov 2, 2023

I always liked the Classic macOS way for using "File Type" and "File Creator" meta data. File extensions were not used at the operating system level, but were useful for cross-platform sharing or as away to organize file.

The file icon would be assigned by the "Creator" Application. Double-clicking the file would cause it to be opened by the "Creator". You could choose to open a file with anything that advertised that it could work with a "File Type". There were utilities to change the "File Creator".

dheera · on Nov 2, 2023

It was also a royal pain in the ass to change the file type should you want to open the same file with another application, edit it, save it, then go back to the 1st application.

yjftsjthsd-h · on Nov 2, 2023

At that point I assume it's easier to open the file from each application rather than the finder. Which I appreciate isn't a great workflow, but probably simpler in that worldview.

dheera · on Nov 2, 2023

On MacOS of that era, the files didn't even show up in the "Open..." dialogs of those other apps. Many of those apps didn't have a "Show all files" option.

yjftsjthsd-h · on Nov 2, 2023

Oh, I didn't realize it was that rigid. Yeah, that's an unfortunate system.

wizofaus · on Nov 2, 2023

Isn't iOS still essentially the same way? It was one reason I stopped using iOS devices as I frequently wanted to open, for instance, pdf or image files from multiple apps.

solardev · on Nov 3, 2023

Does IOS even have a file system? Last time I used it, I had to share files back and forth from one app to others just to open them. It was really annoying...

fulltimeloser · on Nov 2, 2023

Most programs I used had import and export options for these situations

cm2187 · on Nov 2, 2023

And to deal with storage media that don't support that metadata

solardev · on Nov 2, 2023

I have the opposite preference... I want to know if my OS is going to open a PDF in Preview or it was hijacked by Adobe again, or if a .cfg file is going to open up in some useless Microsoft System Configuration app, or a JSON file is gonna open up my super bloated IDE instead of a text editor.

It's an additional piece of information, rather than just mirroring what the extension already tells you.

ulfw · on Nov 2, 2023

To each their own. I prefer to know which app will open when I double click a file. Seems Apple agrees as that's how it's working on macOS.

Workaccount2 · on Nov 2, 2023

I'm surprised people are adverse to this on here. Having both the extension and the graphical icon convey the same information is redundant and wasteful.

crazygringo · on Nov 2, 2023

It's way easier to recognize an icon at a glance, rather than scan the filename to locate and read the extension.

And the application icons are often super ugly, and often don't distinguish between the filetypes anyways.

Honestly, I'd rather do away with default application handlers altogether, if I've got multiple apps installed for a file type. Every time I double-click on a .jpeg, just ask me if I want to open it in Preview or Photoshop. When it's a .pdf, ask me about Acrobat or Preview. When it's an .mp4, ask me about VLC or QuickTime.

I have different reasons for wanting to use a different app each time (do I want to consume or edit, and edit how?), and trying to remember which one is the default handler and when I need to pick the non-default one is cognitive load I don't want to deal with.

Especially since the application-specific icons also sometimes don't even always make it clear which app it is. E.g. the IINA player assigns its own dedicated icons to video files, but you'd never guess that they open IINA.

Just give me standardized icons that are recognizable at a glance, and let me pick the application I want to use at the moment.

johnmaguire · on Nov 2, 2023

100%. Having the icon show what's going to actually open lets me know whether I need to right click and "Open with..."

account42 · on Nov 2, 2023

That's how it generally works under Linux desktop environments - there are generic icons based on file type no matter what application is used to open them.

on Nov 2, 2023

[dead]

dheera · on Nov 2, 2023

> expressive iconography

why not just emojis?

toast0 · on Nov 2, 2023

> Which is so strange -- an .mkv file isn't a VLC document; an .mp4 file isn't a QuickTime document, an .mp3 file isn't an Apple Music document, a .jpg isn't a Preview document, and a .tiff isn't a Photoshop document. What the heck?

Aren't they? Most of these tools can open many specific types of documents, so you might want to know if a document is a movie or a picture in addition to what program will open it.

And of course there's the issue that Preview is actually an editor.

ajb92 · on Nov 2, 2023

I share your pain. Luckily there are great apps out there (I use IconChamp[1]) that let you take full control of any given extension's associated icon, among other cool things. If your point is more purely about Apple's stance, then I extend a hearty "hear hear".

[1] https://www.macenhance.com/iconchamp.html

kube-system · on Nov 2, 2023

I think it is intuitive, the icon indicates which application you are opening. If I want to know which file type it is, I turn on 'show file extensions'.

robotnikman · on Nov 2, 2023

Its the same thing on Windows as well, and many Linux desktop environments. Seems to be a universal thing.

on Nov 2, 2023

[dead]