Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Spinning disks.

If you want raw file access, your ideal filesystem is a giant key-value store that keys on the full path to the file. This choice means doing a directory listing will involve a lot of random access reads across the disk, and in the days before SSDs this would be a performance killer.

So instead, a directory would have a little section of disk to write its file entries to. These would be fairly small, because most directories are small, and if there were more files added than there was room, another block would be allocated. Doing a file listing would be reasonably quick as you're just iterating over a block and following the link to the next block, but random file access would get slower the more files are added.

It's probably a solvable problem in filesystem design, but because current filesystems fall down hard on the million-files-in-a-directory use case, nobody actually does that, so there's no drive to make a filesystem performant in that case.



ext4 uses btrees for the directory index, so accessing files by name is just as fast as manually splitting it into a prefix-based directory tree.

Searching by filename on the other hand can be much faster with a prefix tree because directories only allow linear scans, not range-based ones.


I'm missing something. What difference do you see between "accessing files by name" and "searching by filename"?


    f = open("/path/to/file");
    // do something with file
vs.

    d = opendir("path/to")
    while((dent = readdir(d)) != NULL) {
       if(!dent.name.startsWith("file"))
          continue;
       // do something with first matched entry
    }

The former is O(log n) on ext4, O(n) on older filesystems that use flat lists.

The latter is always O(n)


Understood. Thanks ;-)


Ok but listing 100 directories with 10,000 files each should take the same time than listing one directory with 1,000,000 files (what you are describing is a filesystem with more files vs less files, not with more subdirectories than less subdirectories).


The benefit is with accessing single files and not requiring a 1,000,000 directory listing lookup




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: