- The tags attached to the file (naturally)
- The actual content of the file
Eventually, the special case of the 'name' tag started to create problems in the organization of my code (it was awful), so that kludge, along with a good deal of equally stinky code was replaced. File names became first-class metadata again and things worked relatively well. While I was working in this new system though, I realized that some situations were not covered by simply preserving file names. When I store a file, it gets placed in a sort of bucket for each of its tags. To pick out a file, we have to specify which buckets we want to look in and what the name of the file is that we want to find in these buckets. If the file isn't in any one of the buckets then we return a "not here" value. This generally works by taking a path like "/tag1/tag2/tag3/file_we_want", translating the dirname into a list of buckets, ("tag1", "tag2", "tag3"), to check and checking for "file_we_want" in each bucket. Our problems start when we have files that share some set of tags and a name. There are 3 cases where this happens:
Let, Name(File_a) == Name(File_b)
- File_a has tag set A, File_b has tag set B, A is a subset of B
- the same, but B is a subset of A
- File_a has tag set A, File_b has tag set B, and there exists a tag set C such that C is a subset of A and C is a subset of B
The case where the tag sets of File_a and File_b are equal isn't a problem since that implies that they are the same file.In all of these cases, there is a set of buckets where the data that we are seeking can't be gotten by knowing only those buckets, but requires knowledge of which order the files were put in the buckets and how that order affects which file you see, or if we're in case 1 or case 2, which file still exists. There are some schemes where most of the user-supplied names would be preserved, and even one I've thought of where files are stored in a stack rather than being overwritten. None of these seem effective or natural, so I don't want to bother with them. After thinking over this problem I recalled the no-file-name concept and decided to give it another chance.
Setting my earlier misgivings aside, how does removing file names solve my identity crisis? We still keep identifiers, but we just ensure that they are unique. That's easy because every file that's added to TagFS gets a brand-new id number. By identifying files this way we are guaranteed not to have any collisions, so the precondition for my problem cases can't happen (as long as you make fewer files than a 4-byte integer can hold). The only worthy considerations then are, first, how a human user identifies files based only on tags and an id number, second, and more importantly, how can we do all of the things we did with file names, with tags?
I won't answer those questions because I doubt that anything I could come up with would be as worthy a solution as whatever people would come up with while actually using TagFS. Different workflows would have to be designed for some tasks and file management utilities would have to be redesigned. These aren't minor hurdles to overcome (far from), but they don't present a fundamental challenge to the idea of removing file names.