Monday, August 20, 2012

What I've Learned from TagFS

I started TagFS because, first of all, I thought it would be a good idea. What really drove me to keep working on it though, was that I was learning things that I wouldn't have even bothered to look into before. Because this project has been a source and guide for my learning I thought it would be worthwhile to make a list of what I have actually learned over these months. So, without further ado:
  1.  Software design (diagramming and statement of expected program function)
  2. The C language
  3. Memory management--and memory leaks
  4. Trie data structures ("staged" directories)
  5. Developing a types system (tag value types)
  6. Using an issue tracker and public code repository (GitHub)
  7. Using software revision control (git--still learning :P)
  8. Software build process (Makefiles)
  9. Logging program execution with levels of verbosity
  10. User level file system operations on Linux
  11. Code generation (very ad-hoc and a pain to work with, but I am using it)
  12. Input parsing (query interface; doesn't follow the formal (tokenize-> parse) pattern, but it works)
  13. Vim-fu
  14. Data serialization and storage (the database file and proposed xattr data storage formats)
  15. Software testing (made some attempts at automation)
  16. Pacing
  17. Exercise regularly and eat right--you can't code effectively if you're tired and out of shape all the time
  18. Your code won't runaway overnight--turn off the laptop and get a full night's sleep
Lately, I've felt discouraged by this project because even though it's been months since I started, with several breaks in the middle, it still isn't in a very usable state. Thinking about some of the things I've learned gives me some well-needed perspective and reminds me that there was a point to all of the time I've spent working on TagFS. Most likely, I will abandon TagFS indefinitely because, for all that I've gained, the time I lost on it might have been better spent improving my social life.

Regardless, I hope that this brief listing of my learning experience can encourage some others out there who, like me, have wanted to learn a skill independently but felt like making anything complex was too big of a challenge. The only barriers are motivation and knowledge; having a goal--and this is with anything, not just programming--will do wonders for directing your learning and motivating your actions. Find anything you want to do or make, set your goals and pay attention when you've achieved them. That's really all it takes.

Sunday, August 19, 2012

File names? Who needs them?

While trying to work out a problem with name collisions in TagFS, I recalled a somewhat radical idea I had nearer to the start of the project to completely remove user-specified file names from the system. When I initially thought of the idea it was mostly an extrapolation of the ontology I was building for TagFS. It seemed to me that what defined a file was
  1. The tags attached to the file (naturally)
  2. The actual content of the file
and absolutely nothing more. The file name is a kind of metadata which  has at times served as primary data, but I didn't see it as something innate to what it named---it was merely a convenient tag for the data. I back-pedaled from the idea that the system was workable without a dedicated name (à la -booru imageboards) because several situations where the name is important to how a system works with data came to mind: Makefiles, Java source code, C include directives---programming in general really; but also, file extensions, URIs, simple data organization schemes---and the list goes on. Generally, we rely on files having names and having canonical ways of accessing the files based on their names; and so, I compromised my ideology by making 'name' a tag and storing file names in the 'name' value (possible thanks to an extension to my simple tags that was little used elsewhere). Where ever the file name was required, the name tag was read in behind the scenes. From there, I moved on ahead and let the no-file-name idea drop.

Eventually, the special case of the 'name' tag started to create problems in the organization of my code (it was awful), so that kludge, along with a good deal of equally stinky code was replaced. File names became first-class metadata again and things worked relatively well. While I was working in this new system though, I realized that some situations were not covered by simply preserving file names. When I store a file, it gets placed in a sort of bucket for each of its tags. To pick out a file, we have to specify which buckets we want to look in and what the name of the file is that we want to find in these buckets. If the file isn't in any one of the buckets then we return a "not here" value. This generally works by taking a path like "/tag1/tag2/tag3/file_we_want", translating the dirname into a list of buckets, ("tag1", "tag2", "tag3"), to check and checking for "file_we_want" in each bucket. Our problems start when we have files that share some set of tags and a name. There are 3 cases where this happens:
Let, Name(File_a) == Name(File_b)
  1. File_a has tag set A, File_b has tag set B, A is a subset of B
  2. the same, but B is a subset of A
  3. File_a has tag set A, File_b has tag set B, and there exists a tag set C such that C is a subset of A and C is a subset of B
The case where the tag sets of File_a and File_b are equal isn't a problem since that implies that they are the same file.
In all of these cases, there is a set of buckets where the data that we are seeking can't be gotten by knowing only those buckets, but requires knowledge of which order the files were put in the buckets and how that order affects which file you see, or if we're in case 1 or case 2, which file still exists. There are some schemes where most of the user-supplied names would be preserved, and even one I've thought of where files are stored in a stack rather than being overwritten. None of these seem effective or natural, so I don't want to bother with them.  After thinking over this problem I recalled the no-file-name concept and decided to give it another chance.

Setting my earlier misgivings aside, how does removing file names solve my identity crisis? We still keep identifiers, but we just ensure that they are unique. That's easy because every file that's added to TagFS gets a brand-new id number. By identifying files this way we are guaranteed not to have any collisions, so the precondition for my problem cases can't happen (as long as you make fewer files than a 4-byte integer can hold). The only worthy considerations then are, first, how a human user identifies files based only on tags and an id number, second, and more importantly, how can we do all of the things we did with file names, with tags?

I won't answer those questions because I doubt that anything I could come up with would be as worthy a solution as whatever people would come up with while actually using TagFS. Different workflows would have to be designed for some tasks and file management utilities would have to be redesigned. These aren't minor hurdles to overcome (far from), but they don't present a fundamental challenge to the idea of removing file names.

Tuesday, August 7, 2012

It's actually really easy to get five of the planets and the days of the week in Japanese all in one shot as long as you know the Latin roots for these.

To demonstrate, in French and Engilsh the days of the week are
lundi, mardi, mercredi, juedi, vendredi, saturday (EN), dimanche

The planets in English:
(__, Mars, Mercury, Jupiter, Venus, Saturn, __)

and in Japanese, the planets:
(__, 火星, 水星, 木星, 金星, 土星, __)

and the days:
火曜日, 水曜日, 木曜日, 金曜日, 土曜日

I should admit though, that while these help with remembering the connection between these two sets of words in Japanese, recalling this relation probably isn't the best way to learn them unless you know the Latin roots and one of the word sets well enough to use as a basis.