FAMF: Files As Metadata Format

Perma@programming.dev · 4 months ago

FAMF: Files As Metadata Format

Hawk@lemmy.dbzer0.com · 4 months ago

This post misses the entire point of JSON/TOML/YAML and the big advantage it has over databases: readability.

Using a file based approach sounds horrible. Context gets lost very easily, as I need to browse and match outputs of a ton of files to get the full picture, where the traditional methods allow me to see that nearly instantly.

I also chuckled at the exact, horribly confusing example you give: upd_at. A metadata file for an object that already inherently has that metadata. It’s metadata on top of metadata, which makes it all the more confusing what the actual truth for the object is.

Perma@programming.dev · 4 months ago

I know! right?

Some say thay since you can use ‘tree’ and things like ranger to navigate the files, it should work alright. But I guess if you have one giant metadatafile for all the posts on your blog, it should be much easier to see the whole picture.

As for upd_at, it does not contain information about when the files have been edited, but when the content of the post was meaningfully edited.

So if for example I change the formatting of my times form ISO3339 to another standard, it changes the file metadata, but it does not update the post content, as far as the readers of the blog are concerned with. But I get why you chuckled.

netvor@lemmy.world · 4 months ago

Tip: find -type f | xargs head (but no it’s not comfy)

but I don’t think going to “one giant metadatafile” argument helps; personally my attention starts splintering far sooner than that. Most of the time, if I’m looking at meta-data of an object, I’m not just looking at that single object, I’m reasoning about it in relation to other data points (maybe other objects in the same collection, maybe not). If at some point I want to shift my focus from created_at to updated_at or back, I need that transition to be as cheap as eye saccade. So by splitting the data to multiple files you are sort of setting “minimal tax” already pretty high.

That said, for simple projects where you want to have as few dependencies as possible, I think it’s fine; it might or might not be better than raw-dogging your own format. I’ve actually implemented pretty much this format multiple times when I was coding predominantly in Bash. (Heck, eg. my JATS framework is pretty much using FAMF for test run state 😄 .) Just be careful: creating / removing files and directories can be a pretty risky operation – make a typo in (or fail refactoring) a shell variable and you might be just rm -rf’ing your own “$HOME”. It might be one of things you want to do less of, not more.

BTW, I chuckled because you turn from created_at to cre_at for no apparent reason. (I mean, if you like obscure variable names, fine by me, but then why would you call it created_at in the first file?)

BTWBTW, I love your site, I wish most of the web looked like that; the grey gives me sort of nostalgy :D Also you reminded me that I should give Kagi a try…

JackbyDev@programming.dev · 4 months ago

It’s a very interesting idea. I don’t think I’ll use it and I think the downsides outweigh the benefits but it is still an interesting idea.

In all of these cases, the answer is not TOML, YAML or JSON — or FAMF for what it’s worth. It is goddamn database.

I was about to boo and hiss, but if you mean something like sqlite as an application file format I’m more tempted to agree.

Eager Eagle@lemmy.world · 4 months ago

thanks, i hate it

Perma@programming.dev · 4 months ago

Sure thing! Awesome!

VonReposti@feddit.dk · 4 months ago

Dark Arc@social.packetloss.gg · 4 months ago

I’m a bit skeptical about the performance penalty. I know there’s a benchmark but I didn’t see any details of what was actually benchmarked and where. Windows (AFAIK) still has notoriously slow directory traversal operations. God forbid you’re using SSHFS or even NFS. I’ve seen things with hundreds of YAML nodes before.

Benchmarking this is also tricky because the OS file cache will almost certainly make the second time faster than the first (and probably by a lot).

Also just the usability… I think opening a file to change one value is extreme. You also still have the problem of documentation… Which sure you can solve by putting that in another file, but… You can also do that with just plain old JSON.

I think in the majority of languages, writing a library to process these files would also be more complicated than writing a JSON parser or using an existing library.

Also how do you handle trailing whitespace inserted by a text editor? Do you drop it? Keep it? It probably doesn’t matter as long as the configuration is just for a particular program. The program just needs to document it… But then you’ve got ambiguities between programs that you just don’t have to worry about with TOML or JSON.

Perma@programming.dev · 4 months ago

OK so, you are very much right. You should definitely benchmark it using a simulation of what your data might look like. It should not be that hard. Just make script, that creates bunch of files similar to your data. About the trailing white space, when I am in terminal I just use sed to remove the latest ‘\n’ and in rust I just use .trim(), in go I think there is strings.trim(). It is honestly not that hard. The data structure and parser is not formed the same way as the json, where you have to parse the whole thing. So you don’t have to. You just open the files you need read their content. It is a bit more difficult at first since you can’t just translate a whole struct directly, but it pays for itself when you want to migrate the data to a new format. So if your structure never changes, probably those formats are easier.

Dark Arc@social.packetloss.gg · 4 months ago

You should definitely benchmark it using a simulation of what your data might look like. It should not be that hard. Just make script, that creates bunch of files similar to your data.

Right, it’s just kind of a thing to think about. If your program is something that might conceivably be used of sshfs (as an example) … this is probably not a great option for your program’s configuration.

The data structure and parser is not formed the same way as the json, where you have to parse the whole thing. So you don’t have to. You just open the files you need read their content. It is a bit more difficult at first since you can’t just translate a whole struct directly, but it pays for itself when you want to migrate the data to a new format. So if your structure never changes, probably those formats are easier.

Well a very common thing is to create a “config” object that lives in the long running process (and in some cases can be reloaded without restarting the program).

That model also saves you from unnecessary repeated IO operations (without one off caching and reloading mechanisms) and allows you to centralize any validation (which also means you can give configuration errors on start up).

I do wish various formats were more “streaming” friendly, but configuration isn’t really one of them.

In a lot of languages moving between formats is also fairly trivial because the XYZ markup parser parses things into an object map and the ZYK markup writer can write an object map into ZYK format.

Maybe I’m not understanding what you mean by migrating the data to a new format though.

FAMF: Files As Metadata Format

FAMF: Files As Metadata Format

PRMA::Files As Metadata Format