A few years ago I was working on a prototype of a P2P MP3 sharing app for private networks of friends (no narcs allowed!) that would let you recommend stuff for friends, asynchronously download stuff (even swarming it from other nodes that already have it but which have better connectivity to you), etc. etc. The whole thing was built on the ID3 metadata that people had in their collections. Instead of browsing files clumsily by filename (as in Napster) you'd see a more attribute-based interface, a la iTunes... ...except ID3 tags are just a hack, so: 1) The schema is not standardized across MP3 playing apps (which makes it harder, but certainly not impossible, to write a library that lets you get at the data in an organized fashion). ID3 is not a formal standard, so there's documentation here and there, but that documentation is just what some dude somewhere decided to propose, and doesn't agree with what's actually attached to MP3s you'll find "in the wild". ID3 "in the wild" is whatever some developers decided it should be when they wrote that part of their MP3 player / CD ripper. 2) People don't bother to update the information, probably because it's just too free-form. What's the list of genres? That's program-dependent, or free-form, so either the genre you think a song belongs in is missing, or little Timmy decides it belongs in RAWK instead of Rock. So the ID3 information is garbage in most collections that I found during my research. There's just not enough metadata there to work with, and when the songs have been already ripped from CD, CDDB (and FreeDB, which is what I actually tried to use) can't really help you auto-scrub them, since it looks things up based on a sequence of CD track lengths. This is nearly useless for identifying individual songs whose length may not match the CD exactly. Not that CDDB has correct data either. You'd think that after GraceNote bought all the user-donated data and surrounded it with LawyersGunsAndMoney, they'd take an interest in scrubbing their precious data. Nope. It's still full of crap. As you mentioned, the Genre field is just a joke. Frequently the title field is wrong, and that's gotta be in a database they could get their hands on. Classical music is the worst (and you'd think it'd have the most anal-retentive listeners, now wouldn'tcha?): artist/performer/composer/conductor confusion, artist/album confusion, album/song confusion. On a little MP3 player screen, it doesn't help me to see "Mozart - Horn Concerto No." when the whole album is Mozart horn concertos. There's no remediation process as far as I know so the data just sucks until they feel like fixing it, or until we the listeners fix it every time the CD is ripped. Anyway, anytime somebody tries to sell you this "semantic web" nonsense where everybody will just take the time to add all sorts of tags and keywords to their web pages and documents and fit them into a carefully designed ontology, remind them of the lessons of KaZaA and Napster: metadata is extremely expensive to add, so unless everybody can collaborate on it and share the results (as NOT seen in CDDB), only a few people with OCD will actually organize their stuff, in unique ways. Not that I'm bitter.