A few years ago I was working on a prototype of a P2P MP3 sharing app
for private networks of friends (no narcs allowed!) that would let you
recommend stuff for friends, asynchronously download stuff (even
swarming it from other nodes that already have it but which have
better connectivity to you), etc. etc. The whole thing was built on
the ID3 metadata that people had in their collections. Instead of
browsing files clumsily by filename (as in Napster) you'd see a more
attribute-based interface, a la iTunes...

...except ID3 tags are just a hack, so:

1) The schema is not standardized across MP3 playing apps (which makes
   it harder, but certainly not impossible, to write a library that
   lets you get at the data in an organized fashion). ID3 is not a
   formal standard, so there's documentation here and there, but that
   documentation is just what some dude somewhere decided to propose,
   and doesn't agree with what's actually attached to MP3s you'll find
   "in the wild". ID3 "in the wild" is whatever some developers
   decided it should be when they wrote that part of their MP3 player
   / CD ripper.

2) People don't bother to update the information, probably because
   it's just too free-form. What's the list of genres? That's
   program-dependent, or free-form, so either the genre you think a
   song belongs in is missing, or little Timmy decides it belongs in
   RAWK instead of Rock.

So the ID3 information is garbage in most collections that I found
during my research. There's just not enough metadata there to work
with, and when the songs have been already ripped from CD, CDDB (and
FreeDB, which is what I actually tried to use) can't really help you
auto-scrub them, since it looks things up based on a sequence of CD
track lengths. This is nearly useless for identifying individual songs
whose length may not match the CD exactly.

Not that CDDB has correct data either. You'd think that after
GraceNote bought all the user-donated data and surrounded it with
LawyersGunsAndMoney, they'd take an interest in scrubbing their
precious data. Nope. It's still full of crap. As you mentioned, the
Genre field is just a joke. Frequently the title field is wrong, and
that's gotta be in a database they could get their hands on. Classical
music is the worst (and you'd think it'd have the most anal-retentive
listeners, now wouldn'tcha?): artist/performer/composer/conductor
confusion, artist/album confusion, album/song confusion. On a little
MP3 player screen, it doesn't help me to see "Mozart - Horn Concerto
No." when the whole album is Mozart horn concertos.

There's no remediation process as far as I know so the data just sucks
until they feel like fixing it, or until we the listeners fix it every
time the CD is ripped.

Anyway, anytime somebody tries to sell you this "semantic web"
nonsense where everybody will just take the time to add all sorts of
tags and keywords to their web pages and documents and fit them into a
carefully designed ontology, remind them of the lessons of KaZaA and
Napster: metadata is extremely expensive to add, so unless everybody
can collaborate on it and share the results (as NOT seen in CDDB),
only a few people with OCD will actually organize their stuff, in
unique ways.

Not that I'm bitter.