OK, the trickiest part of cleaning ones iTunes library is looking up missing album information. There are a few applications that do this by mathematically analyzing the song’s music itself against a library. The problem is that this isn’t that accurate. There was an old app that did this (no longer available) called iEatzBrains or something like that. Unfortunately I made the mistake of running it against a bunch of tracks after testing it. Only then did I find that it’s accuracy was about 90%. So to be honest I am still finding tracks that had their artist and album names “revised” because of that. (I’d say that event caused about half of my library errors)
So I’m really leery of that sort of thing. However trying to find a script that could give a guess at album names was surprisingly hard. There weren’t any. There was a database of album information called discogs but it really didn’t let you search by track info. It was all oriented around correct album information. (Although it is useful for other metadata – as we’ll see in the future)
I did some pretty elaborate Google searching and remarkably this isn’t a problem that folks have solved. (Or at least they haven’t put solutions up online) Fortunately I found a solution from an unusual source: Apple.
Apple actually has a web API for searching the iTunes store. The documentation (oddly labeled “confidential” but publicly available) is quite helpful. What I did was to do an artist search returning just track information. Effectively getting a list of all songs by that artist. The returned data is even in JSON format rather than XML! This means I can make all the returned data into a Python dict with the simplejson module and a single call.
All I do then is just compare the selected song in iTunes (using Appscript) with the songs in the iTunes store. Now I could make this part a bit more intelligent by normalizing unicode characters and perhaps removing padding words like “a” or “the.” Thus far I’ve not had any trouble so I’ve not bothered making it “brighter.”
To avoid the problem of entering album information that is wrong and then never being able to find the changed files I put pretty extensive comments in each track. If the song could be from more than one album I put the other alternative albums in. Even if (as is typical) there’s only one album the track is on I put a note in the comments.
What’s surprising is how short the code is. (Take out all the comments in the code and there really isn’t much to it) My previous attempt using discog’s XML was at least 3 – 4 times as long and not nearly as accurate.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | #!/usr/bin/python ###################################################################################################### ## getalbumitunes.py ###################################################################################################### ## Fills in album and other missing data from an iTunes track by looking up the data in the ## iTunes store database. ## ## Requires: appscript module ## appscript.sourceforge.net/py-appscript ## install via: easy_install appscript ## ## simplejson module ## install via: easy_install simplejson ## import sys from appscript import * import urllib2, urllib import simplejson ###################################################################################################### ## do_search - queries iTunes store and returns a large python dictionary ###################################################################################################### # # Basically we search the iTunes store for the artist and return all music tracks. This is returned # as a large Python like dictionary which we can coincidentally make a Python dictionary using the # exec command. def do_search(artist="jurassic 5"): a = artist.lower().replace(' ','+') limit = 200 url = 'http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStoreServices.woa/wa/wsSearch?' + \ "term="+a + "&limit="+str(limit)+"&entity=musicTrack" try: request = urllib2.Request(url) response = urllib2.urlopen(request) result = simplejson.load( response ) if 'Error' in result: print "Error" return {} return result except IOError, e: print "Error for Requested Artist: " +artist print "Limit:" + str(limit) print e.reason print return {} ###################################################################################################### ## find_album - searches through an iTunes dict to find a track with a particular name. ###################################################################################################### # # Given a track dictionary for an artist (see do_search) we iterate through it finding all songs # that match our desired song name. We then build up a list with all the album names that song # is found within. To ease error checking I put in a verbose flag. def find_album(data, trackname, verbose=False): # we try and "normalize" the tracks for comparison as much as possible # we could probably spruce this up a bit and remove "filler" words search_track = trackname.lower().strip() albums = [] if len( data ) == 0: print "No Data" return [] if len( data['results']) == 0: print "No Data" return [] for c in data['results']: if verbose: print "Artist ID:" + str(c['artistId']) print "Artist: " + c['artistName'] print "Album: " + c['collectionName'] print "Track: " + c['trackName'] print compare_track = c['trackName'].lower().strip() if compare_track == search_track: if verbose: print "Hit: " + c['collectionName'] if c['collectionName'] != None: albums.append(c['collectionName']) print return albums ###################################################################################################### ## albums_for_selection - sets album info for all selected songs ###################################################################################################### # # For each selected song it looks up the artist information getting the list of potential albums # for that song name. If there is more than one potential album extra album information is stored # in the comment field otherwise we just note in the comment field the the album was looked up and # isn't "natively set" (Nice for double checking information) def albums_for_selection(): iT = app(u'/Applications/iTunes.app') songs = iT.selection.get() # get list of all selected songs for song in songs: artist = song.artist() name = song.name() albums = find_album( do_search(artist), name ) if len(albums) == 0: song.comment.set( "No Album Found") continue print "Song: " + name print "Albums: " + albums[0] song.album.set( albums[0] ) alt_albums = albums[1:] if len(alt_albums) > 0: song.comment.set( "Alt Albums: " + "; ".join(alt_albums)) else: song.comment.set( "Looked up Album") ###################################################################################################### ## albums_for_playlist - sets album info for all songs in a particular playlist ###################################################################################################### # # For each song in a playlist (default = __Bad Album) we look up the artist information getting the # list of potential albums for that song name. If there is more than one potential album extra album # information is stored in the comment field otherwise we just note in the comment field the the album # was looked up and isn't "natively set" (Nice for double checking information) def albums_for_playlist(playlist="__Bad Album"): iT = app(u'/Applications/iTunes.app') tracks = iT.playlists[playlist].tracks() for song in tracks: artist = song.artist() name = song.name() print "----------------------------------------------------" print artist print name k = do_search(artist) albums = find_album( k, name ) if len(albums) == 0: song.comment.set( "No Album Found") continue print "Song: " + name print "Albums: " + albums[0] song.album.set( albums[0] ) alt_albums = albums[1:] if alt_albums == None: song.comment.set("Looked up Album") continue if len(alt_albums) > 0: song.comment.set( "Alt Albums: " + "; ".join(alt_albums)) else: song.comment.set( "Looked up Album") print if __name__ == '__main__': #albums_for_selection() albums_for_playlist() sys.exit(0) |
Note that I created two functions, albums_for_selection() and albums_for_playlist(). One works off the selected songs while the other works off of a specified playlist. I usually use the latter as I can then control what songs are modified a little better.
From my testing this is pretty accurate. Most of the failures are track names with typos in them or albums not in the iTunes store. The Beatles, for instance, gets albums set wrong because there is a cover band with the name Beatles in it that catches a lot of songs.
Edit: If you came here early on Tuesday I’ve changed the program slightly. I was trying to “cheat” my parsing of the JSON data. I found that my method didn’t work with unicode which led to some subtle bugs. The above requires the simplesjson module which you can easily install with easy_install. I’ve modified the body text above to reflect those changes.
Edit-2: I’ll leave the above code as is. However if you want “fuzzy searching” you might want to replace find_album in the above with the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | def find_album_fuzzy(data, trackname, verbose=False): # we try and "normalize" the tracks for comparison as much as possible # we could probably spruce this up a bit and remove "filler" # Yes this is a slow method - but I couldn't get translate to work right # with unicode and considering how rarely this is run I figured it was # good enough search_track = trackname.lower() search_track = search_track.replace("(","") search_track = search_track.replace(")","") search_track = search_track.replace(":","") search_track = search_track.replace(".","") search_track = search_track.replace("-","") search_track = search_track.replace("_","") search_track = search_track.replace("'","") search_track = search_track.replace('"',"") albums = [] if len( data ) == 0: print "No Data" return [] if len( data['results']) == 0: print "No Data" return [] for c in data['results']: if verbose and c['collectionName'] != None: print "Artist ID:" + str(c['artistId']) print "Artist: " + c['artistName'] print "Album: " + c['collectionName'] print "Track: " + c['trackName'] print "Orignal: " + trackname print compare_track = c['trackName'].lower() compare_track = compare_track.replace("(","") compare_track = compare_track.replace(")","") compare_track = compare_track.replace(":","") compare_track = compare_track.replace(".","") compare_track = compare_track.replace("-","") compare_track = compare_track.replace("_","") compare_track = compare_track.replace("'","") compare_track = compare_track.replace('"',"") compare_list = compare_track.split(" ") search_list = search_track.split(" ") compare_count = 0.0 for word in search_list: if word in compare_list: compare_count = compare_count + 1 r = compare_count / len( search_list ) if r > 0.5: if verbose: print "*****" print "Track: " + c['trackName'] print "Orignal: " + trackname if c['collectionName'] != None: albums.append(c['collectionName']) print return albums |
What the above does is first remove a lot of characters that are confusing. As I noted in the comments originally I was going to use the string translate method but it just was a pain to do with unicode strings so I figured it wasn’t worth the effort to get right. (I was in a hurry) You can add characters to the replace lines for any other characters you might find in your track names. This seemed to deal with most of the ones I found.
The next thing it does is get a ration of the number of words that are the same between the two track names. What this let’s you deal with are tracks named “Stompin’ at the Savoy (Benny Goodman Version)” with “Stompin at the Savoy” and see them as sufficiently close.
I usually keep both functions and just “swap them out” when I’m done with the strict comparisons.
Related posts:
.jpg)
#1 by Dominik on 2009/08/27 - 5:53 pm
I’m searching for exactly your type of script/app to get missing album information. I have mp3′s that I digitalized from MiniDisc or even from Tape years ago in my iTunes and I want to fill the missing album information so I get all the benefits (art cover, lyrics..)
I’m not really in programming, so how could I use your script? What do I do with it? How do I start?
thanks for your help
Dominik
#2 by clark on 2009/08/27 - 9:43 pm
Well, knowing some programming helps for the way I do my scripts. So if you can learn a little Python it helps.
However technically all you need do is cut and paste the program into a text editors and save it as something like itunes_fix_titles.py. I’d create a folder in your home directory called bin and place the file there.
Then at the command line do
That ls command will list all the files in the bin folder and you should see your named file. You then need to make it an executable.
That makes it an executable you can run from the command line. I’d then type the following:
That “./” is necessary as there is a security mechanism in most Unixes that won’t let you accidentally run a command in the same directory you are in.
That’s all there is. However I’d suggest going through a Python tutorial if you want to follow all the scripts I put up here. There are a ton of good books out there as well. Learning the terminal is probably a wise idea as well. This is a collection of tutorials on learning the terminal in OSX.
The system really does become orders of magnitude more powerful and flexible once you know scripting and the terminal.
What I’ve been doing is assuming the folks reading the blog were familiar with scripting and light programming. (I kind of wanted a blog at that level of information as that’s and underfilled niche in my experience) However I’ll try and clean up the scripts and put them in a single easy to use directory with explanations. It won’t happen this week, but hopefully next week. So keep checking back.