Cleaning iTunes Pt. 3: Album Names


OK, the trickiest part of cleaning ones iTunes library is looking up missing album information. There are a few applications that do this by mathematically analyzing the song’s music itself against a library. The problem is that this isn’t that accurate. There was an old app that did this (no longer available) called iEatzBrains or something like that. Unfortunately I made the mistake of running it against a bunch of tracks after testing it. Only then did I find that it’s accuracy was about 90%. So to be honest I am still finding tracks that had their artist and album names “revised” because of that. (I’d say that event caused about half of my library errors)

So I’m really leery of that sort of thing. However trying to find a script that could give a guess at album names was surprisingly hard. There weren’t any. There was a database of album information called discogs but it really didn’t let you search by track info. It was all oriented around correct album information. (Although it is useful for other metadata – as we’ll see in the future)

I did some pretty elaborate Google searching and remarkably this isn’t a problem that folks have solved. (Or at least they haven’t put solutions up online) Fortunately I found a solution from an unusual source: Apple.

Apple actually has a web API for searching the iTunes store. The documentation (oddly labeled “confidential” but publicly available) is quite helpful. What I did was to do an artist search returning just track information. Effectively getting a list of all songs by that artist. The returned data is even in JSON format rather than XML! This means I can make all the returned data into a Python dict with the simplejson module and a single call.

All I do then is just compare the selected song in iTunes (using Appscript) with the songs in the iTunes store. Now I could make this part a bit more intelligent by normalizing unicode characters and perhaps removing padding words like “a” or “the.” Thus far I’ve not had any trouble so I’ve not bothered making it “brighter.”

To avoid the problem of entering album information that is wrong and then never being able to find the changed files I put pretty extensive comments in each track. If the song could be from more than one album I put the other alternative albums in. Even if (as is typical) there’s only one album the track is on I put a note in the comments.

What’s surprising is how short the code is. (Take out all the comments in the code and there really isn’t much to it) My previous attempt using discog’s XML was at least 3 – 4 times as long and not nearly as accurate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
#!/usr/bin/python
 
######################################################################################################
## getalbumitunes.py
######################################################################################################
## Fills in album and other missing data from an iTunes track by looking up the data in the
## iTunes store database.
##
## Requires:    appscript module
##              appscript.sourceforge.net/py-appscript
##              install via:  easy_install appscript
##
##              simplejson module
##              install via: easy_install simplejson
##
 
import sys
from appscript import *
import urllib2, urllib
import simplejson
 
 
######################################################################################################
## do_search - queries iTunes store and returns a large python dictionary
######################################################################################################
#
# Basically we search the iTunes store for the artist and return all music tracks.  This is returned
# as a large Python like dictionary which we can coincidentally make a Python dictionary using the
# exec command. 
 
def do_search(artist="jurassic 5"):
 
    a = artist.lower().replace(' ','+')
    limit = 200
 
    url = 'http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStoreServices.woa/wa/wsSearch?' + \
                "term="+a + "&limit="+str(limit)+"&entity=musicTrack"
    try:
        request = urllib2.Request(url)
        response = urllib2.urlopen(request)
        result = simplejson.load( response )
        if 'Error' in result:
            print "Error"
            return {}
 
        return result
    except IOError, e:
        print "Error for Requested Artist: " +artist
        print "Limit:" + str(limit)
        print e.reason
        print
 
 
    return {}
 
######################################################################################################
## find_album - searches through an iTunes dict to find a track with a particular name.
######################################################################################################
#
# Given a track dictionary for an artist (see do_search) we iterate through it finding all songs
# that match our desired song name.  We then build up a list with all the album names that song
# is found within.  To ease error checking I put in a verbose flag.
 
def find_album(data, trackname, verbose=False):
 
    # we try and "normalize" the tracks for comparison as much as possible
    # we could probably spruce this up a bit and remove "filler" words
 
    search_track = trackname.lower().strip()    
 
    albums = []
 
    if len( data ) == 0:
        print "No Data"
        return []
 
    if len( data['results']) == 0:
        print "No Data"
        return []
 
 
    for c in data['results']:
 
        if verbose:
            print "Artist ID:" + str(c['artistId'])
            print "Artist:   " + c['artistName']
            print "Album:    " + c['collectionName']
            print "Track:    " + c['trackName']
            print
 
        compare_track = c['trackName'].lower().strip()
        if compare_track == search_track:
            if verbose:
                print "Hit: " + c['collectionName']
 
            if c['collectionName'] != None:
                albums.append(c['collectionName'])
 
    print        
    return albums 
 
######################################################################################################
## albums_for_selection - sets album info for all selected songs
######################################################################################################
#
# For each selected song it looks up the artist information getting the list of potential albums
# for that song name.  If there is more than one potential album extra album information is stored
# in the comment field otherwise we just note in the comment field the the album was looked up and
# isn't "natively set"  (Nice for double checking information)
 
def albums_for_selection():
 
 
    iT = app(u'/Applications/iTunes.app')
    songs = iT.selection.get()      # get list of all selected songs
 
    for song in songs:
        artist = song.artist()
        name = song.name()
 
        albums = find_album( do_search(artist), name )
 
        if len(albums) == 0:
            song.comment.set( "No Album Found")
            continue
 
        print "Song:   " + name
        print "Albums: " + albums[0]
 
        song.album.set( albums[0] )
 
        alt_albums = albums[1:]
 
        if len(alt_albums) > 0:
            song.comment.set( "Alt Albums: " + "; ".join(alt_albums))
        else:
            song.comment.set( "Looked up Album")
 
######################################################################################################
## albums_for_playlist - sets album info for all songs in a particular playlist
######################################################################################################
#
# For each song in a playlist (default = __Bad Album) we look up the artist information getting the 
# list of potential albums for that song name.  If there is more than one potential album extra album 
# information is stored in the comment field otherwise we just note in the comment field the the album 
# was looked up and isn't "natively set"  (Nice for double checking information)
 
def albums_for_playlist(playlist="__Bad Album"):
 
    iT = app(u'/Applications/iTunes.app')
 
    tracks = iT.playlists[playlist].tracks()
 
    for song in tracks:
        artist = song.artist()
        name = song.name()
        print "----------------------------------------------------"
        print artist
        print name
 
        k = do_search(artist)
        albums = find_album( k, name )
 
        if len(albums) == 0:
            song.comment.set( "No Album Found")
            continue
 
        print "Song:   " + name
        print "Albums: " + albums[0]
 
 
        song.album.set( albums[0] )
 
        alt_albums = albums[1:]
        if alt_albums == None:
            song.comment.set("Looked up Album")
            continue
 
        if len(alt_albums) > 0:
            song.comment.set( "Alt Albums: " + "; ".join(alt_albums))
        else:
            song.comment.set( "Looked up Album")
 
    print
 
 
if __name__ == '__main__':
 
    #albums_for_selection()
    albums_for_playlist()
    sys.exit(0)

Note that I created two functions, albums_for_selection() and albums_for_playlist(). One works off the selected songs while the other works off of a specified playlist. I usually use the latter as I can then control what songs are modified a little better.

From my testing this is pretty accurate. Most of the failures are track names with typos in them or albums not in the iTunes store. The Beatles, for instance, gets albums set wrong because there is a cover band with the name Beatles in it that catches a lot of songs.

Edit: If you came here early on Tuesday I’ve changed the program slightly. I was trying to “cheat” my parsing of the JSON data. I found that my method didn’t work with unicode which led to some subtle bugs. The above requires the simplesjson module which you can easily install with easy_install. I’ve modified the body text above to reflect those changes.

Edit-2: I’ll leave the above code as is. However if you want “fuzzy searching” you might want to replace find_album in the above with the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def find_album_fuzzy(data, trackname, verbose=False):
 
    # we try and "normalize" the tracks for comparison as much as possible
    # we could probably spruce this up a bit and remove "filler" 
    # Yes this is a slow method - but I couldn't get translate to work right
    # with unicode and considering how rarely this is run I figured it was
    # good enough
 
    search_track = trackname.lower()
    search_track = search_track.replace("(","")
    search_track = search_track.replace(")","")
    search_track = search_track.replace(":","")
    search_track = search_track.replace(".","")
    search_track = search_track.replace("-","")
    search_track = search_track.replace("_","")
    search_track = search_track.replace("'","")
    search_track = search_track.replace('"',"")
 
    albums = []
 
    if len( data ) == 0:
        print "No Data"
        return []
 
    if len( data['results']) == 0:
        print "No Data"
        return []
 
 
    for c in data['results']:
 
        if verbose and c['collectionName'] != None:
            print "Artist ID:" + str(c['artistId'])
            print "Artist:   " + c['artistName']
            print "Album:    " + c['collectionName']
            print "Track:    " + c['trackName']
            print "Orignal:  " + trackname
            print
 
        compare_track = c['trackName'].lower()   
        compare_track = compare_track.replace("(","")
        compare_track = compare_track.replace(")","")
        compare_track = compare_track.replace(":","")
        compare_track = compare_track.replace(".","")
        compare_track = compare_track.replace("-","")
        compare_track = compare_track.replace("_","")
        compare_track = compare_track.replace("'","")
        compare_track = compare_track.replace('"',"")
 
        compare_list = compare_track.split(" ")    
        search_list = search_track.split(" ")
 
        compare_count = 0.0
        for word in search_list:
            if word in compare_list:
                compare_count = compare_count + 1
 
        r = compare_count / len( search_list )
 
        if r > 0.5: 
            if verbose:
                print "*****"
                print "Track:    " + c['trackName']
                print "Orignal:  " + trackname
            if c['collectionName'] != None:
                albums.append(c['collectionName'])
 
    print        
    return albums

What the above does is first remove a lot of characters that are confusing. As I noted in the comments originally I was going to use the string translate method but it just was a pain to do with unicode strings so I figured it wasn’t worth the effort to get right. (I was in a hurry) You can add characters to the replace lines for any other characters you might find in your track names. This seemed to deal with most of the ones I found.

The next thing it does is get a ration of the number of words that are the same between the two track names. What this let’s you deal with are tracks named “Stompin’ at the Savoy (Benny Goodman Version)” with “Stompin at the Savoy” and see them as sufficiently close.

I usually keep both functions and just “swap them out” when I’m done with the strict comparisons.

Related posts:

  1. Cleaning iTunes Pt. 4: Artist Names
  2. Cleaning iTunes Pt. 2: Title Case
  3. Fix iTunes Names
  4. Cleaning iTunes Pt. 1
  5. The 9th
  6. Python & Appscript Tools
  7. The Future of iTunes?
  8. Clean Up Text in Mars Edit
  1. #1 by Dominik on 2009/08/27 - 5:53 pm

    I’m searching for exactly your type of script/app to get missing album information. I have mp3′s that I digitalized from MiniDisc or even from Tape years ago in my iTunes and I want to fill the missing album information so I get all the benefits (art cover, lyrics..)
    I’m not really in programming, so how could I use your script? What do I do with it? How do I start?
    thanks for your help
    Dominik

  2. #2 by clark on 2009/08/27 - 9:43 pm

    Well, knowing some programming helps for the way I do my scripts. So if you can learn a little Python it helps.

    However technically all you need do is cut and paste the program into a text editors and save it as something like itunes_fix_titles.py. I’d create a folder in your home directory called bin and place the file there.

    Then at the command line do

    cd ~/bin
    ls

    That ls command will list all the files in the bin folder and you should see your named file. You then need to make it an executable.

    chmod a+x itunes_fix_titles.py

    That makes it an executable you can run from the command line. I’d then type the following:

    ./itunes_fix_titles.py

    That “./” is necessary as there is a security mechanism in most Unixes that won’t let you accidentally run a command in the same directory you are in.

    That’s all there is. However I’d suggest going through a Python tutorial if you want to follow all the scripts I put up here. There are a ton of good books out there as well. Learning the terminal is probably a wise idea as well. This is a collection of tutorials on learning the terminal in OSX.

    The system really does become orders of magnitude more powerful and flexible once you know scripting and the terminal.

    What I’ve been doing is assuming the folks reading the blog were familiar with scripting and light programming. (I kind of wanted a blog at that level of information as that’s and underfilled niche in my experience) However I’ll try and clean up the scripts and put them in a single easy to use directory with explanations. It won’t happen this week, but hopefully next week. So keep checking back.

Comments are closed.