Often you get a spreadsheet with some data that needs heavily filtered. You could try Applescript but let’s be honest. For text processing Python is so much easier. So here’s a Python script I use a lot with Excel. It’s pretty flexible following my scripting rules. It’s designed to be more a template that I then modify when the need arises.
First the code and then the comments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | #!/usr/bin/python ## cleanexcel.py ## A quick and dirty template to modify for doing filtering of ## Excel data. This could esaily be modified for iWork. import string, sys, time, shutil from appscript import * from osax import OSAX tr = { "FEDEX":"FedEx", \ "USPS":"USPS", \ "LOWE'S":"Lowe's", \ "COSTCO":"Costco", \ } def leftword(data): # return the leftmost word and the location of the # word boundary endword = data.find(" ") return data[0:endword], endword def rightword(data): # return the rightmost word and the location of the # word boundary startword = data.rfind(" ") return data[startword+1:], startword def cleandata(data): d = data firstword, loc = leftword(d) if firstword[0] in string.digits: d = d[loc+1:] firstword, loc = leftword(d) if firstword[0] in string.digits: d = d[loc+1:] lastword, loc = rightword(d) if lastword[0] in string.digits: d = d[:loc] return d def filtercell(data): # iterate through the dictionary of conversions global tr for word, rep in tr.iteritems(): if word in data: return rep # clear numbers on left & right d = cleandata( data ) if d[0:5] == "P.O.S": d = d[16:] return d.title() def handle_spreadsheet(): MS = app('/Applications/Microsoft Office 2008/Microsoft Excel') col = "C" ncol = "D" for row in range(2,3283): # get the field and return filtered data data = MS.cells[ col + str(row) ].value.get() fd = filtercell( data ) # replace the data in excel MS.cells[ ncol + str(row) ].value.set(fd) #MS.cells[ ncol + str(row) ].value.set(data) if __name__ == '__main__': if len(sys.argv) == 1: handle_spreadsheet() # change to 0 for success, 1 for (partial) failure sys.exit(0) |
I run this from within TextMate so I can quickly modify it. The biggest suggestion, which may or may not work for your data, is to save the filtered text into a column beside the one you are filtering. That way you can do a sanity check by quickly scanning down and comparing the two. The column you are filtering is col while the column for the filtered text is ncol.
I found that for large selections that the Applescript for getting the selection bounds with the current version of Excel is a bit buggy. (It basically hangs – maybe it’s just taking a long time but I didn’t want to wait) So I explicitly set the rows to work on with the range() function.
I use Appscript in Python instead of Applescript the language. That part of the code is simple and hopefully self evident.
The key part of the code is the filtercell() function. There I do a variety of filtering. In my case it was simplifying some names by seeing if a particular name occurred in the text. To make the script easy to modify I have a global dictionary whose key is the word to find and whose value is what to change the data to. So, for example, if I find the word “FEDEX” in the cell then I will replace it with “FedEx”.
I then had a more robust function called cleandata() which just removed some numbers from the beginning and end of my strings. (This was some bank data I was filtering) I left that in just so you could see how I did it. But realistically you can just ignore it.
Yeah this is all pretty simple. But I suspect there are some folks out there who’ve not thought about doing this. I can’t imagine having to write this in Applescript proper. (I get nauseous just thinking about it)
Related posts:
.jpg)
#1 by Alex on 2010/01/21 - 5:50 pm
Don’t forget there’s also the PyXL library for directly modifying Excel files using Python, so you don’t have to have the file open in Excel.
#2 by clark on 2010/01/21 - 6:30 pm
Yeah. I find I like using Applescript over modifying the file for various reasons. (For one I can modify it in place) If you were doing something quite regularly there would be advantages to using PyXL as I’m sure it’s faster than Applescripts. Excel in particular doesn’t handle some Applescripts in a terribly efficient fashion.
The other advantage to using Applescript is that it’s fairly trivial to modify to work with say Numbers.