Send As SMS

2/6/2006

read an excel file in unix or perl

argh, i wasted about an hour the other day trying to parse an excel file in perl. i thought i could just save the file as a "tab-delimited" file and load the .txt file in perl. things didn't work out however, as perl (and the unix command line) thought the .txt file was just one line long.

staring at the file using the "cat" or "more" command, i saw a bunch of ^M (carat M's) wherever a newline should've been. it took forever for me to figure out those ^M's were actually carriage returns.

once i realized what was causing the problem, the solution came pretty quickly. just replace each carriage return with a newline character. here's a little script to do just that:

#!/bin/sh
# script to remove carriage returns from files (useful for processing
# excel tab-delimited files). use:
# removeCarriageReturn.sh FILENAME

tr "\r" "\n" < $1 > $1.ncr
(the file with the newlines now has .ncr appended to the end of the filename.) oh, and if you're trying to do this replacement with sed on a mac, forget about it. it seems that the implementation of sed for "os x" doesn't recognize the carriage return character ("\r"). bah, wasted a good 10 minutes trying to figure that one out.


0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home

My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://stinkpot.afraid.org:8080/tricks/ and update your bookmarks.