read an excel file in unix or perl
argh, i wasted about an hour the other day trying to parse an excel file in perl. i thought i could just save the file as a "tab-delimited" file and load the .txt file in perl. things didn't work out however, as perl (and the unix command line) thought the .txt file was just one line long.
staring at the file using the "cat" or "more" command, i saw a bunch of ^M (carat M's) wherever a newline should've been. it took forever for me to figure out those ^M's were actually carriage returns.
once i realized what was causing the problem, the solution came pretty quickly. just replace each carriage return with a newline character. here's a little script to do just that:
#!/bin/sh(the file with the newlines now has .ncr appended to the end of the filename.) oh, and if you're trying to do this replacement with sed on a mac, forget about it. it seems that the implementation of sed for "os x" doesn't recognize the carriage return character ("\r"). bah, wasted a good 10 minutes trying to figure that one out.
# script to remove carriage returns from files (useful for processing
# excel tab-delimited files). use:
# removeCarriageReturn.sh FILENAME
tr "\r" "\n" < $1 > $1.ncr
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home