How can I strip white space from the start and end of fields using awk?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
(there are actually many spaces in this, but the forum seems to remove them automatically)
I would like it to be like this:
Quote:
12345,12 Data Street,Command Deck,Enterprise,Space,17094
So far my code strips off ALL of the white space so I end up with 12DataStreet etc. I'm OK with replacing , with a space in the output, that's just basic print stuff.
Something else I seem to be unable to get right is the filename bit - I cannot strip off the .csv part of the original and replace it with my string (because each filename should retain it's uniqueness) without something weird going on - I seem to end up with two output files, one with 0 in front of the filename, the other with 1, 0 contains just one line (what was the heading line in the spreadsheet), 1 contains the data. What would be the correctway to chop up a FILENAME and append my own bit to the end? There are parts of the filename at the beginning that I do not want either, they vary too which is the annoying part.
Any help would be much appreciated!
Last edited by jonnymorris; 10-01-2008 at 10:48 AM.
the space problem: Your gsub does not differ between trailing, embedded and leading spaces. There are different ways to do it:
Code:
# first example: replace ", " and " ," with ","
gsub(/, /, ",");
gsub(/ ,/, ",");
# second example: replace leading and trailing space field by field
for (i = 1; i<= NF; i++) {
gsub(/^ /, "", $i);
gsub(/ $/, "", $i);
}
To your filename problem: I would create the new filename using a function in awk. You have to use gensub instead of gsub, because gsub modifies the original string and returns only the number of substitutions:
The FS="[ \t]*,[ \t]*"; part removes the long blank spaces after the first field (a reference number), the for loop does the spaces at start and end of each field (and also removes all occurances of ").
Thanks also for the filename fix! That works rather well.
A few more tweaks and that should be this script finished. I need to catch some records that have extra fields in the middle and adjust the print statement accordingly, nothing too difficult (one hopes). If I get stuck I'll be back!
Thanks again.
Last edited by jonnymorris; 10-02-2008 at 03:51 AM.
When you use regular-expressions, there are just a couple of features that you need to keep in mind:
The "^" and "$" symbols refer to the start and the end of the string, respectively. So, if you want to remove only leading-blanks, you could look for "start-of-string followed by one-or-more whitespace characters." And so on.
Regular-expression substitution can be specified to occur "globally" (replace all occurrences) or "do it only once."
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.