Download your favorite Linux distribution at LQ ISO.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 10-01-2008, 10:46 AM   #1
LQ Newbie
Registered: Sep 2008
Posts: 10

Rep: Reputation: 0
How can I strip white space from the start and end of fields using awk?


I'm hoping some of you know the answer to this, what is probably quite a trivial thing but seems to be eluding me.

I am developing an awk script to pull specific fields from a csv file, but it also has to remove preceeding and trailing white space from each field.

Here is an example of one record:

12345 , , , , , 12, Data Street, Command Deck, Enterprise, Space, 17094
(there are actually many spaces in this, but the forum seems to remove them automatically)

I would like it to be like this:

12345,12 Data Street,Command Deck,Enterprise,Space,17094

So far my code strips off ALL of the white space so I end up with 12DataStreet etc. I'm OK with replacing , with a space in the output, that's just basic print stuff.

BEGIN {FS = ",";
		print "Finding good records..." 

$36 ~ /N/ { 
			gsub(/ +/, "");
			gsub(/"/, "");
			print $1 "," $38 " " $39 "," $40 "," $41 "," $42 "," $43 "," $44 "," $59 "," $55 " " $56 " " $57 > FILENAME "_good.csv"; }

END { print "Finished!  Output sent to '"FILENAME"_good.csv'"}
Something else I seem to be unable to get right is the filename bit - I cannot strip off the .csv part of the original and replace it with my string (because each filename should retain it's uniqueness) without something weird going on - I seem to end up with two output files, one with 0 in front of the filename, the other with 1, 0 contains just one line (what was the heading line in the spreadsheet), 1 contains the data. What would be the correctway to chop up a FILENAME and append my own bit to the end? There are parts of the filename at the beginning that I do not want either, they vary too which is the annoying part.

Any help would be much appreciated!

Last edited by jonnymorris; 10-01-2008 at 10:48 AM.
Old 10-01-2008, 01:15 PM   #2
Registered: Jun 2008
Posts: 235

Rep: Reputation: 46

the space problem: Your gsub does not differ between trailing, embedded and leading spaces. There are different ways to do it:
# first example: replace ", " and " ," with ","
gsub(/, /, ",");
gsub(/ ,/, ",");
# second example: replace leading and trailing space field by field
for (i = 1; i<= NF; i++) {
  gsub(/^ /, "", $i);
  gsub(/ $/, "", $i);
To your filename problem: I would create the new filename using a function in awk. You have to use gensub instead of gsub, because gsub modifies the original string and returns only the number of substitutions:
NR == 1 { new_fname = gensub(/\.csv$/, "", 1, FILENAME) "_good.csv"; }
{ ...
print ... >new_fname;
Old 10-02-2008, 12:43 AM   #3
Mr. C.
Senior Member
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 61
Let awk's regular expression field separator work for you:

$ cat data    
12345 , , , , , 12 , Data Street,   Command Deck   , Enterprise, Space, 17094 

$ cat scr.awk 
    FS="[ \t]*,[ \t]*";

  for (i=1; i <= NF; i++) {
      printf "%2d '%s'\n", i, $i;

$ awk -f scr.awk data    
 1 '12345'
 2 ''
 3 ''
 4 ''
 5 ''
 6 '12'
 7 'Data Street'
 8 'Command Deck'
 9 'Enterprise'
10 'Space'
11 '17094'

Last edited by Mr. C.; 10-02-2008 at 12:44 AM.
Old 10-02-2008, 03:49 AM   #4
LQ Newbie
Registered: Sep 2008
Posts: 10

Original Poster
Rep: Reputation: 0

Thank you both for your excellent suggestions and help. I used a combination of your ideas:

BEGIN {FS="[ \t]*,[ \t]*";
		print "Finding good records..." 
NR == 1 { new_fname = gensub(/\.csv$/, "", 1, FILENAME) "_good.csv"; }

$36 ~ /N/ { 
		for (i = 1; i<= NF; i++) {
			gsub(/^ /, "", $i);
			gsub(/ $/, "", $i);
			gsub(/"/, "", $i);
		print $1 "," $38 " " $39 "," $40 "," $41 "," $42 "," $43 "," $44 "," $59 "," $55 " " $56 " " $57 > new_fname; }

END { print "Finished!  Output sent to '"new_fname"'"}
The FS="[ \t]*,[ \t]*"; part removes the long blank spaces after the first field (a reference number), the for loop does the spaces at start and end of each field (and also removes all occurances of ").

Thanks also for the filename fix! That works rather well.
A few more tweaks and that should be this script finished. I need to catch some records that have extra fields in the middle and adjust the print statement accordingly, nothing too difficult (one hopes). If I get stuck I'll be back!

Thanks again.

Last edited by jonnymorris; 10-02-2008 at 03:51 AM.
Old 10-02-2008, 07:27 PM   #5
Senior Member
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
Posts: 4,500

Rep: Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636
awk is not good for CSV strings. Sooner or later you'll find a string complying with the CSV format which crashes the AWK script.

I have succesfully been using this little c program.

Old 10-02-2008, 07:44 PM   #6
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,356

Rep: Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367
Or Perl with Text::CSV module
Old 10-02-2008, 10:10 PM   #7
LQ Guru
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 8,148
Blog Entries: 4

Rep: Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716Reputation: 2716
When you use regular-expressions, there are just a couple of features that you need to keep in mind:
  1. The "^" and "$" symbols refer to the start and the end of the string, respectively. So, if you want to remove only leading-blanks, you could look for "start-of-string followed by one-or-more whitespace characters." And so on.
  2. Regular-expression substitution can be specified to occur "globally" (replace all occurrences) or "do it only once."
Old 10-02-2008, 10:52 PM   #8
Senior Member
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
if you have PHP and your data don't have quotes
$file = "file.csv";
$handle = fopen("file","r");
while ( ($data = fgetcsv($handle,4096,",")) !== FALSE){
    $data = array_values($data);
    echo "data: $data";
# more file.csv
12345 , , ,, , 12, Data Street, Command Deck  , Enterprise, Space, 17094
# php5 test.php
data: 12345 ,12,Data Street,Command Deck  ,Enterprise,Space,17094



Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
awk question on handling *.CSV "text fields" in awk jschiwal Programming 8 05-27-2010 06:23 AM
As I right this, I have a large white strip on my screen from top to bottom Virtual Circuit Linux - Hardware 4 08-23-2008 02:27 AM
modify all fields in awk tostay2003 Programming 16 08-09-2008 01:41 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM
Supressing Fields w/ AWK Rv5 Programming 3 10-19-2004 11:06 AM

All times are GMT -5. The time now is 06:52 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration