LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 10-13-2011, 06:51 PM   #1
scorbett
Member
 
Registered: May 2002
Location: Canada
Distribution: Slackware, Mandriva, RedHat
Posts: 46

Rep: Reputation: 15
How to cut quoted fields out of a text file?


Let's say I have a file structured something like this:

abc 123 "hi there" blah
xyz 456 "yo dude" blah
aaa 111 "uh huhh" blah
How can I tell the "cut" command that the third and fourth columns are actually one field? As a human, you can look at it and see that they're quoted, indicating one value that happens to have a space in it, but when you do something like:
cut -d' ' -f 3
You end up with the following:

"hi
"yo
"uh
which is NOT what I want. Any ideas?
 
Old 10-13-2011, 06:56 PM   #2
scorbett
Member
 
Registered: May 2002
Location: Canada
Distribution: Slackware, Mandriva, RedHat
Posts: 46

Original Poster
Rep: Reputation: 15
I should clarify that I know you can specify multiple fields when invoking cut, like this:
cut -d' ' -f 3,4
but that doesn't work in this case because I have no idea how many spaces (if any) are going to appear in the third column, and it can vary wildly from line to line, so there isn't a specific value I could use here. I just want the quoted value to always be treated as a single column even though it has the delimiter within it.
 
Old 10-13-2011, 06:58 PM   #3
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,540
Blog Entries: 1

Rep: Reputation: 1000Reputation: 1000Reputation: 1000Reputation: 1000Reputation: 1000Reputation: 1000Reputation: 1000Reputation: 1000
Try:

Code:
awk '{ print $3,$4 }' file
 
Old 10-13-2011, 07:04 PM   #4
scorbett
Member
 
Registered: May 2002
Location: Canada
Distribution: Slackware, Mandriva, RedHat
Posts: 46

Original Poster
Rep: Reputation: 15
Never mind, the answer was right in front of me: as long as there aren't any other quotes on the line (as there aren't in my case), you can just change the delimiter to the quotation mark itself, and change the column you're asking for:
cut -d'"' -f 2
This returns the following when run against my example data from above:

hi there
yo dude
uh huhh
Which is exactly what I was trying for.
 
Old 10-13-2011, 07:15 PM   #5
Skaperen
Senior Member
 
Registered: May 2009
Location: WV, USA
Distribution: Slackware, CentOS, Ubuntu, Fedora, Timesys, Linux From Scratch
Posts: 1,777
Blog Entries: 20

Rep: Reputation: 115Reputation: 115
I take it you are wanting to count the fields by space separation, and not by double-quote separation. Otherwise you could do
Code:
cut -d '"' -f 2
and get the result you wanted for the data example you gave. But in cases where arbitrary columns are quoted, you get other results than what I thing you want.

But cut doesn't even do that as often expected, since two adjacent spaces count individually. So data like
Code:
abc 123 "hi there" blah
xyz 456 "yo dude" blah
aaa 111 "uh huhh" blah
can be changed to
Code:
abc  123 "hi there" blah
xyz 456  "yo dude" blah
aaa 111 "uh huhh"  blah
and get very different results. The cut command just doesn't do what is expected with spaces, but can succeed anyway in some cases. It's really more designed for non-space delimiters, such as : as found in /etc/passwd and many other files.

The awk command is more flexible (at the cost of you coding the mechanisms). But what you are asking for (that a quoted string be counted as one token ... and probably also that any number of adjacent space class character count as one delimiter) is not easy even in awk.

The colrm command is almost completely useless for things similar to this.

There is such a format as "tab delimited" where 2 adjacent tabs mean a column in between them that just happens to be empty. I hate that format. But in the past, many data sources I've had to use (especially certain government data) come that way :-(

I've run into this same need myself many times, and keep thinking I need to write a program to do it. Since cut already exists, I could just make this program only do space class delimiting, and support quotes. But will that be double quotes only? Mixed quotes? And what about cases where some data uses ` and ' paired to make a token string? I suppose I'd need to have some kinds of options to indicate what kinds of quoting to work with. Perhaps -s being a simple option to enable single quote recognition, and -d being a simple option to enable double quote recognition. Then -o and -c could specify specific opening and closing quotes to be recognized. What about nesting?

Last edited by Skaperen; 10-13-2011 at 07:16 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Formatting Fields and Text Being Displayed from Text File devUnix Programming 23 02-28-2011 02:17 PM
need to cut fields smritisingh03 Linux - Newbie 1 01-25-2011 02:49 PM
[SOLVED] Will gawk extract bits of text fields from a few thousand identically structured file taskmaster Linux - Software 4 11-10-2010 08:46 PM
how not to print the 4th field from a text file with six fields livetoday Red Hat 3 10-02-2007 01:19 PM
cut fields in a file christina_rules Linux - Newbie 12 07-15-2006 10:00 AM


All times are GMT -5. The time now is 06:14 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration