LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-02-2016, 11:28 AM   #16
CJenR
LQ Newbie
 
Registered: Jun 2016
Posts: 5

Original Poster
Rep: Reputation: Disabled

I do not have any literal double quote characters or pipeline in the file hence I think I can use this for now to get data from these files.
Code:
awk -F '"' 'NF>1{ for(i=1;i<=NF;i+=2){gsub(",","|",$i)}}NF==1{gsub(",","|")}1' OFS='"' FileName | awk -F "|" '{ if(NR == 2) print $3 }'
Like @rknichols said, it wouldn't be the best solution and maybe I should write my own code which I do not intend to do right now Instead I can work around this to make it simpler.
 
Old 06-02-2016, 12:08 PM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,623

Rep: Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942
Not really sure why you need 2 awks, but glad you are happy with your solution.
 
Old 06-02-2016, 12:30 PM   #18
CJenR
LQ Newbie
 
Registered: Jun 2016
Posts: 5

Original Poster
Rep: Reputation: Disabled
I know.. It is bit tacky. thing is, the first awk converts the file data as expected. and the second awk, fetches only the specified line's column value. I tried incorporating both but it does not work. Can you help me with it?
 
Old 06-02-2016, 01:15 PM   #19
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
Hmm... this seems to work with the test cases:
Code:
./csv_split.py '1234,some text,"some text, with comma",text,alphanumeric,"again with, comma",text'
The instring converted to list:
['1234', 'some text', '"some text, with comma"', 'text', 'alphanumeric', '"again with, comma"', 'text']
Print index 2 in the list:
"some text, with comma"

./csv_split.py '111,222,"Amy said, ""Hello""",444,555,666'
The instring converted to list:
['111', '222', '"Amy said, ""Hello"""', '444', '555', '666']
Print index 2 in the list:
"Amy said, ""Hello"""

./csv_split.py '123,foo,"f(x,y)",456'
The instring converted to list:
['123', 'foo', '"f(x,y)"', '456']
Print index 2 in the list:
"f(x,y)"
The actual code (without error handling et-al, this is just a quick implementation, can be improved a lot):
Code:
#!/usr/bin/env python3
"""
Split a csv file at ',' but keep commas in quoted strings
"""

import sys

inString = sys.argv[1]
outString = ""

isQutedString = False

for i in (range(len(inString))):
    if isQutedString:
        if inString[i] == "," and inString[i-1] == "\"":
            outString += "|"
            isQutedString = False
            continue
    if inString[i] == "," and inString[i+1] == "\"":
        isQutedString = True
        outString += "|"
    elif not isQutedString and inString[i] == "," and inString[i+1] != "\"":
        outString += "|"
    else:
        outString += inString[i]

outFileList = outString.split("|")

print("The instring converted to list:")
print(outFileList)
print("Print index 2 in the list:")
print(outFileList[2])
Also, this will of course fail if there is a literal '|' sign in the actual text. One could perhaps choose an even more esoteric character such as ł or ŋ or or . But, again, not claiming this to be bullet proof.

Best regards,
HMW

Last edited by HMW; 06-02-2016 at 01:27 PM.
 
Old 06-02-2016, 01:27 PM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,623

Rep: Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942
I believe the awk I posted will deliver what you need, just have to print the line / column you want.
 
Old 06-02-2016, 01:31 PM   #21
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
Quote:
Originally Posted by grail View Post
I believe the awk I posted will deliver what you need, just have to print the line / column you want.
Of that I am certain. But I just wanted to see if I could solve this on my own (no, I don't have a life!). Since I don't know awk very well, I threw together something in Python real quick.
 
Old 06-02-2016, 02:04 PM   #22
CJenR
LQ Newbie
 
Registered: Jun 2016
Posts: 5

Original Poster
Rep: Reputation: Disabled
No @grail it did not give me the expected value. I think it misses something because of space and comma!
Code:
awk '{if(NR == 4) print $1 "|" $3}' FPAT="([^,]+)|(\"[^\"]+\")" temp.csv

actual data: 234235,some text,"some text, value",123,"test test, test, test"

Output: 234235,some|text,
thanks @HMW - If I continue to work on this, I might need that.
 
Old 06-02-2016, 02:13 PM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,623

Rep: Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942
Using your new input I get:
Code:
$ awk '{print $1"|"$3}' FPAT="([^,]+)|(\"[^\"]+\")" file
234235|"some text, value"
So unless you would like to present another different data input, it would seem to work??

@HMW - my last post was to user not in reply to yours
 
Old 06-02-2016, 02:42 PM   #24
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
Quote:
Originally Posted by grail View Post
@HMW - my last post was to user not in reply to yours
Ah, ok!

Tried the latest sample input:
Code:
./csv_split.py '234235,some text,"some text, value",123,"test test, test, test"'
The instring converted to list:
['234235', 'some text', '"some text, value"', '123', '"test test, test, test"']
Print index 2 in the list:
"some text, value"

Last edited by HMW; 06-02-2016 at 02:49 PM.
 
Old 06-02-2016, 03:46 PM   #25
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,623

Rep: Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942Reputation: 2942
Your gonna hate me HMW, but here is the ruby option:
Code:
$ cat d.rb 
#!/usr/bin/env ruby

puts ARGV[0].gsub(/""/,"@").scan(/"[^"]+"|[^,]+/)[2].gsub(/@/,'""')
$ ./d.rb '1234,some text,"some text, with comma",text,alphanumeric,"again with, comma",text'
"some text, with comma"
$ ./d.rb '111,222,"Amy said, ""Hello""",444,555,666'
"Amy said, ""Hello"""
$ ./d.rb '123,foo,"f(x,y)",456'
"f(x,y)"
 
Old 06-03-2016, 12:39 AM   #26
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
Quote:
Originally Posted by grail View Post
Your gonna hate me HMW, but here is the ruby option
Why would I ever do that!?! I think it's great to see solutions in different languages. To me, both awk and ruby are fairly incomprehensible, that's why I stick with Python. There's only so much space left in my internal hard drive (read: brain). I don't think I will ever master any of those languages.

Best regards & nice work!
HMW
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to print lines in csv file if 1 csv column field = "text". There are 10 column (;) in csv file nexuslinux Linux - Newbie 9 04-22-2016 11:35 PM
Remove Column from .CSV using AWK command gaurvrishi Linux - Newbie 13 04-03-2015 11:14 AM
How to to strip commas from csv file but keep the fields separated? keenboy Linux - General 6 08-05-2013 06:05 AM
awk - rearrange column data in csv file to match columns wolverene13 Programming 9 12-21-2011 04:55 AM
Parsing a comma separated CSV file where fields have commas in to trickyflash Linux - General 7 03-26-2009 03:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration