LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-15-2015, 01:36 PM   #1
speed0212
LQ Newbie
 
Registered: Mar 2015
Posts: 2

Rep: Reputation: Disabled
How to escape a FS in a CSV text and help with formatting


Hi Linux Experts,

I have the following problem to solve:

-Below is the CSV file give

firstname,lastname,password,username,notes,city,phonenumber
fred,smith, notgood1, fredsmith, this user\, is the first in this file, Brighton,345698
Peter, Bloggs, anotherbad,peterbloggs,,London,987123
Jo, cooper, notmuch, jcooper, this user is Jo, Brighton, 456987
john, carter,nearlyempty,jcarter,This note is actually very long\, but really doesn't say anything very useful,,345777
sam,jones,passing, samjones, Not much of a note really, Manchester, 135790

- capitalise the first letter of the two name fields
- sanitise the formatting
- move the username column to the beginning of each line
- the phone number is missing the area code - look up the city in the following table, and add it to the beginning of the phone number column:

City, Area Code
London, 5
Brighton, 6
Manchester, 7

Provide the corrected CSV file.

One of the problems I have is that whenever I use the comma as FS the output for column 5 is the following

cat Test.csv | awk 'BEGIN { FS = "," } {print $5 }'
notes
this user\

this user is Jo
This note is actually very long\
Not much of a note really

It stops in the middle of the entry because it sees the comma but what I am trying to achieve is to produce the full entry for column 5 like this:

this user\, is the first in this file
This note is actually very long\, but really doesn't say anything very useful

I have to probably escape somehow the FS in the text but so far no joy with completing this task. Also can you kindly help out for the rest of the requirements.
I really appreciate your help in advance.

Ivan
 
Old 03-15-2015, 06:40 PM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
To quote Wikipedia on the topic, "'CSV' is not a single, well-defined format." RFC 4180 is the most commonly used standard, and there the only defined quoting method is use of the double quote character (") to enclose entire fields, and that needs to be used for fields that contain line breaks, commas, and double quote characters. A literal double quote character within a field is escaped by preceding it with another double quote character.

Parsing a CSV file in awk is not a trivial undertaking. I've attached an awk script that does parse CSV files and extract fields from them. It does not handle the embedded newline case, but perhaps it will serve as a useful example.
Attached Files
File Type: txt csvextract.txt (2.7 KB, 17 views)

Last edited by rknichols; 03-15-2015 at 06:49 PM. Reason: add attachment
 
Old 03-15-2015, 07:54 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
The OP resurrected an old thread that was (very) similar. There were suggestions there for how this could be handled.
 
Old 03-15-2015, 07:58 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
awk does have another variable called FPAT which is often used for csv files but it is generally suited to solving the issue of commas in quotes to not be considered.

This does appear to be an unusual format to have an escape included in input data.
 
Old 03-15-2015, 09:53 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
phew ... it wasn't immediately obvious (well not to me):
Code:
awk 'BEGIN{FPAT="([^,]*)|([^,\\\\]+\\\\,[^,]*)"}{print $5}' test.csv
 
Old 03-16-2015, 01:33 AM   #6
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,801

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
you can replace \, for example you can use QQQ (or whatever you want), do your job and replace back at the end. You can try sed to do that easily. Other way could be to use perl or other language with better parsing possibilities.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
need help on csv formatting linuxunix Linux - Newbie 13 03-16-2015 07:23 PM
[SOLVED] CSV formatting scream Programming 11 10-16-2013 02:29 PM
[SOLVED] How to script csv editing? Remove rows from csv file that do not contain certain text ingram87 Linux - Software 9 08-03-2012 12:45 PM
[SOLVED] Pulling text from a text file and append to CSV fishy Linux - Newbie 5 03-02-2012 07:32 AM
Using awk/sed to convert linefeed to csv, with some formatting jaykup Programming 1 04-03-2009 05:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration