How to escape a FS in a CSV text and help with formatting
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to escape a FS in a CSV text and help with formatting
Hi Linux Experts,
I have the following problem to solve:
-Below is the CSV file give
firstname,lastname,password,username,notes,city,phonenumber
fred,smith, notgood1, fredsmith, this user\, is the first in this file, Brighton,345698
Peter, Bloggs, anotherbad,peterbloggs,,London,987123
Jo, cooper, notmuch, jcooper, this user is Jo, Brighton, 456987
john, carter,nearlyempty,jcarter,This note is actually very long\, but really doesn't say anything very useful,,345777
sam,jones,passing, samjones, Not much of a note really, Manchester, 135790
- capitalise the first letter of the two name fields
- sanitise the formatting
- move the username column to the beginning of each line
- the phone number is missing the area code - look up the city in the following table, and add it to the beginning of the phone number column:
City, Area Code
London, 5
Brighton, 6
Manchester, 7
Provide the corrected CSV file.
One of the problems I have is that whenever I use the comma as FS the output for column 5 is the following
this user is Jo
This note is actually very long\
Not much of a note really
It stops in the middle of the entry because it sees the comma but what I am trying to achieve is to produce the full entry for column 5 like this:
this user\, is the first in this file
This note is actually very long\, but really doesn't say anything very useful
I have to probably escape somehow the FS in the text but so far no joy with completing this task. Also can you kindly help out for the rest of the requirements.
I really appreciate your help in advance.
To quote Wikipedia on the topic, "'CSV' is not a single, well-defined format." RFC 4180 is the most commonly used standard, and there the only defined quoting method is use of the double quote character (") to enclose entire fields, and that needs to be used for fields that contain line breaks, commas, and double quote characters. A literal double quote character within a field is escaped by preceding it with another double quote character.
Parsing a CSV file in awk is not a trivial undertaking. I've attached an awk script that does parse CSV files and extract fields from them. It does not handle the embedded newline case, but perhaps it will serve as a useful example.
Last edited by rknichols; 03-15-2015 at 06:49 PM.
Reason: add attachment
awk does have another variable called FPAT which is often used for csv files but it is generally suited to solving the issue of commas in quotes to not be considered.
This does appear to be an unusual format to have an escape included in input data.
you can replace \, for example you can use QQQ (or whatever you want), do your job and replace back at the end. You can try sed to do that easily. Other way could be to use perl or other language with better parsing possibilities.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.