LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-03-2012, 10:27 PM   #1
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Rep: Reputation: 18
need help on csv formatting


I need cleaning and altering the CSV in the following ways:

* capitalise the first letter of the two name fields
* sanitise the formatting
* move the username column to the beginning of each line
* the phone number is missing the area code - look up the city in the following table, and add it to the beginning of the phone number
column

City, Area Code
London, 5
Brighton, 6
Manchester, 7

Tha CSV file look like:

Code:
firstname	lastname	password	username	notes	city	phonenumber	
fred	smith	 notgood1	 fredsmith	 this user\	 is the first in this file	 Brighton	345698
Peter	 Bloggs	 anotherbad	peterbloggs		London	987123	
Jo	 cooper	 notmuch	 jcooper	 this user is Jo	 Brighton	456987	
john	 carter	nearlyempty	jcarter	This note is actually very long\	 but really doesn't say anything very useful		345777
sam	jones	passing	 samjones	 Not much of a note really	 Manchester	135790

I wrote a script but it capitalized all the column:

sed -e "s/\b\(.\)/\u\1/g" test.csv

Please suggest.
 
Old 03-04-2012, 11:44 AM   #2
arizonagroovejet
Senior Member
 
Registered: Jun 2005
Location: England
Distribution: openSUSE, Fedora, CentOS
Posts: 1,078

Rep: Reputation: 195Reputation: 195
Quote:
Originally Posted by linuxunix View Post

Tha CSV file look like:

Code:
firstname	lastname	password	username	notes	city	phonenumber	
fred	smith	 notgood1	 fredsmith	 this user\	 is the first in this file	 Brighton	345698
Peter	 Bloggs	 anotherbad	peterbloggs		London	987123	
Jo	 cooper	 notmuch	 jcooper	 this user is Jo	 Brighton	456987	
john	 carter	nearlyempty	jcarter	This note is actually very long\	 but really doesn't say anything very useful		345777
sam	jones	passing	 samjones	 Not much of a note really	 Manchester	135790
CSV means Comma Separated Values. Those values are not comma separated.


Quote:
Originally Posted by linuxunix View Post
I wrote a script but it capitalized all the column:
sed -e "s/\b\(.\)/\u\1/g" test.csv
Please suggest.
Lose the g.
 
Old 03-04-2012, 12:46 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
hmmm ... not sure why this is a csv when the c is for comma and there are none?

Personally my advice would be to use awk as it is well suited to a columned input.
 
Old 03-04-2012, 08:50 PM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
My personal recommendation would be Perl; this sort of text munging/layout changing is one of its strengths.
 
1 members found this post helpful.
Old 03-04-2012, 10:06 PM   #5
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by arizonagroovejet View Post
CSV means Comma Separated Values. Those values are not comma separated.
...
Lose the g.
The term "CSV" is commonly (and loosely) applied when separator is something other than a comma such as this example in which it is a tab.

Without the g it will only change firstname, not lastname.

Last edited by catkin; 03-04-2012 at 10:08 PM. Reason: Added "lose the g"
 
Old 03-04-2012, 10:09 PM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by chrism01 View Post
My personal recommendation would be Perl; this sort of text munging/layout changing is one of its strengths.
If you don't want to use Perl, awk would be a good second choice.
 
Old 03-05-2012, 01:57 AM   #7
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
I tried attempting them:
* capitalise the first letter of the two name fields

Code:
# cat t|sed -r '2,$s/(^ *|, *)([a-z])/\1\U\2/1' | sed -r '2,$s/(^ *|, *)([a-z])/\1\U\2/2' > t2.csv
It capitalizes the first column not the second. Any idea?


* sanitise the formatting
Code:
Remove leading spaces so that fields are left-aligned.
Also, remove the backspace-comma pair, so that fields are under correct column headers and all lines have same number of fields.
# cat t2.csv|sed -r 's/(^|,)( +)/\1/g' | sed -r 's/\\\,/ /g' > t3.csv
* move the username column to the beginning of each line

Code:
# cat t3.csv| awk -F',' '{print $4, $1, $2, $3, $5, $6, $7 }' OFS="," > t4.csv
* the phone number is missing the area code - look up the city in the following table, and add it to the beginning of the phone number
column

Code:
touch final.csv
while read line
do
city=`echo $line|awk -F',' '{print $6}'`
(This is required, otherwise it ll exit out after a row having a blank city)
if [ "x$city" != "x" ]	
then
code=`grep $city mapping.csv|cut -f2 -d','`
echo $line|sed "s/,/,$code/6" >>final.csv
else
echo $line >>final.csv
fi
done < t4.csv
 
Old 03-05-2012, 03:53 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
May I ask if the data you have shown is what you are actually practising with? I only ask as I am at present unable to find a common delimiter, ie there are obviously no commas and
it appears it may be space based ... the issue I have with space based is how do you know where certain columns start as they seem to have spaces within the column, eg:
Code:
john	 carter	nearlyempty	jcarter	This note is actually very long\	 but really doesn't say anything very useful		345777
If we assume the portion in red is part of the notes column ... there appears no real way to know where it starts or stops (except visually)
 
Old 03-05-2012, 04:13 AM   #9
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
This is how the values are placed actually in the excel format:
Code:
firstname	lastname	password	username	      notes	                         city	     phonenumber	
fred	         smith	        notgood1	fredsmith	 this useris the first in this file	 Brighton	345698
Peter	         Bloggs	 	anotherbad	peterbloggs	 <blank for notes> 			 London		987123	
Jo	 	cooper	 	notmuch	 	jcooper	 	this user is Jo	 			 Brighton	456987	
john		 carter		nearlyempty	jcarter		This note is actually very long	 but really doesn't say anything very useful	<blank for city>   345777
sam		jones		passing	 	samjones	 Not much of a note really	 	Manchester	135790
I have quoted blank status for column which are empty.

Last edited by linuxunix; 03-05-2012 at 04:15 AM.
 
Old 03-05-2012, 07:57 AM   #10
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by grail View Post
May I ask if the data you have shown is what you are actually practising with? I only ask as I am at present unable to find a common delimiter, ie there are obviously no commas and
it appears it may be space based
All the data I explored in the OP was tab-delimited (and some fields had leading spaces).
 
Old 03-05-2012, 09:27 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Quote:
Originally Posted by catkin
All the data I explored in the OP was tab-delimited (and some fields had leading spaces).
Must just be me then because the last list of data posted seems to have multiple spaces in some parts and tabs and spaces in others.
So for me at least, I can find no consistency to build on, however, I will assume tabs with maybe spaces and hence the following should work
for the first 3 items:
Code:
awk 'BEGIN{FS=" *\t *";OFS=","}NR>1{sub(/^./,toupper(substr($1,1,1)),$1);sub(/^./,toupper(substr($2,1,1)),$2);gsub(/\\/,"")}$1=$1' file
The last part should be easy.
 
Old 03-15-2015, 03:04 PM   #12
speed0212
LQ Newbie
 
Registered: Mar 2015
Posts: 2

Rep: Reputation: Disabled
The correct CVS file for this exercise looks like this:


firstname,lastname,password,username,notes,city,phonenumber
fred,smith, notgood1, fredsmith, this user\, is the first in this file, Brighton,345698
Peter, Bloggs, anotherbad,peterbloggs,,London,987123
Jo, cooper, notmuch, jcooper, this user is Jo, Brighton, 456987
john, carter,nearlyempty,jcarter,This note is actually very long\, but really doesn't say anything very useful,,345777
sam,jones,passing, samjones, Not much of a note really, Manchester, 135790

I applied for VoIP engineer role with some experience in Linux and they provided me with the same test to solve but as you can see this is the correct file to format.

Good luck
 
Old 03-15-2015, 03:22 PM   #13
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 17,933

Rep: Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692
Quote:
Originally Posted by speed0212 View Post
The correct CVS file for this exercise looks like this:


firstname,lastname,password,username,notes,city,phonenumber
fred,smith, notgood1, fredsmith, this user\, is the first in this file, Brighton,345698
Peter, Bloggs, anotherbad,peterbloggs,,London,987123
Jo, cooper, notmuch, jcooper, this user is Jo, Brighton, 456987
john, carter,nearlyempty,jcarter,This note is actually very long\, but really doesn't say anything very useful,,345777
sam,jones,passing, samjones, Not much of a note really, Manchester, 135790

I applied for VoIP engineer role with some experience in Linux and they provided me with the same test to solve but as you can see this is the correct file to format.

Good luck
That's the correct output, but the question was how to GET that output...also, this thread had been closed for three years at this point.
 
Old 03-16-2015, 08:23 PM   #14
geolykos
LQ Newbie
 
Registered: Mar 2015
Posts: 1

Rep: Reputation: Disabled
I haven't even applied for a job, I just got a call from an agent about an opportunity with linux etc and I was sent this test. It seems to me that it's at least advanced to expert linux knowledge test. Anyone got the correct answer for the first one?

Don't even know why people provide these tests for non-expert linux jobs.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Map 1 CSV's columns to matching columns in another CSV 2legit2quit Programming 7 10-27-2011 09:53 AM
XML to CSV grob115 Programming 11 04-12-2010 12:58 PM
Using awk/sed to convert linefeed to csv, with some formatting jaykup Programming 1 04-03-2009 06:18 PM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 10:15 PM
CSV and CUT wwnexc Linux - Software 11 01-27-2006 12:23 AM


All times are GMT -5. The time now is 09:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration