LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-25-2012, 10:53 AM   #1
VijayaRaghavanLakshman
LQ Newbie
 
Registered: Jan 2012
Posts: 10

Rep: Reputation: Disabled
How to read CSV data and compare the column values and then write them in new file


Hi All,

I am trying to read a csv file line by line & process each line based on some column values.

have csv with few columns & in column 3 & 5 all records won't be having data in it.


ie,

1. read csv file line by line - i have done that

2. after ready a line, call sub function processLine() - done that

3. in processLine(), need to

check if column 3(Address Town) and column 5(Postcode) are empty, if yes, then don't write the entire line of record into new file, if not then write them in new csv file.


At step 3, am not sure how to read specific column in shell script.


csv file look like this
Code:
Store Ref	Location Name	Address	Town	County		Postcode	Region
200	        Wel Good 	50 Howar gate	Welwyn City	123 ghk		RCOMS
430     	Colchester	5 Cul Square	Col		
4302     	Colchester 1			ster	        ddd 453		RSOTE
43022   	Saffron Walden	2 Kong Street	Saf     	fff 888		RCTE

output file should look like :
Code:
Store Ref	Location Name	Address	Town	County		Postcode	Region
200	        Wel Good 	50 Howar gate	Welwyn City	123 ghk		RCOMS


43022   	Saffron Walden	2 Kong Street	Saf     	fff 888		RCTEOTE

Code:
#!/bin/bash


# User define Function (UDF)
processLine(){


   F1=$(echo $line | awk '{ print $9 }')

  echo $F1

  

}



while IFS="," read line
do
	# use $line variable to process line in processLine() function
	processLine $line

done < C1.csv
exit 1

somehelp please

Thanks,
Vijay

Last edited by VijayaRaghavanLakshman; 01-25-2012 at 10:57 AM.
 
Old 01-25-2012, 11:13 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
How about we help by asking questions and pointing out some problems:

1. What do you think the following line is doing?
Code:
while IFS="," read line
2. You are passing a variable to your function but then just call the variable anyway???? (why did you bother to pass it)

3. The line below is trying to return the ninth field. Are you sure there are that many in the data you passed in?
Code:
F1=$(echo $line | awk '{ print $9 }')
4. Finally, it appears that you have chosen to say to anything else using your script that when it completes it always fails? (is this what you want)
Code:
exit 1
 
Old 01-25-2012, 11:16 AM   #3
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
Is that really what the file looks like? CSV stands for "comma separated variable". I see no commas in your file, therefore it's not a csv file.

We need to know more details about the file. How are the fields delimited? It's not csv, so is it delimited by tabs, is it fixed format? The way you read it will depend on how it's actually formatted.


FYI: this is what your file would look like in csv format:
Code:
Store Ref,Location Name,Address Town,County,Postcode,Region
200,Wel Good,50 Howar gate,Welwyn City,123 ghk,RCOMS
430,Colchester,5 Cul Square,Col,,		
4302,Colchester 1,,ster,ddd 453,RSOTE
43022,Saffron Walden,2 Kong Street,Saf,fff 888,RCTE
If it doesn't look like that, it's not csv.

Last edited by suicidaleggroll; 01-25-2012 at 11:27 AM.
 
Old 01-26-2012, 12:06 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
In fairness, the term CSV is commonly used for files in which the delimiter character is not a comma.

@VijayaRaghavanLakshman: is it guaranteed there are no commas in the data, that is the only commas in the input file are field delimiters?
 
Old 01-26-2012, 02:18 AM   #5
VijayaRaghavanLakshman
LQ Newbie
 
Registered: Jan 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
Thanks for the reply grail,suicidaleggroll and catkin

@ grail,

1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.

please correct me if i have understood wrong, this line of code, reads the input file as comma separator.

Code:
while IFS="," read line
2. Didn't understand what you are mentioned in the point number 2

3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty

4. it was my mistake, thanks for pointing it out.

As a senior memeber please forgive my mistakes and help me out..

@suicidaleggroll and catkin,

As mentioned by suicidaleggroll, input file has comma (,) in b/w the data's

Sorry i didn't mentionined input file with coma separated.

Thanks,
Vijay
 
Old 01-26-2012, 09:00 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.

please correct me if i have understood wrong, this line of code, reads the input file as comma separator.
This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until
it encounters the first comma. Therefore my reason for asking the question is for you to realise that you will only have a single column stored in line.

Quote:
2. Didn't understand what you are mentioned in the point number 2
Yes sorry, this was not very clear ... let me try again.

You have the following line in your code:
Code:
processLine $line
This calls your function and passes the value stored in the variable 'line' as the first positional parameter.
Inside your function though you then refer to the variable 'line' directly:
Code:
F1=$(echo $line | awk '{ print $9 }')
My point here is that you either:

a. Call the function as you have and then refer to the variable passed as a positional parameter, ie in this case it would be $1 (this would be my preferred method to avoid confusion)

b. Call the function without any positional parameters and continue to use the same variable throughout your code (which in more complicated problems will often cause issues)

Hope that is a little clearer

Quote:
3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty
This refers back to point 1 above in that you will not have 15 - 18 columns but rather a single column from each line, ie the first column, which of course will not allow the awk command
to return what you are expecting.

Please do not take me the wrong way. I am pointing out the errors so you will learn from them

Please advise if this has helped and you have understood my points?
 
Old 01-26-2012, 09:13 AM   #7
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.
Hello grail

That is not correct (I tested to be sure). The characters in IFS are used for word-splitting the "lines" read by the read builtin, not for line-splitting which is done at newlines unless otherwise set by read's -d option.

Here's testing the effect of IFS=',':
Code:
c@CW8:/tmp$ echo 'a,b,c
> d,e,f' > input.txt
c@CW8:/tmp$ while IFS=',' read line
> do
>    echo "$line"
> done < input.txt
a,b,c
d,e,f
c@CW8:/tmp$ while IFS=',' read f1 f2 rest
> do 
>     echo "f1 is $f1, rest is $rest"
> done < input.txt
f1 is a, rest is c
f1 is d, rest is f
 
Old 01-26-2012, 09:39 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by grail View Post
This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.
Actually, this is incorrect. The IFS setting determines what characters the line is split on if you read it into multiple variables or an array. But the command itself will still accept up to the next newline in the input. It's the -d and -n options that let you change the stopping point to something else.

So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.


Edit: catkin, will you please stop doing that?!

Last edited by David the H.; 01-26-2012 at 09:41 AM.
 
Old 01-26-2012, 12:37 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
My bad guys ... thanks for the clarification Must have been on something when writing that bit
 
Old 01-26-2012, 09:02 PM   #10
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by David the H. View Post
So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.

Edit: catkin, will you please stop doing that?!
Thanks David the H,. that explanation of IFS usage was the missing piece of the jigsaw required for this post.

Great minds think alike, eh? And at the same time
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk - rearrange column data in csv file to match columns wolverene13 Programming 9 12-21-2011 04:55 AM
[SOLVED] Read a number x from a file and write a character in the xth column of other file homexylo Programming 7 08-05-2011 05:28 AM
How to redirect the output to a different column in .csv file Sayan Acharjee Linux - General 2 04-25-2011 11:43 AM
Rearrange the column in CSV file.. govi1234 Linux - Newbie 9 08-27-2010 07:15 PM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 09:15 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration