How to read CSV data and compare the column values and then write them in new file

VijayaRaghavanLakshman · 01-25-2012, 10:53 AM

Hi All,

I am trying to read a csv file line by line & process each line based on some column values.

have csv with few columns & in column 3 & 5 all records won't be having data in it.

ie,

1. read csv file line by line - i have done that

2. after ready a line, call sub function processLine() - done that

3. in processLine(), need to

check if column 3(Address Town) and column 5(Postcode) are empty, if yes, then don't write the entire line of record into new file, if not then write them in new csv file.

At step 3, am not sure how to read specific column in shell script.

csv file look like this

Code:

Store Ref	Location Name	Address	Town	County		Postcode	Region
200	        Wel Good 	50 Howar gate	Welwyn City	123 ghk		RCOMS
430     	Colchester	5 Cul Square	Col		
4302     	Colchester 1			ster	        ddd 453		RSOTE
43022   	Saffron Walden	2 Kong Street	Saf     	fff 888		RCTE

output file should look like :

Code:

Store Ref	Location Name	Address	Town	County		Postcode	Region
200	        Wel Good 	50 Howar gate	Welwyn City	123 ghk		RCOMS


43022   	Saffron Walden	2 Kong Street	Saf     	fff 888		RCTEOTE

Code:

#!/bin/bash


# User define Function (UDF)
processLine(){


   F1=$(echo $line | awk '{ print $9 }')

  echo $F1

  

}



while IFS="," read line
do
	# use $line variable to process line in processLine() function
	processLine $line

done < C1.csv
exit 1

somehelp please

Thanks,
Vijay

grail · 01-25-2012, 11:13 AM

How about we help by asking questions and pointing out some problems:

1. What do you think the following line is doing?

Code:

while IFS="," read line

2. You are passing a variable to your function but then just call the variable anyway???? (why did you bother to pass it)

3. The line below is trying to return the ninth field. Are you sure there are that many in the data you passed in?

Code:

F1=$(echo $line | awk '{ print $9 }')

4. Finally, it appears that you have chosen to say to anything else using your script that when it completes it always fails? (is this what you want)

Code:

exit 1

suicidaleggroll · 01-25-2012, 11:16 AM

Is that really what the file looks like? CSV stands for "comma separated variable". I see no commas in your file, therefore it's not a csv file.

We need to know more details about the file. How are the fields delimited? It's not csv, so is it delimited by tabs, is it fixed format? The way you read it will depend on how it's actually formatted.

FYI: this is what your file would look like in csv format:

Code:

Store Ref,Location Name,Address Town,County,Postcode,Region
200,Wel Good,50 Howar gate,Welwyn City,123 ghk,RCOMS
430,Colchester,5 Cul Square,Col,,		
4302,Colchester 1,,ster,ddd 453,RSOTE
43022,Saffron Walden,2 Kong Street,Saf,fff 888,RCTE

If it doesn't look like that, it's not csv.

catkin · 01-26-2012, 12:06 AM

In fairness, the term CSV is commonly used for files in which the delimiter character is not a comma.

@VijayaRaghavanLakshman: is it guaranteed there are no commas in the data, that is the only commas in the input file are field delimiters?

VijayaRaghavanLakshman · 01-26-2012, 02:18 AM

Thanks for the reply grail,suicidaleggroll and catkin

@ grail,

1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.

please correct me if i have understood wrong, this line of code, reads the input file as comma separator.

Code:

while IFS="," read line

2. Didn't understand what you are mentioned in the point number 2

3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty

4. it was my mistake, thanks for pointing it out.

As a senior memeber please forgive my mistakes and help me out..

@suicidaleggroll and catkin,

As mentioned by suicidaleggroll, input file has comma (,) in b/w the data's

Sorry i didn't mentionined input file with coma separated.

Thanks,
Vijay

grail · 01-26-2012, 09:00 AM

Quote:

1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.

please correct me if i have understood wrong, this line of code, reads the input file as comma separator.

This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until
it encounters the first comma. Therefore my reason for asking the question is for you to realise that you will only have a single column stored in line.

Quote:

2. Didn't understand what you are mentioned in the point number 2

Yes sorry, this was not very clear ... let me try again.

You have the following line in your code:

Code:

processLine $line

This calls your function and passes the value stored in the variable 'line' as the first positional parameter.
Inside your function though you then refer to the variable 'line' directly:

Code:

F1=$(echo $line | awk '{ print $9 }')

My point here is that you either:

a. Call the function as you have and then refer to the variable passed as a positional parameter, ie in this case it would be $1 (this would be my preferred method to avoid confusion)

b. Call the function without any positional parameters and continue to use the same variable throughout your code (which in more complicated problems will often cause issues)

Hope that is a little clearer

Quote:

3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty

This refers back to point 1 above in that you will not have 15 - 18 columns but rather a single column from each line, ie the first column, which of course will not allow the awk command
to return what you are expecting.

Please do not take me the wrong way. I am pointing out the errors so you will learn from them

Please advise if this has helped and you have understood my points?

catkin · 01-26-2012, 09:13 AM

Quote:

Originally Posted by grail

As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.

Hello grail

That is not correct (I tested to be sure). The characters in IFS are used for word-splitting the "lines" read by the read builtin, not for line-splitting which is done at newlines unless otherwise set by read's -d option.

Here's testing the effect of IFS=',':

Code:

c@CW8:/tmp$ echo 'a,b,c
> d,e,f' > input.txt
c@CW8:/tmp$ while IFS=',' read line
> do
>    echo "$line"
> done < input.txt
a,b,c
d,e,f
c@CW8:/tmp$ while IFS=',' read f1 f2 rest
> do 
>     echo "f1 is $f1, rest is $rest"
> done < input.txt
f1 is a, rest is c
f1 is d, rest is f

David the H. · 01-26-2012, 09:39 AM

Quote:

Originally Posted by grail

This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.

Actually, this is incorrect. The IFS setting determines what characters the line is split on if you read it into multiple variables or an array. But the command itself will still accept up to the next newline in the input. It's the -d and -n options that let you change the stopping point to something else.

So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.

Edit: catkin, will you please stop doing that?!

grail · 01-26-2012, 12:37 PM

My bad guys ... thanks for the clarification

Must have been on something when writing that bit

catkin · 01-26-2012, 09:02 PM

Quote:

Originally Posted by David the H.

So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.

Edit: catkin, will you please stop doing that?!

Thanks David the H,. that explanation of IFS usage was the missing piece of the jigsaw required for this post.

Great minds think alike, eh? And at the same time