How to read CSV data and compare the column values and then write them in new file
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to read CSV data and compare the column values and then write them in new file
Hi All,
I am trying to read a csv file line by line & process each line based on some column values.
have csv with few columns & in column 3 & 5 all records won't be having data in it.
ie,
1. read csv file line by line - i have done that
2. after ready a line, call sub function processLine() - done that
3. in processLine(), need to
check if column 3(Address Town) and column 5(Postcode) are empty, if yes, then don't write the entire line of record into new file, if not then write them in new csv file.
At step 3, am not sure how to read specific column in shell script.
csv file look like this
Code:
Store Ref Location Name Address Town County Postcode Region
200 Wel Good 50 Howar gate Welwyn City 123 ghk RCOMS
430 Colchester 5 Cul Square Col
4302 Colchester 1 ster ddd 453 RSOTE
43022 Saffron Walden 2 Kong Street Saf fff 888 RCTE
output file should look like :
Code:
Store Ref Location Name Address Town County Postcode Region
200 Wel Good 50 Howar gate Welwyn City 123 ghk RCOMS
43022 Saffron Walden 2 Kong Street Saf fff 888 RCTEOTE
Code:
#!/bin/bash
# User define Function (UDF)
processLine(){
F1=$(echo $line | awk '{ print $9 }')
echo $F1
}
while IFS="," read line
do
# use $line variable to process line in processLine() function
processLine $line
done < C1.csv
exit 1
somehelp please
Thanks,
Vijay
Last edited by VijayaRaghavanLakshman; 01-25-2012 at 10:57 AM.
Is that really what the file looks like? CSV stands for "comma separated variable". I see no commas in your file, therefore it's not a csv file.
We need to know more details about the file. How are the fields delimited? It's not csv, so is it delimited by tabs, is it fixed format? The way you read it will depend on how it's actually formatted.
FYI: this is what your file would look like in csv format:
Thanks for the reply grail,suicidaleggroll and catkin
@ grail,
1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.
please correct me if i have understood wrong, this line of code, reads the input file as comma separator.
Code:
while IFS="," read line
2. Didn't understand what you are mentioned in the point number 2
3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty
4. it was my mistake, thanks for pointing it out.
As a senior memeber please forgive my mistakes and help me out..
@suicidaleggroll and catkin,
As mentioned by suicidaleggroll, input file has comma (,) in b/w the data's
Sorry i didn't mentionined input file with coma separated.
1. What do you think the following line is doing? -- Internal Field Separator, since the input file is comma separator, i thought we should mentioned this line.
please correct me if i have understood wrong, this line of code, reads the input file as comma separator.
This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until
it encounters the first comma. Therefore my reason for asking the question is for you to realise that you will only have a single column stored in line.
Quote:
2. Didn't understand what you are mentioned in the point number 2
Yes sorry, this was not very clear ... let me try again.
You have the following line in your code:
Code:
processLine $line
This calls your function and passes the value stored in the variable 'line' as the first positional parameter.
Inside your function though you then refer to the variable 'line' directly:
Code:
F1=$(echo $line | awk '{ print $9 }')
My point here is that you either:
a. Call the function as you have and then refer to the variable passed as a positional parameter, ie in this case it would be $1 (this would be my preferred method to avoid confusion)
b. Call the function without any positional parameters and continue to use the same variable throughout your code (which in more complicated problems will often cause issues)
Hope that is a little clearer
Quote:
3. Yes, there are around 15-18 fields (columns) in the input file, but in first 5 columns (fields) all my above mentioned requirement are present. so i tried to see if am able to fetch the column, if so then i taught will store it in some variable & compare the values with null/empty
This refers back to point 1 above in that you will not have 15 - 18 columns but rather a single column from each line, ie the first column, which of course will not allow the awk command
to return what you are expecting.
Please do not take me the wrong way. I am pointing out the errors so you will learn from them
Please advise if this has helped and you have understood my points?
As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.
Hello grail
That is not correct (I tested to be sure). The characters in IFS are used for word-splitting the "lines" read by the read builtin, not for line-splitting which is done at newlines unless otherwise set by read's -d option.
Here's testing the effect of IFS=',':
Code:
c@CW8:/tmp$ echo 'a,b,c
> d,e,f' > input.txt
c@CW8:/tmp$ while IFS=',' read line
> do
> echo "$line"
> done < input.txt
a,b,c
d,e,f
c@CW8:/tmp$ while IFS=',' read f1 f2 rest
> do
> echo "f1 is $f1, rest is $rest"
> done < input.txt
f1 is a, rest is c
f1 is d, rest is f
This is partially correct. As you have set the IFS variable to a comma it will not read in the whole line, as IFS would need to be a new line for that to happen, so it will read up until it encounters the first comma.
Actually, this is incorrect. The IFS setting determines what characters the line is split on if you read it into multiple variables or an array. But the command itself will still accept up to the next newline in the input. It's the -d and -n options that let you change the stopping point to something else.
So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.
Edit: catkin, will you please stop doing that?!
Last edited by David the H.; 01-26-2012 at 09:41 AM.
So setting IFS to another character is almost pointless when used with a single variable, except for one thing: if the input string contains leading or trailing whitespace, the default IFS will strip them off. You have to set it to a non-whitespace value (or nothing) in order to preserve them.
Edit: catkin, will you please stop doing that?!
Thanks David the H,. that explanation of IFS usage was the missing piece of the jigsaw required for this post.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.