Bash: How to remove rows in CSVs where Price column is less than 30

guest · 02-23-2009, 12:22 PM

I have tons of csv/txt files (~5gb & 300 files). Is there a way to edit all the files where if the price is less than 30, then it will simply remove the rows?

Say I have this in my csv:

Code:

ID,NAME,PRICE
1,BLUE,25
2,GREEN,36
3,TEAL,37
4,BLACK,0
5,PURPLE,10

After the command, the csv will have this:

Code:

ID,NAME,PRICE
2,GREEN,36
3,TEAL,37

Thanks in advance

colucix · 02-23-2009, 12:29 PM

Code:

awk -F, '$NF >= 30' file.cvs

guest · 02-23-2009, 12:35 PM

The problem is the actual file looks like this:

Code:

ID,NAME,PRICE,SHIPPING
1,BLUE,25,5.99
2,GREEN,36,2.99
3,TEAL,37,6.99
4,BLACK,0,1.99
5,PURPLE,10,9.99

so the desired output is like this:

Code:

ID,NAME,PRICE,SHIPPING
2,GREEN,36,2.99
3,TEAL,37,6.99

In your code, I don't see how it identifies the PRICE column explicitly? Thanks though colucix!!

colucix · 02-23-2009, 12:52 PM

Code:

awk -F, '$(NF-1) >= 30' file.csv

The -F, option tells awk to use a comma as Field Separator. The field is $(NF-1) if you want to parse the column which comes before the last one. That is I count the columns from the last one backward. If you want to count from the first column onward, just do

Code:

awk -F, '$3 >= 30' file.csv

guest · 02-23-2009, 12:58 PM

This is genius.. because it's so simple! thanks again colucix

guest · 04-11-2009, 03:06 PM

Is there a way to execute the command: awk -F, '$3 >= 30' file.csv

and have the output saved as file.csv as well?

I have many files and wish to edit the existing file instead of making new ones.

Code:

awk -F, '$3 >= 30' file.csv > file.csv

doesn't seem to work

colucix · 04-11-2009, 04:39 PM

Nope. Awk cannot edit files in place. You have to use sed with the -i option, but since sed cannot test numeric expressions, you have to find a regular expression matching the price less (or more) than 30. You can try with something like:

Code:

sed -i.bck '/.*,.*,[0-9],.*/d;/.*,.*,[0-2][0-9],.*/d' file.csv

that is you explicitly put all the commas separated fields. The third field will be either a one-digit number between 0 and 9 or a two digit number between 00 and 29. In this way lines with price less than 30 will be deleted.

The -i.bck does a backup copy of the original file with the .bck extension added. This is for safety. Once you've checked the result you can easily remove the backup files by rm *.bck.

bigearsbilly · 04-11-2009, 05:19 PM

yes,
when you do something like,

Code:

command infile > outfile

the output file is created and/or truncated initially.
hence, if it's the same name, you destroy the file before you open it.

still never mind eh?

one to remember.

guest · 04-11-2009, 06:15 PM

For now, this is what I did:

Code:

for file in *
do
awk -F, '$3 >= 30' "$file" > edited."$file"
done

But what's an easy way to rename edited.filename.csv back to filename.csv?

bigearsbilly · 04-11-2009, 06:20 PM

[QUOTE=guest;3505878]For now, this is what I did:

Code:

for file in *
do
mv $file $file.old
awk -F, '$3 >= 30' "$file.old" > "$file"
done
rm *.old

laterally?

guest · 04-11-2009, 06:44 PM

That works

I did something similar.. just renaming operations after the awk command