remove the extra numeric field in a text file

powah · 01-05-2008, 09:38 AM

How to remove the extra numeric field in a text file?
e.g. for the following text file, I want to remove $2,299.11 from the
first line, $2,292.86 from the second line, $2,170.08 from the fourth
line and $2,286.08 from the last line.

Oct. 01, 2007 CITY OF
VANCOUVER $456.48 $2,299.11
Property taxes
Oct. 02, 2007 TD GEN INS
$66.25 $2,292.86 Car insurance
Oct. 09, 2007 BELAIR HABITAT. BELAIR INS/ASS.
$40.95 Home insurance
Oct. 09, 2007 ENBRIDGE ENBRIDGE
$45.91 $2,170.08 Home heating
Nov. 05, 2007 CANADIAN TIRE #
$22.74 Home general maintenance
Dec. 10, 2007 CANADIAN TIRE #
$44.21 Home general maintenance
Dec. 19, 2007 CANADIAN TIRE #
$71.48 $2,286.08 Home general maintenance

Telemachos · 01-05-2008, 10:39 AM

Here is a small Perl script that seems to work (at least on the data you gave). At the moment, it would just print the changed version to your screen, but it should be easy enough to pipe that to a new version (or edit the script so that it does save the changed version). I just wanted you to be able to test it safely before saving anything or changing your data. I hope that this helps:

Code:

#!/usr/bin/perl
use strict;
use warnings;

while (<>) {
    s/(\$\d+,?\d+\.\d{2})\s+(\$\d+,?\d+\.\d{2})/$1/;
    print;
}

Save this as "delete_numbers" and then run it with, say,

Code:

perl delete_numbers file_to_change

That way you can check if it works across a whole real file before deciding what to do. Note that this will only work if the two numbers are next to one another, separated by only spaces. If the file gets more complicated (eg, with words between the two numbers, or the numbers on separate lines), this wouldn't work.

pixellany · 01-05-2008, 12:37 PM

Life would not be complete without a solution using SED....

Code:

sed 's/\$[0-9]\+,\?[0-9]\+\.[0-9]\{2\}//2' filename > newfilename

Disclosure: I looked at the PERL solution, and also had to learn that the "+" has to be escaped in SED.

EDIT: fixed omitted "\"

ghostdog74 · 01-06-2008, 02:54 AM

in bash, tested only for your sample file.

Code:

while read line
do
    for items in $line
    do
        case $items in 
        "$"*","* ) line="${line/$items/}";;        
        esac 
    done
    echo $line;
done < "file"

yawe_frek · 01-06-2008, 10:32 AM

hi pixellany,

for people like me still learning sed. could you kindly explain

sed 's/$[0-9]\+,\?[0-9]\+\.[0-9]\{2\}//2' filename > newfilename

Thanks

pixellany · 01-06-2008, 11:42 AM

Quote:

Originally Posted by yawe_frek

hi pixellany,

for people like me still learning sed. could you kindly explain

sed 's/$[0-9]\+,\?[0-9]\+\.[0-9]\{2\}//2' filename > newfilename

Thanks

The basic syntax for sed s --in this context--is:
sed 's/thingtofind//2' filename > newfilename
This means "using filename, find thingtofind and replace the 2nd occurence with nothing. Write the result to newfilename"

Now to translate "thingtofind":

EEK!!!! I somehow posted the wrong code: I should be:

Quote:

sed 's/\$[0-9]\+,\?[0-9]\+\.[0-9]\{2\}//2' filename > newfilename

That first "\" makes all the difference.....

Translation (code in bold):
literal "$" \$
any number, minimum of one occurrence [0-9]\+
an optional comma ,\? (This means there can be a comma, but not any other character.)
any number, minimum of one occurrence [0-9]\+
literal "." \.
any number, exactly two occurrences [0-9]\{2\}

One of the big tricks with something like this is to keep the regular expression from being greedy---ie matching more than was intended.

My favorite SED tutorial here: http://www.grymoire.com/Unix

yawe_frek · 01-06-2008, 12:27 PM

Thnaks so much for taking out time to explain this to me i am really glad. Less i forget kindly send me the special characters that need to be escaped.

this are the onces i know.

.*^$[]\

THNAKS

pixellany · 01-06-2008, 01:09 PM

The tutorial I linked has all that stuff--and more.

The really complete reference is the Advanced Bash Scripting Guide (ABS)---at http://tldp.org

Here's one basic definition of when an escape is needed:
"Whenever the meaning of the character needs to be changed from what it normally would be in the context." Escaping can be used to make a character be special---or to stop it from being special.

Examples:
sed 's/?/C/g' filename changes all "?" to "C"--- "?" is not special in SED, unless it is escaped.
sed 's/./C/g' filename changes any character to "C"---thus for a literal ".", we need "\."

Disclosure: The only way I learned this stuff was a mix of reading and trial and error. The power of BASH and Regular expressions unfortunately comes with a lot of stuff that is not intuitive.

yawe_frek · 01-06-2008, 01:46 PM

thanks onces again 4 the sites

ghostdog74 · 01-06-2008, 07:53 PM

@OP
Just FYI, regexp can be a tool for you to learn and use, but not every solution to a problem needs a regexp. Here's a link for you to get started on sed/awk and shell.

Telemachos · 01-07-2008, 08:29 PM

Quote:

Originally Posted by ghostdog74

@OP
Just FYI, regexp can be a tool for you to learn and use, but not every solution to a problem needs a regexp. Here's a link for you to get started on sed/awk and shell.

I was curious about the link, but it seems dead. Let us know if it's just a typo, please.

ghostdog74 · 01-07-2008, 08:49 PM

Quote:

Originally Posted by Telemachos

I was curious about the link, but it seems dead. Let us know if it's just a typo, please.

Its searchable from google if you know how to. then how about this and this.

Telemachos · 01-08-2008, 10:15 AM

Quote:

Originally Posted by ghostdog74

Its searchable from google if you know how to. then how about this and this.

Google does searches? Wow, I hadn't known that. Thanks for the tip. You should have checked your link before you posted it. I was simply pointing out the link to nowhere - which you should still edit out of your original post to save other people the trouble of clicking on a dead link. The first link in the post above is to a file that seems corrupt, or at least it fills my browswer with nonsense characters. You really should check before you post a link.

ghostdog74 · 01-08-2008, 08:24 PM

Quote:

Originally Posted by Telemachos

You should have checked your link before you posted it.

The link is ok from my side. Some areas cannot get through, from what I know.

Quote:

at least it fills my browswer with nonsense characters. You really should check before you post a link.

It depends on whether you know how its done. You can just right click and save to your system.