LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-12-2009, 12:00 PM   #1
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Rep: Reputation: 0
Sed replace string up to tab


I'd like a sed command to replace all decimal values greater than 0.5 with nothing in a tab delimited text file.


EX: Input text- 0.8765 0.301 0.5 0.11 (note, for simplicity these values are space delim)
This is what I'd like back- 0.301 0.11
There would be one tab before 0.301 and two tabs after it. This way when you paste the text into an excel spreadsheet, there are empty cells where you deleted values.

I thought the sed command would be something like:

sed s/0\.[5-9].*[[:space:]]//g

But this will delete everything.

What I need is something that will start deleting when it sees 0\.[5-9] and stop when it reaches a tab. I know you can use [^character]+ to do this, but it doesn't seem to work with a TAB as the character.

sed /0\.[5-9][^character]+//g


Another idea I had would be to have sed replace from 0.[5-9] to 0. and replace with a tab + 0.
But I also cant get [^0\.]+ to work as it only works with single characters.



Any ideas?
Reallly need this command for my work.
Thanks-
 
Old 12-12-2009, 12:13 PM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
First, your criteria seems to have very little to do with tabs---you need to recognize decimal numbers.

First, the regex for a decimal number with any number of digits, value >or= 0.5 and < 1.0, followed by any non-numeric character:

Code:
0\.[5-9]\+[^[:digit:]]
To keep from deleting the non-digit, put it in a "backreference" \(....\), then recall it using \1

So:

Code:
sed 's/0\.[5-9]\+\([^[:digit:]]\)/\1/' filename > newfilename
Not tested---if you post a larger example of the file contents, we can help test.

Last edited by pixellany; 12-12-2009 at 01:50 PM. Reason: Fixed a typo
 
Old 12-12-2009, 12:44 PM   #3
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Here is a text example. I included the same text in an attached text file.


0.1972 0.07161 0.06874 0.1313 0.1499 0.197710473 0.071610311 0.071297537 0.07466101
0.04915 0.02215 0.02114 0.08588 0.02757 0.049525059 0.022147094 0.021650466 0.023604862


Your sed command:
sed 's/0\.[5-9]\+\([^[:digit]]\)/\1/'

Gave me this error:
sed: -e expression #1, char 25: unterminated `s' command
Attached Files
File Type: txt 2x9_decimal-values_tab-delim.txt (178 Bytes, 4 views)
 
Old 12-12-2009, 01:44 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

pixellany made a typo in his command, it should be [:digit:], not [:digit] (mind the missing : at the end).

I don't get which output you want. You already provided an infile example, could you post the desired output that goes with that example?
 
Old 12-12-2009, 01:52 PM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Hi,

Your first request (replace all decimal values greater than 0.5 with nothing) and your example don't go together. There are no numbers larger then 0.5 (0.1 is biggest).

Wrong example or should that be 0.05 or ...?

Anyway, this does what you originally requested:

sed 's/0\.[5-9][0-9]*//' infile

I dope hope this is what you are looking for.....
 
Old 12-12-2009, 02:05 PM   #6
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Wow, my bad.

I actually just figured it out. My sed command is:


sed 's/0\.[5-9][0-9]*\([^0-9]\)/\1/g'

Only problem is that it doesn't replace decimal strings in the last field. It would be nice, but I think I can just paste a dummy column at the end, or add a tab or something.


Thanks for your help. Couldn't of figured this out otherwise.
 
Old 12-12-2009, 02:18 PM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Eeeek!!! another mistake......

Your file does not have anything larger than 0.5, so the command is not going to work!!!!!

I found more problems---here is what seems to work (I created the test file "dec"):

Code:
[mherring@Ath Downloads]$ more dec
0.4     0.3     0.56    0.4999
0.23456 0.499999        0.50002
0.49abc 0.52xyz 0.6     0.9
1.0     1.5     1.6
[mherring@Ath Downloads]$ sed -r 's/0\.[5-9][0-9]*([^[:digit:]]|$)/\1/g' dec
0.4     0.3             0.4999
0.23456 0.499999
0.49abc xyz
1.0     1.5     1.6
[mherring@Ath Downloads]$
Note that I had to change to extended regexes and use the alternation operator so it would also match the end of the line.

The other glaring error in my previous code is that it would only match 1 or more occurences of [5-9]---what's needed is [5-9] followed by any digit.

Last edited by pixellany; 12-12-2009 at 02:19 PM.
 
Old 12-12-2009, 02:20 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Great minds work in the same channels!!! My code takes care of the end of the line, also.
 
Old 12-12-2009, 02:20 PM   #9
flewis777
LQ Newbie
 
Registered: Dec 2009
Location: Northern CA
Distribution: UBUNTU 9.10
Posts: 1

Rep: Reputation: 0
Here is one that works

if you have 0.55 0.25 0.501 0.95 0.55 delimited by tabs
the this will keep the tabs and eliminate the ones under .5 :

sed 's/0*\.[0-4][0-9]*//g' old file > newfile


the output is 0.55 0.501 0.95 0.55

and for 0.25 0.25 0.301 0.95 0.55

the output is: 0.95 0.55

The TABS stay in so Excel will see them.

Fred
 
1 members found this post helpful.
Old 12-12-2009, 03:49 PM   #10
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Thanks pixellany, works perfect!


sed -r 's/0\.[5-9][0-9]*([^0-9]|$)/\1/g'

I'm really new to unix/sed, so pardon my ignorance, but:
What does the |$ mean/ literally do?
And why did the backreference (/...)/ work before, but with the addition of |$ it doesn't and (...) must be used instead?
 
Old 12-12-2009, 05:17 PM   #11
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by snakefact View Post
Thanks pixellany, works perfect!


sed -r 's/0\.[5-9][0-9]*([^0-9]|$)/\1/g'

I'm really new to unix/sed, so pardon my ignorance, but:
What does the |$ mean/ literally do?
And why did the backreference (/...)/ work before, but with the addition of |$ it doesn't and (...) must be used instead?
The overall logic is to find the desired number pattern followed by any character that is:
not a number ..... [^0-9]
OR
the end of the line ..... $

The | is the "alternation operator"---a fancy way of saying "or". This operator works only with extended regexes, so I switched to that mode using the -r flag. Once I did this, then the backreference uses (...) instead of \(...\).

If you are now confused, you are a member of a large community....

Good tutorials here:
http://www.grymoire.com/Unix/

and here:
http://tldp.org
Look for the Bash Guide for Beginners and the Advanced Bash Scripting Guide (ABS)
 
Old 12-12-2009, 05:23 PM   #12
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
I think that the version from flewis777 is actually good enough for the sample data we have seen so far (except that the logic is backwards---you want to delete the numbers that are 0.5 and larger)---i.e. as long as you don't have anything except numbers between the tabs, my version is overkill.
 
Old 12-13-2009, 12:38 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
deleted

Last edited by ghostdog74; 12-13-2009 at 02:41 AM. Reason: dup
 
Old 12-13-2009, 12:38 AM   #14
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
@OP, you are using the wrong choice of tool. Use awk, where you can compare numbers. don't need to waste time creating complicated regex..
Code:
$ s="0.8765 0.301 0.5 0.11"
$ echo $s|awk '{for(i=1;i<=NF;i++) {if($i<0.5) print $i} }'
0.301
0.11
Further, there are things that the proposed regex solutions might not catch. What if you have say 09.155. That, theoretically is 9.155. Might not happen with your case, BUT nothing is impossible.

Last edited by ghostdog74; 12-13-2009 at 12:44 AM.
 
0 members found this post helpful.
Old 12-13-2009, 01:48 AM   #15
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
With all due respect, SED is an entirely appropriate solution for this. No doubt there are some arguments to favor AWK---especially if the problem is more generalized---but that does not make the SED solution incorrect.

In our motorcycle shop--almost 50 years ago--we had a mantra: "If it works, it's OK." Still true today---and will continue to be true for all time.

I would be hard-pressed to think of a computing problem that has only one solution.......
 
1 members found this post helpful.
  


Reply

Tags
sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sed replace string problem. zcrxsir88 Programming 4 02-11-2009 11:48 AM
SED replace string by occurrence uttam_h Programming 5 03-05-2008 10:02 PM
sed replace string octeto Programming 4 06-06-2007 02:09 AM
How can I replace this string with another using sed? dave4545 Programming 7 01-27-2006 10:58 AM
[sed] replace string? chuanyung Programming 3 03-11-2004 08:42 PM


All times are GMT -5. The time now is 12:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration