LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-12-2009, 05:16 PM   #1
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Rep: Reputation: 0
perform -log() on all decimal strings within file


The title pretty much says it.

I have a file that contains tab delimited decimals between 0 and 1, many rows X many-columns.

Does anyone know if/how it would be possible to manipulate each value by -log(X). I know I can do this using excel, but my dataset is huge so it would be much easier to do it with one of two unix commands.


Input Example:
0.25 0.75 0.0000001 0.001 0.99
0.0000001 0.001 0.25 0.75 0.99

Output Example:
0.602059991 0.124938737 7 3 0.004364805
7 3 0.602059991 0.124938737 0.004364805


Suggestions welcome.

Thanks-
 
Old 12-12-2009, 05:28 PM   #2
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,720

Rep: Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704
You asked for a unix command, so how about awk:
Code:
echo "0.25" | awk '{ print -log($1) }'
Cheers,


Evo2.
 
Old 12-12-2009, 05:48 PM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Good suggestion. awk would be my choice, too. Just pay attention to the fact that in awk the log function gives the natural logarithm, whereas the OP asks for the base-10 logarithm, as clearly shown by the posted example. Hence you have to divide the natural logarithm by log(10):
Code:
$ echo "0.25" | awk '{ printf "%11.9f\n",-log($1)/log(10) }'
0.602059991
 
Old 12-12-2009, 05:57 PM   #4
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
I like the above suggestions too, but if for whatever reason they don't appeal to the OP, have a look at the bc man page. bc is the "basic calculator" and does log functions too.
 
Old 12-12-2009, 05:57 PM   #5
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,720

Rep: Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704
Ok, I realize my awk suggestion is not of much use since it uses base e log. Also it does not address reading and writing files like you requested.

So, to make amends here is a little python script that should do what you want. It's not really unix, but, meh.

Code:
#!/usr/bin/env python
import sys
import math
from math import log
for line in sys.stdin:
    for word in line.split():
        try:
            print -math.log10(float(word)),
        except:
            print word,
    print
This should work even if there are things in the file that aren't numbers.

Assuming you call the script logall
Usage:
Code:
logall < infile > outfile
I'm sure perl people could do this in one line.

Cheers,

Evo2.

Last edited by evo2; 12-12-2009 at 06:05 PM. Reason: Correct code tags
 
Old 12-12-2009, 06:15 PM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by evo2 View Post
Ok, I realize my awk suggestion is not of much use since it uses base e log. Also it does not address reading and writing files like you requested.
Not really true...
Code:
$ awk '{for (i=1; i<=NF; i++) sub($i,-log($i)/log(10),$i); print}' file
0.60206 0.124939 7 3 0.00436481
7 3 0.60206 0.124939 0.00436481


Edit: if you want to match exactly the number of digits in the floating point numbers, as shown in the example output, you can change the value of the CONVFMT built-in variable
Code:
$ awk 'BEGIN{CONVFMT="%.9g"}{for (i= 1; i <= NF; i++) sub($i,-log($i)/log(10),$i); print}' file
0.602059991 0.124938737 7 3 0.0043648054
7 3 0.602059991 0.124938737 0.0043648054

Last edited by colucix; 12-12-2009 at 06:25 PM.
 
Old 12-12-2009, 06:35 PM   #7
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Original Poster
Rep: Reputation: 0
I tried the python script and it works great.

The only problem is some of the values returned have too many characters (eg 0.602059991328).

Is there a way to limit the number of characters per value to say 4 after the decimal?
 
Old 12-12-2009, 10:59 PM   #8
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,720

Rep: Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704
Quote:
Originally Posted by snakefact View Post
I tried the python script and it works great.
Cool.

Quote:
The only problem is some of the values returned have too many characters (eg 0.602059991328).

Is there a way to limit the number of characters per value to say 4 after the decimal?
There sure is. Taking the queue from colucix, just replace the print line with:
Code:
print '%.9g' % -math.log10(float(word)),
EDIT: I just realized there is an unneeded/unusde line in my original script. You can delete the following line.
Code:
from math import log
Cheers,

Evo2.

Last edited by evo2; 12-12-2009 at 11:01 PM.
 
Old 12-14-2009, 02:42 PM   #9
snakefact
LQ Newbie
 
Registered: Dec 2009
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
$ awk 'BEGIN{CONVFMT="%.9g"}{for (i= 1; i <= NF; i++) sub($i,-log($i)/log(10),$i); print}' file
Quote:
Edit: if you want to match exactly the number of digits in the floating point numbers, as shown in the example output, you can change the value of the CONVFMT built-in variable


Id like to be able to limit the number of digits after the decimal. The floating point in the above command limits the number of digits after the last 0.

Ex: 0.00000000000009876 can be returned if the CONVFMT="%.4g"

When I'd want 0.0000


Any suggestions?
 
Old 12-14-2009, 02:56 PM   #10
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
I'm not familiar with the perl & python, so while there's likely a way to get the math functions to limit the output places, I don't know it. Meanwhile, what you *could* do if you simply want to chop the end off the decimals, is use sed:

Code:
echo 0.00000000000009876 | sed 's/\(.*\.[0-9][0-9][0-9][0-9]\)\(.*\)/\1/'
..just pipe the output of your formula stuff into the sed statement. NOTE: this doesn't round anything or do anything mathematical; it just chops the string down to 4 places-- that's all.

Sasha
 
Old 12-14-2009, 04:44 PM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by snakefact View Post
When I'd want 0.0000


Any suggestions?
If you understand what %.9g you can easily find the answer. It is the notation of printf format used in C and inherited by many scripting languages or specific commands. Section 4.5.2 in the current version of the GAWK manual explains it all. Moreover you will find some detailed explanation and caveats about the usage of the CONVFMT internal variable. In the meanwhile you can give a try to CONVFMT=%.4f and see if it suits your requirement.
 
Old 12-14-2009, 08:10 PM   #12
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,720

Rep: Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704Reputation: 1704
%6.4f
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
In Apache server, How to change log file location and log format for access log fil? since1993 Linux - Server 1 08-19-2009 05:14 PM
Rename a file with hexa decimal characters sandeshsk007 Linux - Software 13 12-13-2007 03:11 AM
bash scripting: loop over a file, replacing two decimal numbers frankie_DJ Programming 2 04-30-2007 05:04 PM
how to perform simple editing of a file kos147 Linux - Newbie 15 05-30-2005 06:16 AM
How to perform a file system check? atheist Linux - General 6 02-29-2004 12:47 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration