LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-06-2012, 01:30 AM   #1
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Rep: Reputation: Disabled
Print numbers and associated text belonging to an interval of numbers


%%%%%

Last edited by Trd300; 05-01-2012 at 05:33 AM.
 
Old 03-06-2012, 01:35 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
So I am not sure I see the difficulty? What have you tried? awk should handle this trivially.
 
Old 03-06-2012, 01:59 AM   #3
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Hi grail !

The problem is that I don't really know how to start.

Does awk have a particular option to handle intervals?
 
Old 03-06-2012, 03:16 AM   #4
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
I tried to isolate only the numbers form fields 1, 3 and 4 but it turns messy very quickly as they are merged with string of text.
 
Old 03-06-2012, 03:45 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
Ok ... this should get you going. Currently it will only provide column 3 data:
Code:
#!/usr/bin/awk -f

BEGIN{ FS = "\t*" }

NR > 1 && $1 ~ /-/ && $3 ~ /-/ && $4 ~ /-/{
    f3 = ""
    split($1,c1,"-")
    
    n = split($3,c3,";")
    
    for( i = 1; i <= n; i++){
	split(c3[i], p, "[ -]")
	
	if( p[2] >= c1[1] && p[2] <= c1[2] && p[3] >= c1[1] && p[3] <= c1[2])
	    f3 = f3 (f3?"; ":"")c3[i]
    }

    $3 = f3    
}
1
You should be able to figure out the other column and tidy it up to suit.
 
Old 03-06-2012, 03:51 AM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,260

Rep: Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328
If you're going to use awk, you may find this useful http://www.grymoire.com/Unix/Awk.html
 
Old 03-06-2012, 09:59 PM   #7
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 05:33 AM.
 
Old 03-06-2012, 11:45 PM   #8
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 05:34 AM.
 
Old 03-07-2012, 02:29 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
Quote:
1st question:
In line 7 (and 20): What does "f3" stand for? Is it the overall variable result after the calculations? Why "" after?
"f3" is just a variable but my thinking was 'field 3'. So it is used to store which items from field 3 will be stored.
f3 is set to "" so as to clear it to nil for each line being investigated.
Quote:
2nd question:
In line 19: Does p[2] and p[3] refer to "6"/"7" and "8"/"15" respectively in field 3 line 2 for example? Why don't call them p[1] and p[2] instead of p[2] and p[3] respectively?
If they do, I am a bit confused with the "[ -]" in line 15.
"[ -]" doesn't mean "the number before the -"?
"p" is the array storing the items from each field 3, stored in c3 (column 3) which is separated by ";", and its items are split on EITHER a space or a dash, [ -].
I will demonstrate with an example:
Code:
AGE 6-8 text 1.; AGE 7-15 text 2.
The above is from one of your field 3 examples when using FS = "\t*" (which by the way means zero or more tabs so should probably be FS = "\t+" as each field will have at least one tab as separator.
We then split this on the semi-colon and store values in "c3":
Code:
n = split($3,c3,";")
In this example the output is like so:
Code:
n = 2
c3[1] = AGE 6-8 text 1.
c3[2] = AGE 7-15 text 2.
Now inside the for loop we split again as follows:
Code:
split(c3[i], p, "[ -]")
And again, based on the example and i = 1 the values will be:
Code:
p[1] = AGE
p[2] = 6
p[3] = 8
p[4] = text
p[5] = 1.
From this we see that the values we are interested in are stored in the array indices 2 and 3.

So now when we look at the following code:
Code:
if( p[2] >= c1[1] && p[2] <= c1[2] && p[3] >= c1[1] && p[3] <= c1[2])
So again, looking at the full line example:
Code:
5-10      A2              AGE 6-8 text 1.; AGE 7-15 text 2.        SIZE 1-20 text 3.; SIZE 9-18 text 4.
The values in the "if" would be:
Code:
if( 6 >= 5 && 6 <= 10 && 8 >= 5 && 8 <= 10 )
As the above is true we now assign the piece of column 3 (c3[i] where i = 1) that we are looking at to our variable "f3":
Code:
f3 = f3 (f3?"; ":"")c3[i]
Here we use the ternary (?:) operation to check if f3 is empty and if it is simply assign c3[i] and if not the place "; " before c3[i]

I will let you read all of this and see if it helps with field 4. Let me know how you go
 
1 members found this post helpful.
Old 03-07-2012, 04:29 AM   #10
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Thanks grail,
Explained like that, it's simple !

I keep digging and take a look at the syntax for user defined variable.
 
Old 03-08-2012, 12:56 AM   #11
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 05:34 AM.
 
Old 03-08-2012, 01:01 AM   #12
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
NB: I am in bash shell

...and I checked the conditions several time, they seem to be correct.

Last edited by Trd300; 03-08-2012 at 01:06 AM.
 
Old 03-08-2012, 01:17 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
ummm ... I think you got a bit carried away If we look at your requirement:
Quote:
In Field4: only individual strings if at least ONE of the TWO numbers from Field1 are included in the interval in Field4 (endpoints included)
To me the if you are looking for (using your current variables) is:
Code:
if ( ( c1[1] >= q[2] && c1[1] <= q[3] ) || ( c1[2] >= q[2] && c1[2] <= q[3] ) )
Does this look correct to you?
 
Old 03-08-2012, 03:04 AM   #14
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
It is not easy to explain.

I consider the numbers not as individual numbers but as edges of an interval.
There are 4 conditions, depending of the location of the field1 interval compared to the field2 interval:
1st condition: field1 interval inside field2 interval
2nd condition: field1 interval overlapping field2 intervalon the right side
3rd condition: field1 interval overlapping field2 interval on the left side
4th condition: field1 interval surrounding field2 interval

Field1 interval has to fulfil one of these conditions, for the field2 and associated text to be printed in $4, that's why I use the "logical OR" ("||").

Do you see my point !

Last edited by Trd300; 03-09-2012 at 10:43 PM.
 
Old 03-08-2012, 10:37 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
Quote:
Do you see my point !
ahhh ... No

Except for the last example, which by the way would mean we don't care if numbers are in or out and that everything in field 4 should always be included, your first 3
seem to be covered by the if I have shown. Not sure if you are thrown by the variables or what the represent, but I will use data and variables to show what I am thinking:

Data:
Code:
5-10      A2              AGE 6-8 text 1.; AGE 7-15 text 2.        SIZE 1-20 text 3.; SIZE 9-18 text 4.
# Field 1 Field 2         Field 3                                  Field 4
# c1                      c3 array with 2 items with p array       c4 array with 2 items with q array
#                         to store 6&8 and then 7&15               to store 1&20 and then 9&18
We will ignore field 3 as we seem to agree here. My if again:
Code:
if ( ( c1[1] >= q[2] && c1[1] <= q[3] ) || ( c1[2] >= q[2] && c1[2] <= q[3] ) )

# Written with variables replaced

if ( ( 5 >= 1 && 5 <= 20 ) || ( 10 >= 1 && 10 <= 20 ) )

#        T    &&    T
Second part does not get tested as OR (||) allows for first set of brackets to be true.

Looking at second part of c4 which will be split into q:
Code:
if ( ( 5 >= 9 && 5 <= 18 ) || ( 10 >= 9 && 10 <= 18 ) )


#    (   F    &&    T    ) || (     T   &&    T     )
Here second test is true and hence it will be shown.

If we follow the process for the second line:
Code:
12-22    B2              AGE 3-8 text 5.                                      SIZE 10-19 text 6.; SIZE 10-11 text 7.; SIZE 23-28 text 8.
Here only the first of the 3 parts to field 4 would be included as 12 is between 10-19 but neither 12 nor 22 is in any of the other boundaries.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to find the 1 number out of 10 numbers, if I have taken 9 numbers from my brain ? rpittala Linux - Newbie 4 01-30-2012 06:40 PM
[SOLVED] find the total of numbers that are higher than x in a text file with numbers (using awk??) Mike_V Programming 12 11-24-2010 10:51 AM
sequence of numbers, how to extract which numbers are missing jonlake Programming 13 06-26-2006 04:28 AM
print openoffice in arabic numbers error reaky Linux - General 0 06-02-2004 10:51 AM
Adding numbers, break on non-numbers... Cruger Programming 1 03-22-2004 10:18 AM


All times are GMT -5. The time now is 08:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration