LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   decimal pattern matching (http://www.linuxquestions.org/questions/programming-9/decimal-pattern-matching-900716/)

kadhan 09-02-2011 08:34 AM

decimal pattern matching
 
I have a big file and the lines pattern is given below:

MDQ[11:15],IO,MDQ[10:14],,,,MDQ[12:16],TPR_AAWD[11:15]

I want to modify this file like given below:

MDQ[11],IO,MDQ[10],,,,MDQ[12],TPR_AAWD[11]
MDQ[12],IO,MDQ[11],,,,MDQ[13],TPR_AAWD[12]
MDQ[13],IO,MDQ[12],,,,MDQ[14],TPR_AAWD[13]
MDQ[14],IO,MDQ[13],,,,MDQ[15],TPR_AAWD[14]


How i can implement this in sed/awk/perl/csh/vim?
Please help

Proud 09-02-2011 09:22 AM

I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

David the H. 09-02-2011 09:33 AM

It looks more to me like he wants to take the bracketed parts as a sequence of integers, and expand them into individual entries based on that sequence. [11:15] becomes five single lines with [11], [12], [13], [14], and [15]. But the whole problem is still not entirely clear.

You say it's a "big file", but you only showed us one line. Does every line have sequences like that, or are there other lines interspersed? Are the sequences all single digit increments? Are the patterns regular or irregular in any way? Do you want the output to be a single file, with only the expanded lines, or with the expanded lines following the unexpanded lines, or what?

Please clarify.

And please use [code][/code] tags around your code (including input and output text), to preserve formatting and to improve readability.

kadhan 09-02-2011 09:35 AM

Quote:

Originally Posted by Proud (Post 4459634)
I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

yaa.. its integers.

first we need to calculate the range between MDQ[11:15] , its 5.
and need to split the line into 5 lines as given below:
MDQ[11]
MDQ[12]
MDQ[13]
MDQ[14]
MDQ[15]

kadhan 09-02-2011 09:45 AM

Quote:

Originally Posted by David the H. (Post 4459647)
It looks more to me like he wants to take the bracketed parts as a sequence of integers, and expand them into individual entries based on that sequence. [11:15] becomes five single lines with [11], [12], [13], [14], and [15]. But the whole problem is still not entirely clear.

You say it's a "big file", but you only showed us one line. Does every line have sequences like that, or are there other lines interspersed? Are the sequences all single digit increments? Are the patterns regular or irregular in any way? Do you want the output to be a single file, with only the expanded lines, or with the expanded lines following the unexpanded lines, or what?

Please clarify.

And please use [code][/code] tags around your code (including input and output text), to preserve formatting and to improve readability.

Yes. Its a big file, i showed here only one line and every other lines has same sequence.

input file is given below:

Code:


MDQ[10:15],IO,MDQ[10:15],,,,MDQ[10:15],TPR_AAWD[10:15],,,DATA[11:16],DATA[11:16],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[16],IO,MDQ[16],,,,MDQ[16],TPR_CLK_SYNC,,,DATA[16],DATA[16],IO,,1,,GVDD (1.5V/1.35V) SSTL15

output file become :

Code:

MDQ[10],IO,MDQ[10],,,,MDQ[10],TPR_AAWD[10],,,DATA[11],DATA[11],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[11],IO,MDQ[11],,,,MDQ[11],TPR_AAWD[11],,,DATA[12],DATA[12],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[12],IO,MDQ[12],,,,MDQ[12],TPR_AAWD[12],,,DATA[13],DATA[13],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[13],IO,MDQ[13],,,,MDQ[13],TPR_AAWD[13],,,DATA[14],DATA[14],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[14],IO,MDQ[14],,,,MDQ[14],TPR_AAWD[14],,,DATA[15],DATA[15],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[15],IO,MDQ[15],,,,MDQ[15],TPR_AAWD[15],,,DATA[16],DATA[16],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[16],IO,MDQ[16],,,,MDQ[16],TPR_CLK_SYNC,,,DATA[16],DATA[16],IO,,1,,GVDD (1.5V/1.35V) SSTL15


grail 09-02-2011 09:46 AM

So each line the difference between the ranges for each item on a line will always be the same?
ie. each MDQ in this line have a range size of 5

Proud 09-02-2011 09:50 AM

Yes, should the first range dictate the number of rows, are the other ranges redundant (only their first number need be read?) or do they dictate just the pattern/stepping for values in those rows&columns, or can they cause more rows to be added, etc?

ta0kira 09-02-2011 09:51 AM

Quote:

Originally Posted by Proud (Post 4459634)
I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

Maybe "decimal" (vs. "hexadecimal") integers? Technically not incorrect, but definitely misleading.
Kevin Barry

grail 09-02-2011 11:11 AM

Well going on the assumption that Proud and I have made:
Code:

#!/usr/bin/awk -f

BEGIN{ FS = "[][]" }

/:/{
    n = 1
    j = 1
    diff = 0
    for(i = 1; i <= NF; i++){
        if($i ~ /:/){
            split($i, a, ":")
            start[j++] = a[1]
            if( ! diff )
                diff = a[2] - a[1] + 1
        }
        else
            pieces[n++] = $i
    }
    for(x = 1; x <= diff; x++){
        line = ""
        for(y = 1; y < (n-1);y++)
            line = sprintf("%s%s[%d]", line, pieces[y], start[y]++)

        line = sprintf("%s%s", line, pieces[n-1])

        print line
    }
}

!/:/

I am sure a perl guru out there will have something to offer :)

kurumi 09-02-2011 11:21 PM

Ruby(1.9+)

Code:

#!/usr/bin/env ruby
range=[]
File.open("file").each do |line|
    if line[/^MDQ\[(\d+):(\d+)\]/]
        num=line.scan(/DATA\[(\d+):/)[0][0]
        line.scan(/^MDQ\[(\d+):(\d+)\]/){|x,y| range=(x..y).to_a }
        range.each do |i|
            line.gsub!(/(MDQ|TPR_AAWD)\[.[^\]]*?\]/,"\\1[#{i}]")
            puts line.gsub!(/DATA\[.[^\]]*?\]/,"DATA[#{num}]")
            num.succ!
        end
    else
        puts line
    end
end


grail 09-03-2011 07:46 AM

That is sweet kurumi ... haven't worked it all out but very nice :)

kadhan 09-05-2011 02:51 AM

Hi Grail and Kurumi,

Your solutions are working fine for me.... Thanks a lot for your help.....
Grail, Can you please explain the FS = "[][]" in the first line of the awk script.

Regards,
Kadhan.

Proud 09-05-2011 03:46 AM

FS = "[][]"
I belive this is setting the field separator to the regular expression [][] which is the character set containing ] or [. The definition of a character set is the outer [] and I think the double quotes might be superfluous.
FS = [\]\[] might be clearer, you can test if it works.

grail 09-05-2011 03:52 AM

The quotes are required as it is a computed regex.

kadhan 09-05-2011 04:12 AM

I got your point... Thanks

Regards,
Kadhan


All times are GMT -5. The time now is 09:40 PM.