[SOLVED] decimal pattern matching

kadhan · 09-02-2011, 07:34 AM

I have a big file and the lines pattern is given below:

MDQ[11:15],IO,MDQ[10:14],,,,MDQ[12:16],TPR_AAWD[11:15]

I want to modify this file like given below:

MDQ[11],IO,MDQ[10],,,,MDQ[12],TPR_AAWD[11]
MDQ[12],IO,MDQ[11],,,,MDQ[13],TPR_AAWD[12]
MDQ[13],IO,MDQ[12],,,,MDQ[14],TPR_AAWD[13]
MDQ[14],IO,MDQ[13],,,,MDQ[15],TPR_AAWD[14]

How i can implement this in sed/awk/perl/csh/vim?
Please help

Proud · 09-02-2011, 08:22 AM

I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

David the H. · 09-02-2011, 08:33 AM

It looks more to me like he wants to take the bracketed parts as a sequence of integers, and expand them into individual entries based on that sequence. [11:15] becomes five single lines with [11], [12], [13], [14], and [15]. But the whole problem is still not entirely clear.

You say it's a "big file", but you only showed us one line. Does every line have sequences like that, or are there other lines interspersed? Are the sequences all single digit increments? Are the patterns regular or irregular in any way? Do you want the output to be a single file, with only the expanded lines, or with the expanded lines following the unexpanded lines, or what?

Please clarify.

And please use [code][/code] tags around your code (including input and output text), to preserve formatting and to improve readability.

kadhan · 09-02-2011, 08:35 AM

Quote:

Originally Posted by Proud

I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

yaa.. its integers.

first we need to calculate the range between MDQ[11:15] , its 5.
and need to split the line into 5 lines as given below:
MDQ[11]
MDQ[12]
MDQ[13]
MDQ[14]
MDQ[15]

kadhan · 09-02-2011, 08:45 AM

Quote:

Originally Posted by David the H.

It looks more to me like he wants to take the bracketed parts as a sequence of integers, and expand them into individual entries based on that sequence. [11:15] becomes five single lines with [11], [12], [13], [14], and [15]. But the whole problem is still not entirely clear.

You say it's a "big file", but you only showed us one line. Does every line have sequences like that, or are there other lines interspersed? Are the sequences all single digit increments? Are the patterns regular or irregular in any way? Do you want the output to be a single file, with only the expanded lines, or with the expanded lines following the unexpanded lines, or what?

Please clarify.

And please use [code][/code] tags around your code (including input and output text), to preserve formatting and to improve readability.

Yes. Its a big file, i showed here only one line and every other lines has same sequence.

input file is given below:

Code:

 
MDQ[10:15],IO,MDQ[10:15],,,,MDQ[10:15],TPR_AAWD[10:15],,,DATA[11:16],DATA[11:16],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[16],IO,MDQ[16],,,,MDQ[16],TPR_CLK_SYNC,,,DATA[16],DATA[16],IO,,1,,GVDD (1.5V/1.35V) SSTL15

output file become :

Code:

MDQ[10],IO,MDQ[10],,,,MDQ[10],TPR_AAWD[10],,,DATA[11],DATA[11],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[11],IO,MDQ[11],,,,MDQ[11],TPR_AAWD[11],,,DATA[12],DATA[12],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[12],IO,MDQ[12],,,,MDQ[12],TPR_AAWD[12],,,DATA[13],DATA[13],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[13],IO,MDQ[13],,,,MDQ[13],TPR_AAWD[13],,,DATA[14],DATA[14],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[14],IO,MDQ[14],,,,MDQ[14],TPR_AAWD[14],,,DATA[15],DATA[15],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[15],IO,MDQ[15],,,,MDQ[15],TPR_AAWD[15],,,DATA[16],DATA[16],IO,,16,,GVDD (1.5V/1.35V) SSTL15
MDQ[16],IO,MDQ[16],,,,MDQ[16],TPR_CLK_SYNC,,,DATA[16],DATA[16],IO,,1,,GVDD (1.5V/1.35V) SSTL15

grail · 09-02-2011, 08:46 AM

So each line the difference between the ranges for each item on a line will always be the same?
ie. each MDQ in this line have a range size of 5

Proud · 09-02-2011, 08:50 AM

Yes, should the first range dictate the number of rows, are the other ranges redundant (only their first number need be read?) or do they dictate just the pattern/stepping for values in those rows&columns, or can they cause more rows to be added, etc?

ta0kira · 09-02-2011, 08:51 AM

Quote:

Originally Posted by Proud

I see integers not decimals.
To clarify, you want to strip the bits between : and ], aka take turn [nn:nn] into [nn]?

Maybe "decimal" (vs. "hexadecimal") integers? Technically not incorrect, but definitely misleading.
Kevin Barry

grail · 09-02-2011, 10:11 AM

Well going on the assumption that Proud and I have made:

Code:

#!/usr/bin/awk -f

BEGIN{ FS = "[][]" }

/:/{
    n = 1
    j = 1
    diff = 0
    for(i = 1; i <= NF; i++){
        if($i ~ /:/){
            split($i, a, ":")
            start[j++] = a[1]
            if( ! diff )
                diff = a[2] - a[1] + 1
        }
        else
            pieces[n++] = $i
    }
    for(x = 1; x <= diff; x++){
        line = ""
        for(y = 1; y < (n-1);y++)
            line = sprintf("%s%s[%d]", line, pieces[y], start[y]++)

        line = sprintf("%s%s", line, pieces[n-1])

        print line
    }
}

!/:/

I am sure a perl guru out there will have something to offer

kurumi · 09-02-2011, 10:21 PM

Ruby(1.9+)

Code:

#!/usr/bin/env ruby
range=[]
File.open("file").each do |line|
    if line[/^MDQ\[(\d+):(\d+)\]/]
        num=line.scan(/DATA\[(\d+):/)[0][0]
        line.scan(/^MDQ\[(\d+):(\d+)\]/){|x,y| range=(x..y).to_a }
        range.each do |i|
            line.gsub!(/(MDQ|TPR_AAWD)\[.[^\]]*?\]/,"\\1[#{i}]")
            puts line.gsub!(/DATA\[.[^\]]*?\]/,"DATA[#{num}]")
            num.succ!
        end
    else
        puts line
    end
end

grail · 09-03-2011, 06:46 AM

That is sweet kurumi ... haven't worked it all out but very nice

kadhan · 09-05-2011, 01:51 AM

Hi Grail and Kurumi,

Your solutions are working fine for me.... Thanks a lot for your help.....
Grail, Can you please explain the FS = "[][]" in the first line of the awk script.

Regards,
Kadhan.

Proud · 09-05-2011, 02:46 AM

FS = "[][]"
I belive this is setting the field separator to the regular expression [][] which is the character set containing ] or [. The definition of a character set is the outer [] and I think the double quotes might be superfluous.
FS = [\]\[] might be clearer, you can test if it works.

grail · 09-05-2011, 02:52 AM

The quotes are required as it is a computed regex.

kadhan · 09-05-2011, 03:12 AM

I got your point... Thanks

Regards,
Kadhan