[SOLVED] Help fixing bash/awk code to print letters based on counter

grail · 10-19-2011, 07:43 AM

hmmm .... I think I follow but now that I see we are disassembling at a much higher level a couple of xml documents I would probably think a language like Perl which has a parser may
be more of what you are looking for. I will have a tinker and see what I can come up with but I only see this growing out of control.

So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)

Perseus · 10-19-2011, 05:27 PM

Hi grail again, thabks for your help.

Quote:

Originally Posted by grail

.... I would probably think a language like Perl which has a parser may
be more of what you are looking for.

Use Spreadsheet::WriteExcel::FromXML was my first try, but after
crash with many issues like don't know the structure that perl module expects or receive some other errors I gave up, and even I'm a newbie in shell script, I decided to do the conversion using bash that I understand a little bit more that try to decipher how it works that CPAN module. One more thing is that this perl module only converts from xml to xls files, but
doesn't convert from xml to xlsx files.

Quote:

Originally Posted by grail

I will have a tinker and see what I can come up with but I only see this growing out of control.

I know now I'm asking for more help, because my original idea was do each block with bash and going step by step until get the complete output, only after to see your unique AWK code, I realized that is a much better and faster option to get the result. Really is not much than that, it would be my last help petition.

Quote:

Originally Posted by grail

So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)

Like I said I'm trying to generate an Excel xlsx file containing 2 sheets from 2 xml files.

If you look inside an xlsx file (that actually is an OpenXML file format), you'll see the following mandatory files that conform the xlsx file (mandatory are highlighted in red):
$ find . -type f

./docProps/core.xml
./xl/sharedStrings.xml
./xl/workbook.xml
./xl/worksheets/sheet1.xml
./xl/worksheets/sheet2.xml
./xl/_rels/workbook.xml.rels
./[Content_Types].xml
./_rels/.rels

Well, I was able to write a combination of bash/awk scripts to generate all of them, except "sheet2.xml" that is the very last part of my objetive to generate the xlsx file. The sheet2.xml is what we're (or your in a higher percentage) trying to do and the complete output of sheet2.xml is:

Code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
	xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
	xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" 
	xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
	<dimension ref="A1:C6"/> 
	<cols><col min="3" max="3" width="12" bestFit="1" customWidth="1"/></cols> 
	<sheetData>
		<row r="1" >
			<c r="B1" t="s"><v>8</v></c>
			<c r="C1" t="s"><v>13</v></c>
		</row>
		<row r="2" >
			<c r="A2" t="s"><v>17</v></c>
			<c r="B2"><v>1</v></c>
			<c r="C2"><v>2</v></c>
		</row>
		<row r="3" >
			<c r="A3" t="s"><v>18</v></c>
			<c r="B3"><v>3</v></c>
			<c r="C3"><v>0</v></c>
		</row>
		<row r="4" >
			<c r="A4" t="s"><v>19</v></c>
			<c r="B4"><v>2</v></c>
			<c r="C4"><v>7</v></c>
		</row>
		<row r="5" >
			<c r="A5" t="s"><v>20</v></c>
			<c r="B5"><v>1</v></c>
			<c r="C5"><v>4</v></c>
		</row>
		<row r="6" >
			<c r="A6" t="s"><v>21</v></c>
			<c r="B6"><v>1</v></c>
			<c r="C6"><v>5</v></c>
		</row>
	</sheetData>
	<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
</worksheet>

(Values in red change from file to file)
In this sheet2.xml,Last Cell in last block is C6
In this sheet2.xml, number of letters in 1rst block is 2, then min=2+1=3 and max=2+1=3

I hope I've done a good explanation.

Many thanks for any help, really.

Perseus · 10-20-2011, 05:24 AM

Hi again,

The code below has the recursive function and prints the first block, but if I try to add a new loop to get
blocks 2, 3, ..N after first block "for (s in save).." receive errors. The same if I try to add this 2nd
for (s1 in save1) inside END{} statement. I'm not sure what is wrong.

Code:

L=2
awk -v Z="$L" '#BEGIN{ Z = L; W=1 }
##############################################################################		
function recurse(num,   ret)
{	mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}
##############################################################################	
FNR == NR && /report id/{
    save[gensub(/^[^"]+"|".+$/,"","g")]
	save1[gensub(/^[^"].+">|<.+$/,"","g")]
    next}
{for(s in save){
	if($0 ~ s){
		if( Z==2 )
			printf("<row r=\"1\" >\n\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
		else if ( Z > 2 && Z < length(save))
			printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
		else 	
			printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n</row>" , recurse(Z) , FNR - 3);
			Z++
		}		
	}
}' file2 file3
#*The above code works and for loop in blue works too, but if I add the for loop shown 
below in red, the whole code doesn't work.

#END{asorti(save1);for(s1=1;s1<=length(save1);s1++){
	#if($0 ~ save1(s1)){
	#	if( W==1 )
	#		printf("<row r=\"%d\" >\n\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n" ,W+1,W+1, FNR - 3);
	#	else if ( W > 1 && W < length(save1))
	#		printf("\t<c r=\"%s%d\"><v>%d</v></c>\n" , recurse(W), W+1, FNR - 3);
	#	else 	
	#		printf("\t<c r=\"%s%d\"><v>%d</v></c>\n</row>" , recurse(W), W+1, FNR - 3);
	#	}
	#W++		
	#}' file2 file3

Thanks for any help.

Perseus · 10-24-2011, 08:33 PM

Hi again,

After trying and searching and learning a little bit more of awk and bash I've been able to get
the desired output.

Many thanks to David and grail who helped me to learn new things related my question.

Grettings

grail · 10-25-2011, 09:52 AM

Ok ... based on information from your other post as well as this one, I have come up with the following. Of course I can not validate the output as I am not
100% sure what it should look like, so you will have to advise:

Code:

#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{  reportsList[r[3]] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(length(reports_id)+1),length(reportsList))
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""length(reports_id)+1"\" max=\""length(reports_id)+1"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i in reports_id) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=2;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i,reportsList[sortedReportsList[i-1]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j in reportValues)
            if( j ~ sortedReportsList[i-1])
                printf("\t\t\t<c r=\"%s%d\"><v>"reportValues[j]"</v></c>\n",recurse(LetterNumber++),i)

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

Let me know if any of it is too confusing

Perseus · 10-25-2011, 03:29 PM

Hi grail,

Thanks for take your time, really thanks.

Well, the output I need is close to the output of your code, only need some adjustments in the order the values are being printed. You can test your code using the samples of file2 and file3 shown below:
file2:

Code:

<file title="Title 1 and 2">
    <report id="No. 1234">
      <f v="0">Report x</f>
      <f v="3">Report t</f>
      <f v="1">Report o</f>
    </report>
    <report id="No. 457">
      <f v="4">Report x</f>
      <f v="5">Report y</f>
      <f v="7">Report u</f>
      <f v="2">Report o</f>
    </report>
</file>

file3:

Code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">        
        <si><t>Report</t></si>
        <si><t>Tittle</t></si>
        <si><t>Number</t></si>
        <si><t>Address</t></si>
        <si><t>Date</t></si>
        <si><t>Description</t></si>
        <si><t>Title 1 and 2</t></si>
        <si><t>Title 1</t></si>
        <si><t>No. 1234</t></si>
        <si><t>Address 1</t></si>
        <si><t>October 07, 2009</t></si>
        <si><t>Some text</t></si>
        <si><t>Title 2</t></si>
        <si><t>No. 457</t></si>
        <si><t>Address 2</t></si>
        <si><t>October 15, 2009</t></si>
        <si><t>Some text</t></si>
        <si><t>Report o</t></si>
        <si><t>Report t</t></si>
        <si><t>Report u</t></si>
        <si><t>Report x</t></si>
        <si><t>Report y</t></si>
</sst>

Referring to the output, even when the text in red is variable, I can handle that. If you want, only focus in the output of the complex part (the one in blue)

As you can see in the output,
1-) The order of blocks 2-6 is the same order as the "Reports" appear in file3 (in file3 they are in alphabetical order, Report o, Report t, Report u, etc...)
2-) Whithin each block in the output, the order of appearence of the values is based in the order of appeareance of each "report id" in file2 (first report id=No. 1234, 2nd report id=No. 457 etc.)
3-) Some other details are as comments within the output (in green).

The output desired is:

Code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
	xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
	xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac"
	xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
	<dimension ref="A1:C6"/>
	<sheetViews><sheetView workbookViewId="0"/></sheetViews>
	<sheetFormatPr baseColWidth="10" defaultRowHeight="15" customHeight="1" x14ac:dyDescent="0.25"/>
	<cols><col min="3" max="3" width="11.85546875" bestFit="1" customWidth="1"/></cols>
	<sheetData>
	<row r="1" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="B1" t="s"><v>8</v></c>  # 8 because report id="No. 1234" is in line 11 in file3, then 11-3=8
		<c r="C1" t="s"><v>13</v></c> # 13 because report id="No. 457" is in line 16 in file3, then 11-3=8
	</row>
	<row r="2" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A2" t="s"><v>17</v></c> # 17 because is "Report o" is in line 20 in file3, then 20-3=17
		<c r="B2"><v>1</v></c> # 1 because "Report o" has value 1 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C2"><v>2</v></c> # 2 because "Report o" has value 2 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="3" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A3" t="s"><v>18</v></c>
		<c r="B3"><v>3</v></c> # 3 because "Report t" has value 3 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C3"><v>0</v></c> # 0 because "Report t" doesn't appear in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="4" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A4" t="s"><v>19</v></c>
		<c r="B4"><v>0</v></c> # 0 because "Report u" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C4"><v>7</v></c> # 7 because "Report u" has value 7 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="5" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A5" t="s"><v>20</v></c>
		<c r="B5"><v>0</v></c> # 0 because "Report x" has value 0 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C5"><v>4</v></c> # 4 because "Report x" has value 4 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="6" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A6" t="s"><v>21</v></c>
		<c r="B6"><v>0</v></c> # 0 because "Report y" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C6"><v>5</v></c> # 5 because "Report y" has value 5 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	</sheetData>
	<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
</worksheet>

I hope is not too confusing, many thanks for your help.

PS: The array in bash I have in the other code is related to an awk routine that set to zero, in the output, the value of each "report Z" that doesn't appear in some "report id" block in file2 e.g "Report u" doesn't appear in 1st block of file2, because of that in the output for that "report id" is "0" (see C3, B4 or B6 in the output). Do that I found is the more complex part.

Grettings

grail · 10-26-2011, 03:47 AM

So i found that you get an extra row as your original ReportsList has a pipe at the end which creates an additional Report X which does not exist. Easy enough to over come
with a simple +1 and having the counter start at 1 instead of 2 which threw me at first anyways.

Code:

#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(length(reports_id)+1),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""length(reports_id)+1"\" max=\""length(reports_id)+1"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i in reports_id) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    #for ( i=2;i<=length(sortedReportsList);i++ ) {
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j in reports_id){
            val = 0
            if( reportValues[j,sortedReportsList[i]] )
                val = reportValues[j,sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}

Perseus · 10-26-2011, 04:24 AM

Hi grail,

I must think it work almost perfect. Your code is fast. The only issue that is printing the values within each block in different order, e.g for the first block should be 8,13 and the code is printing 13,8. The same for the other blocks if you compare the output I show in my previous post.

Thanks for all help so far. It's great!

grail · 10-26-2011, 06:02 AM

hmmm ... not sure I follow? I have run your code, my code and compare both to your desired output (minus the comments) and I seem to get the same??

Perseus · 10-26-2011, 06:06 PM

Hi again grail,

When I run the code I get the output on the right, the order of appearance of the values should be as in the output on the left:

Code:

        Output desired:                      Current output:
<row r="1" >                        <row r="1" >
    <c r="B1" t="s"><v>8</v></c>	<c r="C1" t="s"><v>13</v></c>
    <c r="C1" t="s"><v>13</v></c>	<c r="B1" t="s"><v>8</v></c>
</row>                              </row>
<row r="2" >                        <row r="2" >
    <c r="A2" t="s"><v>17</v></c>       <c r="A2" t="s"><v>17</v></c>
    <c r="B2"><v>1</v></c>              <c r="B2"><v>2</v></c>
    <c r="C2"><v>2</v></c>              <c r="C2"><v>1</v></c>
</row>                              </row>
<row r="3" >                        <row r="3" >
    <c r="A3" t="s"><v>18</v></c>       <c r="A3" t="s"><v>18</v></c>
    <c r="B3"><v>3</v></c>              <c r="B3"><v>0</v></c>
    <c r="C3"><v>0</v></c>              <c r="C3"><v>3</v></c>
</row>                              </row>
<row r="4" >                        <row r="4" >
    <c r="A4" t="s"><v>19</v></c>       <c r="A4" t="s"><v>19</v></c>
    <c r="B4"><v>0</v></c>              <c r="B4"><v>7</v></c>
    <c r="C4"><v>7</v></c>              <c r="C4"><v>0</v></c>
</row>                              </row>
<row r="5" >                        <row r="5" >
    <c r="A5" t="s"><v>20</v></c>       <c r="A5" t="s"><v>20</v></c>
    <c r="B5"><v>0</v></c>              <c r="B5"><v>4</v></c>
    <c r="C5"><v>4</v></c>              <c r="C5"><v>0</v></c>
</row>                              </row>
<row r="6" >                        <row r="6" >
    <c r="A6" t="s"><v>21</v></c>       <c r="A6" t="s"><v>21</v></c>
    <c r="B6"><v>0</v></c>              <c r="B6"><v>5</v></c>
    <c r="C6"><v>5</v></c>              <c r="C6"><v>0</v></c>
</row>                              </row>

Many thanks for your help so far.

Regards

grail · 10-27-2011, 12:22 AM

See I get all the output on the left

I think the issue is that when doing an 'in' call to an array the order is not set in stone

(so I was probably just lucky).

Haven't had a chance to test but highlighted in red are my latest changes:

Code:

#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
    cnt = 0
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[++cnt] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)
    id_len = cnt + 1

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(id_len),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""id_len"\" max=\""id_len"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i = 1;i <= cnt;i++) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j = 1; j <= cnt; j++){
            val = 0
            if( reportValues[reports_id[j],sortedReportsList[i]] )
                val = reportValues[reports_id[j],sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}

Perseus · 10-27-2011, 12:39 AM

Hi grail, I hope your fine.

Thanks for your help and time.

Quote:

Originally Posted by grail

See I get all the output on the left

I think the issue is that when doing an 'in' call to an array the order is not set in stone

(so I was probably just lucky).

Yes, because of that I tried to set all arrays indexed numerically in ascending order in order to print at the end using for(i=1;i<=N;i++), some troubles come if the arrays are printed using for i in Array.

Regarding the code, now some bad news :-(, only the first block contain values, in the other blocks all values are printed
as zero.

Code:

               </row>
               <row r="2" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A2" t="s"><v>17</v></c>
                       <c r="B2"><v>0</v></c>
                       <c r="C2"><v>0</v></c>
               </row>
               <row r="3" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A3" t="s"><v>18</v></c>
                       <c r="B3"><v>0</v></c>
                       <c r="C3"><v>0</v></c>
               </row>
               <row r="4" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A4" t="s"><v>19</v></c>
                       <c r="B4"><v>0</v></c>
                       <c r="C4"><v>0</v></c>
               </row>
               <row r="5" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A5" t="s"><v>20</v></c>
                       <c r="B5"><v>0</v></c>
                       <c r="C5"><v>0</v></c>
               </row>
               <row r="6" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A6" t="s"><v>21</v></c>
                       <c r="B6"><v>0</v></c>
                       <c r="C6"><v>0</v></c>
               </row>

grail · 10-27-2011, 01:40 AM

Take 32:

Code:

#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
    cnt = 0
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
	ordered_id[++cnt] = id[2]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)
    id_len = cnt + 1

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(id_len),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""id_len"\" max=\""id_len"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i = 1;i <= cnt;i++) print reports_id[ordered_id[i]]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j = 1; j <= cnt; j++){
            val = 0
            if( reportValues[ordered_id[j],sortedReportsList[i]] )
                val = reportValues[ordered_id[j],sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}

Nominal Animal · 10-27-2011, 02:03 AM

grail, there is no need for a recursive recurse function:

Code:

function recurse(num) {
    num--
    if (num < 0)
        return "";
    ret = sprintf("%c", 65 + int(num) % 26)
    num = int(num / 26)
    while (num > 0) {
        ret = sprintf("%c", 64 + (num % 26)) ret
        num = int(num / 26)
    }
    return ret;
}

will produce A, B, C, ..., Z, AA, AB, AC, ..., AZ, BA, BB, ..., YZ (for num == 1 to 676 ). If you want zero-based indexing, just remove the num-- line.

This one will work for all positive integers (1, 2, ...):

Code:

function recurse(num) {
    ret = ""
    while (num > 0) {
        num--
        ret = sprintf("%c", 65 + (num % 26)) ret
        num = int(num / 26)
    }
    return ret;
}

Perseus · 10-27-2011, 02:24 AM

Jajaja "Take 32"? jaja.

Well, it seems it works 99.999999999999999999999999999% correctly, I've tested with a file that contain 86 Report id's
and the order of values looks correct but the problem is with the sequence of letters generated by recurse function, see this:

Code:

<c r="V1" t="s"><v>108</v></c>
<c r="W1" t="s"><v>113</v></c>
<c r="X1" t="s"><v>118</v></c>
<c r="Y1" t="s"><v>123</v></c>
<c r="A1" t="s"><v>128</v></c> # Instead of A1 should be Z1 
<c r="AA1" t="s"><v>133</v></c>
<c r="AB1" t="s"><v>138</v></c>

or this:

<c r="AX1" t="s"><v>248</v></c>
<c r="AY1" t="s"><v>253</v></c>
<c r="B1" t="s"><v>258</v></c> # Instead of B1 should be AZ1
<c r="BA1" t="s"><v>263</v></c>

or this:

<c r="BX1" t="s"><v>378</v></c>
<c r="BY1" t="s"><v>383</v></c>
<c r="C1" t="s"><v>388</v></c> # Instead of C1 should be BZ1
<c r="CA1" t="s"><v>393</v></c>

It generates until Y, is not printing Z and instead Z prints A1, B1, C1 etc.

I've tried changing in recurse function 27 intead of 26 and prints the Z's but after the Z's prints A1, B1, C1.. etc too.

How to solve this grail?

Thanks a lot again