LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-19-2011, 07:43 AM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193

hmmm .... I think I follow but now that I see we are disassembling at a much higher level a couple of xml documents I would probably think a language like Perl which has a parser may
be more of what you are looking for. I will have a tinker and see what I can come up with but I only see this growing out of control.

So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)
 
Old 10-19-2011, 05:27 PM   #17
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi grail again, thabks for your help.
Quote:
Originally Posted by grail View Post
.... I would probably think a language like Perl which has a parser may
be more of what you are looking for.
Use Spreadsheet::WriteExcel::FromXML was my first try, but after
crash with many issues like don't know the structure that perl module expects or receive some other errors I gave up, and even I'm a newbie in shell script, I decided to do the conversion using bash that I understand a little bit more that try to decipher how it works that CPAN module. One more thing is that this perl module only converts from xml to xls files, but
doesn't convert from xml to xlsx files.
Quote:
Originally Posted by grail View Post
I will have a tinker and see what I can come up with but I only see this growing out of control.
I know now I'm asking for more help, because my original idea was do each block with bash and going step by step until get the complete output, only after to see your unique AWK code, I realized that is a much better and faster option to get the result. Really is not much than that, it would be my last help petition.

Quote:
Originally Posted by grail View Post
So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)
Like I said I'm trying to generate an Excel xlsx file containing 2 sheets from 2 xml files.

If you look inside an xlsx file (that actually is an OpenXML file format), you'll see the following mandatory files that conform the xlsx file (mandatory are highlighted in red):
$ find . -type f

./docProps/core.xml
./xl/sharedStrings.xml
./xl/workbook.xml
./xl/worksheets/sheet1.xml
./xl/worksheets/sheet2.xml
./xl/_rels/workbook.xml.rels
./[Content_Types].xml
./_rels/.rels

Well, I was able to write a combination of bash/awk scripts to generate all of them, except "sheet2.xml" that is the very last part of my objetive to generate the xlsx file. The sheet2.xml is what we're (or your in a higher percentage) trying to do and the complete output of sheet2.xml is:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
	xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
	xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" 
	xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
	<dimension ref="A1:C6"/> 
	<cols><col min="3" max="3" width="12" bestFit="1" customWidth="1"/></cols> 
	<sheetData>
		<row r="1" >
			<c r="B1" t="s"><v>8</v></c>
			<c r="C1" t="s"><v>13</v></c>
		</row>
		<row r="2" >
			<c r="A2" t="s"><v>17</v></c>
			<c r="B2"><v>1</v></c>
			<c r="C2"><v>2</v></c>
		</row>
		<row r="3" >
			<c r="A3" t="s"><v>18</v></c>
			<c r="B3"><v>3</v></c>
			<c r="C3"><v>0</v></c>
		</row>
		<row r="4" >
			<c r="A4" t="s"><v>19</v></c>
			<c r="B4"><v>2</v></c>
			<c r="C4"><v>7</v></c>
		</row>
		<row r="5" >
			<c r="A5" t="s"><v>20</v></c>
			<c r="B5"><v>1</v></c>
			<c r="C5"><v>4</v></c>
		</row>
		<row r="6" >
			<c r="A6" t="s"><v>21</v></c>
			<c r="B6"><v>1</v></c>
			<c r="C6"><v>5</v></c>
		</row>
	</sheetData>
	<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
</worksheet>
(Values in red change from file to file)
In this sheet2.xml,Last Cell in last block is C6
In this sheet2.xml, number of letters in 1rst block is 2, then min=2+1=3 and max=2+1=3


I hope I've done a good explanation.

Many thanks for any help, really.
 
Old 10-20-2011, 05:24 AM   #18
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi again,

The code below has the recursive function and prints the first block, but if I try to add a new loop to get
blocks 2, 3, ..N after first block "for (s in save).." receive errors. The same if I try to add this 2nd
for (s1 in save1) inside END{} statement. I'm not sure what is wrong.
Code:
L=2
awk -v Z="$L" '#BEGIN{ Z = L; W=1 }
##############################################################################		
function recurse(num,   ret)
{	mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}
##############################################################################	
FNR == NR && /report id/{
    save[gensub(/^[^"]+"|".+$/,"","g")]
	save1[gensub(/^[^"].+">|<.+$/,"","g")]
    next}
{for(s in save){
	if($0 ~ s){
		if( Z==2 )
			printf("<row r=\"1\" >\n\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
		else if ( Z > 2 && Z < length(save))
			printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
		else 	
			printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n</row>" , recurse(Z) , FNR - 3);
			Z++
		}		
	}
}' file2 file3
#*The above code works and for loop in blue works too, but if I add the for loop shown 
below in red, the whole code doesn't work.

#END{asorti(save1);for(s1=1;s1<=length(save1);s1++){
	#if($0 ~ save1(s1)){
	#	if( W==1 )
	#		printf("<row r=\"%d\" >\n\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n" ,W+1,W+1, FNR - 3);
	#	else if ( W > 1 && W < length(save1))
	#		printf("\t<c r=\"%s%d\"><v>%d</v></c>\n" , recurse(W), W+1, FNR - 3);
	#	else 	
	#		printf("\t<c r=\"%s%d\"><v>%d</v></c>\n</row>" , recurse(W), W+1, FNR - 3);
	#	}
	#W++		
	#}' file2 file3
Thanks for any help.

Last edited by Perseus; 10-20-2011 at 05:27 AM.
 
Old 10-24-2011, 08:33 PM   #19
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi again,

After trying and searching and learning a little bit more of awk and bash I've been able to get
the desired output.

Many thanks to David and grail who helped me to learn new things related my question.

Grettings
 
Old 10-25-2011, 09:52 AM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Ok ... based on information from your other post as well as this one, I have come up with the following. Of course I can not validate the output as I am not
100% sure what it should look like, so you will have to advise:
Code:
#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{  reportsList[r[3]] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(length(reports_id)+1),length(reportsList))
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""length(reports_id)+1"\" max=\""length(reports_id)+1"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i in reports_id) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=2;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i,reportsList[sortedReportsList[i-1]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j in reportValues)
            if( j ~ sortedReportsList[i-1])
                printf("\t\t\t<c r=\"%s%d\"><v>"reportValues[j]"</v></c>\n",recurse(LetterNumber++),i)

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}
Let me know if any of it is too confusing
 
Old 10-25-2011, 03:29 PM   #21
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi grail,

Thanks for take your time, really thanks.

Well, the output I need is close to the output of your code, only need some adjustments in the order the values are being printed. You can test your code using the samples of file2 and file3 shown below:
file2:
Code:
<file title="Title 1 and 2">
    <report id="No. 1234">
      <f v="0">Report x</f>
      <f v="3">Report t</f>
      <f v="1">Report o</f>
    </report>
    <report id="No. 457">
      <f v="4">Report x</f>
      <f v="5">Report y</f>
      <f v="7">Report u</f>
      <f v="2">Report o</f>
    </report>
</file>
file3:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">        
        <si><t>Report</t></si>
        <si><t>Tittle</t></si>
        <si><t>Number</t></si>
        <si><t>Address</t></si>
        <si><t>Date</t></si>
        <si><t>Description</t></si>
        <si><t>Title 1 and 2</t></si>
        <si><t>Title 1</t></si>
        <si><t>No. 1234</t></si>
        <si><t>Address 1</t></si>
        <si><t>October 07, 2009</t></si>
        <si><t>Some text</t></si>
        <si><t>Title 2</t></si>
        <si><t>No. 457</t></si>
        <si><t>Address 2</t></si>
        <si><t>October 15, 2009</t></si>
        <si><t>Some text</t></si>
        <si><t>Report o</t></si>
        <si><t>Report t</t></si>
        <si><t>Report u</t></si>
        <si><t>Report x</t></si>
        <si><t>Report y</t></si>
</sst>
Referring to the output, even when the text in red is variable, I can handle that. If you want, only focus in the output of the complex part (the one in blue)

As you can see in the output,
1-) The order of blocks 2-6 is the same order as the "Reports" appear in file3 (in file3 they are in alphabetical order, Report o, Report t, Report u, etc...)
2-) Whithin each block in the output, the order of appearence of the values is based in the order of appeareance of each "report id" in file2 (first report id=No. 1234, 2nd report id=No. 457 etc.)
3-) Some other details are as comments within the output (in green).

The output desired is:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
	xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
	xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac"
	xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
	<dimension ref="A1:C6"/>
	<sheetViews><sheetView workbookViewId="0"/></sheetViews>
	<sheetFormatPr baseColWidth="10" defaultRowHeight="15" customHeight="1" x14ac:dyDescent="0.25"/>
	<cols><col min="3" max="3" width="11.85546875" bestFit="1" customWidth="1"/></cols>
	<sheetData>
	<row r="1" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="B1" t="s"><v>8</v></c>  # 8 because report id="No. 1234" is in line 11 in file3, then 11-3=8
		<c r="C1" t="s"><v>13</v></c> # 13 because report id="No. 457" is in line 16 in file3, then 11-3=8
	</row>
	<row r="2" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A2" t="s"><v>17</v></c> # 17 because is "Report o" is in line 20 in file3, then 20-3=17
		<c r="B2"><v>1</v></c> # 1 because "Report o" has value 1 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C2"><v>2</v></c> # 2 because "Report o" has value 2 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="3" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A3" t="s"><v>18</v></c>
		<c r="B3"><v>3</v></c> # 3 because "Report t" has value 3 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C3"><v>0</v></c> # 0 because "Report t" doesn't appear in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="4" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A4" t="s"><v>19</v></c>
		<c r="B4"><v>0</v></c> # 0 because "Report u" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C4"><v>7</v></c> # 7 because "Report u" has value 7 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="5" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A5" t="s"><v>20</v></c>
		<c r="B5"><v>0</v></c> # 0 because "Report x" has value 0 in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C5"><v>4</v></c> # 4 because "Report x" has value 4 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	<row r="6" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
		<c r="A6" t="s"><v>21</v></c>
		<c r="B6"><v>0</v></c> # 0 because "Report y" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
		<c r="C6"><v>5</v></c> # 5 because "Report y" has value 5 in 2nd block of file2. 2nd block is report id="No. 457"
	</row>
	</sheetData>
	<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
</worksheet>
I hope is not too confusing, many thanks for your help.

PS: The array in bash I have in the other code is related to an awk routine that set to zero, in the output, the value of each "report Z" that doesn't appear in some "report id" block in file2 e.g "Report u" doesn't appear in 1st block of file2, because of that in the output for that "report id" is "0" (see C3, B4 or B6 in the output). Do that I found is the more complex part.

Grettings
 
Old 10-26-2011, 03:47 AM   #22
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
So i found that you get an extra row as your original ReportsList has a pipe at the end which creates an additional Report X which does not exist. Easy enough to over come
with a simple +1 and having the counter start at 1 instead of 2 which threw me at first anyways.

Code:
#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(length(reports_id)+1),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""length(reports_id)+1"\" max=\""length(reports_id)+1"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i in reports_id) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    #for ( i=2;i<=length(sortedReportsList);i++ ) {
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j in reports_id){
            val = 0
            if( reportValues[j,sortedReportsList[i]] )
                val = reportValues[j,sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}
 
Old 10-26-2011, 04:24 AM   #23
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi grail,

I must think it work almost perfect. Your code is fast. The only issue that is printing the values within each block in different order, e.g for the first block should be 8,13 and the code is printing 13,8. The same for the other blocks if you compare the output I show in my previous post.

Thanks for all help so far. It's great!
 
Old 10-26-2011, 06:02 AM   #24
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
hmmm ... not sure I follow? I have run your code, my code and compare both to your desired output (minus the comments) and I seem to get the same??
 
Old 10-26-2011, 06:06 PM   #25
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi again grail,

When I run the code I get the output on the right, the order of appearance of the values should be as in the output on the left:
Code:
        Output desired:                      Current output:
<row r="1" >                        <row r="1" >
    <c r="B1" t="s"><v>8</v></c>	<c r="C1" t="s"><v>13</v></c>
    <c r="C1" t="s"><v>13</v></c>	<c r="B1" t="s"><v>8</v></c>
</row>                              </row>
<row r="2" >                        <row r="2" >
    <c r="A2" t="s"><v>17</v></c>       <c r="A2" t="s"><v>17</v></c>
    <c r="B2"><v>1</v></c>              <c r="B2"><v>2</v></c>
    <c r="C2"><v>2</v></c>              <c r="C2"><v>1</v></c>
</row>                              </row>
<row r="3" >                        <row r="3" >
    <c r="A3" t="s"><v>18</v></c>       <c r="A3" t="s"><v>18</v></c>
    <c r="B3"><v>3</v></c>              <c r="B3"><v>0</v></c>
    <c r="C3"><v>0</v></c>              <c r="C3"><v>3</v></c>
</row>                              </row>
<row r="4" >                        <row r="4" >
    <c r="A4" t="s"><v>19</v></c>       <c r="A4" t="s"><v>19</v></c>
    <c r="B4"><v>0</v></c>              <c r="B4"><v>7</v></c>
    <c r="C4"><v>7</v></c>              <c r="C4"><v>0</v></c>
</row>                              </row>
<row r="5" >                        <row r="5" >
    <c r="A5" t="s"><v>20</v></c>       <c r="A5" t="s"><v>20</v></c>
    <c r="B5"><v>0</v></c>              <c r="B5"><v>4</v></c>
    <c r="C5"><v>4</v></c>              <c r="C5"><v>0</v></c>
</row>                              </row>
<row r="6" >                        <row r="6" >
    <c r="A6" t="s"><v>21</v></c>       <c r="A6" t="s"><v>21</v></c>
    <c r="B6"><v>0</v></c>              <c r="B6"><v>5</v></c>
    <c r="C6"><v>5</v></c>              <c r="C6"><v>0</v></c>
</row>                              </row>
Many thanks for your help so far.

Regards
 
Old 10-27-2011, 12:22 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
See I get all the output on the left

I think the issue is that when doing an 'in' call to an array the order is not set in stone (so I was probably just lucky).

Haven't had a chance to test but highlighted in red are my latest changes:
Code:
#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
    cnt = 0
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[++cnt] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)
    id_len = cnt + 1

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(id_len),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""id_len"\" max=\""id_len"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i = 1;i <= cnt;i++) print reports_id[i]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j = 1; j <= cnt; j++){
            val = 0
            if( reportValues[reports_id[j],sortedReportsList[i]] )
                val = reportValues[reports_id[j],sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}
 
Old 10-27-2011, 12:39 AM   #27
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hi grail, I hope your fine.

Thanks for your help and time.

Quote:
Originally Posted by grail View Post
See I get all the output on the left

I think the issue is that when doing an 'in' call to an array the order is not set in stone (so I was probably just lucky).
Yes, because of that I tried to set all arrays indexed numerically in ascending order in order to print at the end using for(i=1;i<=N;i++), some troubles come if the arrays are printed using for i in Array.

Regarding the code, now some bad news :-(, only the first block contain values, in the other blocks all values are printed
as zero.
Code:
               </row>
               <row r="2" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A2" t="s"><v>17</v></c>
                       <c r="B2"><v>0</v></c>
                       <c r="C2"><v>0</v></c>
               </row>
               <row r="3" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A3" t="s"><v>18</v></c>
                       <c r="B3"><v>0</v></c>
                       <c r="C3"><v>0</v></c>
               </row>
               <row r="4" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A4" t="s"><v>19</v></c>
                       <c r="B4"><v>0</v></c>
                       <c r="C4"><v>0</v></c>
               </row>
               <row r="5" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A5" t="s"><v>20</v></c>
                       <c r="B5"><v>0</v></c>
                       <c r="C5"><v>0</v></c>
               </row>
               <row r="6" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
                       <c r="A6" t="s"><v>21</v></c>
                       <c r="B6"><v>0</v></c>
                       <c r="C6"><v>0</v></c>
               </row>

Last edited by Perseus; 10-27-2011 at 12:40 AM.
 
Old 10-27-2011, 01:40 AM   #28
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Take 32:
Code:
#!/usr/bin/awk -f

BEGIN{
    FS = "[<>]+"
    LetterNumber = 2
    cnt = 0
}

FNR == NR{
    if(/id/){
        split($0,id,/"/)
        reports_id[id[2]]
	ordered_id[++cnt] = id[2]
    }

    if(/Report ./){
        split($0,r)
        reportsList[r[3]]
        split(r[2],n,/"/)
        reportValues[id[2],r[3]] = n[2]
    }

    next
}

$4 in reports_id{
    reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}

$4 in reportsList{ reportsList[$4] = FNR - 3 }

END{
    asorti(reportsList, sortedReportsList)
    id_len = cnt + 1

    print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
    print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
    print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
    print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
    print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
    printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(id_len),length(reportsList)+1)
    print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
    print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
    print"\t<cols><col min=\""id_len"\" max=\""id_len"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
    print"\t<sheetData>"

    ### --------------------------- Printing Block 1 -------------------------------###
    print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
    for (i = 1;i <= cnt;i++) print reports_id[ordered_id[i]]
    print "\t\t</row>"

    ### --------------------------- Printing next blocks -------------------------------###
    for ( i=1;i<=length(sortedReportsList);i++ ) {
        printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
        printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block

        LetterNumber = 2

        for(j = 1; j <= cnt; j++){
            val = 0
            if( reportValues[ordered_id[j],sortedReportsList[i]] )
                val = reportValues[ordered_id[j],sortedReportsList[i]]

            printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
        }

        print "\t\t</row>"
    }
    ### ----------------------- Printing fix text at the end ---------------------------------------------###
    print "\t</sheetData>"
    print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
    print "</worksheet>"

}

### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###

function recurse(num,   ret){
    mod = num % 26
    div = int(num / 26)

    if(mod)
        ret = sprintf("%c", mod + 64)
    if(div){
        if(div <= 26)
            ret = sprintf("%c%s", div + 64, ret)
        else
            ret = recurse(div)ret
    }
    return ret
}
 
Old 10-27-2011, 02:03 AM   #29
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
grail, there is no need for a recursive recurse function:
Code:
function recurse(num) {
    num--
    if (num < 0)
        return "";
    ret = sprintf("%c", 65 + int(num) % 26)
    num = int(num / 26)
    while (num > 0) {
        ret = sprintf("%c", 64 + (num % 26)) ret
        num = int(num / 26)
    }
    return ret;
}
will produce A, B, C, ..., Z, AA, AB, AC, ..., AZ, BA, BB, ..., YZ (for num == 1 to 676 ). If you want zero-based indexing, just remove the num-- line.

This one will work for all positive integers (1, 2, ...):
Code:
function recurse(num) {
    ret = ""
    while (num > 0) {
        num--
        ret = sprintf("%c", 65 + (num % 26)) ret
        num = int(num / 26)
    }
    return ret;
}

Last edited by Nominal Animal; 10-28-2011 at 01:39 AM. Reason: Fixed the error in the first version; thanks to grail for noticing it
 
1 members found this post helpful.
Old 10-27-2011, 02:24 AM   #30
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Jajaja "Take 32"? jaja.

Well, it seems it works 99.999999999999999999999999999% correctly, I've tested with a file that contain 86 Report id's
and the order of values looks correct but the problem is with the sequence of letters generated by recurse function, see this:
Code:
<c r="V1" t="s"><v>108</v></c>
<c r="W1" t="s"><v>113</v></c>
<c r="X1" t="s"><v>118</v></c>
<c r="Y1" t="s"><v>123</v></c>
<c r="A1" t="s"><v>128</v></c> # Instead of A1 should be Z1 
<c r="AA1" t="s"><v>133</v></c>
<c r="AB1" t="s"><v>138</v></c>

or this:

<c r="AX1" t="s"><v>248</v></c>
<c r="AY1" t="s"><v>253</v></c>
<c r="B1" t="s"><v>258</v></c> # Instead of B1 should be AZ1
<c r="BA1" t="s"><v>263</v></c>

or this:

<c r="BX1" t="s"><v>378</v></c>
<c r="BY1" t="s"><v>383</v></c>
<c r="C1" t="s"><v>388</v></c> # Instead of C1 should be BZ1
<c r="CA1" t="s"><v>393</v></c>
It generates until Y, is not printing Z and instead Z prints A1, B1, C1 etc.

I've tried changing in recurse function 27 intead of 26 and prints the Z's but after the Z's prints A1, B1, C1.. etc too.

How to solve this grail?

Thanks a lot again

Last edited by Perseus; 10-27-2011 at 02:30 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] need help adding letters to beginning of words from list using perl,bash or awk enteptain Programming 2 08-24-2011 11:15 PM
[SOLVED] Awk - finding and counting words specific letters within mora978 Programming 9 10-13-2010 10:45 AM
Is there any benefit to rewrite OSS-based code to ALSA-based code? RogueWarrior65 Linux - Software 1 08-13-2010 02:11 AM
bash-code to rename files based on config file Yalla-One Programming 20 04-14-2010 01:39 PM
sed or awk question - replace caps with small letters computera Linux - General 1 12-30-2003 04:39 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration