[SOLVED] Help fixing bash/awk code to print letters based on counter
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
hmmm .... I think I follow but now that I see we are disassembling at a much higher level a couple of xml documents I would probably think a language like Perl which has a parser may
be more of what you are looking for. I will have a tinker and see what I can come up with but I only see this growing out of control.
So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)
.... I would probably think a language like Perl which has a parser may
be more of what you are looking for.
Use Spreadsheet::WriteExcel::FromXML was my first try, but after
crash with many issues like don't know the structure that perl module expects or receive some other errors I gave up, and even I'm a newbie in shell script, I decided to do the conversion using bash that I understand a little bit more that try to decipher how it works that CPAN module. One more thing is that this perl module only converts from xml to xls files, but
doesn't convert from xml to xlsx files.
Quote:
Originally Posted by grail
I will have a tinker and see what I can come up with but I only see this growing out of control.
I know now I'm asking for more help, because my original idea was do each block with bash and going step by step until get the complete output, only after to see your unique AWK code, I realized that is a much better and faster option to get the result. Really is not much than that, it would be my last help petition.
Quote:
Originally Posted by grail
So just to confirm I am starting on the right track ... everything in your final output is going to be produced by what we are trying to do? (ie. all of the last code in previous post)
Like I said I'm trying to generate an Excel xlsx file containing 2 sheets from 2 xml files.
If you look inside an xlsx file (that actually is an OpenXML file format), you'll see the following mandatory files that conform the xlsx file (mandatory are highlighted in red):
$ find . -type f
Well, I was able to write a combination of bash/awk scripts to generate all of them, except "sheet2.xml" that is the very last part of my objetive to generate the xlsx file. The sheet2.xml is what we're (or your in a higher percentage) trying to do and the complete output of sheet2.xml is:
(Values in red change from file to file)
In this sheet2.xml,Last Cell in last block is C6
In this sheet2.xml, number of letters in 1rst block is 2, then min=2+1=3 and max=2+1=3
The code below has the recursive function and prints the first block, but if I try to add a new loop to get
blocks 2, 3, ..N after first block "for (s in save).." receive errors. The same if I try to add this 2nd
for (s1 in save1) inside END{} statement. I'm not sure what is wrong.
Code:
L=2
awk -v Z="$L" '#BEGIN{ Z = L; W=1 }
##############################################################################
function recurse(num, ret)
{ mod = num % 26
div = int(num / 26)
if(mod)
ret = sprintf("%c", mod + 64)
if(div){
if(div <= 26)
ret = sprintf("%c%s", div + 64, ret)
else
ret = recurse(div)ret
}
return ret
}
##############################################################################
FNR == NR && /report id/{
save[gensub(/^[^"]+"|".+$/,"","g")]
save1[gensub(/^[^"].+">|<.+$/,"","g")]
next}
{for(s in save){
if($0 ~ s){
if( Z==2 )
printf("<row r=\"1\" >\n\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
else if ( Z > 2 && Z < length(save))
printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n" , recurse(Z) , FNR - 3);
else
printf("\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>\n</row>" , recurse(Z) , FNR - 3);
Z++
}
}
}' file2 file3
#*The above code works and for loop in blue works too, but if I add the for loop shown
below in red, the whole code doesn't work.
#END{asorti(save1);for(s1=1;s1<=length(save1);s1++){
#if($0 ~ save1(s1)){
# if( W==1 )
# printf("<row r=\"%d\" >\n\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n" ,W+1,W+1, FNR - 3);
# else if ( W > 1 && W < length(save1))
# printf("\t<c r=\"%s%d\"><v>%d</v></c>\n" , recurse(W), W+1, FNR - 3);
# else
# printf("\t<c r=\"%s%d\"><v>%d</v></c>\n</row>" , recurse(W), W+1, FNR - 3);
# }
#W++
#}' file2 file3
Ok ... based on information from your other post as well as this one, I have come up with the following. Of course I can not validate the output as I am not
100% sure what it should look like, so you will have to advise:
Well, the output I need is close to the output of your code, only need some adjustments in the order the values are being printed. You can test your code using the samples of file2 and file3 shown below: file2:
Referring to the output, even when the text in red is variable, I can handle that. If you want, only focus in the output of the complex part (the one in blue)
As you can see in the output,
1-) The order of blocks 2-6 is the same order as the "Reports" appear in file3 (in file3 they are in alphabetical order, Report o, Report t, Report u, etc...)
2-) Whithin each block in the output, the order of appearence of the values is based in the order of appeareance of each "report id" in file2 (first report id=No. 1234, 2nd report id=No. 457 etc.)
3-) Some other details are as comments within the output (in green).
The output desired is:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<dimension ref="A1:C6"/>
<sheetViews><sheetView workbookViewId="0"/></sheetViews>
<sheetFormatPr baseColWidth="10" defaultRowHeight="15" customHeight="1" x14ac:dyDescent="0.25"/>
<cols><col min="3" max="3" width="11.85546875" bestFit="1" customWidth="1"/></cols>
<sheetData>
<row r="1" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="B1" t="s"><v>8</v></c> # 8 because report id="No. 1234" is in line 11 in file3, then 11-3=8
<c r="C1" t="s"><v>13</v></c> # 13 because report id="No. 457" is in line 16 in file3, then 11-3=8
</row>
<row r="2" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="A2" t="s"><v>17</v></c> # 17 because is "Report o" is in line 20 in file3, then 20-3=17
<c r="B2"><v>1</v></c> # 1 because "Report o" has value 1 in 1st block of file2. 1st block is report id="No. 1234"
<c r="C2"><v>2</v></c> # 2 because "Report o" has value 2 in 2nd block of file2. 2nd block is report id="No. 457"
</row>
<row r="3" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="A3" t="s"><v>18</v></c>
<c r="B3"><v>3</v></c> # 3 because "Report t" has value 3 in 1st block of file2. 1st block is report id="No. 1234"
<c r="C3"><v>0</v></c> # 0 because "Report t" doesn't appear in 2nd block of file2. 2nd block is report id="No. 457"
</row>
<row r="4" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="A4" t="s"><v>19</v></c>
<c r="B4"><v>0</v></c> # 0 because "Report u" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
<c r="C4"><v>7</v></c> # 7 because "Report u" has value 7 in 2nd block of file2. 2nd block is report id="No. 457"
</row>
<row r="5" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="A5" t="s"><v>20</v></c>
<c r="B5"><v>0</v></c> # 0 because "Report x" has value 0 in 1st block of file2. 1st block is report id="No. 1234"
<c r="C5"><v>4</v></c> # 4 because "Report x" has value 4 in 2nd block of file2. 2nd block is report id="No. 457"
</row>
<row r="6" spans="1:5" ht="15" customHeight="1" x14ac:dyDescent="0.25">
<c r="A6" t="s"><v>21</v></c>
<c r="B6"><v>0</v></c> # 0 because "Report y" doesn't appear in 1st block of file2. 1st block is report id="No. 1234"
<c r="C6"><v>5</v></c> # 5 because "Report y" has value 5 in 2nd block of file2. 2nd block is report id="No. 457"
</row>
</sheetData>
<pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
</worksheet>
I hope is not too confusing, many thanks for your help.
PS: The array in bash I have in the other code is related to an awk routine that set to zero, in the output, the value of each "report Z" that doesn't appear in some "report id" block in file2 e.g "Report u" doesn't appear in 1st block of file2, because of that in the output for that "report id" is "0" (see C3, B4 or B6 in the output). Do that I found is the more complex part.
So i found that you get an extra row as your original ReportsList has a pipe at the end which creates an additional Report X which does not exist. Easy enough to over come
with a simple +1 and having the counter start at 1 instead of 2 which threw me at first anyways.
Code:
#!/usr/bin/awk -f
BEGIN{
FS = "[<>]+"
LetterNumber = 2
}
FNR == NR{
if(/id/){
split($0,id,/"/)
reports_id[id[2]]
}
if(/Report ./){
split($0,r)
reportsList[r[3]]
split(r[2],n,/"/)
reportValues[id[2],r[3]] = n[2]
}
next
}
$4 in reports_id{
reports_id[$4] = sprintf("\t\t\t<c r=\"%s1\" t=\"s\"><v>%d</v></c>" , recurse(LetterNumber++) , FNR-3)
}
$4 in reportsList{ reportsList[$4] = FNR - 3 }
END{
asorti(reportsList, sortedReportsList)
print "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
print "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\""
print "\txmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
print "\txmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\""
print "\txmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\">"
printf("\t<dimension ref=\"A1:%s%d\"/>\n",recurse(length(reports_id)+1),length(reportsList)+1)
print"\t<sheetViews><sheetView workbookViewId=\"0\"/></sheetViews>"
print"\t<sheetFormatPr baseColWidth=\"10\" defaultRowHeight=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\"/>"
print"\t<cols><col min=\""length(reports_id)+1"\" max=\""length(reports_id)+1"\" width=\"11.85546875\" bestFit=\"1\" customWidth=\"1\"/></cols>"
print"\t<sheetData>"
### --------------------------- Printing Block 1 -------------------------------###
print "\t\t<row r=\"1\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">"
for (i in reports_id) print reports_id[i]
print "\t\t</row>"
### --------------------------- Printing next blocks -------------------------------###
#for ( i=2;i<=length(sortedReportsList);i++ ) {
for ( i=1;i<=length(sortedReportsList);i++ ) {
printf("\t\t<row r=\"%d\" spans=\"1:5\" ht=\"15\" customHeight=\"1\" x14ac:dyDescent=\"0.25\">\n", i+1)
printf("\t\t\t<c r=\"A%d\" t=\"s\"><v>%d</v></c>\n",i+1,reportsList[sortedReportsList[i]]) #Printing 1rst line of each block
LetterNumber = 2
for(j in reports_id){
val = 0
if( reportValues[j,sortedReportsList[i]] )
val = reportValues[j,sortedReportsList[i]]
printf("\t\t\t<c r=\"%s%d\"><v>"val"</v></c>\n",recurse(LetterNumber++),i+1)
}
print "\t\t</row>"
}
### ----------------------- Printing fix text at the end ---------------------------------------------###
print "\t</sheetData>"
print "\t<pageMargins left=\"0.75\" right=\"0.75\" top=\"1\" bottom=\"1\" header=\"0.5\" footer=\"0.5\"/>"
print "</worksheet>"
}
### --------------------------- Recurse function to obtain Column Letter from number ---------------------- ###
function recurse(num, ret){
mod = num % 26
div = int(num / 26)
if(mod)
ret = sprintf("%c", mod + 64)
if(div){
if(div <= 26)
ret = sprintf("%c%s", div + 64, ret)
else
ret = recurse(div)ret
}
return ret
}
I must think it work almost perfect. Your code is fast. The only issue that is printing the values within each block in different order, e.g for the first block should be 8,13 and the code is printing 13,8. The same for the other blocks if you compare the output I show in my previous post.
I think the issue is that when doing an 'in' call to an array the order is not set in stone (so I was probably just lucky).
Yes, because of that I tried to set all arrays indexed numerically in ascending order in order to print at the end using for(i=1;i<=N;i++), some troubles come if the arrays are printed using for i in Array.
Regarding the code, now some bad news :-(, only the first block contain values, in the other blocks all values are printed
as zero.
grail, there is no need for a recursive recurse function:
Code:
function recurse(num) {
num--
if (num < 0)
return "";
ret = sprintf("%c", 65 + int(num) % 26)
num = int(num / 26)
while (num > 0) {
ret = sprintf("%c", 64 + (num % 26)) ret
num = int(num / 26)
}
return ret;
}
will produce A, B, C, ..., Z, AA, AB, AC, ..., AZ, BA, BB, ..., YZ (for num == 1 to 676 ). If you want zero-based indexing, just remove the num-- line.
This one will work for all positive integers (1, 2, ...):
Code:
function recurse(num) {
ret = ""
while (num > 0) {
num--
ret = sprintf("%c", 65 + (num % 26)) ret
num = int(num / 26)
}
return ret;
}
Last edited by Nominal Animal; 10-28-2011 at 01:39 AM.
Reason: Fixed the error in the first version; thanks to grail for noticing it
Well, it seems it works 99.999999999999999999999999999% correctly, I've tested with a file that contain 86 Report id's
and the order of values looks correct but the problem is with the sequence of letters generated by recurse function, see this:
Code:
<c r="V1" t="s"><v>108</v></c>
<c r="W1" t="s"><v>113</v></c>
<c r="X1" t="s"><v>118</v></c>
<c r="Y1" t="s"><v>123</v></c>
<c r="A1" t="s"><v>128</v></c> # Instead of A1 should be Z1
<c r="AA1" t="s"><v>133</v></c>
<c r="AB1" t="s"><v>138</v></c>
or this:
<c r="AX1" t="s"><v>248</v></c>
<c r="AY1" t="s"><v>253</v></c>
<c r="B1" t="s"><v>258</v></c> # Instead of B1 should be AZ1
<c r="BA1" t="s"><v>263</v></c>
or this:
<c r="BX1" t="s"><v>378</v></c>
<c r="BY1" t="s"><v>383</v></c>
<c r="C1" t="s"><v>388</v></c> # Instead of C1 should be BZ1
<c r="CA1" t="s"><v>393</v></c>
It generates until Y, is not printing Z and instead Z prints A1, B1, C1 etc.
I've tried changing in recurse function 27 intead of 26 and prints the Z's but after the Z's prints A1, B1, C1.. etc too.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.