LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-10-2009, 03:45 AM   #1
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Rep: Reputation: 45
What the "awk" is wrong with this?


I've been diving into the realm of awk. Problem is.. I find it quite difficult being that I am only using the ABS guide and couple others. So grasping the concept is rather hard. I'm writing several scripts using dialog to show progress. I have successfully "with help" accomplished alot of answers into different usages of dialog.

My problem is with printing out the correct percentage of my tar extraction. I have successfully done this, except the progress bar goes off the chart.... e.g. 300% more or less.

So I am assuming that this has to do with the file size of the tar file. Do I need to count the contents of the file before I can extract? If so, how would I start with this? As you can see my awk is limited to the division of 240 (or any other definitive number). The thing is, this is fixed. And the a tar file can change file size.

Any suggestions

Code:
 tarfile1='*.tar.bz2' 
   
    tar -xvjpf $tarfile1 2>&1 |
    awk '{ print (Total+=1)/240,"=>",$0}' |
    dialog --clear --gauge "Extracting Stage3...."  7 70

Last edited by manwithaplan; 03-10-2009 at 03:54 AM.
 
Old 03-10-2009, 05:42 AM   #2
Agrouf
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: LFS
Posts: 1,591

Rep: Reputation: 79
Code:
tarfile1='*.tar.bz2' 
total=$(bzip2 -dc $tarfile1 | tar -t | wc -l)
tar -xvjpf $tarfile1 2>&1 |
awk '{ print (Total+=1)/'$total',"=>",$0}' |
dialog --clear --gauge "Extracting Stage3...."  7 70
 
Old 03-10-2009, 01:33 PM   #3
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
Quote:
Originally Posted by Agrouf View Post
Code:
total=$(bzip2 -dc $tarfile1 | tar -t | wc -l)
awk '{ print (Total+=1)/'$total',"=>",$0}' |
I see what has been done... by counting the file. It doesn't seem to work. I also tried using this with a counting loop. Just major stumped on this.

Any idea what I could be doing wrong?
 
Old 03-10-2009, 02:14 PM   #4
Agrouf
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: LFS
Posts: 1,591

Rep: Reputation: 79
what is the output?
 
Old 03-10-2009, 03:06 PM   #5
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
I'm assuming that the count wasn't successful. As it see's $total with a value of zero.

Code:
awk: (FILENAME=- FNR=1) fatal: division by zero attempted
I can get the extraction to work, though it isn't gauging.

Last edited by manwithaplan; 03-10-2009 at 03:12 PM.
 
Old 03-10-2009, 03:11 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
try "export total" after populating it
 
Old 03-10-2009, 03:48 PM   #7
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
Quote:
Originally Posted by Tinkster View Post
try "export total" after populating it
Well it pauses, and output's:
Code:
"tar: Record size = 8 blocks"
then continues with extraction until exit 0.

Is there some way to tail & grep any output of the extraction? it seems this approach would also require a good count. This has given me an interesting problem.

Would extracting to a temp.$$ then record the size, and count be an option?
 
Old 03-10-2009, 05:30 PM   #8
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
I am making some progress... I've used the examples above and actually changed it around some.
My output.. is getting better with dialog... Still having the progress bar go over. It exceeds to 111% when finished. So I need to re-look at the awk again.

Here is the code as revised.
Code:
   tr=$(tar -tvf *tar.bz2 2>&1|wc -l) 
    tar -xvf *tar.bz2 2>&1 |  
    awk '{ print (Total+=1)/'$tr'*100,"=>",$0}' |
    dialog --gauge "Extracting...."  7 70
It seemed by changing the awk to a percentage output helped. Though my count is ahead by 1.1%. So I need to rethink my awk.
Maybe Total+=1.01 ...? that doesn't make sense though. I'll have to account for some drifting... of some sort. The original file contains over 38,000 files.

EDIT: When tried with smaller files... its perfect... there seems to be a math droop on files exceeding a certain count with using "wc -l"

Last edited by manwithaplan; 03-10-2009 at 06:09 PM.
 
Old 03-10-2009, 07:46 PM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Just an aside note. Looking at your script, an unwanted effect is that the commmand
Code:
tr=$(tar -tvf *tar.bz2 2>&1|wc -l)
actually uncompresses the bz2 file to retrieve the list of files. This increase the execution time of the script and the additional time is not taken in account from the dialog's gauge. A trick to avoid this behavior is to unzip the file to standard output and pipe it to the tar command.

Moreover to determine a correct percentage, to me the only way is to use the size of the uncompressed file and force the tar command to give the progress of its execution.

Unfortunately, the first task cannot be achieved for bzip2 files, since the bz2 file format does not store the size of the uncompressed file explicitly. Instead, you can do that for gzip files:
Code:
$ gzip -l archive.tar.gz
         compressed        uncompressed  ratio uncompressed_name
           28022232            43264000  35.2% archive.tar
The second task can be performed using GNU tar's checkpoints. See here for details.

Finally, in awk you can use all the collected information to retrieve the correct percentage. The following example uses a 512k record size for tar extraction. This allows the calculation of the total number of records in the BEGIN section of the awk script:
Code:
#!/bin/bash
tsize=$(gzip -l archive.tar.gz | awk 'NR==2{print $2}')
(
gzip -dc archive.tar.gz |
tar --record-size=512 -x --checkpoint=1000 2>&1 |
awk -v tsize=$tsize 'BEGIN{den=int(tsize/512000)*10}{fflush(""); print int($NF/den)}'
) |
dialog --title "archive.tar.gz" --gauge "Extracting...."  7 70
Note that the checkpoints are sent to standard error, hence the redirection 2>&1. Also note the fflush function of awk to flush every line of output into the following pipe. In this way each computed percentage is sent to the gauge and the effect is smoother.
 
Old 03-11-2009, 02:46 AM   #10
Agrouf
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: LFS
Posts: 1,591

Rep: Reputation: 79
Quote:
Originally Posted by manwithaplan View Post
I am making some progress... I've used the examples above and actually changed it around some.
My output.. is getting better with dialog... Still having the progress bar go over. It exceeds to 111% when finished. So I need to re-look at the awk again.

Here is the code as revised.
Code:
   tr=$(tar -tvf *tar.bz2 2>&1|wc -l) 
    tar -xvf *tar.bz2 2>&1 |  
    awk '{ print (Total+=1)/'$tr'*100,"=>",$0}' |
    dialog --gauge "Extracting...."  7 70
It seemed by changing the awk to a percentage output helped. Though my count is ahead by 1.1%. So I need to rethink my awk.
Maybe Total+=1.01 ...? that doesn't make sense though. I'll have to account for some drifting... of some sort. The original file contains over 38,000 files.

EDIT: When tried with smaller files... its perfect... there seems to be a math droop on files exceeding a certain count with using "wc -l"
I believe it is a rounding error. That is because you devide by a large number before you multiply by 100. When you divide by a large number, you loose precision. This is why it works with small files but not with large ones.
try this:
awk '{ print (Total+=1)*100/'$tr',"=>",$0}'
 
Old 03-11-2009, 12:46 PM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Another clue: maybe the command
Code:
tar -tvf *tar.bz2
does not work the way you expect. If the current working directory contains more than one tar.bz2, the shell expands the command line as
Code:
tar -tvf file1.tar.bz2 file2.tar.bz2 file3.tar.bz3 ...
this means "test for the existence of file2.tar.bz2 file3.tar.bz3 ... inside the archive file1.tar.bz2". When multiple file names are passed as argument to the tar command, the files from the 2nd to the Nth are extracted from (x) or added to (c) the tar archive.

A last clue: the dialog's gauge expects an integer value from the standard input. This value is interpreted as the percent to show in the progress bar. Any other string is interpreted as the gauge's prompt (that is a displayed message). However, the correct way to update the prompt involves the usage of the XXX string to separate different prompt messages, that are cycled every time the percent change. From the dialog's man page:
Code:
--gauge text height width [percent]
      A gauge box displays a meter along the bottom of the box.  The meter indicates the percentage.
      New  percentages  are read from standard input, one integer per line.  The meter is updated to
      reflect each new percentage.  If the standard input reads the string  "XXX",  then  subsequent
      lines  up  to another "XXX" are used for a new prompt.  The gauge exits when EOF is reached on
      the standard input.

      The percent value denotes the initial percentage shown in the meter.  If not specified, it  is
      zero.
Therefore a working version of the code, could be
Code:
#!/bin/bash
tr=$(tar -tvf archive.tar.bz2 2>&1 | wc -l) 
(
tar -xvf archive.tar.bz2 2>&1 |  
awk '{ print int((Total+=1)/'$tr'*100); system("sleep 1")}' 
) |
dialog --title "archive.tar.gz" --gauge "Extracting...."  7 70
The system call to sleep from the awk program is just for testing purposes: if you want to test over a small archive, it slows up the input flow. Anyway, to me it's still valid my previous assertion: the execution time of the command executed in the tr variable assignment is not taken in account from the dialog box.
 
Old 03-11-2009, 07:18 PM   #12
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
Quote:
Originally Posted by Agrouf View Post
awk '{ print (Total+=1)*100/'$tr',"=>",$0}'
Good point... Too me, the mathematics is linear either way. And produces the same output.

Quote:
colucix: "does not work the way you expect. If the current working directory contains more than one tar.bz2, the shell expands the command line as"
I am very aware of this. And this specific problem is only addressing one file at a time. Though your previous example would be absolutely perfect if the specific files where in gzip format. Unfortunately there not. So I am duped... Your example seems to hang on the sleep. And the output is null in the progress bar.

I am also trying to address this same problem with a .git that contains all hidden data. There is no output at all. So it seems the initial problem is my count, and the limitations of "wc -l"

Last edited by manwithaplan; 03-11-2009 at 07:20 PM.
 
Old 03-11-2009, 08:10 PM   #13
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Quote:
Originally Posted by manwithaplan View Post
So I am duped... Your example seems to hang on the sleep. And the output is null in the progress bar.
I'm sure you debugged one command at a time. Anyway, I suspect that awk gave a syntax error or something and since in my example it run in a subshell there is no standard error displayed. The entire I/O chain hangs because standard error is not passed through the pipe, unless explicitly redirected to standard output.

Regarding the .git files, I'm not aware about them. Which application let you extract (?) them?
 
Old 03-11-2009, 10:10 PM   #14
manwithaplan
Member
 
Registered: Nov 2008
Location: ~/
Distribution: Arch || Sidux
Posts: 393

Original Poster
Rep: Reputation: 45
I am 100% convinced that the problem lays with dialog. I have tested the output in bash & I'm receiving a 100% lines counted.

I assumed by creating a percentage output in awk (*100) this would produce the correct percentages needed for gauge. Apparently not. The dialog --gauge seems very limited and really seems more for show. I wanted an easy gui for my scripts.

Quote:
Originally Posted by colucix View Post
Regarding the .git files, I'm not aware about them. Which application let you extract (?) them?
I was referring to the extraction. I have tar.bz2 that contains .git files that have hidden attributes. It seems, trying these same commands is not gauging correctly. Going to output to a temp file to check for errors.
 
Old 03-12-2009, 03:00 AM   #15
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Please, can you post some lines of the bash output? I mean that one directly piped to dialog. Thanks.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk question on handling *.CSV "text fields" in awk jschiwal Programming 8 05-27-2010 07:23 AM
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 12:39 PM
dhcp says "wrong interface name: "ath0" Quakeboy02 Linux - Networking 10 12-19-2006 08:14 PM
Replacing "function(x)" with "x" using sed/awk/smth Griffon26 Linux - General 3 11-22-2006 11:47 AM
how can I invoke "awk" from shell to do floating point math? Joseph Schiller Programming 8 01-12-2006 06:00 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration