LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-01-2013, 05:02 PM   #16
atjurhs
Member
 
Registered: Aug 2012
Posts: 168

Original Poster
Rep: Reputation: Disabled

Well I played with it over the weekend and came up with more pieces (sorry guys I'm really stumbling how to do this in one awk script) then I put the pieces together in a bash script to run it all. Here are the pieces in the bash script:

Code:
#!/usr/bin/awk -f

# this is from grail

BEGIN{ 
	end_zeroes = "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" 
	extra_zeroes = "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
}

{ print $0,end_zeroes }

!(NR % 6){

	for(i = 0; i < 15; i++)
		print extra_zeroes
}
then to break the one giant file into the right size files I use

Code:
split -d -a 3 -l 210 one_big_padded_file.dat
this gives file names like x000 x001 x002 etc. etc. which need to be renamed to what I need, so I use

Code:
for f in x* ; do mv "$f" "file_$f" ; done

for f in file_* ; do mv "$f" "$f.dat" ; done
Now all the output files sort of have the right names.

I'd rather do it with this awk command and when I run this awk script from the command line after I run grail's it all works
Code:
awk '!(NR%210) {i++;} {print > "file_"i".dat";}' i=1 giant_padded_file.txt
but I can't figure out how to run them together in one awk script.

I'm lost, Tabby
 
Old 07-01-2013, 06:54 PM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Well my first point would be that there is no need for 2 for loops as you can append to the start and end of the variable as you have done in awk.

To help with putting the awks together, look at the original awk and you will see how the file name was being created.
The only difference now is that instead of changing the name every 6 rows (NR % 6) you are now going to change it at a different point.

The gotcha is, it will not be changing every 210 rows as the new file created with second awk script (giant_padded_file.txt) has had additions.
The math is fairly trivial though.

Let me know how you get on?
 
Old 07-02-2013, 01:27 PM   #18
atjurhs
Member
 
Registered: Aug 2012
Posts: 168

Original Poster
Rep: Reputation: Disabled
...I have to take another step back

so say the input file has 240 lines
I want to break up the 240 lines every 6 lines so now I have 40 blocks of data
I want to add to each block 15 columns of zeros and 15 lines of zeros
now I have 21 lines per block, and 40 blocks of data so I get an output file with 840 lines which I've been calling the "one_giant_padded_file.txt"

as grail wrote the awk script that does it, it works perfectly, many many thanks!

now for my awk line command (which kinda works right sorta). I take the one giant file with 840 lines and run this line command.

Code:
awk '!(NR%210) {i++;} {print > "file_"i".dat";}' i=1 one_giant_padded_file.txt
but I get wrong results, I get 5 files, and...

Code:
file_1.dat has 209 lines and the file is missing it's last line 
file_2.dat has 210 lines
file_3.dat has 210 lines
file_4.dat has 210 lines
file_5.dat has 1 lines and the one line is all zeros
it looks like file_5.dat is what's supposed to go as the last line of file_1.dat, anyways, the last file should not exist and file_1.dat should have 210 lines and the last line is all zeros. Is there suposed to be a stop somewhere in the command so it doesn't loop back around, idk.

grail, can you please help me, I've been trying to fix this scince yesterday afternoon

thank you, Tabby

Last edited by atjurhs; 07-02-2013 at 01:34 PM. Reason: clarity
 
Old 07-02-2013, 02:16 PM   #19
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
You need to think about order of execution.
Code:
!(NR%210) {i++;}
This says, when you reach the 210th line of the current file, increase the counter by 1 ... so where do you think the 210th line (first round) will go??

Once you have this ... you can then simply add this into the original script
 
1 members found this post helpful.
Old 07-02-2013, 03:41 PM   #20
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,186

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Gail's approach to answering question is to provide a fish hook, pole, and line; I prefer to offer a little advice about how to use the fishing equipment and something about where the fish live. . .

So, a suggestion: See if your system responds to pinfo gawk or the older info gawk.

Here's an UNTESTED modification of Gail's program, with some added comments.

Code:
#!/usr/bin/gawk -f
# This section is run once, before any input is processed
BEGIN { 
	end_zeroes =   "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" 
	extra_zeroes = "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
# Initial output file name
        output_file=sprintf("file_%d.dat",++output_file_count)
# Number of lines written to this output file
        nout=0
}

#################################################
#
# These blocks (i.e., 'test {statements}')
# are executed, in the order in which they appear,
# for each line read from any input file.
#
# ("file names" in the form "x=y"
# set the value of x to y and are processed
# when read AFTER the BEGIN section (if any) is
# run.)
#
##################################################
#
# Do we now have 210 lines in the output file?
(nout == 210) {
# Get the next output file name
        output_file=sprintf("file_%d.dat",++output_file_count)
# And reset the output line count to zero
        nout=0
}

# Copy the next input file line to the current output file and increment the output line count
{       print $0 end_zeroes > output_file
        ++nout
}

# If the number of records read is a multiple of 6, add 15 lines of zeros
# and increment the output line count by 15
!(NR % 6) {
	for(i = 0; i < 15; i++) {
            print extra_zeroes > output_file
        }
        nout += 15
}
#############################################
# This block in run after the last input file
# is read.
#############################################
#
# Write some summary info to the console
END {incurment
  print "Done. Wrote " output_file_count " files."
}
Note: The first line (starting with the "shebang," #! would be used by a Linux system if you saved the code to a file and made it executable (chmod u+x code_file_name) so you could run it as a command (e,g., $ code_file_name input_file(s)).
 
Old 07-02-2013, 04:28 PM   #21
atjurhs
Member
 
Registered: Aug 2012
Posts: 168

Original Poster
Rep: Reputation: Disabled
grail

I got it, I got it, wohoooo

Code:
 awk 'NR%210==1 {"file_"i".dat";i++;} {print > "file_"i".dat"}' i=0 giant_input.file
that was hard, at least for me it was.

if there is something that I should change to stop any errors/bugs that I don't know about please tell me


PTrenholme, I'll give yours a look over too...

excited/happy Tabby
 
Old 07-02-2013, 08:50 PM   #22
atjurhs
Member
 
Registered: Aug 2012
Posts: 168

Original Poster
Rep: Reputation: Disabled
well guys I think that does it. thanks sooooo much for all your help!

Tabby
 
Old 07-03-2013, 04:59 AM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
I agree with PTrenholme's analogy that I provide direction as opposed to answers, but generally only to those that seem to be following

Glad you found a solution. Now that you have one, here is what I would look at:

1. Your final solution works, which is cool, but what I was pointing at in my last advice was that by simply changing the position of the increment you would achieve the same affect:
Code:
awk '{print > "file_"i".dat"}!(NR%210) {i++}' i=1 one_giant_padded_file.txt
2. As I pointed out, this can then be added to the original script to output the data from the original file:
Code:
#!/usr/bin/awk -f

BEGIN{ 
	end_zeroes = "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" 
	extra_zeroes = "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"

	cnt = 0

	file_name = sprintf("file_%02d.dat",++cnt)
}

{ print $0,end_zeroes > file_name }

!(NR % 6){

	for(i = 0; i < 15; i++)
		print extra_zeroes > file_name end
}

!(NR%60){ file_name = sprintf("file_%02d.dat",++cnt) }
 
Old 07-03-2013, 10:07 AM   #24
atjurhs
Member
 
Registered: Aug 2012
Posts: 168

Original Poster
Rep: Reputation: Disabled
good morning Grail, you get up too early, even I'm up too early today

I did try to follow your direction of moving the iterator, but...

Code:
warning these 2 awk commands do not work

awk '!(NR%210) {print > "file_"i".dat";} {i++;} ' i=1 one_giant_padded_file.txt

and I tried putting it inside the print statment

awk '!(NR%210) {print > "file_"i".dat" i++;} ' i=1 one_giant_padded_file.txt
you know (and I found out), that this keeps getting syntax erros that I couldn't work around.

what I couldn't think past was not having the seperating statment
Code:
!(NR%210)
at the very beginning of the command, and even in my awk command that works I had the seperating statement at the beginning.

the "combined script" is definetly beyound my coding ability. I've never had a class in any sort of programming, so I'm learning awk, sed, and bash writting on my own because alot of what I do is re-formating and re-configuring files and directories to run in already existing programs. I do write psuedo code to organize my thoughts, but putting into real code is much tuffer for me, like I can understand what for and while loops do, but I'll pull my hair out trying to write one, and having 3 print statments in that combined script no way would I have got, so I'll keep working at it, and thank you so much for your help!

Thanks so much, Tabby

please read my PM to you, ahhhh, I haven't figured out how to do that, is there a link somewhere?

Last edited by atjurhs; 07-03-2013 at 10:19 AM. Reason: requesting a link to send PMs
 
Old 07-03-2013, 02:27 PM   #25
tabbyagirl
LQ Newbie
 
Registered: Jul 2013
Distribution: Red Hat Enterprise Linux Client release 5.5 (Tikanga)
Posts: 7

Rep: Reputation: Disabled
I found out I can't send a PM, so out in the open, the pepole I work with asked me to change my username here, so I did. My new username here is "tabbyagirl"
 
Old 07-04-2013, 04:54 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Well I would need more information on any error messages to help with them.

Looking at the 2 lines you have in post #24, neither would work well for what you want, but I shall try to explain:
Code:
awk '!(NR%210) {print > "file_"i".dat";} {i++;} ' i=1 one_giant_padded_file.txt
There are 2 issues here:

1. The 'i' variable is now going to increase for every line read in the file, ie by the end of the script it will be 841

2. As you now have the condition '!(NR%210)' prior to your print command, it will only print every 210th line, ie only 4 single lines, one per file will be printed
Code:
awk '!(NR%210) {print > "file_"i".dat" i++;} ' i=1 one_giant_padded_file.txt
Here you have the same issue as above for printing, but now the value for 'i' will only get to 4

If you look at my example:
Code:
awk '{print > "file_"i".dat"}!(NR%210) {i++}' i=1 one_giant_padded_file.txt
{print > "file_"i".dat"} - This will print every line of one_giant_padded_file.txt into a new file called 'file_N.dat' where N starts at 1

!(NR%210) {i++} - This tells awk that when NR is evenly divisible by 210 that the variable 'i' will be increased by 1, hence our file of 840 lines will force the variable to be increased 4 times

Note: Even though 'i' is increased 4 times, the last value of 'i' is 5 but it is never used


Lastly, instead of comparing the new script from post #23 to the previous version in post #15, compare it instead to the one in post #7 as apart from a slight change in the BEGIN section
the following is the only new line:
Code:
!(NR%60){ file_name = sprintf("file_%02d.dat",++cnt) }
Hope some of this helps
 
Old 07-05-2013, 12:33 PM   #27
tabbyagirl
LQ Newbie
 
Registered: Jul 2013
Distribution: Red Hat Enterprise Linux Client release 5.5 (Tikanga)
Posts: 7

Rep: Reputation: Disabled
that helps VERY much, in learning what's going as it steps through the lines of code.

a friend of mine has some C code development tool that let's him step through each line so he can see what's happening, kinda like you explained up above. Do they hav such a thing for scripting languages? converting over to C seems like a BIG step, IDK

Tabby
 
Old 07-05-2013, 01:36 PM   #28
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Scripting in bash you can use the following as second line in script to set logging of a sorts:
Code:
set -xv
As for awk, or really any language whilst learning, I believe your best friend is the standard print / echo statement. Simply redirect all variables each time they change
to a separate file (or on screen if only a few lines) and then you can track down where things have gone wrong.

Other options like the one above or something like gdb to step through C code can be adopted later when executing much larger programs / scripts
 
Old 07-09-2013, 10:14 AM   #29
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,186

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
There is also a "full-fledged" gawk debugger available. It's described in the gawk info file to which I referred you above.

Basically, instead of, for example, gawk '{print > "file_"i".dat"}!(NR%210) {i++}' i=1 one_giant_padded_file.txt you would use dgawk '{print > "file_"i".dat"}!(NR%210) {i++}' i=1 one_giant_padded_file.txt

If your 'C' friend is familiar with gdb usage, dgawk commands are similar to those. The info section on "debugging" describes the usage fairly well.

A comment that. hopefully, will help you understand where you're loosing the track:

In a "one-line" command like awk '{print > "file_"i".dat"}!(NR%210) {i++}' i=1 one_giant_padded_file.txt, the "stuff" between the single quotes is a gawk program and the rest of the line are the argument for that program. You could, instead of that "on-line" program, done this:
Code:
$ cat > tmp.gawk
# Do this for every input line (I.e., No condition precedes the expression.)
{
  print > "file_" i ".dat"
}
# Do this whenever the number of records read is a multiple of 210
# (I.e., when the remainder of (NR / 210) is zero)
!(NR%210) {
  i += 1
}
^C
$ gawk -f tmp.gawk i=1 one_giant_padded_file
Note that comments and spaces are ignored in gawk code, and that (generally - quoted strings may contain almost any character, including new line characters, and a few other exceptions), new line characters and semi-colon character are equivalent and required to separate program statements in expressions.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Parallel matrix - matrix multiplication seg-faults ejspeiro Programming 9 04-18-2011 10:41 PM
is there a matrix screensaver, very exactly like in the Matrix movie? frenchn00b Linux - Desktop 2 08-20-2009 11:00 AM
awk convert column matrix to square matrix? johnpaulodonnell Programming 4 04-30-2008 02:45 PM
!!GIANT!! Tux Hitboxx General 12 08-09-2007 10:46 AM
Giant tar's pk21 Linux - General 4 09-04-2003 02:37 PM


All times are GMT -5. The time now is 07:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration