LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-26-2012, 12:47 PM   #31
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193

Ok ... so here is what I came up with, obviously you can change the output as needed, also, I went with an awk script instead of bash calling awk, but I am sure you can edit as required
Code:
#!/usr/bin/awk -f

match($0,/^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)/,f){
    for(i=1; i <= 16; i++){
        gsub(/^ *| *$/,"",f[i])
        printf "%s",f[i](i==16?"\n":"|")
    }
}
And you run it like so:
Code:
./awk_script --re-interval file
Only after version 4 can you not use that switch.

PS. You pulled a dodgy with fields 11 and 12 as they do not have a space between them but a hyphen This was corrected and allowed for.

Last edited by grail; 04-26-2012 at 12:48 PM.
 
Old 04-26-2012, 03:20 PM   #32
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
that is elegant. so how to add/combine that to post #24 script, i need the 1st awk to ignore lines of data (aka noise) that are not needed, etc.


ah, as you see F76-F82 is consecutive in raw data w/o h20, my bad. i needed to separate them, etc.
11 = $76$77$78
12 = $79$80$81$82

Last edited by Linux_Kidd; 04-26-2012 at 03:29 PM.
 
Old 04-27-2012, 12:22 AM   #33
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
No combining required, that takes out both scripts

Firstly I would try running the script with a test file that contains the "noise". The point here is that unless the "noise" exactly matches the 'match' function, it will be ignored.

If this does not work as you have lines with the exact same format but wish to ignore them based on a pattern, simply put pattern in slashes (//) and 'and' (&&) with match.

Let me know if any of this is unclear?
 
Old 04-27-2012, 07:43 AM   #34
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
the elegance works fine, except i need to run it in bash script, or, call this awk from a bash script and be able to send output to $FILE along with $i, etc.
see, i use two scripts, one to verify the directory and the 2nd (awk processing) does the rest. if you notice i pass $FILE to the awk script (actually its a bash script, i just name it .awk , etc) and i print out $2 from the awk script into last field of my file. i do this so that if any data shows up funny i know which file caused the problem, etc. the script(s) currently process 187 files, and a new file gets added daily.

Code:
#!/bin/bash -l
# written by me
umask 026
NOW=`date +%F%T`
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
  echo ""
  echo "HEAP folder was found in $HOME."
  echo "Please wait, processing files..."
  echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE
   for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
  echo ""
  echo "All done, your output file is $FILE"
  echo "have a nice day..."
  echo ""
else
  echo ""
  echo "HEAP folder in directory $HOME does not exist."
  echo "Please make sure this directory exists and has"
  echo "files in it."
  echo ""
fi

Last edited by Linux_Kidd; 04-27-2012 at 07:45 AM.
 
Old 04-27-2012, 10:20 AM   #35
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Please do not take this the wrong way ...
Code:
for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
I am hoping this means you can confirm that absolutely no files contain spaces, tabs or new lines in the name. Otherwise this is a big no no. Much safer to use:
Code:
for i in $HOME/HEAP/*; ...
I have to back up here as another part looks ... unusual:
Code:
FILE="$HOME/HEAP.$NOW.txt"
Is the dot (.) between HEAP and $NOW correct? Or should it be a slash like:
Code:
$HOME/HEAP/$i
Here is a way you could make it an awk script:
Code:
#!/usr/bin/awk -f

BEGIN{
    if(ARGV[1] ~ "HEAP/\\*"){
        print "HEAP folder in directory",ENVIRON["HOME"],"does not exist or"
        print "no files were available"
        exit
    }

    file = strftime("%F%T")".txt"

    print "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" > file
}

match($0,/^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)/,f){
    for(i=1; i <= 16; i++){
        gsub(/^ *| *$/,"",f[i])
        printf "%s",f[i](i==16?"\n":"|") > file
    }
}

END{
    print "All done, your output file is",file
    print "have a nice day..."
}
Then you would call it like so:
Code:
/var/scripts/convert.awk $HOME/HEAP/*
Have a play and let me know if you have any questions?
 
Old 04-27-2012, 10:59 AM   #36
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
ok, suggestion for using * for filename understood, but it is guaranteed the file names have no spaces. i did however make the change for the better, etc.

as for FILE="$HOME/HEAP.$NOW.txt"
this is correct, this is my output file. i name my output file at run time which is named with a timestamp to the second. the script will never be ran twice within the same second by same uid, etc. so everytime it runs the output is a unique file (for some uid's having date/time in the filename is easier than ls -al, etc).

not sure i have time to test this elegance, might need to leave what i have since i have already trained the uid's on how to run what i have, which is "log in via ssh, type /var/scripts/process.sh and hit enter".

Last edited by Linux_Kidd; 04-27-2012 at 11:05 AM.
 
Old 04-27-2012, 11:56 AM   #37
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
No probs with the file name ... I was a little confused as the start of the file name was the same as the directory ... so just checking
Quote:
log in via ssh, type /var/scripts/process.sh and hit enter
So process.sh then calls /var/scripts/convert.awk? You could just as easily call one, as you have them doing, but no need to then break off elsewhere, just put the script in that does the work.

Quote:
not sure i have time to test this elegance
I can fully understand as putting things in a live environment that you aren't a 100% on is not flash.

As I have been playing, I thought I might show you another way (just to keep that mind of yours guessing (lol)):
Code:
#!/bin/bash

regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'

umask 026
NOW=$(date +%F%T)
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
  echo ""
  echo "HEAP folder was found in $HOME."
  echo "Please wait, processing files..."
  echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE

  for i in $HOME/HEAP/*
  do
    while IFS="" read -r line
    do
      if [[ $line =~ $regex ]]
      then
	for (( i = 1; i <= 16; i++ ))
	do
	    read field <<< "${BASH_REMATCH[i]}"
	    (( i == 16 )) && end="\n" || end="|"
	    echo -ne "$field$end" >> $FILE
	done
      fi
    done<"$i"
  done
  
  echo ""
  echo "All done, your output file is $FILE"
  echo "have a nice day..."
  echo ""
else
  echo ""
  echo "HEAP folder in directory $HOME does not exist."
  echo "Please make sure this directory exists and has"
  echo "files in it."
  echo ""
fi
 
Old 04-27-2012, 01:02 PM   #38
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 737

Original Poster
Rep: Reputation: 78
i tried your bash script, no dice.

i ran your bash vs my 2 scripts. each way processes 187 txt files in the dir.

your bash:
2min30sec producing 28,989 lines of output

my scripts:
28sec producing 29,190 lines of output (this output was verified to be correct)


not sure where it choked. i'll use this for reference. thnx.
 
Old 04-27-2012, 02:05 PM   #39
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Yeah the awk will always run quicker as that is its thing, but of course the bash has the nicety of being all bash

Obviously it worked fine on the data you gave me for testing. The tests on my machine also show awk performs over bash even for the small level of data:
Code:
# bash
real	0m0.034s
user	0m0.012s
sys	0m0.016s

#awk
real	0m0.008s
user	0m0.000s
sys	0m0.004s
I do find it a little odd the amount that is out, ie. just over 200 out of 29000+. I would have thought larger if a recurring items was being missed.

Oh well ... it was a bit of fun
 
Old 04-27-2012, 02:27 PM   #40
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Ok ... one last edition which I finally worked out ... just seemed cool (just the part doing the work):
Code:
regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'

for i in $HOME/HEAP/*
do
  IFS="|$IFS"

  while IFS="" read -r line
  do
    if [[ $line =~ $regex ]]
    then
	read -a temp <<<"${BASH_REMATCH[*]:1}"
	echo "${temp[*]}"
    fi
  done<"$i"
done

unset IFS
And also 3 or 4 times faster than previous bash (on the small data)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
AWK looping though fields casperdaghost Linux - Newbie 10 12-31-2011 09:31 AM
awk question on handling *.CSV "text fields" in awk jschiwal Programming 8 05-27-2010 06:23 AM
[SOLVED] get fields using awk ashok.g Programming 9 12-09-2009 01:21 AM
modify all fields in awk tostay2003 Programming 16 08-09-2008 01:41 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration