Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
04-26-2012, 12:47 PM
|
#31
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Ok ... so here is what I came up with, obviously you can change the output as needed, also, I went with an awk script instead of bash calling awk, but I am sure you can edit as required
Code:
#!/usr/bin/awk -f
match($0,/^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)/,f){
for(i=1; i <= 16; i++){
gsub(/^ *| *$/,"",f[i])
printf "%s",f[i](i==16?"\n":"|")
}
}
And you run it like so:
Code:
./awk_script --re-interval file
Only after version 4 can you not use that switch.
PS. You pulled a dodgy with fields 11 and 12 as they do not have a space between them but a hyphen  This was corrected and allowed for.
Last edited by grail; 04-26-2012 at 12:48 PM.
|
|
|
04-26-2012, 03:20 PM
|
#32
|
Member
Registered: Jan 2006
Location: USA
Posts: 742
Original Poster
Rep:
|
that is elegant. so how to add/combine that to post #24 script, i need the 1st awk to ignore lines of data (aka noise) that are not needed, etc.
ah, as you see F76-F82 is consecutive in raw data w/o h20, my bad. i needed to separate them, etc.
11 = $76$77$78
12 = $79$80$81$82
Last edited by Linux_Kidd; 04-26-2012 at 03:29 PM.
|
|
|
04-27-2012, 12:22 AM
|
#33
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
No combining required, that takes out both scripts
Firstly I would try running the script with a test file that contains the "noise". The point here is that unless the "noise" exactly matches the 'match' function, it will be ignored.
If this does not work as you have lines with the exact same format but wish to ignore them based on a pattern, simply put pattern in slashes (//) and 'and' (&&) with match.
Let me know if any of this is unclear?
|
|
|
04-27-2012, 07:43 AM
|
#34
|
Member
Registered: Jan 2006
Location: USA
Posts: 742
Original Poster
Rep:
|
the elegance works fine, except i need to run it in bash script, or, call this awk from a bash script and be able to send output to $FILE along with $i, etc.
see, i use two scripts, one to verify the directory and the 2nd (awk processing) does the rest. if you notice i pass $FILE to the awk script (actually its a bash script, i just name it .awk , etc) and i print out $2 from the awk script into last field of my file. i do this so that if any data shows up funny i know which file caused the problem, etc. the script(s) currently process 187 files, and a new file gets added daily.
Code:
#!/bin/bash -l
# written by me
umask 026
NOW=`date +%F%T`
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
echo ""
echo "HEAP folder was found in $HOME."
echo "Please wait, processing files..."
echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE
for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
echo ""
echo "All done, your output file is $FILE"
echo "have a nice day..."
echo ""
else
echo ""
echo "HEAP folder in directory $HOME does not exist."
echo "Please make sure this directory exists and has"
echo "files in it."
echo ""
fi
Last edited by Linux_Kidd; 04-27-2012 at 07:45 AM.
|
|
|
04-27-2012, 10:20 AM
|
#35
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Please do not take this the wrong way ...
Code:
for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
I am hoping this means you can confirm that absolutely no files contain spaces, tabs or new lines in the name. Otherwise this is a big no no. Much safer to use:
Code:
for i in $HOME/HEAP/*; ...
I have to back up here as another part looks ... unusual:
Code:
FILE="$HOME/HEAP.$NOW.txt"
Is the dot (.) between HEAP and $NOW correct? Or should it be a slash like:
Here is a way you could make it an awk script:
Code:
#!/usr/bin/awk -f
BEGIN{
if(ARGV[1] ~ "HEAP/\\*"){
print "HEAP folder in directory",ENVIRON["HOME"],"does not exist or"
print "no files were available"
exit
}
file = strftime("%F%T")".txt"
print "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" > file
}
match($0,/^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)/,f){
for(i=1; i <= 16; i++){
gsub(/^ *| *$/,"",f[i])
printf "%s",f[i](i==16?"\n":"|") > file
}
}
END{
print "All done, your output file is",file
print "have a nice day..."
}
Then you would call it like so:
Code:
/var/scripts/convert.awk $HOME/HEAP/*
Have a play and let me know if you have any questions?
|
|
|
04-27-2012, 10:59 AM
|
#36
|
Member
Registered: Jan 2006
Location: USA
Posts: 742
Original Poster
Rep:
|
ok, suggestion for using * for filename understood, but it is guaranteed the file names have no spaces. i did however make the change for the better, etc.
as for FILE="$HOME/HEAP.$NOW.txt"
this is correct, this is my output file. i name my output file at run time which is named with a timestamp to the second. the script will never be ran twice within the same second by same uid, etc. so everytime it runs the output is a unique file (for some uid's having date/time in the filename is easier than ls -al, etc).
not sure i have time to test this elegance, might need to leave what i have since i have already trained the uid's on how to run what i have, which is "log in via ssh, type /var/scripts/process.sh and hit enter".
Last edited by Linux_Kidd; 04-27-2012 at 11:05 AM.
|
|
|
04-27-2012, 11:56 AM
|
#37
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
No probs with the file name ... I was a little confused as the start of the file name was the same as the directory ... so just checking
Quote:
log in via ssh, type /var/scripts/process.sh and hit enter
|
So process.sh then calls /var/scripts/convert.awk? You could just as easily call one, as you have them doing, but no need to then break off elsewhere, just put the script in that does the work.
Quote:
not sure i have time to test this elegance
|
I can fully understand as putting things in a live environment that you aren't a 100% on is not flash.
As I have been playing, I thought I might show you another way (just to keep that mind of yours guessing (lol)):
Code:
#!/bin/bash
regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'
umask 026
NOW=$(date +%F%T)
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
echo ""
echo "HEAP folder was found in $HOME."
echo "Please wait, processing files..."
echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE
for i in $HOME/HEAP/*
do
while IFS="" read -r line
do
if [[ $line =~ $regex ]]
then
for (( i = 1; i <= 16; i++ ))
do
read field <<< "${BASH_REMATCH[i]}"
(( i == 16 )) && end="\n" || end="|"
echo -ne "$field$end" >> $FILE
done
fi
done<"$i"
done
echo ""
echo "All done, your output file is $FILE"
echo "have a nice day..."
echo ""
else
echo ""
echo "HEAP folder in directory $HOME does not exist."
echo "Please make sure this directory exists and has"
echo "files in it."
echo ""
fi
|
|
|
04-27-2012, 01:02 PM
|
#38
|
Member
Registered: Jan 2006
Location: USA
Posts: 742
Original Poster
Rep:
|
i tried your bash script, no dice.
i ran your bash vs my 2 scripts. each way processes 187 txt files in the dir.
your bash:
2min30sec producing 28,989 lines of output
my scripts:
28sec producing 29,190 lines of output (this output was verified to be correct)
not sure where it choked. i'll use this for reference. thnx.
|
|
|
04-27-2012, 02:05 PM
|
#39
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Yeah the awk will always run quicker as that is its thing, but of course the bash has the nicety of being all bash
Obviously it worked fine on the data you gave me for testing. The tests on my machine also show awk performs over bash even for the small level of data:
Code:
# bash
real 0m0.034s
user 0m0.012s
sys 0m0.016s
#awk
real 0m0.008s
user 0m0.000s
sys 0m0.004s
I do find it a little odd the amount that is out, ie. just over 200 out of 29000+. I would have thought larger if a recurring items was being missed.
Oh well ... it was a bit of fun 
|
|
|
04-27-2012, 02:27 PM
|
#40
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Ok ... one last edition which I finally worked out ... just seemed cool (just the part doing the work):
Code:
regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'
for i in $HOME/HEAP/*
do
IFS="|$IFS"
while IFS="" read -r line
do
if [[ $line =~ $regex ]]
then
read -a temp <<<"${BASH_REMATCH[*]:1}"
echo "${temp[*]}"
fi
done<"$i"
done
unset IFS
And also 3 or 4 times faster than previous bash (on the small data)
|
|
|
All times are GMT -5. The time now is 01:37 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|