ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Ok ... so here is what I came up with, obviously you can change the output as needed, also, I went with an awk script instead of bash calling awk, but I am sure you can edit as required
No combining required, that takes out both scripts
Firstly I would try running the script with a test file that contains the "noise". The point here is that unless the "noise" exactly matches the 'match' function, it will be ignored.
If this does not work as you have lines with the exact same format but wish to ignore them based on a pattern, simply put pattern in slashes (//) and 'and' (&&) with match.
the elegance works fine, except i need to run it in bash script, or, call this awk from a bash script and be able to send output to $FILE along with $i, etc.
see, i use two scripts, one to verify the directory and the 2nd (awk processing) does the rest. if you notice i pass $FILE to the awk script (actually its a bash script, i just name it .awk , etc) and i print out $2 from the awk script into last field of my file. i do this so that if any data shows up funny i know which file caused the problem, etc. the script(s) currently process 187 files, and a new file gets added daily.
Code:
#!/bin/bash -l
# written by me
umask 026
NOW=`date +%F%T`
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
echo ""
echo "HEAP folder was found in $HOME."
echo "Please wait, processing files..."
echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE
for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
echo ""
echo "All done, your output file is $FILE"
echo "have a nice day..."
echo ""
else
echo ""
echo "HEAP folder in directory $HOME does not exist."
echo "Please make sure this directory exists and has"
echo "files in it."
echo ""
fi
Last edited by Linux_Kidd; 04-27-2012 at 07:45 AM.
for i in `ls $HOME/HEAP/`; do /var/scripts/convert.awk $HOME/HEAP/$i $FILE; done
I am hoping this means you can confirm that absolutely no files contain spaces, tabs or new lines in the name. Otherwise this is a big no no. Much safer to use:
Code:
for i in $HOME/HEAP/*; ...
I have to back up here as another part looks ... unusual:
Code:
FILE="$HOME/HEAP.$NOW.txt"
Is the dot (.) between HEAP and $NOW correct? Or should it be a slash like:
Code:
$HOME/HEAP/$i
Here is a way you could make it an awk script:
Code:
#!/usr/bin/awk -f
BEGIN{
if(ARGV[1] ~ "HEAP/\\*"){
print "HEAP folder in directory",ENVIRON["HOME"],"does not exist or"
print "no files were available"
exit
}
file = strftime("%F%T")".txt"
print "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" > file
}
match($0,/^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)/,f){
for(i=1; i <= 16; i++){
gsub(/^ *| *$/,"",f[i])
printf "%s",f[i](i==16?"\n":"|") > file
}
}
END{
print "All done, your output file is",file
print "have a nice day..."
}
Then you would call it like so:
Code:
/var/scripts/convert.awk $HOME/HEAP/*
Have a play and let me know if you have any questions?
ok, suggestion for using * for filename understood, but it is guaranteed the file names have no spaces. i did however make the change for the better, etc.
as for FILE="$HOME/HEAP.$NOW.txt"
this is correct, this is my output file. i name my output file at run time which is named with a timestamp to the second. the script will never be ran twice within the same second by same uid, etc. so everytime it runs the output is a unique file (for some uid's having date/time in the filename is easier than ls -al, etc).
not sure i have time to test this elegance, might need to leave what i have since i have already trained the uid's on how to run what i have, which is "log in via ssh, type /var/scripts/process.sh and hit enter".
Last edited by Linux_Kidd; 04-27-2012 at 11:05 AM.
No probs with the file name ... I was a little confused as the start of the file name was the same as the directory ... so just checking
Quote:
log in via ssh, type /var/scripts/process.sh and hit enter
So process.sh then calls /var/scripts/convert.awk? You could just as easily call one, as you have them doing, but no need to then break off elsewhere, just put the script in that does the work.
Quote:
not sure i have time to test this elegance
I can fully understand as putting things in a live environment that you aren't a 100% on is not flash.
As I have been playing, I thought I might show you another way (just to keep that mind of yours guessing (lol)):
Code:
#!/bin/bash
regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'
umask 026
NOW=$(date +%F%T)
FILE="$HOME/HEAP.$NOW.txt"
if [ -d "$HOME/HEAP" ]
then
echo ""
echo "HEAP folder was found in $HOME."
echo "Please wait, processing files..."
echo "1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|FILENAME" >> $FILE
for i in $HOME/HEAP/*
do
while IFS="" read -r line
do
if [[ $line =~ $regex ]]
then
for (( i = 1; i <= 16; i++ ))
do
read field <<< "${BASH_REMATCH[i]}"
(( i == 16 )) && end="\n" || end="|"
echo -ne "$field$end" >> $FILE
done
fi
done<"$i"
done
echo ""
echo "All done, your output file is $FILE"
echo "have a nice day..."
echo ""
else
echo ""
echo "HEAP folder in directory $HOME does not exist."
echo "Please make sure this directory exists and has"
echo "files in it."
echo ""
fi
Yeah the awk will always run quicker as that is its thing, but of course the bash has the nicety of being all bash
Obviously it worked fine on the data you gave me for testing. The tests on my machine also show awk performs over bash even for the small level of data:
Code:
# bash
real 0m0.034s
user 0m0.012s
sys 0m0.016s
#awk
real 0m0.008s
user 0m0.000s
sys 0m0.004s
I do find it a little odd the amount that is out, ie. just over 200 out of 29000+. I would have thought larger if a recurring items was being missed.
Ok ... one last edition which I finally worked out ... just seemed cool (just the part doing the work):
Code:
regex='^[0 ](.{8}) (.{8}) (.{4}) (.{8}) (.{8}) (.{3}) (.{2}) (.{8}) (.{8}) (.{8}) (.{3})(.{4}) (.{5}) (.{28}) (.{5}) (.*)'
for i in $HOME/HEAP/*
do
IFS="|$IFS"
while IFS="" read -r line
do
if [[ $line =~ $regex ]]
then
read -a temp <<<"${BASH_REMATCH[*]:1}"
echo "${temp[*]}"
fi
done<"$i"
done
unset IFS
And also 3 or 4 times faster than previous bash (on the small data)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.