LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-20-2010, 07:34 AM   #16
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0

hi Grail,

Extract.sh - would extract the text between the two tags START and END.
awk '{sub(/^[ /*]+/, ""); print}' - would remove the leading spaces, slash, astericks
Format.sh would the format the output in to required format.

Quote:
Name|Date|Changes
 
Old 10-20-2010, 08:25 AM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
Extract.sh - would extract the text between the two tags START and END.
Already done in other script so kind of a waste
Quote:
awk '{sub(/^[ /*]+/, ""); print}' - would remove the leading spaces, slash, astericks
Easily implemented on the fly in other script

Maybe you can show me some input, seeing as the initial file is now different to other problem?
Quote:
Format.sh would the format the output in to required format.
I have also come up with a simpler solution to this
 
Old 10-20-2010, 11:40 PM   #18
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
I have got a solution.

I need to execute this script to specific files in a given directory and its sub-directories. And also need to print the File name also.

Yes, the code given by you has worked. I can remove that and put the one-liner below:

Quote:
/START|END/{f=!f;next}f

The following is the script that i execute.

Quote:
abort(){ echo "ABORTED: $1 missing or doesn't exist!"; usage;}
usage(){
echo -e "$0: [-s source dir] [-e extention,extention,extention,...]\nex: $0 -s $HOME -e jpg,png,mp3";
exit;
}

echo -e "Starting script................\n"

OUTPUT_FILE=Document.csv


TODAY_DATE_TIME=`date +%Y%m%d%H%M%S`
SAMPLE_FILE="DUMMY_FILE.$TODAY_DATE_TIME.txt"
touch $SAMPLE_FILE

#####################
# CHECK INPUTS
#####################

if [ ! "$1" ];
then usage;
fi
until [ -z "$1" ];
do
case "$1" in
"-s") shift; SOURCEDIR="$1";;
"-e") shift; EXTENTIONS=( `echo "$1" |sed 's/,/ /g'` );;
*) usage;;
esac
shift
done

if [ ! "$SOURCEDIR" ] || [ ! -e "$SOURCEDIR" ]; then abort "source dir"; fi
if [ "${#EXTENTIONS[@]}" == "0" ]; then abort "extention(s)"; fi

echo -e "Input Directory: $SOURCEDIR\n"

for ITEM in "${EXTENTIONS[@]}"; do
while read file; do
echo "File name: " $file>>$SAMPLE_FILE
awk -f Extract.sh $SOURCEDIR/$file|awk '{sub(/^[ /*]+/, ""); print}'>>$SAMPLE_FILE
done < <(find "$SOURCEDIR" -type f -name "*$ITEM" -printf "%P\n")
done

cat $SAMPLE_FILE|awk -f Format1.sh>$OUTPUT_FILE

if [ $? -eq 0 ];then
echo "Ending script successfully............"
fi

exit $?
The script needs the input directory and the file extension.

If the input file contains below:
Quote:

//START
//Application Name: XXXXX
//Release Number: 1.0
//Function Name: f_job_no_aktiv()
//Changes: Added logic in f_job_no_aktiv()
//END

Ouput is:

Quote:
File Name|Application Name|Release Number|Function Name|Changes
file1.txt| XXXXX| 1.0| f_job_no_aktiv()| Added logic in f_job_no_aktiv()

Can you tell me the simpler solution you have found.

Last edited by flamingo_l; 10-20-2010 at 11:43 PM.
 
Old 10-21-2010, 09:33 AM   #19
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
So not sure how picky you want me to be, but I did pick up a couple of things.

Firstly, if you use code instead of quote tags on this site then your formatting will remain.
Code:
usage(){
    echo -e "$0: [-s source dir] [-e extention,extention,extention,...]\nex: $0 -s $HOME -e jpg,png,mp3"; 
    exit; 
}
To me this implies that both source and extension are optional.
Code:
if [ ! "$1" ];
then usage; 
fi
Clearly not the case. Also on this test, I am guessing from the following line:
Code:
if [ ! "$SOURCEDIR" ] || [ ! -e "$SOURCEDIR" ]; then abort "source dir"; fi
if [ "${#EXTENTIONS[@]}" == "0" ]; then abort "extention(s)"; fi
That both items are compulsory, this would mean that every time you run the script it should have a minimum (because you can have multiple extensions) of 4.
So investigate '$#'

Once you have used the above you could go simpler than the until loop and in the else use something like:
Code:
SOURCEDIR=$2
shift 3
EXTENSIONS=($@)
I would like to assume you are using bash, no #! at top of script, so if you are I would suggest the following;

1. Change [] for [[]]. The second is safer in that it doesn't have as many little gotchas as []
2. When doing math calculations, change [] for (()). Apart from being more obvious what you are testing, it also provides normal math looking tests.
eg.
Code:
if [ "${#EXTENTIONS[@]}" == "0" ]; then abort "extention(s)"; fi

# becomes

if (( ${#EXTENTIONS[@]} == 0 )); then abort "extention(s)"; fi

# even cleaner could be

(( ${#EXTENTIONS[@]} == 0 )) && abort "extention(s)"
Although if you make the change for testing arguments, this test becomes mute
Code:
if [ ! "$SOURCEDIR" ] || [ ! -e "$SOURCEDIR" ]; then abort "source dir"; fi
Again first test not required if you make changes above. As far as second test goes, check what '-e' means.
Code:
for ITEM in "${EXTENTIONS[@]}"; do
    while read file; do
        echo "File name: " $file>>$SAMPLE_FILE
        awk -f Extract.sh $SOURCEDIR/$file|awk '{sub(/^[ /*]+/, ""); print}'>>$SAMPLE_FILE
    done < <(find "$SOURCEDIR" -type f -name "*$ITEM" -printf "%P\n")
done
Impact of loops in loops can be done away with using a small change to assigning extensions:
Code:
EXTENTIONS=$(echo "$@" | tr ' ' '|')

while read -r found_file
do
    <your stuff here>
done< <(find "$SOURCEDIR" -type f -regextype posix-extended -regex ".*\.($EXTENTIONS)")
Lastly, the following if is always true:
Code:
cat $SAMPLE_FILE|awk -f Format1.sh>$OUTPUT_FILE

if [ $? -eq 0 ];then
    echo "Ending script successfully............"
fi
First thing here is that cat is not required, just place $SAMPLE_FILE prior to redirection.
The reason it will always work is unless the Format1.sh has errors in it the awk is the last command run and even if nothing is output it will end successfully

Hope this wasn't too much overload

As for simpler solution:
Code:
#!/usr/bin/awk -f

BEGIN{
    FS="[ \t]*:[ \t]*"
    OFS="|"
    counter=0
}

/^(START|END)/{
    if(/^END/)counter++
    f=!f
    next
}

f{
    if(NF > 1)
	arr[counter,$1]=$2
    else
	arr[counter,last]=arr[counter,last]" "$0

    last=$1
}

END{
    print "Name|Date|Function Name|Changes"
    for(y=0;y<counter;y++)
	print arr[y,"Name"],arr[y,"Date"],arr[y,"Function Name"],arr[y,"Changes"]
}
This also covers the Extract.sh script, so is the input above valid? If so I can look at putting code to do:
Quote:
awk '{sub(/^[ /*]+/, ""); print}' - would remove the leading spaces, slash, astericks
 
2 members found this post helpful.
Old 10-21-2010, 09:39 AM   #20
genderbender
Member
 
Registered: Jan 2005
Location: US
Distribution: Centos, Ubuntu, Solaris, Redhat
Posts: 396

Rep: Reputation: 31
Grail - FYI you're a genius, this isn't my script but that's bloody clever (both the code changes, the original code and your improved code. Grail has obviously done a lot of work here to get his, I mean your code working here flamingo.
 
Old 10-21-2010, 12:40 PM   #21
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Quote:
Originally Posted by genderbender View Post
Grail - FYI you're a genius, this isn't my script but that's bloody clever (both the code changes, the original code and your improved code. Grail has obviously done a lot of work here to get his, I mean your code working here flamingo.
I agree, especially with awk.
 
Old 10-21-2010, 07:06 PM   #22
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Thanks
 
Old 10-21-2010, 11:53 PM   #23
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
Yes, i too agree.. you are really genious and smart.

Thank you for your comments and help. I would test the script and get back to you on your questions....
 
Old 10-22-2010, 04:11 AM   #24
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
Quote:
To me this implies that both source and extension are optional.

Code:
if [ ! "$1" ];
then usage;
fi
No, it is checking whether the argument given to the script is not null.


Quote:
Clearly not the case. Also on this test, I am guessing from the following line:

Code:
if [ ! "$SOURCEDIR" ] || [ ! -e "$SOURCEDIR" ]; then abort "source dir"; fi
if [ "${#EXTENTIONS[@]}" == "0" ]; then abort "extention(s)"; fi

That both items are compulsory, this would mean that every time you run the script it should have a minimum (because you can have multiple extensions) of 4.
So investigate '$#'
Yes, both the source directory and extensions are compulsory and -e is checking for the existence of the directory.

Quote:
I would like to assume you are using bash, no #! at top of script, so if you are I would suggest the following;
Am not using bash shell, since am using Cygwin by default shell would be Bourne shell (sh).


Quote:
This also covers the Extract.sh script, so is the input above valid? If so I can look at putting code to do:
Yes, the given input is valid and it would be a part of a java or cs file as comments....
Also, there can be multiple blocks like that in a single file...
In some cases, instead of // it may contain /* or * or " *"... So only am removing the leading spaces, astericks, slash.

The script given by you looks simpler but it is not working for me if it contains slash, space, asterick....

Also, if any of the line like Function Name or changes is missing , then also it is working by printing nothing in the specified column name...
Which i was trying these days ...............

Last edited by flamingo_l; 10-22-2010 at 04:36 AM.
 
Old 10-22-2010, 08:27 AM   #25
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
To me this implies that both source and extension are optional.
This line was referring to the code box above it meaning that your usage() function uses square brackets for -s and -e switches which under a lot of linux apps would mean they are optional.
Quote:
No, it is checking whether the argument given to the script is not null.
Yes your first if is checking that you have past at least one parameter to the function, but as I said previously, from further code it is an incorrect test as the minimum number of
arguments is 4.
Quote:
-e is checking for the existence of the directory.
Actually, no it is not. If you look under man test (equivalent to [ or [[) you will see:
Code:
-e FILE
        FILE exists

-d FILE
        FILE exists and is a directory
So -e is not the correct test here as the user could enter 'file.txt' for the source directory and your test will pass.
Quote:
Am not using bash shell, since am using Cygwin by default shell would be Bourne shell (sh).
My bad ... means some of my nice tricks might not work

Quote:
Yes, the given input is valid and it would be a part of a java or cs file as comments....
Also, there can be multiple blocks like that in a single file...
In some cases, instead of // it may contain /* or * or " *"... So only am removing the leading spaces, astericks, slash.
So am I understanding correctly then that we are stripping information from source code and the START,END range is inside each of the comments along with the required fields?
Something like:
Code:
<java code here>
<java code here>
//START
//blah
//ablh
//END
<java code here>
<java code here>
<java code here>

/*START
blah2
foo
END*/
<java code here>
<java code here>
<etc>
 
Old 10-22-2010, 09:19 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
So I couldn't wait ...

Tell me what ya think? Just remore .txt and then extract into a test folder
Attached Files
File Type: txt flamingo_l.tar.bz2.txt (1.7 KB, 10 views)
 
Old 10-25-2010, 01:52 AM   #27
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
hi Grail,

Soory for the late reply...

Yes the sample file given by you is correct.

Code:
<java code here>
<java code here>
//START
//blah
//ablh
//END
<java code here>
<java code here>
<java code here>

/*START
blah2
foo
END*/
<java code here>
<java code here>

/*START
  * blah2
  * foo
  * END*/
<java code here>
<java code here>
<etc>

<etc>
Am unable to open your tar file....

Attaching a sample file...
Attached Files
File Type: txt sample.txt (1.0 KB, 8 views)

Last edited by flamingo_l; 10-25-2010 at 03:52 AM. Reason: Inserting sample file
 
Old 10-25-2010, 02:50 AM   #28
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
You renamed it first? ie you need to take .txt off the end.
 
Old 10-25-2010, 03:54 AM   #29
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
Now i have got it.. I have solved by renaming flamingo_l.tar.bz2.txt to flamingo_l.tar.bz2.
Used winzip to extarct the contents of the folder..
Will check and let you know.

Meanwhile, attaching sample file.........
Attached Files
File Type: txt sample.txt (1.0 KB, 12 views)

Last edited by flamingo_l; 10-25-2010 at 04:20 AM.
 
Old 10-25-2010, 04:36 AM   #30
flamingo_l
Member
 
Registered: Jul 2010
Posts: 41

Original Poster
Rep: Reputation: 0
Angry

I got an error when executed the script in Cygwin on my system...

Code:
$ ./flamingo_l1.sh -s . -e java
Starting script................
Input Directory: .
find: warning: you have specified the -regextype option after a non-option argum
ent -type, but options are not positional (-regextype affects tests specified be
fore it as well as those specified after it).  Please specify options before oth
er arguments.

./flamingo_l1.sh: line 48: column: command not found
No data in output file
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find/grep/wc command to find matching files, print filename and word count dbasch Linux - Newbie 10 09-14-2009 05:55 PM
Single find command to find multiple files? thok Linux - Newbie 7 01-31-2009 04:45 PM
Using a single "Find" Command to find files bases on multiple criteria roboxooo Linux - Newbie 6 01-15-2009 04:13 AM
How to find files and copy the found files to the floppy in one command justmehere Linux - Newbie 11 05-04-2008 11:29 PM
problem with find command in script cojo Linux - Software 3 05-26-2004 10:28 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration