LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-08-2011, 05:00 AM   #1
Perseus
Member
 
Registered: Oct 2011
Posts: 177

Rep: Reputation: Disabled
Include loop to get files from different folders that have same name


Hello everyone,

Let say I have 100 files in folder1 and 100 files in folder2. 100 files in folder2 have the same name as in folder1.

I have a script that requires 2 files to get desired output, I mean, requires one file from folder1 and its corresponding counterpart from folder2.

If file from folder1 is "data100.txt", the script needs "data100.txt" from folder2 to get "data100_out.txt" (the output).

If my script looks like this:
Code:
file1=~/folder1/data100.txt
file2=~/folder2/data100.txt

In this part of code it's used ${file1}

In this part of code it's used ${file2}
Since there are 100 couples of files with the same name, in different path,

How is the way to add a loop in order the script takes all the files in folder1 and folder2 to produce 100 outputs?

Thanks for any help guys.

Grettings

Last edited by Perseus; 10-08-2011 at 05:06 AM.
 
Old 10-08-2011, 05:49 AM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946
Do a loop over all files in folder1. To get the corresponding name in folder2, replace the path part. Here is an example using Bash:
Code:
#!/bin/bash

FOLDER1=~/folder1
FOLDER2=~/folder2

for FILE1 in "${FOLDER1}"/*.txt ; do
    FILE2="${FOLDER2}/${FILE1##*/}"
    [ -f "$FILE1" ] || continue
    [ -f "$FILE2" ] || continue

    # Do something with FILE1 and FILE2

done
The bold bit above is the glob pattern that specifies which files to process. You can use * to process all files.

In Bash, ${FILE1##*/} expands to the contents of variable FILE1, but with everything up to and including the final slash removed. In non-Bash shells you can use something like FILE2="$FOLDER2/`basename "$FILE1"`" instead. This latter form is more traditional.

The two continue tests verify that both FILE1 and FILE2 exist, and are normal files. This may not be necessary for you, but if you use * glob pattern, these will skip any subdirectories you might have in folder1. If you want to output a warning message when those are skipped, you could use
Code:
    if [ ! -f "$FILE1" ]; then
        echo "$FILE1 is not a normal file. Skipped." >&2
        continue
    fi
    if [ ! -f "$FILE2" ]; then
        echo "$FILE2 is not a normal file (but $FILE1 is). Skipped." >&2
        continue
    fi
Hope this helps.
 
1 members found this post helpful.
Old 10-08-2011, 04:15 PM   #3
Perseus
Member
 
Registered: Oct 2011
Posts: 177

Original Poster
Rep: Reputation: Disabled
Hi Nominal Animal,

Many thanks for your reply.

Quote:
Originally Posted by Nominal Animal
Hope this helps.
It works perfect and it helps a lot, really! thanks

Regarding your code I only have a doubt, what it mean the below lines?:
Code:
    [ -f "$FILE1" ] || continue
    [ -f "$FILE2" ] || continue
Regarding the options below, both worked for me:
Code:
FILE2="${FOLDER2}/${FILE1##*/}" # This worked
FILE2="$FOLDER2/`basename "$FILE1"`"  #This worked too
And my last question,
how to get each output.txt when script finishes to process each couple of files within the same script?

In summary the idea is something like:
Code:
#!/bin/bash

FOLDER1=~/folder1
FOLDER2=~/folder2

for FILE1 in "${FOLDER1}"/*.txt ; do
    FILE2="${FOLDER2}/${FILE1##*/}"

Some code with FILE1
Some code with FILE2

> `basename "$FILE1"`_Out.txt #Get output corresponding to current FILE1 and FILE2
done
The structure of my real script including the loop you provided me is like this:
Code:
#!/bin/bash

FOLDER1=~/Danny/Example_files/Case_Info
FOLDER2=~/Danny/Example_files/Votes

FOLDER1=~/folder1
FOLDER2=~/folder2

for FILE1 in "${FOLDER1}"/*.txt ; do
    FILE2="${FOLDER2}/${FILE1##*/}"
    [ -f "$FILE1" ] || continue
    [ -f "$FILE2" ] || continue

Var1=$(( $(awk '/string/{print|"wc -l"}' ${FILE1}) + 1))
Var2=$(( $(grep -c 'string' ${FILE2}) + 1 ))
Var3=$(( $(awk -F ' /k/ {print $3| "sort -u|wc -l"}' ${FILE2}) + 1))

cat <<-FILE1_PART1
Name="FILE1"
First line... ${Var1} Other text...
Other lines...
.
.
.Last line...
FILE1_PART1

cat <<-FILE1_PART2
$(awk '/stringX/ {some code...}' ${FILE1}
FILE1_PART2

echo '  Some text'

cat <<-FILE2_PART1
 Name="FILE2"
Second line ... ${Var2}.. other text ${Var3}
Other lines...
.
.
$(awk ' /string/ {print $2}' ${FILE2} | sed 's|^|Text to replace|g' | sed 's|$|Other text|g') 
Last line...
FILE2_PART1

awk -F '/k/ {print $3| "sort -u"}' ${FILE2} | while read name
do
cat <<-FILE2_PART2
   First line....
      Some text ${name} some text
	$(awk -v Z=$name '/Z/{print " Some text"}' ${FILE2})
   Last line...
FILE2_PART2
done

echo 'Some text'

done
Many thanks for help so far.

Grettings

Last edited by Perseus; 10-08-2011 at 04:19 PM.
 
Old 10-09-2011, 03:23 AM   #4
Perseus
Member
 
Registered: Oct 2011
Posts: 177

Original Poster
Rep: Reputation: Disabled
Hi again Nominal Animal,

Don't worry about how to do what I asked in my last post. After trying and trying I got
that using "echo" instead of that structure of "cat" is the solution I looked for.

In summary, I replaced each cat by echo as follow and worked.
Code:
echo "Name="FILE1"
First line... ${Var1} Other text...
Other lines...
.
.
.Last line..." >> "$(basename "$FILE1" .txt)_Out.txt"
Many thanks for all help and time really.

Gretting
 
Old 10-09-2011, 03:59 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,550

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
I would add that combinations of awk and sed or even awk, sort and wc are not required.

Example:
Code:
awk -F ' /k/ {print $3| "sort -u|wc -l"}' ${FILE2}

# becomes

awk -F '/k/ && !_[$3]++{tot++}END{print tot}' ${FILE2}
Also, piping into a while loop will stop you being able to access any changes you make inside the loop (should you need to).
 
Old 10-09-2011, 04:55 AM   #6
Perseus
Member
 
Registered: Oct 2011
Posts: 177

Original Poster
Rep: Reputation: Disabled
Hello grail,

Thanks for your suggestions, I'll use it as you say.

But may you explain a little bit how does your awk code work? and I'm confuse what it mean "!_"
Code:
awk -F '/k/ && !_[$3]++{tot++}END{print tot}' ${FILE2}
Thanks a lot
 
Old 10-09-2011, 06:25 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,550

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
! = not

_ = this is simply the name of the variable the same as saying - a[$3]++

Let me know if you need more details
 
Old 10-09-2011, 06:56 AM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946
Quote:
Originally Posted by Perseus View Post
Regarding your code I only have a doubt, what it mean the below lines?:
Code:
    [ -f "$FILE1" ] || continue
Literally:
If file "$FILE1" does not exist, skip the rest of this iteration of the do .. done loop, continuing to the next iteration of the loop.

Quote:
Originally Posted by Perseus View Post
Regarding the options below, both worked for me:
Code:
FILE2="${FOLDER2}/${FILE1##*/}" # This worked
FILE2="$FOLDER2/`basename "$FILE1"`"  #This worked too
Yes. The first one works in Bash and POSIX shells (so just about any bash or sh you're likely to use), the second one works even in ancient Bourne shells (very old sh).

As to the AWK scriptlet, you're missing the field separator. Let me add that, and recolor the scriptlet:
Code:
awk -F separator '/k/ && !_[$3]++{tot++}END{print tot}' ${FILE2}
In a Bash script, you can use e.g. $'\t' if you want to use tabs as a separator.

_[$3]++ adds one to an associative array named _ (an underscore being a perfectly acceptable variable name in awk), keyed by the third field in the input. Because it is also used as a rule, and the ++ operator is on the right side ("post-increment"), the value before the increment is used in deciding the rule. In this case, there is a not operator ! in front, which means the action is done only when the value evaluates to zero.

In all, !_[$3]++{tot++} means: Find the value in array _ indexed by the third field in the current line (record), and increase it by one. If it was zero (or undefined), increase variable tot by one.

Rules can be quite complex. && is the logical AND operator, and /re/ is a regular expression compared against the entire line (record), evaluating to true if the expression matches. For example, /k/ is true for any line that contains the letter 'k'.

Thus, the entire scriptlet, '/k/ && !_[$3]++{tot++}END{print tot}' can be read as:
If, and only if, the line (record) contains 'k', find the value in array _ indexed by the third field in the current line (record), increase it by one; and if it was zero (or undefined), increase variable tot by one.
After all input lines (records) have been checked, print the value of variable tot.

Therefore, the scriptlet only prints the number of unique values in the third field of each line (record), but only considers lines (records) containing the letter k.
 
Old 10-09-2011, 08:36 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,550

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
As usual NA, very complete explanation
 
Old 10-10-2011, 04:37 AM   #10
Perseus
Member
 
Registered: Oct 2011
Posts: 177

Original Poster
Rep: Reputation: Disabled
Hi grail,

Many thanks for your reply and clarify my doubts.

Quote:
Originally Posted by Nominal Animal View Post
Literally:
If file "$FILE1" does not exist, skip the rest of this iteration of the do .. done loop, continuing to the next iteration of the loop.


Yes. The first one works in Bash and POSIX shells (so just about any bash or sh you're likely to use), the second one works even in ancient Bourne shells (very old sh).

As to the AWK scriptlet, you're missing the field separator. Let me add that, and recolor the scriptlet:
Code:
awk -F separator '/k/ && !_[$3]++{tot++}END{print tot}' ${FILE2}
In a Bash script, you can use e.g. $'\t' if you want to use tabs as a separator.

_[$3]++ adds one to an associative array named _ (an underscore being a perfectly acceptable variable name in awk), keyed by the third field in the input. Because it is also used as a rule, and the ++ operator is on the right side ("post-increment"), the value before the increment is used in deciding the rule. In this case, there is a not operator ! in front, which means the action is done only when the value evaluates to zero.

In all, !_[$3]++{tot++} means: Find the value in array _ indexed by the third field in the current line (record), and increase it by one. If it was zero (or undefined), increase variable tot by one.

Rules can be quite complex. && is the logical AND operator, and /re/ is a regular expression compared against the entire line (record), evaluating to true if the expression matches. For example, /k/ is true for any line that contains the letter 'k'.

Thus, the entire scriptlet, '/k/ && !_[$3]++{tot++}END{print tot}' can be read as:
If, and only if, the line (record) contains 'k', find the value in array _ indexed by the third field in the current line (record), increase it by one; and if it was zero (or undefined), increase variable tot by one.
After all input lines (records) have been checked, print the value of variable tot.

Therefore, the scriptlet only prints the number of unique values in the third field of each line (record), but only considers lines (records) containing the letter k.
Hello again NA,

Explanation more than expected, I'm much more clear now about grail's code. Many thanks for all details, it will help me from now for other problems. I'll continue trying to dissect grail's code and read your explanation to understand 100%
how it works.

Much appreciated experts.

Grettings

Last edited by Perseus; 10-10-2011 at 04:38 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Perforce edits my files when I give a depot path but not if I specify the local path gregorian Linux - Software 1 10-21-2011 12:57 PM
Perforce edits my files when I give a depot path but not if I specify the local path gregorian Linux - Software 1 10-07-2011 06:03 PM
[SOLVED] Shell Script - Use variable in a for loop with directory path Tech109 Linux - General 2 01-19-2011 11:22 AM
Loop with space in path fredrikny Programming 1 01-30-2008 06:24 PM
script to change unix path to windows path in all files csross Programming 8 04-29-2006 02:05 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration