LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-16-2012, 10:44 AM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867

Well I am not sure I follow any of your suggested solutions, how about just create a counter and echo it when it is all done?
 
Old 10-16-2012, 02:16 PM   #17
atjurhs
Member
 
Registered: Aug 2012
Posts: 190

Original Poster
Rep: Reputation: Disabled
Hi grail,

I kinda thought that's what I was doing with the three things I tried. Looks like they are just the wrong way to go about it.

since the information is already running through the script, I thought the echo $link > approach was just pulling that info off and writing to a file as it was being written to standard I/O. That didn't work, so I thought trying an fprintf statment might work, but it didn't either so then I thought of essentially what you are saying (I think) but tried achieving that through an awk statement that wrapped around the whole script.

So what I "think" what I hear you saying is to do something like

Code:
for i=10000000; do   # a really large counter/iterator that the number of lines in either file won't exceed
for link in ${links[*]}; do
  echo -n "Getting count of '$link' in $fileB: "
  awk -F, "\$5 ~ /$link/{print \$5}" $fileB|grep -c .
  echo links[*] > outputfile.out
i=i+1
done
done
this came back and said 'i=10000000': not a valid identifier

so I thought of doing away with the counter/iterator:

Code:
do
for link in ${links[*]}; do
  echo -n "Getting count of '$link' in $fileB: "
  awk -F, "\$5 ~ /$link/{print \$5}" $fileB|grep -c .
  echo links[*] > outputfile.out
done
done
but it gave me: syntax error near unexpected token `do' because there is another do statement in the script with no speacial syntax, I'm confused and out of ideas.

in case we lost signt of the goal in this:

it is to write the output of a single "run" that is given from the script in post #2 to outputfile.out, then when I run the code again to add that run's results to that same outputfile.out , then when I run the code again to add that run's results to that same outputfile.out , etc etc. So that after running the code however many times, I can look at outputfile.out and see which pair of fileA and fileB gives me the largest result.

you can probably see that I'm stubbling around, and might even chuckle. I even giggle at myself over the floundering, but I think the "school of hard knocks" is easier if you can see a little humor along the way.

and I do thank you guys sooooo much!

Tabitha

Last edited by atjurhs; 10-16-2012 at 03:02 PM.
 
Old 10-17-2012, 09:54 AM   #18
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
So I am guessing I still have my cloth ears on I do not understand the multiple running of the script if FileA and FileB never change?

You probably do not want to hear this, but, using your input data in the original post, show what the output would look like and what if any data is being changed to provide different
output each time the script is run?

I can tell you I see no purpose to your first example above and the second is incomplete so difficult to follow.
 
Old 10-17-2012, 11:43 AM   #19
atjurhs
Member
 
Registered: Aug 2012
Posts: 190

Original Poster
Rep: Reputation: Disabled
Hi Grail,

maybe it would be clearer if I said that I will be running the script multiple times and there will be multiple FileA.txt and FileB.txt, like this:

FileA1.txt FileB1.txt --> outputfile1.out
FileA2.txt FileB2.txt --> outputfile2.out
FileA3.txt FileB3.txt --> outputfile3.out
FileA4.txt FileB4.txt --> outputfile4.out
FileA5.txt FileB5.txt --> outputfile5.out
FileA6.txt FileB6.txt --> outputfile6.out
etc. etc.

maybe that can be batched, idk?

and then after all the runs, I'll combine all the outputfile#.out files into one outputfile_summary.out
{outputfile1.out, outputfile2.out, outputfile3.out, outputfile4.out, outputfile5.out, outputfile6.out, etc. etc.} --> outputfile_summary.out

I think I already know how to combine and summarize the outputfile#.out files

does that help answer the first question?

next,
the contents of FileA1.txt will be different than the contents of FileA2.txt and different than FileA3.txt and so on....
the contents of FileB1.txt will be different than the contents of FileB2.txt and different than FileB3.txt and so on....
so the results in outputfile1.out will be different than that in outputfile2.out and different than that in outputfile3.out and so on...
outputfile_summary.out will simply be a collection of the information contained in each of the individual outputfile#.out files

as far as how to take the current script and write it's result (that's currently going to standard i/o) and redirect it to an ouputfile.out I've tried everything I know, and am at a loss?

Tabitha

Last edited by atjurhs; 10-17-2012 at 11:48 AM.
 
Old 10-17-2012, 12:35 PM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
Well I think I get it. My first choice would be to return to awk and use something like:
Code:
awk -vval=12345 -F" *, *" 'FNR==NR{if($1 == val){val2 = $2;nextfile};next}$5 == val2{sum++;print}END{print "Total lines containing",val2,"is",sum}' FileA1 FileB1 > outputfile1.out
Obviously the process for subsequent FileAN and FileBN is the same.

However I am guessing we want to automate the multiple file part? (yes) This is not so much of an issue but my question then would be, will the value of 'val' (used above) always be the same
for all combinations of FileA and FileB?

If differing each time as the data stored in FileA is always changing per N then you are limited to 1 of 2 options:

1. Run above, or whichever script, one at a time for each value of N and val

2. If val is known ahead of time for each N, FileA / FileB, then place values in an array and loop over it whilst calling the script for new values of N

Let me know which way sounds like the direction you are looking to go in?
 
Old 10-17-2012, 02:33 PM   #21
atjurhs
Member
 
Registered: Aug 2012
Posts: 190

Original Poster
Rep: Reputation: Disabled
The value of val is the same for all FileAN FileBN runs (in this example 12345)
I also know beforehand what column to search thru to find 12345 and what column the linking value is in

I'll give your awk script a try and see what happens...

Thanks soooo much for your help!!!

Tabby

Last edited by atjurhs; 10-17-2012 at 02:35 PM.
 
Old 10-17-2012, 08:12 PM   #22
nugat
Member
 
Registered: Sep 2012
Posts: 122

Rep: Reputation: 31
Quote:
Originally Posted by atjurhs View Post
next I'd like to print off the number of occuances for any one pair.

I tried writing an echo and > statment both befrore and after the "done" at the end ofthe script, which I thought I should be echoing $link - that didn't work

so I tried an fprintf statement in the last awk line - that didn't work

then I tried to wrap the whole script in another awk stament and a > to a text file. - that also didn't work

what's the correst way?
If I understand your request (and I'm not sure that I do!), then you can change the format of the script output a bit, and make it take two more command line arguments: FileA and FileB.

The idea is, you'd pass it the input value (like before), but then also your FileA and your FileB. so you might call it like this:

Code:
./script.sh 67890 fileA.txt fileB.txt
It will look for column values the same as before, but now it will output what it finds differently - hopefully in a way that aligns more with what you want. So, for example, the output might look like this:

Code:
input=67890,link=12345,file1=fileA.txt,file2=fileB.txt,cnt=0
input=67890,link=22245,file1=fileA.txt,file2=fileB.txt,cnt=1
There are two lines of output, b/c the input variable, 67890, was found twice in FileA.txt. Consider each line to be comma-separated and contain 5 fields:

The first is the input variable you are passing to the script
The second is the link to the input variable, found in the first file
The third is "fileA"
The fourth is "fileB"
The fifth (the one you are most interested in, I think) is the count of pairs in fileA and fileB

This output is both sent to the terminal (STDOUT), and to an output file defined in the script. Currently it is set like this:

Code:
outfile='outfile.out'
but you can change it to whatever you want. Note that the script will always append to the file (that is the "tee -a" bit), not overwrite it.

Here is the modified code:
Code:
#!/bin/bash
[ $# -ne 3 ] && echo "Usage: $0 <value> <FileA> <FileB>" && exit 1
input=$1

# files
fileA=$2 #'fileA.txt'
fileB=$3 #'fileB.txt'

outfile='outfile.out'

# make sure the files exist
! [ -f $fileA ] && echo "$fileA: No such file" && exit 1
! [ -f $fileB ] && echo "$fileB: No such file" && exit 1

declare -a links
links=($(awk -F, "\$2 ~ /$input/{print \$1}" $fileA|sed -e 's|^[[:space:]]*||'))
if [ ${#links[*]} -lt 1 ]; then
  echo "Input \`$input' not found in first column of $fileA"
  exit 1
fi
echo "Found ${#links[*]} links for $input in $fileA: ${links[*]}"

for link in ${links[*]}; do
#  echo -n "Getting count of '$link' in $fileB: "
  cnt=$(awk -F, "\$5 ~ /$link/{print \$5}" $fileB|grep -c .)
  echo "input=${input},link=${link},file1=${fileA},file2=${fileB},cnt=${cnt}"|\
    tee -a $outfile
done
 
Old 10-18-2012, 03:52 PM   #23
atjurhs
Member
 
Registered: Aug 2012
Posts: 190

Original Poster
Rep: Reputation: Disabled
Hi grail, hoooray, that ran perfectly!!!

oh btw, I had to switch the
Code:
 $1 == val){val2 = $2 to read $2 == val){val2 = $1
because I goofed up in saying what column in FileA1 the linking value is in

-----------------------------------------------------------------------

is it too much to ask now "what if there is a FileC1"

so a known value in FileA1 is used to link FileA1 to FileB1 and find wanted values in FileB1. now can the values that were found in FileB1 be used as a key valuse to search on and find wanted data in FileC1 - so essentially a search across three files with one bread crumb leading to the next....

in this example: searching on 12345 in the 2nd column of FileA1 leads to 67890 which is in the 1st column of FileA1

FileA1
Code:
61443, 97336   
68473, 59775   
67890, 12345   
23159, 09895   
09785, 13844
searching on 67890 in the 1st column of FileB1 leads to 104, 105, and 37 in the 5th column of FileB1

FileB1
Code:
00001, 0, 1, 65894584945, 0
33333, 0, 1, 65894584945, 1
89705, 1, 0, 89657943793, 2
11114, 0, 0, 67539849393, 3                  
67890, 1, 1, 65894584945, 104
67890, 1, 1, 65894584945, 105
48760, 0, 1, 56804850333, 6
67890, 0, 0, 12893490543, 37
76190, 0, 0, 12893490543, 8
59044, 1, 0, 87959457595, 9
00001, 0, 1, 65894584945, 8
the 37, 104, and 105 these values from FileB1 now need to be used to search out and print the very much needed data in the second column of FileC1

FileC1
Code:
11676, 105, 0, 65094, 11111, -0.6484
478394554, 68, 0, 65094, 11111, -0.6484
3.3, 104, 0, 65094, 11111, -0.1
79, 61, 0, 65094, 11111, -0.6484
68473, 68, 0, 65094, 11111, -0.6484
6.6, 68, 0, 65094, 11111, -0.6484
68473, 68, 0, 65094, 11111, -0.6484
8.8, 68, 0, 65094, 11111, -0.6484
6045749, 37, 0, 53989, 111111, -1.7557
6.6, 897, 0, 65094, 11111, -0.6484
68473, 105, 0, 65094, 11111, -3.6
8.8, 986754467, 0, 65094, 11111, -0.6484
6045749, 3, 0, 53989, 111111, -1.7557
3.3, 104, 0, 65094, 11111, -0.6484
7.9, 104, 0, 65094, 11111, -0.6484
68473, 0, 0, 65094, 11111, -0.6484
68473, 105, 0, 65094, 11111, -3.7
and the final answer is a data file like this:

outputfile.out
Code:
11676, 105, 0, 65094, 11111, -0.6484
3.3, 104, 0, 65094, 11111, -0.1
6045749, 37, 0, 53989, 111111, -1.7557
68473, 105, 0, 65094, 11111, -3.6
3.3, 104, 0, 65094, 11111, -0.6484
7.9, 104, 0, 65094, 11111, -0.6484
68473, 105, 0, 65094, 11111, -3.7

Last edited by atjurhs; 10-18-2012 at 04:11 PM.
 
Old 10-19-2012, 01:57 AM   #24
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
Yes. Simply extend the logic and for the second file you will need to do something like:
Code:
FNR==1{x=!x}x{...;next}{<last file processed here>}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linking C++ files to C flipflopfrog Programming 8 01-31-2011 08:07 AM
linking some files chewbo Linux - Software 3 01-28-2008 03:35 PM
Linking Library files mickeyboa Fedora 1 10-16-2006 07:12 AM
Linking Files sksom123 Linux - General 2 08-21-2006 02:55 AM
linking files? citrus Linux - Software 4 01-15-2004 06:46 PM


All times are GMT -5. The time now is 03:15 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration