LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Search values within multiple files line to line (http://www.linuxquestions.org/questions/programming-9/search-values-within-multiple-files-line-to-line-672267/)

Chrizzieej 09-25-2008 05:46 AM

Search values within multiple files line to line
 
Hello guys (and girls),

I've got the following script:

Code:

#!/bin/bash

FILE="/opt/test.txt"
FS="#"

while read line
do
        F1=$(echo $line| grep '#00010' | cut -d$FS -f7)
        if [ -z $F1 ] ; then
          exit
        else
          echo $F1
        fi
done < $FILE

In this script it reads one file, but I want to read a lot of files. I've got one directory with multiple files and directories. So this script has to read al those files and process line to line.

I was puzzeling with this script, I've got this:

Code:

#!/bin/bash

FS="#"

for files in `find /opt/. -name "*.txt";
do
        while read line
                do
                        F1=$(echo $line| grep '#00010' | cut -d$FS -f7)
                        if [ -z $F1 ] ; then
                                exit
                        else
                                echo $F1
                        fi
        done < $files
done

This doesn't work, can anyone help me please?!

Thanx,
Chrizzieej

clvic 09-25-2008 06:48 AM

I don't know if it is the only problem, or maybe a mistake while typing, but the row
for files in `find /opt/. -name "*.txt";
is wrong, you forgot a ` and in /opt/. the dot is pointless. That should be
for files in `find /opt -name "*.txt"`;

archtoad6 09-25-2008 07:06 AM

You're on the right track to wrap the while loop which processes lines in the for loop which processes the files; but, at the very least, you forgot the closing '`' in:
Code:

for files in `find /opt/. -name "*.txt";
Code:

### output from my command line:
$ for files in `find /opt/. -name "*.txt";
> do
>        echo $files
> done
>
### ^C to regain control of the xterm

$ for F in `find /opt/. -name "*.txt"`;
> do
>    echo $F
> done
/opt/./grisoft/avg7/doc/license_cz.txt
/opt/./grisoft/avg7/doc/license_us.txt
/opt/./grisoft/avg7/doc/lgpl.txt
/opt/./grisoft/avggui/doc/license_cz_utf8.txt


If I may be so bold, let me offer some suggestions about programming style:
  • Indenting with tabs spreads your code too much, making it hard to read. The point of indenting, even in Python, is to make the code easier for humans to read &, therefore, to understand. I've found that 3 spaces is ideal for (most) bash scripts.

  • The bash convention is to use all upper case for variable names.

  • Long, (lower case), self documenting variable names may work in C, but where all instances of a variable are on 1 page, how hard is it to figure out what it means? If I start a loop w/ "for F in <list_of_files>", does it take a degree in rocket science to figure out that "$F" is the current file?

  • In any case "files" is plural & should not be used to stand for the, singular, current instance. You should at least be able to use "for FILE in $FILES ...".

ghostdog74 09-25-2008 07:16 AM

If you have Python, here's an alternative
Code:

#!/usr/bin/env python
import sys,os
directory="/path/to/start"
for ROOT,DIR,FILES in os.walk(directory):
    for fi in FILES:
        if fi.endswith(".txt"):
            for lines in open(os.path.join(ROOT,fi)):
                if "#00010" in lines:
                    try:
                        f7=lines.split("#")[6]
                    except:           
                        pass
                    else:
                        print "File ", os.path.join(ROOT,fi), " has field 7: ",f7


chrism01 09-25-2008 07:34 PM

@archtoad:

Actually, style is purely subjective :)


But I would say (slightly different)

Bash CONSTANTS in uppercase, other vars in lower-with-underscores.
Re the latter, just about all real progs I've worked on get longer/more complex sooner or later, and consistency is better.

jan61 09-26-2008 04:11 PM

Moin,

about your syntax error the others always wrote. But I'm confused with your read loop: As far as I understand, you want to print field 7 as long as the lines contain the "#00010" character sequence and you want to stop reading the file at the first line, which does not contain this sequence?

Using your solution you are starting 2 sub shells and 2 external programs for each line - a lot of processes for large files!

You should better think about a specialised program like sed or awk (untested):
Code:

awk -F'#' /#00010/ { print $7; getline; }
          { nextfile; } ' `find /opt -name '*.txt' -print`

2 processes in summary for all files.

Jan


All times are GMT -5. The time now is 10:52 PM.