Bash file filtering (B. Newbie)

fopetesl · 12-14-2006, 12:48 PM

I have this code:

Code:

#!/bin/bash
for file in `ls`
 do
  if [ $file == *.dta ]
  then
#  cp -f $file /var/www/html/scandata.dta
   echo $file " is dta"
  fi
done

I have different types of files in a directory but only want to copy specific ones,(there a quite a few).

I ran the bash debugger "~# bash -x ./pData" which tells me why there are "too many arguments" in line 4 :- the code attempts to copy EVERY file EVERY loop. Actually echo the name to the screen for testing.

How do I get Bash to filter only those files I want to copy?

(Yes. I read through a LOT of posts but nothing has quite the right answer. That I found)

matthewg42 · 12-14-2006, 01:31 PM

Several points:

The string equality operator is = not ==

Code:

if [ "$file" = "myfile" ]; then
    echo "file is myfile"
else
    echo "file is not myfile"
fi

You can't use glob patterns in a test for equality using the = operator - it is just for literal strings. You can however use it in a case statements:
Code:
```
case "$file" in
*.dta)
    echo "yes, $file matches our pattern"
    ;;
*)
    echo "no, $file doesn't match our pattern"
    ;;
esac
```
Instead of `ls`, you can use just a file pattern:
Code:
```
for file in *.dta
do
  ...
done
```
Lastly, you can specify multiple files to cp without having to use a loop at all, and the use -v switch to print verbose messages as you copy each one (assuming scandata.dta is a directory):
Code:
```
cp -v *.dta /var/www/html/scandata.dta
```

fopetesl · 12-15-2006, 02:55 AM

Matthew, thanks.
Firstly, I need to process each *.dta file individually so your last option is informative but not applicable here.

Some of your syntax is confusing so I need to RTFM - I have only used C and Assembler so far.
e.g. you use the " to enclose a variable without prefix $
so I guess Bash picks it up whatever.

Your example #3 works just fine for me.

jschiwal · 12-15-2006, 03:00 AM

Since you only want to process *.dta files, using:
for file in *.dta; do
would work out better.

However, check what you want to do with the file, because you are overwriting the /var/www/html/scandata.dta for each file in the list.

matthewg42 · 12-15-2006, 03:11 AM

Quote:

Originally Posted by fopetesl

Matthew, thanks.
Firstly, I need to process each *.dta file individually so your last option is informative but not applicable here.

Some of your syntax is confusing so I need to RTFM - I have only used C and Assembler so far.

The bash manual page contains everything you need to know, but the manual format isn't really ideal for such a large document. Once you get to know the rough format it's really a valuable reference though.

Shell is quite different from C and assembler. It's a lot cruder than C in many ways. As with any programming, it's just a matter of doing lots of different things an getting a feel for it.

The good thing about shell is that you can use the same commands in the terminal as you do in scripts (mostly). This makes a very nice way to test out commands and little loops etc.

Quote:

Originally Posted by fopetesl

e.g. you use the " to enclose a variable without prefix $
so I guess Bash picks it up whatever.

Your example #3 works just fine for me.

The quoting rules in bash are a little bit weird, but not too bad once you're used to them. "double quotes" let the shell expand lots of things for you like $variable_values, $(sub-shell command executions) and so on. 'single quotes' don't let the shell do anything to the stuff inside the quotes - it's treated as a literal string.

When you see something like *.dta, you should know that it's the shell expanding the pattern first, then using it (e.g. using the list n a loop, or passing the list to a command).

So whet you do

Code:

ls *.dta

...the shell expands the list fist and passes the list to ls. ls never sees the meta-character, *. I only mention it because for a lot of DOS veterans like myself, this was the other way round. It's a moment of epiphany for a lot of people to realise how it's working. Was for me at any rate

fopetesl · 12-15-2006, 03:57 AM

Now we're getting there:

Code:

#!/bin/bash
for file in *.dta
do
 cp -f $file /var/www/html/scandata.dta
 cd ..           # go up one dir to /var/www/html
 ./hbinterpret   # process and save computation
 cd -            # back to original directory
 echo $file      # the file have we just processed
done

Works great.
OK I know that I'm over writing 'scandata' every time but that's exactly what I need.
'hbinterpret' processes the dta info and records it. Then I can analyse later.
There are many dta files.

I find the fact that I can use command line code inside this shell script makes the job much easier. As you say Matthew, I am getting my head round it!

matthewg42 · 12-15-2006, 08:19 AM

Depending on whether or not this hbinterpret program cares about the current working directory setting when it is invoked, you might be able to replace these lines

Code:

 cd ..           # go up one dir to /var/www/html
 ./hbinterpret   # process and save computation
 cd -            # back to original directory

with

Code:

../hbinterpret

Shell scripting is ugly and not really as full featured a language as one might want, but it's dead useful.

For this sort of task it is really ideal. If you find yourself doing data processing an any significant way, you'd probably be better off using sed, perl, awk or something like that. Sed has quite a small instruction set, but is really powerful despite this. I discovered awk first, but then found perl did the same stuff faster, plus a whole lot more.

fopetesl · 12-15-2006, 08:43 AM

Matthew, thanks again.
hbinterpret accesses other files in its own directory so fails on not finding them. When I run ../hbinterpret it junks out.

I'll certainly have a look at perl in the same way I'd like to look at Tcl. I don't know yet whether either or both can have their source protected which I feel is necessary for the proprietary source.

matthewg42 · 12-15-2006, 09:12 AM

Tcl is quite a nice language. Really minimal syntax - you can learn to use it quite OK within in a day.

What's really nice about TCL is expect. expect is well worth a look. It uses TCL as the basic language and adds some really cool stuff. You can automate your whole job with it

There are bindings for expect in other languages too, but I really like the simplicity of TCL/expect.

fopetesl · 12-15-2006, 09:56 AM

Wit a slight leap of faith I note you use Drupal on your site.
That looks good also, especially as Linux Format gives it the thumbs up

See you there.