LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 04-02-2007, 02:59 AM   #1
ParaDoX667
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Rep: Reputation: 0
Question BASH array / sed questions


Hi friendly people of linuxquestions...

I'm trying at present to write a bash script that will read a list of files with a certain extension into an array, and then parse the contents of that array to a sed command.

The aim is to read in the filename from the original file run through a range of processes (SED / GAWK - for data extraction) and use the original filename (from the array) as naming through the entire process.

EG:
ATV_1234_a_wm.wag (original filename)
ATV_1234_a_wm.sed1 (after first sed process)
ATV_1234_a_wm.gwk1 (after first gawk process)
ATV_1234_a_wm-final.wag (after all files completed)
(after all the processing the sed1 / gwk1 files are removed leaving me with only the original + final.wag files)

The reason behind this is I have 1000's of files to process specific data out of and they are all named with the same naming convention (the 1234 part changes), I would like to automate my entire process so that all I need to do is run my script, select the data to be extracted (this part is already working) and process through and output the required additional file.

Is there a way to
a) easily read the filenames into an array
b) parse the array contents to sed / gawk to be used as filenames

Any help appreciated so I don't spend the next 6 months tearing my hair out.

Many Many Many thanks for anyone that can help.
Cheers!
 
Old 04-02-2007, 03:35 AM   #2
omnio
Member
 
Registered: Feb 2007
Location: $HOME
Distribution: Hardened Gentoo
Posts: 66
Blog Entries: 1

Rep: Reputation: 16
Maybe I miss the point here, but why is it necessary to use arrays and why should sed & awk be run again on the array? Why not use something very simple, like:
Code:
#!/bin/bash
# myscript

fnc() {
    cd $1 
    for file in * ; do
        shortfile="${file%%.*}"

        ... sed the "${file}" file and output to "${shortfile}.sed1"
        ... awk the "${shortfile}.sed1" file and output to "${shortfile}.gwk1"

        rm "${shortfile}.sed1"
        mv "${shortfile}.gwk1" "${shortfile}-final.wag"
    done
}

fnc $1
And launch it like:
Code:
./myscript some-directory

Last edited by omnio; 04-02-2007 at 04:27 AM.
 
Old 04-02-2007, 04:15 AM   #3
ParaDoX667
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Original Poster
Rep: Reputation: 0
Omnio - thanks for the reply.

I'll give that a try and see what happens.
If I can read the filename in, use that to output a new file without an array I will be very happy.

(as long as the processing through SED / GAWK doesn't fail)

Cheers.
 
Old 04-02-2007, 04:19 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729
Quote:
I'm trying at present to write a bash script that will read a list of files with a certain extension into an array, and then parse the contents of that array to a sed command.
SED does not read arrays, it reads lines. To have SED operate on a list of file names, they would have to be in a file (or a stream).

For example:
Suppose you have a directory "stuff" with your files in it.
ls stuff >namelist Puts all the file names into a new file "namelist"
Then (eg)
cat namelist|sed s/1234/5678/g >newnamelist would replace all the "1234" strings with "5678"

To use namelist to tell SED which files to operate on the contents, then you could use AWK to get the specific filename from namelist and pass it to SED (directly or thru cat) for processing. (AWK would be told to use the newline for the field separator)

Last edited by pixellany; 04-02-2007 at 04:21 AM.
 
Old 04-02-2007, 06:25 PM   #5
ParaDoX667
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks pixellanny,

I think i'm going to give up on batch processing with this script it's driving me nuts.

I thought it would be relatively easy to generate a file list of original filenames, pass each of those (one at a time) to the appropriate SED / Gawk commands extracting data for me, use the original filename to output a new file and then start on the next file in the array.

I guess I was too ambitious ....

*Cheers for the help all*
 
Old 04-02-2007, 06:56 PM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655
You could use a for loop to read in the filenames.
Code:
for file in $(cat filelist); do
  sed '<sed-command>' "${file}" >"${file}".sed1
  awk '<awk-command>' "${file}.sed1" >"${file.gwk1}"
  ...
done
If you are using awk, you may be able to have awk commands do the same thing as sed did. Also, since the input of one comes from the output of the other, you can use a pipe (as already suggested) which eliminates the need for an intermittant file.

Since the input files follow a strict pattern, "ATV_1234_a_wm.wag (original filename)". Using wildcards is easy and you don't need a list. However if the "-final.wag" is kept in the same directory, you might want to test for its existance.

Code:
for file in ATV_[[:digit:]][[:digit:]][[:digit:]][[:digit:]]_a_wm.wag; do
  if [ -f "${file%.wag}-final.wag" ]; then continue; fi
  ...  # processing instructions
  done
Use with care. Untested.


In effect, using the filename patterns, you are creating a list of the files you need to process without needing to create a filelist in the first place. This is one less manual process, which hopefully will make your life easier and eliminate one potential source of error due to missing items or typos.

Lastly, I wanted to add something to watch for. If you do have a variable or array containing the number part of the files, be sure to use double quotes around the variable when using it. Otherwise, leading zero's will be dropped.

Last edited by jschiwal; 04-02-2007 at 06:59 PM.
 
Old 04-02-2007, 08:13 PM   #7
ParaDoX667
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Original Poster
Rep: Reputation: 0
jschiwal warm pizza and beer for you:

IT ALL WORKS GREAT!

THANK YOU THANK YOU THANK YOU!

Last edited by ParaDoX667; 04-02-2007 at 08:46 PM.
 
Old 04-03-2007, 05:54 PM   #8
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by jschiwal
You could use a for loop to read in the filenames.
Code:
for file in $(cat filelist); do

That is not a safe way to read a file. It will break if any filenames contain spaces or other pathological characters.

Quote:
Code:
  sed '<sed-command>' "${file}" >"${file}".sed1
  awk '<awk-command>' "${file}.sed1" >"${file.gwk1}"
  ...
done

Why bother with intermediate files? Why not pipe the output of sed directly into awk?
Code:
while IFS= read -r file
do
   sed '<sed command?' "$file" |
    awk '<awk-command>' |
     whatever > "$file_final.wag"
done < filelist
 
Old 04-04-2007, 06:12 AM   #9
omnio
Member
 
Registered: Feb 2007
Location: $HOME
Distribution: Hardened Gentoo
Posts: 66
Blog Entries: 1

Rep: Reputation: 16
Quote:
Originally Posted by cfaj
That is not a safe way to read a file. It will break if any filenames contain spaces or other pathological characters.
I have this problem whenever I try to assign filenames to arrays. Do you know of any character which is generally not accepted in filenames and which I can switch the $IFS to? (unfortunately ":" is accepted).
 
  


Reply

Tags
bash, file, rename


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
array in bash ramesh_manu Linux - Newbie 1 02-24-2007 12:19 PM
bash script with grep and sed: sed getting filenames from grep odysseus.lost Programming 1 07-17-2006 12:36 PM
Array Help in BASH! ?*%$ johnnybhoy67 Linux - Software 2 02-22-2006 11:39 AM
bash - sed/tr??? pk21 Programming 2 09-05-2003 08:56 AM
bash - sed pk21 Programming 6 03-07-2003 12:02 PM


All times are GMT -5. The time now is 06:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration