using sed or grep to extract stuff from a text file

DEF. · 12-11-2009, 01:46 PM

I need to get some information out of a text file. The information is of the following format, for example:

# TOKEN: some.label, /folder/folder1/etc..., /folder/folder2/etc..., 2

There may be many of these lines in the text file. Finding the lines is easy as all begin with # TOKEN: followed by four comma separated values: a label, a folder1, a folder2, a numeric.

How would I extract each line and then process each line in turn using the comma separated values in a bash script? Perhaps using sed or grep and which would be better?

pseudo example:

read textfile
for each line in textfile
if line contains "# TOKEN:"
process(line.get(label), line.get(folder1), line.get(folder2), line.get(numeric))
end if
end loop

Obviously using grep/sed this logic may differ, but you get the idea of the function required.

Thanks

bartonski · 12-11-2009, 02:24 PM

Quote:

Originally Posted by DEF.

I need to get some information out of a text file. The information is of the following format, for example:

# TOKEN: some.label, /folder/folder1/etc..., /folder/folder2/etc..., 2

There may be many of these lines in the text file. Finding the lines is easy as all begin with # TOKEN: followed by four comma separated values: a label, a folder1, a folder2, a numeric.

How would I extract each line and then process each line in turn using the comma separated values in a bash script? Perhaps using sed or grep and which would be better?

Actually, this is a task that awk is ideally suited for: the basic unit of computation in awk is a field. By default, fields are white-space separated, but they can be comma separated. I would google around for some basic awk documentation, and see what you can't learn that way.

H_TeXMeX_H · 12-11-2009, 02:45 PM

You can do all that in awk if you like.

Try:
http://www.grymoire.com/Unix/Awk.html

ghostdog74 · 12-11-2009, 07:18 PM

with bash , you can do something like this

Code:

declare -a save
while read -r line
do
    IFS=","
    case "$line" in
    *#*TOKEN*)
        set -- $line
        # save $@
        save=($@)
        for i in "${save[@]}"
        do
            echo "-->"$i
        done
    esac
done <"file"

or you can just use awk

Code:

awk -F"," '/#.*TOKEN.*/{print "first field is: "$1}'  file

I don't recommend sed/grep to parse csv files.

DEF. · 12-12-2009, 09:57 AM

Thanks all. I have never used AWK, but given your advise I will.

catkin · 12-12-2009, 10:13 AM

Quote:

Originally Posted by DEF.

Thanks all. I have never used AWK, but given your advise I will.

In terms of power gained for work done learning it, awk is the best language I have ever learned -- way more intuitive than shellscript or PERL for example. It may have helped that I had once known C.

Here are some awk links