trying to write shell snip to import CSV data into BASH array

dstrout · 01-05-2011, 08:30 AM

I have been trying to write a simple snip of bash shell code to import from 1 to 100 records into a BASH array.

I have a CSV file that is structured like:

record1,item1,item2,item3,item4
record2,item1,item2,item3,item4
record3,item1,item2,item3,item4
record4,item1,item2,item3,item4

and would like to get this data into corresponding arrays as such:

$record1[item1-4]
$record2[item1-4]
$record3[item1-4]
$record4[item1-4]

Any help is well appreciated.

crts · 01-05-2011, 08:49 AM

Code:

var1=record1,item1,item2,item3,item4
IFS=',' record1=( ${var1} )

And here is some more useful info on arrays:
http://tldp.org/LDP/abs/html/arrays.html

dstrout · 01-05-2011, 09:16 AM

Thanks for the reply, but how do I get the first, second, third, etc. lines of the CSV file into $var1?

So if I issue:
host# cat ./mydatafile.csv
record1,item1,item2,item3,item4
record2,item1,item2,item3,item4
record3,item1,item2,item3,item4
record4,item1,item2,item3,item4
host#

Not sure how to stream this into the var1, var2, var3, var4, etc.

Thanks again all!!

dstrout · 01-05-2011, 09:54 AM

Quote:

Originally Posted by crts

Code:

var1=record1,item1,item2,item3,item4
IFS=',' record1=( ${var1} )

All good stuff ... so how would I get the first element in the CSV file (record1) to become the array name as in $record1[@]?

crts · 01-05-2011, 10:17 AM

Try this

Code:

var1=record1,item1,item2,item3,item4
IFS=','    # This might even not be necessary
set -- ${var1}
eval "$1=( $2 $3 $4 $5)"

I gave you the most difficult part. Now all you need to do is put this in a 'while' loop and parse the file line by line. That's fairly easy. You will find anything you need for that here:
http://tldp.org/LDP/abs/html/index.html
http://tldp.org/LDP/abs/html/loops.html

You might also want to google. Chances are that you'll find an example of how to construct such a loop. But that wouldn't be fun

dstrout · 01-05-2011, 12:48 PM

Following is what I have so far ... and it seems to produce the desired output. But .... this is only a one line CSV file (i.e. record1) and I still am having some trouble getting the first element (record1) set as the array name. I tried to rework the snip with the "eval" suggestion and it didn't seem to work properly.

Sorry for all the newbie ?s people ... I really do appreciate all the help!!!!

#!/bin/bash

testfile="./test-dev/test-global-data-import.csv"

var1=`cat $testfile`

IFS=',' record1=( ${var1} )

echo ""
echo "Here is the data without the comma separators."
echo ""
echo ${record1[*]:0}
echo ""
echo "Here is the data native (with comma separators)."
echo ""
echo "${record1[*]:0}"
echo ""

Nominal Animal · 01-05-2011, 01:54 PM

First of all, what you are doing sounds a bit strange. You might get much better results by if you told us what you wanted to ultimately accomplish; we might know much more efficient approaches. For example, I suspect your next question is 'How do I find out which fields I read in the while loop?'

Please, introduce yourself to the Bash Reference Manual first. At least the introductory chapters. It really is worth the effort.

Also note that GNU Awk is much more suitable for processing tabulated data. It has an atypical approach to processing its input, but after you understand it, it is rather easy and powerful.

However, to encourage you, here is a working example:

Code:

#!/bin/bash

inputfile="./test-dev/test-global-data-import.csv"

OLDIFS="$IFS"
IFS=","
while read NAME VALUES ; do
eval "$NAME=($VALUES)"
done < "$inputfile"
IFS="$OLDIFS"

And here are some notes to help you on your way:

Save and restore the field separator, unless you work with exclusively CSV data.
The while loop reads each line (record), until there is nothing else to read.
Read saves the first field to NAME and all other fields to VALUES. This is how 'read' always works; the last variable will get the rest of the record, even if it contains multiple fields.
eval evaluates the quoted string. What happens is that all variable references are expanded first, then the result is executed just as if it were normal code.
The redirection is done at the end of the while loop, so that the input is available for the entire loop construct, without a subshell. (If you use cat "$inputfile" | while ..., the variables will not be accessible after the loop, because they are assigned in the subshell, not the actual shell.)
If you don't know what a subshell is, please read this and this.
${#record[@]} tells you the number of fields in the array stored in variable record.
${#record} tells you the total number of bytes in the array record.
${record[0]} is the first field in the array, ${record[1]} second and so on. You can use even ${record[N]} where N is a variable.
"${record[@]}" expands each field as a separate quoted strings, "${record[*]}" expands all fields into a single string.

Hope this helps,

Nominal Animal

dstrout · 01-05-2011, 02:30 PM

Code:

#!/bin/bash

inputfile="./test-dev/test-global-data-import.csv"

OLDIFS="$IFS"
IFS=","
while read NAME VALUES ; do
    eval "$NAME=($VALUES)"
done < "$inputfile"
IFS="$OLDIFS"

Thanks for this ... I was in fact doing this with awk before, but found it to cumbersome to change the files for each individual device platform.

Okay, so here is the 10k fly by ....

I have one group of persons inputting data about sites into a spreadsheet and saving it as someinputdata.csv. Then, I have several template files with VAR_1, VAR_2, VAR_3, and so on declared. I am trying to assign the CSV data to array elements so I can pluck out the data I need with a sed statement like this ...

I really thought it was a simple problem to solve. One group of people is collecting data and one group will be mass producing network devices with the cfg files.

+SNIP+

# set the template variable
TEMPLATE="some text file with VAR_1 in it"

# code here to insert the CSV data into the array ....
..... MISSING ......

# set the variable to the array element
VAR_1=$record1[12]

#
cat $TEMPLATE | sed -e "s/VAR_1/$VAR_1/"g | tee ./output/$VAR_1.cfg >/dev/null 2>&1

+SNIP+

Nominal Animal · 01-05-2011, 03:05 PM

Oh, its basically just a record-driven template engine.

Think from the other end instead: When you generate a single new file, which pieces of information you need?
Obviously you need the template for that file. What else, what about the data?
Is it just one record (one line) from the CSV file, or do you need to collate the data from multiple records?

After that is sorted out, it really is simple to extend to generate a batch of files.

Assuming it's just one record, you could do something like this:

Code:

#!/bin/bash

DATAFILE="path-to-the-CSV-file"
TEMPLATE="path-to-the-template"

NAME="The name of the record in the CSV file, i.e. the text in the first field"

# Get only the relevant record from the CSV data
OLDIFS="$IFS"
IFS=","
DATA=(`sed -ne "s|^$NAME,||p" "$DATAFILE"`)
IFS="$OLDIFS"

# Construct a list of corresponding field names.
# These are replaced in the template with corresponding CSV data fields.
# You could even read this from a config file -- or even from the CSV file, if it has a header row.
FIELDS=('FIRST' 'SECOND' 'THIRD' 'FOURTH')

# Construct a sed pattern to replace field names with data fields
PATTERN=""
I=0
for FIELD in "${FIELDS[@]}" ; do

# Data string from the CSV record:
STRING="${DATA[I]}"
I=$[I+1]

# TODO: Escape any characters that might bork up sed.
STRING="${STRING//\\/\\\\}"
STRING="${STRING//&/\\&}"
FIELD="${FIELD//\\/\\\\}"
FIELD="${FIELD//$/\\$}"

# Using | as the separator. Escape those too.
FIELD="${FIELD//|/\\|}"
STRING="${STRING//|/\\|}"

PATTERN="$PATTERN;s|$FIELD|$STRING|g"
done
PATTERN="${PATTERN#;}"

# Output the processed template in one fell swoop.
sed -e "$PATTERN" "$TEMPLATE"

I've only quick-tested the above, but it seems to work okay. You'll probably have to escape more characters than above, otherwise sed will bork. (At least {, }, [, ], and ^.)

Nominal Animal

dstrout · 01-06-2011, 12:55 PM

Theoretically, we will never need data from multiple records. Each record holds all the data for each device being configured. Each site will have multiple devices at it. So ideally we would like to pick different data out of the record based on what model of device we are configuring - having only one CSV source.

Example:
DEVICES:
device model A
device model B
device model C

CSV data:
RECORDS:
sitename,var1,var2,var3,var4,var5,var6,var7,var8,var9,var10

So lets say for instance that all three devices (A,B,C) need the "sitename" in their configuration.
device-model-A only will use var1,2,3,7,9 in its configuration
device-model-B will only use var1,3,5,7,9 in its configuration
device-model-C will only use var10 in its configuration

and to top it all off the output configuration files should carry the sitename.cfg file name.

We had been doing it with a separate CSV file and a separate temple for each device model (I realize that the template file will always be different based on the device model). I could only pull off the most basic of sed commands within a script and could never seem to be able to put "ALL" information in one CSV source and execute a script against it to get the required data out.

Thank you for all of your help on this ... I will keep plugging and learning as I go. You've given me much to look at - THANKS AGAIN!!!!

spezticle · 01-07-2014, 02:49 PM

woops. didn't eman to put this here