[SOLVED] Build an array/matrix from a concatenated list in bash
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I would like to read line-by line and build a new array/matrix for every change in column 1 (901, 902 and 903 in this case but not necessarily these values) so that i can read each sub-list independently.
how can i go about doing this? and how to call on each sub-list?
When you say "#sub-list," to what are they subordinate?
If they are just independent lists, then the problem is quite simple. For example, in gawk all you'd need is something like this:
Code:
{
list[$1]=(list[$1])?$0:list[$1] SUBSEP $0
}
In bash, this might work:
Code:
#!/bin/bash
declare -A Value
declare -a Label
declare Count=0
function add_element()
{
local element
element="${1}"
shift
if [ -z "${Value[${element}]}" ]
then
((++Count))
Label[${Count}]="${element}"
Value[${element}]="${@}"
else
Value[${element}]="${Value[${element}]}"$'\n'"${@}"
fi
}
# Test code
while read -a line
do
add_element "${line[@]}"
done < eamesi.data
for ((i=1;i<=Count;++i))
do
echo
echo "${Label[${i}]}:"
echo "${Value[${Label[${i}]}]}"
done
For this im using a series of sed and awk to format the rows to columns
Code:
listcount=`wc tempfile.txt | awk '{print $1}'`
while [ $listcount -ge 200 ] ; do
sed -n '1,200 p' tempfile.txt > 1.txt
sed -i '1,200 d' tempfile.txt
listcount=`wc tempfile.txt | awk '{print $1}'`
echo $listcount
paste -d" " 1.txt tempfile.txt > tempfile2.txt
sed 's/ / /g' tempfile2.txt > tempfile.txt
done
and then assigning columns with an array with the following:
Code:
for i in $(eval echo {3..$maxfile}); do
if [[ $((i % 2)) != 0 ]]; then
array+=($i)
fi
done
So that I can perform functions on the columns, unfortunately the sed/awk/paste formatting takes a very long time on larger files, any speedier options?
If you're going to do a lot of data processing, I'd suggest moving up to a language that can what you want, without recourse to the shell eg Perl.
(You could use C, but its not that much quicker than Perl and its a lot more fiddly to program).
Your matrix is confusing or perhaps I'm not really able to think well due to colds. Could you kindly add colors to those numbers so that we could know how they were relocted please? Thanks.
There are many ways but the simplest I think is by continuous concatenating of strings:
Code:
#!/bin/bash
INPUT=/path/to/input_file.ext
OUTPUT=/path/to/output_file.ext
SEP=' '
LINES=()
{
for (( I = 0; I < 200; ++I )); do
read LINE || break
LINES[I]=$LINE
done
I=0
while read LINE; do
LINES[I]=${LINES[I]}${SEP}${LINE}
(( I = (I + 1) % 200 ))
done
} < "$INPUT"
{
for I in "${!LINES[@]}"; do ## or IFS=$'\n' eval "echo \"\${LINES[*]}\""
echo "${LINES[I]}"
done
} > "$OUTPUT"
Last edited by konsolebox; 05-22-2013 at 10:24 PM.
Reason: (a) Missing !. (b) echo "${LINES[I]}" - S was missing. (c) Improperly placed parenthesis in (( I = (I + 1) % 200 )).
# File identification
Path=$(cut -d'.' -f1 <<< ${0})
InFile=$Path"inp.txt"
OutFile=$Path"out.txt"
Work=$Path"w.txt"
n=4 # n = number of output files
l=12 # l = total number of lines to write
let r=l/n # r = ratio
split -d -l $r $InFile $Work
paste -d" " $Work* >$OutFile
#!/bin/gawk -f
function Max(a,b)
{
if (a > b) return a
return b
}
BEGIN {
if (out=="") out="/dev/stdout"
label=0
len=0
}
{
data[$1][++count[$1]]=$2
label=Max(label,length($1))
len=Max(len,length($2))
}
END {
max=0
columns=asorti(count,ordered)
for (i=1;i<=columns;++i) {
max=Max(max,count[ordered[i]])
}
error=0
for (i=1;i<=columns;++i) {
if (count[ordered[i]] < max) {
print "Warning: " ordered[i] " contained only " count[ordered[i]] " entries." > "/dev/stderr"
error=1
}
}
if (error) print "Short data set values will be reported as \"NaN\"" > "/dev/stderr"
fmt=" %-" label+1 " s%" len+1 "s"
for (j=1; j<= max; ++j) {
for (i=1; i<=columns; ++i) {
v = (j <= count[ordered[i]]) ? data[ordered[i]][j] : "NaN"
printf(fmt, ordered[i], v) > out
}
print ""
}
}
Went with grail's awk - much quicker than the sed/paste/awk looping
for the array im doing a count on the file to get the maximum number of columns ($maxfile) and building an array of the odd numbered columns from 3 to $maxfile (to be the y values in a graph, the x value is column 1). the only way ive seen in bash is to use eval with {A..Z}
Code:
# build array for alternate columns
for i in $(eval echo {3..$maxfile}); do
if [[ $((i % 2)) != 0 ]]; then
oddcol=($i)
printf "%4d: %s\n" $i ${oddcol[$i]}
fi
done
problem with this is that the output array is unpopulated
IF I am following (big IF as really not sure), then I have 2 suggestions:
1. Don't use eval as it really is not needed, simply create a standard for loop (see below)
2. Each time you add to the array the index starts at zero and is incremented by one, however, you are calling the array at the same index as the column position, ie at 3 we add to the array the first
value but this will be at index 0 ... hence we would need to call ${oddcol[0]} and NOT ${oddcol[3]} (which is the value of 'i' at this point
So if above is correct:
Code:
j=0
for (( i = 3; i <= maxfile; i+=2 ))
do
oddcol+=($i)
printf "%4d: %s\n" $i ${oddcol[j++]}
done
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.