LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-21-2013, 09:26 AM   #1
eamesj
Member
 
Registered: May 2006
Posts: 54

Rep: Reputation: 1
Build an array/matrix from a concatenated list in bash


Hi all

I have a concatenated list in a file that i would like to split up and build an array/matrix from.

i.e. for the list

Code:
901  0.0001618 #sub-list 1
901 -0.0083606
901 -0.0060424
902 -0.0006518 #sub-list 2
902 -0.0006474
902 -0.0006474
907  0.0001615 #sub-list 3
907 -0.0093895
907 -0.0090656
I would like to read line-by line and build a new array/matrix for every change in column 1 (901, 902 and 903 in this case but not necessarily these values) so that i can read each sub-list independently.

how can i go about doing this? and how to call on each sub-list?

Thnaks
 
Old 05-21-2013, 10:28 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
What is to be stored in each element of the array?

What language are you reading the file with?

What have you done so far to help solve this issue?
 
Old 05-21-2013, 01:24 PM   #3
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
When you say "#sub-list," to what are they subordinate?

If they are just independent lists, then the problem is quite simple. For example, in gawk all you'd need is something like this:
Code:
{
  list[$1]=(list[$1])?$0:list[$1] SUBSEP $0
}
In bash, this might work:
Code:
#!/bin/bash
declare -A Value
declare -a Label
declare Count=0
function add_element()
{
  local element
  element="${1}"
  shift
  if [ -z "${Value[${element}]}" ]
  then
    ((++Count))
    Label[${Count}]="${element}"
    Value[${element}]="${@}"
  else
    Value[${element}]="${Value[${element}]}"$'\n'"${@}"
  fi
}
# Test code
while read -a line
do
  add_element "${line[@]}"
done < eamesi.data
for ((i=1;i<=Count;++i))
do
  echo
  echo "${Label[${i}]}:"
  echo "${Value[${Label[${i}]}]}"
done
Running that last code for your sample yields:
Code:
$ bash eamesi

901:
0.0001618 #sub-list 1
-0.0083606
-0.0060424

902:
-0.0006518 #sub-list 2
-0.0006474
-0.0006474

907:
0.0001615 #sub-list 3
-0.0093895
-0.0090656
 
Old 05-22-2013, 05:25 AM   #4
eamesj
Member
 
Registered: May 2006
Posts: 54

Original Poster
Rep: Reputation: 1
ok, so, this is an evolving project ...

what i have, a list of 200 lines, each containing a file and a value :

Code:
901  0.0001618
901 -0.0083606
901 -0.0065674
...
901 -0.0006485 # 200th row
902 -0.0060424
902 -0.0006518
..
902 -0.0006474 # 400th row
903 -0.0006518
903  0.0006518
903 -0.0006518
etc..
what i am generating
Code:
901  0.0001618 902 -0.0006485 903 -0.0006518
901 -0.0083606 901 -0.0060424 903 -0.0006518
901 -0.0065674 902 -0.0006518 903 -0.0006518
...
901 -0.0006485 902 -0.0006474 903 -0.0065674 # 200th row
For this im using a series of sed and awk to format the rows to columns
Code:
listcount=`wc tempfile.txt | awk '{print $1}'`
while [ $listcount -ge 200 ] ; do
	sed -n '1,200 p' tempfile.txt > 1.txt
	sed -i '1,200 d' tempfile.txt
	listcount=`wc tempfile.txt | awk '{print $1}'`
	echo $listcount
	paste -d" " 1.txt tempfile.txt > tempfile2.txt
	sed 's/  / /g' tempfile2.txt > tempfile.txt
done
and then assigning columns with an array with the following:
Code:
for i in $(eval echo {3..$maxfile}); do
	if [[ $((i % 2)) != 0 ]]; then
	    array+=($i)
	fi
done
So that I can perform functions on the columns, unfortunately the sed/awk/paste formatting takes a very long time on larger files, any speedier options?

Last edited by eamesj; 05-22-2013 at 05:27 AM.
 
Old 05-22-2013, 05:40 AM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,358

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
If you're going to do a lot of data processing, I'd suggest moving up to a language that can what you want, without recourse to the shell eg Perl.
(You could use C, but its not that much quicker than Perl and its a lot more fiddly to program).
 
Old 05-22-2013, 06:13 AM   #6
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Your matrix is confusing or perhaps I'm not really able to think well due to colds. Could you kindly add colors to those numbers so that we could know how they were relocted please? Thanks.
 
Old 05-22-2013, 07:22 AM   #7
eamesj
Member
 
Registered: May 2006
Posts: 54

Original Poster
Rep: Reputation: 1
Sorry, think a typo has made it confusing...

Code:

901  0.1111111
901 -0.1111111
901 -0.1111111
...
901 -0.1111111 # 200th row
902 -0.2222222 # 201st row
902 -0.2222222
..
902 -0.2222222  # 400th row
903 -0.3333333 
903  0.3333333
..
903 -0.3333333 # 600th row
etc..
becomes
901 0.1111111 902 -0.2222222 903 -0.3333333
901 -0.1111111 902 -0.2222222 903 0.3333333
901 -0.1111111
etc ...
901 -0.1111111 902 -0.2222222 903 -0.3333333# 200th row

Last edited by eamesj; 05-22-2013 at 07:25 AM.
 
Old 05-22-2013, 08:13 AM   #8
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
There are many ways but the simplest I think is by continuous concatenating of strings:
Code:
#!/bin/bash

INPUT=/path/to/input_file.ext
OUTPUT=/path/to/output_file.ext
SEP=' '
LINES=()

{
    for (( I = 0; I < 200; ++I )); do
        read LINE || break
        LINES[I]=$LINE
    done

    I=0

    while read LINE; do
        LINES[I]=${LINES[I]}${SEP}${LINE}
        (( I = (I + 1) % 200 ))
    done
} < "$INPUT"

{
    for I in "${!LINES[@]}"; do  ## or IFS=$'\n' eval "echo \"\${LINES[*]}\""
        echo "${LINES[I]}"
    done
} > "$OUTPUT"

Last edited by konsolebox; 05-22-2013 at 10:24 PM. Reason: (a) Missing !. (b) echo "${LINES[I]}" - S was missing. (c) Improperly placed parenthesis in (( I = (I + 1) % 200 )).
 
Old 05-22-2013, 09:30 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Ok ... so far it all looks like a nightmare to me (and I do not have a cold (yet)).

If I understand correctly:

1. You have a single file 1000's of lines long and every 200 lines (or maybe some arbitrary value) the value in the first column changes

2. Take said file and reformat so every line his a concatenation of lines at the same position within each grouping (here every 200)

If correct the above is what provides the output shown in post #4 (correct?)

The part I do not understand, assuming above is correct, is:
Quote:
and then assigning columns with an array
Would you please elaborate on what is meant by this line? What exactly is being placed in an array? (ie what data)
 
Old 05-22-2013, 09:50 AM   #10
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by eamesj View Post
Sorry, think a typo has made it confusing...
It's still confusing. I simplified the input file to this shorter version ...
Code:
901  0.1111111
901 -0.1111112
901 -0.1111113
902 -0.2222221
902 -0.2222222
902 -0.2222223
903 -0.3333331
903  0.3333332
903 -0.3333333
904 -0.4444441
904  0.4444442
904 -0.4444443
... this code ...
Code:
# File identification
 Path=$(cut -d'.' -f1 <<< ${0})
 InFile=$Path"inp.txt"
OutFile=$Path"out.txt"
   Work=$Path"w.txt"

n=4       # n = number of output files
l=12      # l = total number of lines to write
let r=l/n # r = ratio
split -d -l $r $InFile $Work
paste -d" " $Work* >$OutFile
... produced this OutFile ...
Code:
901  0.1111111 902 -0.2222221 903 -0.3333331 904 -0.4444441
901 -0.1111112 902 -0.2222222 903  0.3333332 904  0.4444442
901 -0.1111113 902 -0.2222223 903 -0.3333333 904 -0.4444443
Daniel B. Martin

Last edited by danielbmartin; 05-22-2013 at 04:00 PM. Reason: Improved code
 
Old 05-22-2013, 11:58 AM   #11
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Here's another gawk program:
Code:
#!/bin/gawk -f
function Max(a,b)
{
  if (a > b) return a
  return b
}
BEGIN {
  if (out=="") out="/dev/stdout"
  label=0
  len=0
}
{
  data[$1][++count[$1]]=$2
  label=Max(label,length($1))
  len=Max(len,length($2))
}
END {
  
  max=0
  columns=asorti(count,ordered)
  for (i=1;i<=columns;++i) {
    max=Max(max,count[ordered[i]])
  }
  error=0
  for (i=1;i<=columns;++i) {
    if (count[ordered[i]] < max) {
      print "Warning: " ordered[i] " contained only " count[ordered[i]] " entries." > "/dev/stderr"
      error=1
    }
  }
  if (error) print "Short data set values will be reported as \"NaN\"" > "/dev/stderr"
  fmt=" %-" label+1 " s%" len+1 "s"
  for (j=1; j<= max; ++j) {
    for (i=1; i<=columns; ++i) {
      v = (j <= count[ordered[i]]) ? data[ordered[i]][j] : "NaN" 
      printf(fmt, ordered[i], v) > out
    }
    print ""
  }
}
Using Mr. Martin's data, that produces:
Code:
$ ./eameri eameri.data 
 901   0.1111111 902  -0.2222221 903  -0.3333331 904  -0.4444441
 901  -0.1111112 902  -0.2222222 903   0.3333332 904   0.4444442
 901  -0.1111113 902  -0.2222223 903  -0.3333333 904  -0.4444443
Note that the code does not require that the data be in any fixed order, and that it reports a non-fatal error if each set is not of the same length.
 
Old 05-22-2013, 01:02 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well assuming the order is present, the initial part is relatively easy:
Code:
awk '$1 != x{i=1;x=$1}{line[i] = line[i] (line[i]?FS:"") $0;i++}END{for(j in line)print line[j]}' file
But I would still need more details on the array part??
 
Old 05-23-2013, 03:23 AM   #13
eamesj
Member
 
Registered: May 2006
Posts: 54

Original Poster
Rep: Reputation: 1
Thanks guys,

Went with grail's awk - much quicker than the sed/paste/awk looping

for the array im doing a count on the file to get the maximum number of columns ($maxfile) and building an array of the odd numbered columns from 3 to $maxfile (to be the y values in a graph, the x value is column 1). the only way ive seen in bash is to use eval with {A..Z}

Code:
# build array for alternate columns
for i in $(eval echo {3..$maxfile}); do
	if [[ $((i % 2)) != 0 ]]; then
	    oddcol=($i)
		printf "%4d: %s\n" $i ${oddcol[$i]}
	fi
done
problem with this is that the output array is unpopulated
Code:
   3:
   5:
   7:
   9:
  11:
  13:
  15:
  17:
  19:
  21:
  23:
  25:
  27:
  29:
  31:
  33:
  35:
  37:
  39:
  41:
  43:
  45:
  47:
  49:
How can i populate this array with the index?
Code:
   3:3
   5:5
   7:7
   9:9
  11:11
etc..
 
Old 05-23-2013, 04:15 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
IF I am following (big IF as really not sure), then I have 2 suggestions:

1. Don't use eval as it really is not needed, simply create a standard for loop (see below)

2. Each time you add to the array the index starts at zero and is incremented by one, however, you are calling the array at the same index as the column position, ie at 3 we add to the array the first
value but this will be at index 0 ... hence we would need to call ${oddcol[0]} and NOT ${oddcol[3]} (which is the value of 'i' at this point

So if above is correct:
Code:
j=0
for (( i = 3; i <= maxfile; i+=2 ))
do
    oddcol+=($i)
    printf "%4d: %s\n" $i ${oddcol[j++]}
done
 
Old 05-23-2013, 04:35 AM   #15
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Quote:
Originally Posted by grail View Post
Well assuming the order is present, the initial part is relatively easy:
Code:
awk '$1 != x{i=1;x=$1}{line[i] = line[i] (line[i]?FS:"") $0;i++}END{for(j in line)print line[j]}' file
But I would still need more details on the array part??
grail excuse me but i don't see a part where it cycles back in the array after 200 lines?
Quote:
Originally Posted by eamesj View Post
the only way ive seen in bash is to use eval with {A..Z}
I would have a big guess that you haven't seen or didn't bother to check my post at all?

Last edited by konsolebox; 05-23-2013 at 04:38 AM.
 
  


Reply

Tags
array, bash, matrix, scripting



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Perform an action on a list/array of variables from user input in bash eamesj Programming 1 12-30-2012 10:22 PM
[SOLVED] python : NumPy : Array, Matrix Formatting cin_ Programming 2 08-05-2011 07:56 PM
BASH-Constructing array from large numerical matrix .txt hippotonic Linux - Newbie 8 12-13-2009 07:24 PM
An array matrix problem in a class Asuralm Programming 4 12-06-2007 09:09 AM
double matrix array in c alaios Programming 3 09-15-2005 11:34 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration