Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
and so on up to array[1999]. I am trying to write these columns into a file (basically get a matrix out of it with zeros added where needed to adjust the length of each column); so my file will look like this (or its transpose; doesnt really matter):
268 950 438 ...
11132 1140 492 ...
11953 1324 987 ...
12097 4231 0 ...
12932 4382 0 ...
13499 5197 0 ...
...
... .... ... .. .
is there a way i can do this? i have been looking all around the web, turned into a webwiz but havent solved my prob yet... should i use echo with IFS fixed as newline? i dont know.
you are right... it is that time of the year, time of projects...but this only the beginning part of my homework!! ;-((
mmm...as my deeds... i thought of something like this to begin with:
for ((i=0; i<$((${arr[@]})); i++))
echo $arr[i] > file
tr '\n' ' ' < file
done
oh, also if i put double quotes in echo, i get sth different, but still not what i want:
bash-3.2$ echo "${antar[@]}" > file
bash-3.2$ cat file
12
13
15
17 22
23
25
27 32
33
35
37 42
43
45
47
IFS=$'\n'; arr=( $( < arts1.tsv ) );
for ((i=0; i<$((${#arr[@]})); i++)); do echo "${arr[$i]}" | tr "[[:space:]]" '\n' | grep -v '^$' | sort | uniq >> ar; sed -i -e 's/^/ /;s/$/ /' ar; while read line <&3; do sed -i -e "s/[[:space:]]$line[[:space:]]/ /g" ar; done 3<stopwords; cat ar | sed "s/[[:space:]]//g" | grep -v '^$' > aar; mv aar ar; num="1"; while read line <&5; do while read LINE <&4; do if [ "$line" == "$LINE" ]; then echo "$num" >> artres; fi; done 4<ar; ((num++)); done 5<word.tsv; artarr[$i]=`cat artres`; rm ar; rm artres; done
It basically reads different articles listed in "arts1.tsv", then eliminates some unimportant words such as a, an, the,... using "stopwords" file. Then, compares those words in "arts1.tsv" articles (now listed in "ar" file) to a file "word.tsv" (contains a lot of words - sth like a library of words). i need to find out which article has which words from the file "word.tsv". So this file that I wrote at the begining of the thread has the indices of words in "word.tsv" contained in 1st article of "arts1.tsv", 2nd... and 2000th article of "arts1.tsv"
now I need to do some statistical analysis on this matrix and i need to pass to matlab.
I hope my explanation ins understandable!
thx.
put your code in code tags. don't expect me(us) to read your code like that right? also, show samples of relevant files and describe your output. finally, a suggestion, since you are doing matlab, why don't you do everything in matlab,especially if you are dealing with matrixes. For reading files you can use LOAD (SAVE) or dlmread() ?? check the matlab docs for more.
Last edited by ghostdog74; 12-16-2009 at 12:14 AM.
IFS=$'\n'; arr=( $( < arts1.tsv ) ); # arr is an array; where arr[0] contains the first article ; arr[1] the second, etc
for ((i=0; i<$((${#arr[@]})); i++)); do # i = 1 up to arr length(=2000)
echo "${arr[$i]}" | tr "[[:space:]]" '\n' | grep -v '^$' | sort | uniq >> ar;
# here i just put each word into a new line, so i have a column of words in the article; i do this for each article and put the result in file "ar"; i remove "ar" at the end of the loop later on
sed -i -e 's/^/ /;s/$/ /' ar; # then i add a sapce to the begin and end of eah line(=each word) of "ar"
while read line <&3; do # this is the part where i take out those unimportant words
sed -i -e "s/[[:space:]]$line[[:space:]]/ /g" ar;
done 3<stopwords;
cat ar | sed "s/[[:space:]]//g" | grep -v '^$' > aar;
#then i take out that space from the begin and end of remaining words of "ar", kill all empty line and put these new words into a new file "aar"
mv aar ar; # remove "aar" to "ar"
num="1";
while read line <&5;
do while read LINE <&4;
do if [ "$line" == "$LINE" ];
then echo "$num" >> artres; # i read words from "word.tsv", compare them to words in "ar", and if its a match i get the indice of that word in "word.tsv" and put it in "artres" file; line by line. I get a column of indices.
fi;
done 4<ar;
((num++));
done 5<word.tsv;
artarr[$i]=`cat artres`;
# i put this file, "artres", containing indices of words in" words.tsv" matching words from ith (for loop) article of "arts1.tsv" into ith element of "artarr" array
rm ar; rm artres;
done
Well, you've got the code tags sorted, but no-one is going to read that 'one-liner'; it's too hard.
How about a sane layout eg each newline of code gets it's own line ?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.