Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I tried Romagnolo's script. I'm not advanced enough to understand it completely, maybe I'll study it some tomorrow.
I've finally wrote a script that works, more or less (this part was only a small part of the script), however the script takes forever to execute, I mean 2 hours on my netbook. I think the problem is the loop (while):
Writing into a text file seems to take a long, long time. Is there a better way of doing this? Via a variable? Some sort of pipe or something? Perhaps Romagnolo's script is would be faster. I don't know. But I'd have to adapt it to my script.
I'm really enjoying learning how to use all of these command line tools. Many thanks for all of your response.
sundialsvcs, PTrenholme, sorry, I haven't tried yours out yet. The problem is that I'm not all that advanced yet, but I am going to study your scripts.
Whizje, I have another question. The words are going to be slightly different, my script doesn't work completely when trying to populate my database.
The format of words will be:
Code:
"word" n v n n v
"multiword expression" n v v n
"another different multiword expression" a d d a
etc.
In other words, the word (or "entry", because there will be multiword expressions) part will be between double quotes and could contain spaces, the part of speech part (v v v n d d a) remains the same.
I can't figure out how to get the entire line except what's in quotes. I've learned "grep -Ev ...", but that would exclude an entire line containing what I don't want.
I'll get through this script eventually. Meanwhile, I'm learning a lot and enjoying it.
A few questions are the input lines coming from a file or from a variable and over how many input lines are we talking.
and you don't need to use a temp file you can use a variable for it
Code:
str3="$word $pos_info"
and are the input lines as long as in the examples you gave.
"word" n v n n v
"multiword expression" n v v n
"another different multiword expression" a d d a
etc.
In other words, the word (or "entry", because there will be multiword expressions) part will be between double quotes and could contain spaces, the part of speech part (v v v n d d a) remains the same.
Consider this approach.
Input file:
Code:
"Frank" a a d v v d v n
"Frank Lloyd" v v v v v
"Frank Lloyd Wright" v v d v n n v d n
Code:
Code:
# sed to put quoted string in a temporary file
sed 's/\(.*" \).*/\1/' $InFile > $Work01
# cut to remove quoted string
# awk to sort horizontally
# tr to squeeze out duplicates
# paste to restore quoted string to each line
cut -d'"' -f3- $InFile \
|awk '{split($0,a); asort(a); for(i=1;i<NF;i++){printf("%s",a[i])} print ""}' \
|tr -s [:alpha:] \
|paste -d '' $Work01 - > $OutFile
Output file:
Code:
"Frank" adnv
"Frank Lloyd" v
"Frank Lloyd Wright" dnv
Daniel B. Martin
Last edited by danielbmartin; 03-10-2012 at 02:04 PM.
Reason: Improve code example
Thank you Daniel for your solution. It doesn't work on my end, I've altered the code to feed an infile and to execute from a script:
Code:
# sed to put quoted string in a temporary file
#Work01=""
sed 's/\(.*" \).*/\1/' $1 > $Work01
# cut to remove quoted string
# awk to sort horizontally
# tr to squeeze out duplicates
# paste to restore quoted string to each line
cut -d'"' -f3- $1 \
|awk '{split($0,a); asort(a); for(i=1;i<NF;i++){printf("%s",a[i])} print ""}' \
|tr -s [:alpha:] \
|paste -d '' $Work01 - > output.txt
This is the error I get:
Quote:
$ ./dannyscode.sh inputFile.txt
./dannyscode.sh: line 3: $Work01: ambiguous redirect
$
I noticed you altered the code once or twice. I tried other versions you had put too but got a similar message.
#!/bin/bash
filename="$@"
declare -a arr
total=""
str3=""
while read line
do
IFS='"' # set word splitting from space to "
arr=($line) # copy var line to array arr
IFS=' ' # reset IFS to space
# arr[1] contains now the quoted words
# and arr[2] contains the letters
# \" put quotes in str3 they where lost when we copied $line to the array array
# ${arr[1]} copy the quoted words to str3
# \" put quotes in str3 so the words are quoted again
# $(.......) command substitution execute command and use resulting string
# echo -e
# ${arr[2]//' '/'\n'} replace space with return so sort can use the input
# |sort -u sort the letters and delete duplicates
# |tr -d '\n' delete the new lines
# $'\n' add 1 newline
str3="\"${arr[1]}\"$(echo -e ${arr[2]//' '/'\n'}|sort -u|tr -d '\n')"$'\n'
total="$total$str3" # add result to total
done < $filename
echo -e $total # print total
example
Code:
bash-4.1$ cat names.txt
"word" n v n n v
"multiword expression" n v v n
"another different multiword expression" a d d a
bash-4.1$ wlsort names.txt
"word" nv
"multiword expression" nv
"another different multiword expression" ad
12000 lines on a phenom X4 3.4 GHz took
Code:
real 1m13.496s
user 0m51.104s
sys 0m7.264s
Last edited by whizje; 03-11-2012 at 02:13 PM.
Reason: extra info
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.