LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-27-2010, 03:41 PM   #1
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Rep: Reputation: 0
Lightbulb Delete lines from a file by their's length


Hello, i've got a file with sorted words - one on each line.
How could it be possible to delete thouse lines that have words of length 1 or 2 (1-2 letters). I guess a good way it will be with AWK, n its fuction length(), but getting it, i dont know how to delete those very lines..
THANKS in advance !!!

Last edited by dayamoon; 04-27-2010 at 03:44 PM.
 
Old 04-27-2010, 03:46 PM   #2
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
Welcome to LQ.

Is it your homework?
Can you post a sample of your file? What are other lines? longer or empty?
 
Old 04-27-2010, 03:55 PM   #3
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
ok... its a part of a bigger project.. ive got some files from a folder, deleting any chars except letters, stemming it, delete the stopwords.. n i got to count the different words, but it still remains some gabbage like 1 word letters.. taht i want to remove..
 
Old 04-27-2010, 03:58 PM   #4
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
if [ -d "$*" ]
then
cat "$1"/*.txt > f
file=f
fi
sed "s/--/ /g" < "$file" | sed "s/[-'_]//g" | sed "s/[0-9]//g"| tr -d '=:;-_|"<>.,?!@#*&^[](){}' | tr "[A-Z]" "[a-z]" > ff

./stop.txt "ff" stopwords.txt ffs.txt
gcc stemming.c -o stem; ./stem "ffs.txt"
tr -s "\ " "\n" <"ffs.txts" |grep -v '^$' | sort | uniq -c > Index/Vocabulary.txt

it goes something like this..... but in the Vocabulary.txt i still want to remove those 1-letter word lines

Last edited by dayamoon; 04-27-2010 at 03:59 PM.
 
Old 04-27-2010, 03:59 PM   #5
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
Can you please put some effort in writing correctly? If I understand your second post correctly, you can do it with 'sed' and the answer is on this page:
http://sed.sourceforge.net/sed1line.txt
 
Old 04-27-2010, 04:01 PM   #6
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
Wouldn't:
Code:
sed -n '/^.\{3\}/p' vocabulary.txt
do the trick?

If you want to make changes permanent, just add the '-i' file.
 
Old 04-27-2010, 04:11 PM   #7
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
Ahhh.. OK i guess i wasn't so clear.. Vocabulary.txt contains also the word frequency... so having 4266 a (a is 4266 times in the file), Sed didn't delete it, maybe because im using bash shell?!?!?
 
Old 04-27-2010, 04:14 PM   #8
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
Quote:
Originally Posted by dayamoon View Post
Ahhh.. OK i guess i wasn't so clear.. Vocabulary.txt contains also the word frequency... so having 4266 a (a is 4266 times in the file), Sed didn't delete it, maybe because im using bash shell?!?!?
Bash is the standard shell in linux and 'sed' is a small tool that runs in Bash.

The sed command prints all the lines that are longer than 2 characters. Isn't that what you wanted to achieve?

I'd be easier to post a representative extract of the file.
 
Old 04-27-2010, 04:15 PM   #9
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Code:
grep ...
 
Old 04-27-2010, 04:19 PM   #10
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
4266 a
1 aaah
3 ab
14 abandon
1 abash
4 abat
1 abdic
1 abduct
1 abhorr
4 abid
8 abil
25 abinet
1 abject
29 abl
1 ablest
1 abnorm
1 abod
1 abolit
6 abomin
1 aborigin
348 about
49 abov
4 abreast
7 abroad
6 abrupt
20 abruptli
10 absenc


after running $
sed -n '/^.\{3\}/p' Index/Vocabulary.txt >voc.txt
i got the same:

4266 a
1 aaah
3 ab
14 abandon
1 abash
4 abat
1 abdic
1 abduct
1 abhorr
4 abid
8 abil
25 abinet
1 abject
29 abl
1 ablest
1 abnorm
1 abod
1 abolit
6 abomin
1 aborigin
348 about
49 abov
4 abreast
7 abroad
6 abrupt
20 abruptli
10 absenc
 
Old 04-27-2010, 04:20 PM   #11
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
ok, I get what you mean.
 
Old 04-27-2010, 04:25 PM   #12
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Why did you never say that there was a number before each word?!?

Code:
sed -rn 's:[0-9]* .{3,}:&:p'
 
Old 04-27-2010, 04:33 PM   #13
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
Maybe because at the beggining i said that i wanted to delete a line by the length of a word..
Still no success... but can i get any explanations about the s: &: p ?? what exactly do they do?

Thank you everyone though...

Last edited by dayamoon; 04-27-2010 at 04:36 PM.
 
Old 04-27-2010, 04:36 PM   #14
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,564
Blog Entries: 1

Rep: Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026Reputation: 1026
In that case, try awk:
Code:
awk 'length($2) > 2' vocabulary.txt
 
1 members found this post helpful.
Old 04-27-2010, 04:40 PM   #15
dayamoon
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
AWW !!! it Worked !! thank you So much !! second column's length.. THANK YOU !!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Delete Duplicate Lines in a file, leaving only the unique lines left xmrkite Linux - Software 6 01-14-2010 06:18 PM
sed delete lines from file one if regexp are listed in file two fucinheira Programming 6 09-17-2009 08:28 AM
how to delete two adjecent lines in a text file wangxinmco Linux - Newbie 3 01-10-2008 02:23 PM
Delete first and last lines of a file ChainsawPenguin Programming 5 09-28-2007 07:28 AM
delete some lines from a file freelinuxcpp Linux - Software 4 01-17-2004 10:28 AM


All times are GMT -5. The time now is 01:57 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration