LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-23-2010, 11:20 PM   #1
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Rep: Reputation: 32
Search for exact repetitions in text file


Hi!

I've got a big text file in which I know have probably made some typos (LaTeX). Sometimes I rewrite sentences several times and then end up with double pieces like "the the" or "is is" without noticing it. Most spell checkers that I can use in LaTeX are very basic so they do not notice these grammar errors. Is there a way that I can search for these repetitions by hand using sed or awk or something along these lines? Is there an app for that?

Thanks.
 
Old 11-24-2010, 12:17 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well its crude but you could try something like:
Code:
awk '{for(x=1;x<=NF;x++)for(y=x+1;y<=NF;y++)if($x == $y){print NR,$x;break}}' file
Obviously if you have something like 'is is is' it will show 2 duplications occurred on this line.
If you change break to next then you will only be advised of each line once.
 
Old 11-24-2010, 03:48 PM   #3
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Code:
echo 'This looks for for repeated words
within a line and prints the the
line number if if any are found' |
grep -nE  '\b(\w+)[[:space:]]+\1\b' 

1:This looks for for repeated words
2:within a line and prints the the
3:line number if if any are found
Code:
echo 'This looks for repeated
words separated by a newline. It uses
uses a sliding window of two lines 
through the text and prints the
line number of the second
second line when a match is found.'| 
sed -rn 'N; /(\b\w+)[[:space:]]*\n[[:space:]]*\1\b/{=;p}; D'

3
words separated by a newline. It uses
uses a sliding window of two lines
6
line number of the second
second line when a match is found.
 
Old 11-24-2010, 10:23 PM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,358

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
The 2nd Ed Perl Cookbook suggests
Code:
$/ = '';                      # paragrep mode
while (<>) {
    while ( m{
                \b            # start at a word boundary (begin letters)
                (\S+)         # find chunk of non-whitespace
                \b            # until another word boundary (end letters)
                (
                    \s+       # separated by some whitespace
                    \1        # and that very same chunk again
                    \b        # until another word boundary
                ) +           # one or more sets of those
             }xig
         )
    {
        print "dup word '$1' at paragraph $.\n";
    }
}
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to do search & replace on a text file--need to extract URLs from a sitemap file Mountain Linux - General 4 08-07-2015 10:52 AM
How to search for file with particular text pranavojha Linux - General 5 11-01-2008 11:12 AM
Is there a file that Linux writes upon boot that is the EXACT text as the boot screen lostboy Linux - General 21 01-08-2008 02:10 PM
How to search within text file 666 Linux - General 2 05-09-2007 02:26 PM
How do you search for a file in text mode? cyberkid12 Linux - General 7 12-21-2002 12:42 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration