LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-24-2010, 12:20 AM   #1
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Rep: Reputation: 32
Search for exact repetitions in text file


Hi!

I've got a big text file in which I know have probably made some typos (LaTeX). Sometimes I rewrite sentences several times and then end up with double pieces like "the the" or "is is" without noticing it. Most spell checkers that I can use in LaTeX are very basic so they do not notice these grammar errors. Is there a way that I can search for these repetitions by hand using sed or awk or something along these lines? Is there an app for that?

Thanks.
 
Old 11-24-2010, 01:17 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,552

Rep: Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899Reputation: 2899
Well its crude but you could try something like:
Code:
awk '{for(x=1;x<=NF;x++)for(y=x+1;y<=NF;y++)if($x == $y){print NR,$x;break}}' file
Obviously if you have something like 'is is is' it will show 2 duplications occurred on this line.
If you change break to next then you will only be advised of each line once.
 
Old 11-24-2010, 04:48 PM   #3
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 336

Rep: Reputation: 141Reputation: 141
Code:
echo 'This looks for for repeated words
within a line and prints the the
line number if if any are found' |
grep -nE  '\b(\w+)[[:space:]]+\1\b' 

1:This looks for for repeated words
2:within a line and prints the the
3:line number if if any are found
Code:
echo 'This looks for repeated
words separated by a newline. It uses
uses a sliding window of two lines 
through the text and prints the
line number of the second
second line when a match is found.'| 
sed -rn 'N; /(\b\w+)[[:space:]]*\n[[:space:]]*\1\b/{=;p}; D'

3
words separated by a newline. It uses
uses a sliding window of two lines
6
line number of the second
second line when a match is found.
 
Old 11-24-2010, 11:23 PM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,411

Rep: Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397
The 2nd Ed Perl Cookbook suggests
Code:
$/ = '';                      # paragrep mode
while (<>) {
    while ( m{
                \b            # start at a word boundary (begin letters)
                (\S+)         # find chunk of non-whitespace
                \b            # until another word boundary (end letters)
                (
                    \s+       # separated by some whitespace
                    \1        # and that very same chunk again
                    \b        # until another word boundary
                ) +           # one or more sets of those
             }xig
         )
    {
        print "dup word '$1' at paragraph $.\n";
    }
}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to do search & replace on a text file--need to extract URLs from a sitemap file Mountain Linux - General 4 08-07-2015 11:52 AM
How to search for file with particular text pranavojha Linux - General 5 11-01-2008 12:12 PM
Is there a file that Linux writes upon boot that is the EXACT text as the boot screen lostboy Linux - General 21 01-08-2008 03:10 PM
How to search within text file 666 Linux - General 2 05-09-2007 03:26 PM
How do you search for a file in text mode? cyberkid12 Linux - General 7 12-21-2002 01:42 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration