LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Look for a string on a file and get its line number (https://www.linuxquestions.org/questions/programming-9/look-for-a-string-on-a-file-and-get-its-line-number-611922/)

horacioemilio 01-08-2008 02:18 AM

Look for a string on a file and get its line number
 
Hi,

I have to search for a string on a big file. Once this string is found, I would need to get the number of the line in which the string is located on the file. Do you know how if this is possible to do in python ?

Thanks

ghostdog74 01-08-2008 02:35 AM

What have you tried so far ?

syg00 01-08-2008 02:38 AM

What are you getting at ghostdog74 ??? .... :p

Gotta be do-able in python. Me, I'd use perl.

ghostdog74 01-08-2008 02:43 AM

Quote:

Originally Posted by syg00 (Post 3015263)
What are you getting at ghostdog74 ??? .... :p

not sure. However, there ought to be an option to delete my own posts.
Quote:

Gotta be do-able in python. Me, I'd use perl.
Yes, of course there is. :p

ghostdog74 01-08-2008 02:46 AM

this is crazy. can some mod/admin help to delete them for me. thanks

pixellany 01-08-2008 05:52 AM

Well, there is "grep -n", and grep can be called from Python.....

The brute force way (in any language) would be to set up a loop with a line counter and then enclose a command that reads a line and tests for the string.

crabboy 01-08-2008 07:48 AM

Duplicate posts removed.

ghostdog74 01-08-2008 07:55 AM

Quote:

Originally Posted by pixellany (Post 3015372)
and grep can be called from Python.....

usually, one shouldn't do that since feature rich languages like Python/Perl(and others) have inbuilt string and file manipulation utilities to do what grep can. Anyway, OP has got some answers in another forum, but just for the record.
Code:

#!/usr/bin/env python
for num,line in enumerate(open("file")):
    if "search_string" in line: print num


horacioemilio 01-08-2008 08:28 AM

Hi, thanks for the help. Then I got running the following code;

Code:

#!/usr/bin/env python

import os, sys, re, string, array, linecache, math

nlach = 12532

lach_list = sys.argv[1]
lach_list_file = open(lach_list,"r")
lach_mol2 = sys.argv[2] # name of the lachand mol2 file
lach_mol2_file = open(lach_mol2,"r")
n_lach_read=int(sys.argv[3])

# Do the following for the total number of lachands

# 1. read the list with the ranked lachands
for i in range(1,n_lach_read+1):
        line = lach_list_file.readline()
        ll = string.split (line)
        #print i, ll[0]
        lach = int(ll[0])
        # 2. for each lachand, print mol2 file
        # 2a. find lachand header in lachand mol2 file (example; kanaka)
        #    and return line number
        line_nr = 0
        for line in lach_mol2_file:
                    line_nr += 1
                    has_match = line.find('kanaka')
                    if has_match >= 0:
                        print 'Found in line %d' % (line_nr)
                        # 2b. print on screen all the info for this lachand
                        #  (but first need to read natoms and nbonds info)
                        #    go to line line_nr + 1
                        ltr=linecache.getline(lach_mol2, line_nr + 1)
                        ll=ltr.split()
                        #print ll[0],ll[1]
                        nat=int(ll[0])
                        nb=int(ll[1])
                        # total lines to print:
                        #  header, 8
                        #  at, na
                        #  b header, 1
                        #  n
                        #  lastheaders, 2
                        #  so; nat + nb + 11
                        ntotal_lines = nat + nb + 11
                        # now we go to the beginning of the lachand               
                        # and print ntotal_lines                       
                        for j in range(0,ntotal_lines):
                                print linecache.getline(lach_mol2, line_nr - 1 + j )

which almost works. In the last "for j" loop, i expected to obtain an output like:

sdsdsdsdsdsd
sdsdsfdgdgdgdg
hdfgdgdgdg

but instead of this, i get:

sdsdsdsdsdsd

sdsdsfdgdgdgdg

hdfgdgdgdg

and also the program is very slow. Do you know how could i solve this ?

thanks

pixellany 01-08-2008 08:40 AM

Quote:

Originally Posted by ghostdog74 (Post 3015461)
usually, one shouldn't do that since feature rich languages like Python/Perl(and others) have inbuilt string and file manipulation utilities to do what grep can.

Of course.....it was a feeble attempt at humor. (Why use Python when the simple grep will work)

theNbomr 01-08-2008 09:06 AM

This is a oneliner in awk:
Code:

awk  '/stringOrRegexToFind/ { print NR;}' fileToSearch.txt
--- rod.

EDIT: Oops. Didn't notice that this is a python question.

ghostdog74 01-08-2008 09:30 AM

Quote:

Originally Posted by pixellany (Post 3015495)
Of course.....it was a feeble attempt at humor.

oh. ok. lol
Quote:

(Why use Python when the simple grep will work)
is this humor as well? Because if its not, there are many reasons why. :)

ghostdog74 01-08-2008 09:33 AM

Quote:

Originally Posted by horacioemilio (Post 3015485)
and also the program is very slow. Do you know how could i solve this ?
thanks

provide your sample input file, describe what you want to do and how your expected output will look like. There are definitely betters ways to do what you want.

Hko 01-08-2008 09:55 AM

Code:

import sys
i = 1
for line in file(sys.argv[2]):
    if line.find(sys.argv[1]) >= 0: print i
    i += 1

Run this script like grep and it will print the number of the line where it found "hko" in /etc/passwd.

Example:
Code:

bash$ python scriptname.py hko /etc/passwd
28
bash$


angrybanana 01-08-2008 04:22 PM

Code:

perl -lane 'print "$.:$_" if /text/' file
Perl mimicking grep -n.

Edit: DOH! just realized it was python question... hmm well since I've already posted I guess I gotta contribute somehow to not look foolish.
umm...ooh, regex support!

Code:

import sys
import re

for i, line in enumerate(open(sys.argv[2])):
    if re.search(sys.argv[1], line): print "%s:%s" %(i, line)

that mimics grep -n, and uses regex.

Edit: changed from re.match() to re.search()


All times are GMT -5. The time now is 09:59 PM.