LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   start grep at line number (https://www.linuxquestions.org/questions/linux-general-1/start-grep-at-line-number-586015/)

secretlydead 09-20-2007 04:28 AM

start grep at line number
 
How do I start grep at line, say, 200?

(My output is too large and produces a segmentation fault, so i'm going to have to grep each 100 or so lines at a time.)


Particularly, I need to convert strings of chinese code to UTF-8, so I'm making a database. I need to find all of

grep -o "&#[[:digit:]]\{5\};"

those in a file...

matthewg42 09-20-2007 04:57 AM

I don't know of an option in grep itself for this, but you can use sed to extract the lines, and then pass them through your grep command (since you already have the command you want), or try to do the whole thing in sed. If you're running the command a large number of times in a tight loop, the all sed version will probably run faster as ou will have half the number of process invocations, but if you're not doing it more than a few dozen times or the efficiency isn't an issue, don't worry about that.

For example, sed will output lines 200 to the end of the file if you do this command:
Code:

$ sed -n '200,$ p' input_file
The -n option says, "don't print each line unless it is explicitly printed in the program".

The sed program, provided in 'single quotes' says: for lines 200 to the end of the file, do command p (print line). The first part, 200,$ is the address specification. You can use lines numbers, special addresses like $ to mean the end of the file, or patterns, which can be used to look in the contents of the line, and start or end when a certain pattern is encountered.

Just pipe the output into your grep statement and you're laughing, i.e.
Code:

$ sed -n '200,$ p' input_file | grep -o "&#[[:digit:]]\{5\};"

syg00 09-20-2007 06:03 AM

You can use sed (in a loop) to do it in blocks of 100 if you like.
Try it like this "sed -n '200,300p;300q' file" - that prints out lines 200-300, and quit at line 300 (saves reading the entire file). Stick it in a loop and adjust the bounds.

Personally I'd do it in perl for a bit more control, but each to their own.


All times are GMT -5. The time now is 12:24 AM.