Filter a large document by line number
I have a 50000 line(ish) set of records in a file. I have another file where I have filtered out all the line numbers for those which have an error of various types. e.g column count, field type etc. I want to get all those lines into a separate file so I can sanitise them. There are abt 3-4000 of them.
How can I access those lines which I want to isolate into a single file?
I have all the usual linux stuff available and a bit of understanding of regexps.
Grab specific line number from file
After making progress on a problem earlier I now have a file of row number which represent rows that I want to rescue from a file of about 50000 rows.
I want to use the row number file as a filter to determine which of the rows in the 50000 row original file get through to my results file.
this does what I need
sed -n <line>p <filename>
So a bit of work with emacs on my line numbers file gives stuff like
sed -n 10p allbut1st28.sql
sed -n 24p allbut1st28.sql
sed -n 68p allbut1st28.sql
sed -n 128p allbut1st28.sql
sed -n 134p allbut1st28.sql
sed -n 136p allbut1st28.sql
sed -n 161p allbut1st28.sql
sed -n 162p allbut1st28.sql
sed -n 228p allbut1st28.sql
sed -n 342p allbut1st28.sql
sed -n 412p allbut1st28.sql
sed -n 414p allbut1st28.sql
sed -n 421p allbut1st28.sql
sed -n 510p allbut1st28.sql
Which I then run like allmydudlines.sh >the sql for the dudlines.sql
If I was a guru then I could do the whole thing in a 100 character line with mainly punctuation characters... but then I am not.
At least I don't have to wear sandals in this weather but I could do with the beard.
is it really beyond http://web.mit.edu/gnu/doc/html/textutils_toc.html ?
If line number does not match value in ___, then append line to file...
Suppose rownumfile contains the row numbers to capture, and rowsfile is to receive the selected lines:
I have merged your two closely-related threads---please keep this all in one place.
|All times are GMT -5. The time now is 06:56 PM.|