Select lines based on key match
I want select lines from a large file based on matching a key value in a smaller file. The key values in the small file are unique; the matching values in the large file are not unique. Both files are sorted.
Sample small file ... Code:
Cole Code:
Bergeron Denise Code:
Cole Carlton Daniel B. Martin |
grep(1) can read patterns from a file, a la:
Code:
$ grep -f keys.txt people.txt |
Quote:
How may we limit the scope of the grep? Daniel B. Martin |
There may be a more efficient means for solving this, but I'd simply put the patterns in the keys.txt file.
Input files: Code:
$ cat keys.txt Code:
$ cat people.txt Code:
$ grep -f keys.txt people.txt |
How about
Code:
awk -v 'keyfile=path/to/small/file' 'BEGIN { The actual rule on the last line reads: If first field matches a key in key array, then print the record. (You can omit the implicit { print } for the last rule.) Essentially, the above reads the first fields in the small file, then outputs the records (lines) of the large file only if the first field matches one of the ones read from the small file. |
Code:
$ join keys.txt people.txt |
Quote:
Code:
$ cat keys.txt Daniel B. Martin |
Those are regular expressions (anchors). The meanings are:
If you do not use the latter, you'll also match names like "Coleman Butler". |
Quote:
I try to learn from tutorials and Google searches. Even knowing \> I found no mention of it anywhere. Help me to help myself -- where could I have found this on my own? Daniel B. Martin |
In this case, you can view the manpages for grep(1):
Code:
Anchoring |
Quote:
Daniel B. Martin |
All times are GMT -5. The time now is 03:03 AM. |