extract substring using sed and regular expressions (regexp)
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am having the same problem....sort of. I want to extract a combination of character if they exist. If they don't exist, I want nothing. My problem is that if my pattern doesn't exist, I get the whole line returned.
if I have .....AA9999999999999999....., I want AA9999999999999999
if I have ............................, I want nothing.
where AA9999999999999999 is 2 capital alphas followed by 16 numerics.
I use 's/.*\(AA[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1/'
because \{16\} as a repeater doesn't work.
show examples of your data what you want to get exactly.
for the purpose of the question, these are examples. Of both the data and the regex. I will go off and try 'print' and maybe grep but I haven't yet found how to extract data using grep.
where '.' represents any character, I want to extract only the characters that fit the pattern AA followed by 16 numerics. Any digits in these positions is a match and the pattern could exist anywhere in the line. If this 'exact' pattern is not found then output nothing.
the above suggested solution actually worked for me:
if I have .....AA9999999999999999....., I want AA9999999999999999
if I have ............................, I want nothing.
where AA9999999999999999 is 2 capital alphas followed by 16 numerics.
I use 's/.*\(AA[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1/'
because \{16\} as a repeater doesn't work.
Repeaters are extended regexp functions, meaning you have use "sed -r" or "grep -E/egrep". On the plus side, this also means you don't have to escape the parentheses or brackets. Also, it's recommended to use the posix matching classes for the standard ranges of characters. Either of the following should work:
Code:
sed -rn 's/.*(AA[[:digit:]]{16}).*/\1/p'
egrep -o 'AA[[:digit:]]{16}'
Note that there are a couple of weaknesses in the above, though they may or may not be a concern for you. First, it will match number strings of any length, but only print the first 16. Second, it will match any combination of numerals, meaning something like AA1234567890123456 will also match. I'm not sure what you'd need to do if you need to isolate only a single repeating number.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.