ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I was trying to think of a way to use scriptutil.freplace to delete lines that don't match a patter. For the moment I've had to settle with a short if, then statement. However, I feel I would be a lot better off if I knew how to do a 'does not' match search.
I'm having a small problem with parentheses and brackets.
I have: ([0-9]{6}),([0-9]{4}[A-Z]{0,2}),([0-9]{1,3}),(.*?),(.*?),([0-9]{1,3}),(.*?),(.*)
I would like to eliminate patterns --for example those that don't have all eight comma delimited fields.
What about something like
[^(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?)] ?
Or how would I include the fields I've already specified:
[0-9]{6}),([0-9]{4}[A-Z]{0,2}),([0-9]{1,3}) etc?
you don't have to make it that complicated if you are using Python. Show samples of text you want to match, and describe more clearly what you want to get.
Here I'm thinking specifically I don't want lined that don't have eight different fields separated by commas. Or possibly if the sixth field is not a number.
I think knowing how to state [^foo] could be helpful.
I'm also curious why if I specify seven fields
(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?) and replace with \1\2\3\4\5\6 why the 7th field is tagged on? To delete the 7th field I used the back references to insert foo between \6 and \7 and delete what cam after foo. This seems a bit unnecessary and I don't remember scriptutil always behaving this way.
* and ? are both modifiers. (.*),(.*) is more accurate, as the * is the kleene star It means 0 or more, which kind of implies its optional (?). Correct me if I am wrong...
If you do not have/use eclipse, download the stand-alone application. You need to have Java installed to get that running, and possibly the java ./bin/ directory added to your PATH.
If you get it running, there comes the fun part: paste some test lines of your input data and write a regular expression to the corresponding text field; it will immediately show you in real time if it matches, which are the groups, etc. Works great for me, and the "JDK regexp" seems to completely match the Python regexp behavior.
* and ? are both modifiers. (.*),(.*) is more accurate, as the * is the kleene star It means 0 or more, which kind of implies its optional (?). Correct me if I am wrong...
It's my understanding that the '?' is used for the 'non-greedy' regex and it matches the limits itself to the first occurrence of a pattern. If there is a comma it will stop at the first instance and not go beyond whereas .* could include anything that goes up to a comma (including other commas).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.