LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How can I do regex recursive file searches that include LF chars? (https://www.linuxquestions.org/questions/linux-newbie-8/how-can-i-do-regex-recursive-file-searches-that-include-lf-chars-4175600624/)

GrumpyGnome 02-26-2017 09:42 AM

How can I do regex recursive file searches that include LF chars?
 
Hello,
I need to use a regex that includes LF to locate some files in a large dir/file set; recursive searches are required. My understanding with grep is that is will not search across line feeds; grep only finds patterns on one line but will show if found on any line in a file. The languages like Perl and awk look workable but I don't know them yet.

Is a short program the best I can do or is there a command I have not learned/found that will recursively search files for a regex that includes LF?

like:
\x0A\x09Material\x20\x20\x0A\x7B

in text it would look like
LF
TABMaterialsSPCSPCLF
{

Happy Trails

HMW 02-26-2017 11:34 AM

It would be helpful if you provided an exact sample of the file (please use proper formatting), and exactly what it is you want to match with your regular expression.

GrumpyGnome 02-27-2017 11:52 AM

Need a hex editor to search 50GiB dd images
 
Thanks for the reply. The search string in my post is as exact as I can get. I have suspect files but am not sure if the string is in any of them. Otherwise I would provide one.

It's searching for a generalized hex sequence that I need. It seems that there isn't a Linus command or typical tool that will do this but I should look for a hex editor that supports regex is some form.

This last thought gave me the idea to dd the directory tree to an image and then use a hex editor to search for a hex string and not use a regex or file tool.

Does anyone know of a stable hex editor that that can handle 50GiB files?

My first choice is for Slackware 14 but I can to the basic configure and make process.

Happy Trails

rtmistler 02-27-2017 12:07 PM

Sounds like you need help learning awk or Perl. I recommend awk, and I usually learn by doing. Therefore you already have an existing regex, and awk uses regex, so try to make an awk string which will suit your needs. Have it first findi the sequences you need, and then changing them or processing them, based on what the next step is.

Suggest you choose which language or tool you wish to use and then start researching them.

Once you've started, you can update this thread with information indicating where you're stuck with the particular option you've chosen. Also from that point people can offer you better recommendations.

rtmistler 02-27-2017 12:11 PM

Also gnuemacs will handle the files and be capable of showing you hex ascii output; however what you should decide first is whether or not you wish an editor or a search tool. Not sure if gedit will similarly work in this mode. VI is also a very capable editor. You should cite what editors you have tried.

syg00 02-27-2017 05:41 PM

Searching hex is easy - simple searching across lines is harder.
Perl (and perl mode in grep) can do it simply, but will slurp the entire file - not a good idea for 50Gig (plus ?) files.

Strictly speaking you don't care about the first \n - start search for lines beginning with "\tMat" and check the next line. I see mgrep on sourceforge that should do for a simple sequence like that - the homepage even has an applicable example.

syg00 02-27-2017 07:59 PM

Thinking about this whilst walking the mutt, sed should work ...
Code:

find /some/directory -type f -exec sed -n '/\tMaterials  / {n; /^{/ F}' {} \;
Prints only filename - note this is a GNU extension.

GrumpyGnome 03-02-2017 07:39 PM

Thanks all, I got busy with work but will work this over the weekend. I'll post back with what I learn and am able to make work.
An awk intro sounds like fun just because I haven't tried it before and used Perl, Ruby, and Java 10-20 years ago.
I found and editor named wxHexEditor. It supports files up to 2^64 byte. It is a beta release.
Happy Trails


All times are GMT -5. The time now is 03:09 AM.