AWK: How to display how array elements are found in data file?

starrysky1 · 05-02-2018, 01:55 PM

Hello and thanks for reading my post, I really appreciate it.

Need a way to read from the arrayfile and search the datafile in a way that prints out how the arrayfile elements are found. An awk one liner is preferred.

arrayfile:

Code:

haha
hahh

datafile:

Code:

aaahahahaaaahahahhaaahahahaaahahaaaa

desiredoutputfile:

Code:

haha 1110111
hahh 0001000

So its basically remove the last character from the arrayfile elements, search the datafile until you find the match, and if the next character after match is the same as was removed then write a 1 and if not then write a 0, It also needs to search in the most squeezed way possible (from beginning of each character in datafile, not by blocks like normal search in text editor, as portrayed in my previous post: https://www.linuxquestions.org/quest...98#post5847798).

Thank you infinitely.

TB0ne · 05-02-2018, 02:39 PM

Quote:

Originally Posted by starrysky1

Hello and thanks for reading my post, I really appreciate it. Need a way to read from the arrayfile and search the datafile in a way that prints out how the arrayfile elements are found. An awk one liner is preferred.
arrayfile:

Code:

haha
hahh

datafile:

Code:

aaahahahaaaahahahhaaahahahaaahahaaaa

desiredoutputfile:

Code:

haha 1110111
hahh 0001000

So its basically remove the last character from the arrayfile elements, search the datafile until you find the match, and if the next character after match is the same as was removed then write a 1 and if not then write a 0, It also needs to search in the most squeezed way possible (from beginning of each character in datafile, not by blocks like normal search in text editor, as portrayed in my previous post: https://www.linuxquestions.org/quest...98#post5847798).

Read the "Question Guidelines" link in my posting signature. We're happy to help, but just telling us what you want and not showing your own efforts to solve your own problem, isn't a good thing. This is something you've done in many previous threads as well.

Post what you have written/done/tried so far.

grail · 05-02-2018, 11:48 PM

Not only do I agree 100% with TB0ne, but how is this any different than the linked previous question? What attempts to alter the suggested previous solution have you made to allow for whatever seems
to be your new criteria?

danielbmartin · 05-03-2018, 12:55 PM

To echo grail and TB0ne, if you show an earnest effort we will help. When you comply I will post a tested-and-working solution here.

Daniel B. Martin

.

starrysky1 · 05-04-2018, 07:14 PM

Finally!!! I got it down! Solution is:

Code:

{ 
  # remove spaces
  gsub(/[ ]+/, "", $0)
} 

NR==FNR {
  data=$0;
  data_len = length(data)
  next
} 

{
  pattern = $0
  pattern_len = length(pattern)
  printf("%s\t", pattern)
  for (j=1; j+pattern_len-1 <= data_len; j++) {
    if (substr(data, j, pattern_len-1) ~ substr(pattern, 1, pattern_len-1)) {
      if (substr(data, j + pattern_len - 1, 1) ~ substr(pattern, pattern_len, 1)) {
        result = 1
      }
      else {
        result = 0 
      }
      printf("%s", result) 
    }      
  }
  print ""
}

danielbmartin · 05-05-2018, 11:25 AM

With this InFile1 ...

Code:

aaahahahaaaahahahhaaahahahaaahahaaaa

... and this InFile2 ...

Code:

haha
hahh

... this awk ...

Code:

# df = datafile
# sc = short candidate
# OL = Output Line
echo; echo "Method #1 of LQ Member danielbmartin."
awk '{if (NR==FNR) df=$0
      else {OL=""; sc=substr($0,1,length($0)-1)
            for (k=1;k<=length(df);k++)
              {if ($0==substr(df,k,length($0))) OL=OL"1"
          else if (sc==substr(df,k,length(sc))) OL=OL"0"} 
      print $0,OL}}'  $InFile1 $InFile2 >$OutFile

... produced this OutFile ...

Code:

haha 1110111
hahh 0001000

Daniel B. Martin

.

starrysky1 · 05-05-2018, 05:57 PM

Quote:

Originally Posted by danielbmartin

With this InFile1 ...

Code:

aaahahahaaaahahahhaaahahahaaahahaaaa

... and this InFile2 ...

Code:

haha
hahh

... this awk ...

Code:

# df = datafile
# sc = short candidate
# OL = Output Line
echo; echo "Method #1 of LQ Member danielbmartin."
awk '{if (NR==FNR) df=$0
      else {OL=""; sc=substr($0,1,length($0)-1)
            for (k=1;k<=length(df);k++)
              {if ($0==substr(df,k,length($0))) OL=OL"1"
          else if (sc==substr(df,k,length(sc))) OL=OL"0"} 
      print $0,OL}}'  $InFile1 $InFile2 >$OutFile

... produced this OutFile ...

Code:

haha 1110111
hahh 0001000

Daniel B. Martin

.

Awesome!! Extremely grateful!!