LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-26-2018, 12:42 PM   #16
rtmistler
Moderator
 
Registered: Mar 2011
Location: MA, USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 6,737
Blog Entries: 12

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390

Quote:
Originally Posted by BW-userx View Post
perhaps even post a chunk of real data so one knows what you're really working with here.
I believe they did in their original question. Please check that and see if it shows what you are looking for. They also posted their attempt there.
 
Old 02-26-2018, 12:45 PM   #17
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Data would be like below, check the first substring in column 4 that is your client code which could match with the code in lookup file

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DE-IAS-AR8F92001T2-C
 
Old 02-26-2018, 12:59 PM   #18
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,899

Rep: Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242
Quote:
Originally Posted by bishnumnnit2006 View Post
@pan64 code is not static, we can get any type of code, there are around 20 client code, so we have a single files with multiple client codes, lookup files can contain any code randomly do out code should be smart enough to handle those
this information is meaningless. At least I don't understand it at all.
What do you mean by "not static"?
How are these information related to the grep I suggested?
what does "smart enough" mean? What "those" should be handled?

What I see you want to filter out lines by keys from a file (keys listed in another file). This is exactly a task for grep. If that really will not fit you can try to use awk/perl/python/whatever.

Furthermore I have already suggested some improvement to your script. Did you try that?
 
Old 02-26-2018, 01:25 PM   #19
rtmistler
Moderator
 
Registered: Mar 2011
Location: MA, USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 6,737
Blog Entries: 12

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
Per BW-userx's point as well as pan64's points.

Perhaps what you should do is attack this in smaller scale and then determine where, if any, performance bottlenecks reside.

My understanding is that you have some number of data files containing these records, as shown in your last post:
Quote:
Originally Posted by bishnumnnit2006 View Post
6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DE-IAS-AR8F92001T2-C
And that you wish to use one or more input files to tell you which client to screen for.

Perhaps it is better to use one file with multiple lines. Example:
Code:
AAR-
ABC-
TRP-
TIA-
DE-
And then write a script where the argument is each of those file names. The second argument being the "client prefix" search list.

In the script, search the main data file for all instances of each individual line from the client prefix list and channel the output of each search to a different destination file. You can use the client prefix term as a way to make the file name, "<prefix>sorted.txt" ==> ABC-sorted.txt. (Merely a suggestion)

Do this with the limited data set you have provided.

I feel it will be a pretty quick and efficient script, using grep and I/O redirection, none of awk, sed, or other.

Then consider using that on a larger data set, and updating members with any performance issues you've noticed.

A very large file ... is a very large file. Bash instructions are going to operate just as fast a C code, or Python code, or whatever, however if you try to do too much for every entry, then this will increase the processing time. Ergo, a recommendation is to do it fast, do it quick, do it plain and dirty - to say this inelegantly. I've many times, tied myself up with trying to be "too elegant" when I didn't need to. Many times these days I start with what I call "brute force" and then improve from there.

Sorry, but from what you've shown, it doesn't appear to be any more complex than what I've talked about here. Are we missing something?
 
Old 02-26-2018, 01:34 PM   #20
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
this information is meaningless. At least I don't understand it at all.
What do you mean by "not static"?
How are these information related to the grep I suggested?
what does "smart enough" mean? What "those" should be handled?

What I see you want to filter out lines by keys from a file (keys listed in another file). This is exactly a task for grep. If that really will not fit you can try to use awk/perl/python/whatever.

Furthermore I have already suggested some improvement to your script. Did you try that?

By Grep you will be looking for only those code that you want, I don't want to hardcode anything as I am not sure what I am going to get, users will send those codes, so we don't actually know what that code can be, so our script is reading the look up files and process the code line by line and compare it with the codes that we are getting from main source file..hope it makes sense now
 
Old 02-26-2018, 01:41 PM   #21
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rtmistler View Post
Per BW-userx's point as well as pan64's points.

Perhaps what you should do is attack this in smaller scale and then determine where, if any, performance bottlenecks reside.

My understanding is that you have some number of data files containing these records, as shown in your last post:And that you wish to use one or more input files to tell you which client to screen for.

Perhaps it is better to use one file with multiple lines. Example:
Code:
AAR-
ABC-
TRP-
TIA-
DE-
And then write a script where the argument is each of those file names. The second argument being the "client prefix" search list.

In the script, search the main data file for all instances of each individual line from the client prefix list and channel the output of each search to a different destination file. You can use the client prefix term as a way to make the file name, "<prefix>sorted.txt" ==> ABC-sorted.txt. (Merely a suggestion)

Do this with the limited data set you have provided.

I feel it will be a pretty quick and efficient script, using grep and I/O redirection, none of awk, sed, or other.

Then consider using that on a larger data set, and updating members with any performance issues you've noticed.

A very large file ... is a very large file. Bash instructions are going to operate just as fast a C code, or Python code, or whatever, however if you try to do too much for every entry, then this will increase the processing time. Ergo, a recommendation is to do it fast, do it quick, do it plain and dirty - to say this inelegantly. I've many times, tied myself up with trying to be "too elegant" when I didn't need to. Many times these days I start with what I call "brute force" and then improve from there.

Sorry, but from what you've shown, it doesn't appear to be any more complex than what I've talked about here. Are we missing something?


No no, in short, I have 2 look up files which will only contain the client codes, I will be reading those client codes from the lookup files and also reading the client code from the main files.

Now comparing those client code, if the client code from first lookup file matches with main file then send those record to one target and if the client code from 2nd lookup matches with main file then send it to other target file
 
Old 02-26-2018, 01:48 PM   #22
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Slackware 14.2 / Slackware 14.2 current / Manjaro
Posts: 6,362

Rep: Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231
I 'm not completely understating the complete logic behind this, but the bold being the pattern you're looking for. if one matches then send it to a different file then use that to search the second file, else if it matches a different pattern then send that to a different file and search that file
Code:
ERUID|GroupID|GroupName|AcctID

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DEF-IAS-AR8F92001T2-C
them all being unique
Code:
#!/bin/bash

sub1=AAR
sub2=ABC
sub3=TIA
sub4=TRP


while read f ;
do
echo $f

if [[ "$f" =~ "$sub1" ]] ; then 
{
    while read d ;
    do
        echo $d
        
        if [[ "$d" =~ "$sub2" ]] ; then
            echo "yep"
        fi
    done < secondSource
}
elif [[ "$f" =~ "$sub3" ]] ; then
    {
        
        while read g ;
        do
            echo $g
            
            if [[ "$g" =~ "$sub4" ]] ; then
                echo "ok"
            fi
        
        done <thridSource
    }

fi
done < source
just need to make sure the conditionals are set properly using the unique pattern as your substring to match off of.

the writing to a file is not there, just the ability to use sub-string matching.
 
Old 02-26-2018, 01:57 PM   #23
rtmistler
Moderator
 
Registered: Mar 2011
Location: MA, USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 6,737
Blog Entries: 12

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
Quote:
Originally Posted by bishnumnnit2006 View Post
No no, in short, I have 2 look up files which will only contain the client codes, I will be reading those client codes from the lookup files and also reading the client code from the main files.

Now comparing those client code, if the client code from first lookup file matches with main file then send those record to one target and if the client code from 2nd lookup matches with main file then send it to other target file
Suggest you then try by doing. I know you have, however what you wrote seems to be overkill.

BW-userx gave you a great start. Recommend you start with that and modify it to suit your needs, such as opening the varied input files with the lists of client codes to search for.
 
1 members found this post helpful.
Old 02-26-2018, 02:05 PM   #24
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by BW-userx View Post
I 'm not completely understating the complete logic behind this, but the bold being the pattern you're looking for. if one matches then send it to a different file then use that to search the second file, else if it matches a different pattern then send that to a different file and search that file
Code:
ERUID|GroupID|GroupName|AcctID

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DEF-IAS-AR8F92001T2-C
them all being unique
Code:
#!/bin/bash

sub1=AAR
sub2=ABC
sub3=TIA
sub4=TRP


while read f ;
do
echo $f

if [[ "$f" =~ "$sub1" ]] ; then 
{
    while read d ;
    do
        echo $d
        
        if [[ "$d" =~ "$sub2" ]] ; then
            echo "yep"
        fi
    done < secondSource
}
elif [[ "$f" =~ "$sub3" ]] ; then
    {
        
        while read g ;
        do
            echo $g
            
            if [[ "$g" =~ "$sub4" ]] ; then
                echo "ok"
            fi
        
        done <thridSource
    }

fi
done < source
just need to make sure the conditionals are set properly using the unique pattern as your substring to match off of.

the writing to a file is not there, just the ability to use sub-string matching.

Thank you for the reply.

Let me explain you the requirements in detail.

I have a source file which is the main file, having the data like below with"|" delimited, the first substring in the 4th column is your client code like AAR,ABC etc.

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DE-IAS-AR8F92001T2-C


Now I have 2 lookup file say lkp1.txt and lkp2.txt which will also contain multiple client code, and will be space delimiter
Data can be as below.

cat lkp1.txt
AAN BWC TRP TIA

cat lkp2.txt
GHU TIO RAT

now I have to read the code from above lookup files and match with code from the main file on the top, hope it clears
 
Old 02-26-2018, 05:03 PM   #25
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Slackware 14.2 / Slackware 14.2 current / Manjaro
Posts: 6,362

Rep: Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231Reputation: 1231
this might get you started
Code:
#!/bin/bash


#number of words in string
# numvars="$(echo $f | wc -w)"

#put it them into an array
# holdvars=( $f )
#access each pattern one at a time

vars=$(cat secondSource)
numvars="$(echo $vars | wc -w)"
holdvars=( $vars )

#echo "$numvars ${holdvars[2]}"

    while read d 
    do
    #    echo "${holdvars[count]}"
        
            while [[ $count -lt "$numvars" ]] ; do
          #  echo "${holdvars[count]}"
           # echo $d
            
            if [[ $d =~ "${holdvars[count]}" ]] ; then
                    echo " ${holdvars[count]} :: $d "
                    echo "${d##*|}"
            fi
       
        ((count++))
        done
     #   echo "count $count "
        
   if [[ $count -eq "$numvars" ]] ; 
   then 
        count=0 
    fi
      # echo $d
  done  < source
source
Code:
ERUID|GroupID|GroupName|AcctID

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DEF-IAS-AR8F92001T2-C
secondSource
Code:
TRP ABC AAR TIA
results
Code:
$ ./3loopsearch
 AAR :: 6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C 
AAR-IAS-AR8F92001T2-C
 ABC :: 6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C 
ABC-IAS-AR8F92001T2-C
 TRP :: 6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C 
TRP-IAS-AR8F92001T2-C
 TIA :: 6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C 
TIA-IAS-AR8F92001T2-C

Last edited by BW-userx; 02-26-2018 at 05:10 PM.
 
Old 02-27-2018, 01:43 AM   #26
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,899

Rep: Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242
Quote:
Originally Posted by bishnumnnit2006 View Post
By Grep you will be looking for only those code that you want
No, it is not true.
Grep can work from file, so it can use that lookup file to filter. see man grep, look for --file.
Quote:
Originally Posted by bishnumnnit2006 View Post
I don't want to hardcode anything as I am not sure what I am going to get, users will send those codes, so we don't actually know what that code can be, so our script is reading the look up files and process the code line by line and compare it with the codes that we are getting from main source file..hope it makes sense now
So there will be nothing hardcoded. Actually you can modify that lookup file to fit your needs, adding some extra filtering regexp, so your script would go like:
Code:
# to adjust lookup file
sed '<something important here>' lookup_file > filter_file
grep --file=filter_file input_file > output_file
# and repeat the same for the second lookup file
sed can implement requirements, like restrict filtering on the fourth field (if it was really needed) or add field separator or anything else you need.
 
1 members found this post helpful.
Old 02-27-2018, 05:06 AM   #27
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by BW-userx View Post
this might get you started
Code:
#!/bin/bash


#number of words in string
# numvars="$(echo $f | wc -w)"

#put it them into an array
# holdvars=( $f )
#access each pattern one at a time

vars=$(cat secondSource)
numvars="$(echo $vars | wc -w)"
holdvars=( $vars )

#echo "$numvars ${holdvars[2]}"

    while read d 
    do
    #    echo "${holdvars[count]}"
        
            while [[ $count -lt "$numvars" ]] ; do
          #  echo "${holdvars[count]}"
           # echo $d
            
            if [[ $d =~ "${holdvars[count]}" ]] ; then
                    echo " ${holdvars[count]} :: $d "
                    echo "${d##*|}"
            fi
       
        ((count++))
        done
     #   echo "count $count "
        
   if [[ $count -eq "$numvars" ]] ; 
   then 
        count=0 
    fi
      # echo $d
  done  < source
source
Code:
ERUID|GroupID|GroupName|AcctID

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DEF-IAS-AR8F92001T2-C
secondSource
Code:
TRP ABC AAR TIA
results
Code:
$ ./3loopsearch
 AAR :: 6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C 
AAR-IAS-AR8F92001T2-C
 ABC :: 6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C 
ABC-IAS-AR8F92001T2-C
 TRP :: 6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C 
TRP-IAS-AR8F92001T2-C
 TIA :: 6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C 
TIA-IAS-AR8F92001T2-C

Hey thanks a ton, it was really helpful, but can you help me in doing the same via AWK, like how to read the lookup files and main files via AWK and comparing both
 
Old 02-27-2018, 05:08 AM   #28
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
AWK is much faster than while or for, I am very new to AWK, want to know how to read multiple lookup files from Awk and how to read the main source file via AWK and compare those based on client code.
 
Old 02-27-2018, 05:24 AM   #29
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,899

Rep: Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242Reputation: 3242
how do you know awk is faster?
What was posted by BW-userx is not perfect (especially did not check if it was the fourth field). Why do you want to solve it using awk? Is this a homework?
Using the following lookup file:
Code:
|TRP-
|ABC-
|AAR-
|TIA-
you will not need to implement anything, grep will do the job. Did you check it already? This will be much faster than awk.

If you really want to use awk, check readline to read a file, split and follow the logic of the script in post #25. Just you need to use awk syntax instead of bash.
To process lookup file you will need to use BEGIN block.
 
1 members found this post helpful.
Old 02-27-2018, 05:31 AM   #30
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
how do you know awk is faster?
What was posted by BW-userx is not perfect (especially did not check if it was the fourth field). Why do you want to solve it using awk? Is this a homework?
Using the following lookup file:
Code:
|TRP-
|ABC-
|AAR-
|TIA-
you will not need to implement anything, grep will do the job. Did you check it already? This will be much faster than awk.

If you really want to use awk, check readline to read a file, split and follow the logic of the script in post #25. Just you need to use awk syntax instead of bash.
To process lookup file you will need to use BEGIN block.

Hey thanks for responding, yes BW- userX code is not fulfill my requirement, but it gives me some logic to solve.

As you said grep will do the job, I would appreciate if you share a script with grep.
Thanks again for your help!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script help to name a file from 1st column in a file SmurfGGM Linux - General 2 08-26-2014 08:07 AM
LXer: Monster RPG 2 source code release on Dec. 1st LXer Syndicated Linux News 0 11-09-2012 01:02 AM
[SOLVED] substring match in specific column udiubu Linux - Newbie 2 06-05-2012 05:29 AM
Printing a column in a file.. slight confusion with my code pdklinux79 Linux - Newbie 3 06-17-2008 06:19 PM
MySQL splitting a string with a delimiter and taking the 1st value and update Lantzvillian Programming 7 01-31-2008 11:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration