LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-26-2018, 09:42 AM   #1
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Rep: Reputation: Disabled
Splitting the source file as per the code(1st substring) in 4th(AcctID) column


Hi Everyone, need your help in solving one of the issue that i am facing



i)I have a source file(Src.txt) with 4 column, Column 4 contain the data as substring and is separated by '-' as shown below.

ii)The first substring(like AAR,ABC,TRP..) in AcctID(cloumn 4)is the client code.

ERUID|GroupID|GroupName|AcctID

6|ERU_MCSD_POS|ERU_MCSD_POS|AAR-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|generic Accounts|ABC-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Accounts|TRP-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic|TIA-IAS-AR8F92001T2-C

6|ERU6_ARCGEN|Archive Generic Accounts|DEF-IAS-AR8F92001T2-C



also i have a lookup files(lkp1.txt and lkp2.txt ) with the client code separated by space as shown below

cat lkp1.txt

AAR ABC TIA



cat lkp2.txt

TRP



now i want to read the lookup files and main file and compare the client code.

below is what the logic is

1) from the main file Split the acctid column into 5 substrings and take the first substring. This is the client code

2) If Client code matches the client codes from lkp.txt file then send the data to one target file

3) If Client code matches the client codes from lkp2.txt file then send the data to other target file.



I am developed the code, but it is taking much time to compare and split.

i know that we can do it via AWK and its pretty fast, but i am new to awk, please help with the solution.



sharing my code below:



while read -r LINE

do

CODE=`echo $LINE |awk -F '|' '{ split($4,a,"-"); print a[1]}'`

while read -r LINE1

do

LKP_AMR1=`echo $LINE1`

if [[ $CODE = $LKP_AMR1 ]]

then

`touch <Path>/<File_Name>

`echo $LINE >> <Path>/<File_Name>

fi

done < ${XmrEtlSrcFiles}/lkp1.txt





while read -r LINE2

do

LKP_AMR2=`echo $LINE2`

if [[ $CODE = $LKP_AMR2 ]]

then

`touch <Path>/<File_Name>

`echo $LINE >> <Path>/<File_Name>

fi



done < ${XmrEtlSrcFiles}/lkp2.txt



done < ${XmrEtlSrcFiles}/Src.txt
 
Old 02-26-2018, 09:47 AM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
This is more of a programming question, so moving to the Programming forum. There really are a larger number of members who browse that forum that can help a great deal.

Please edit your original post and put the code within [code][/code] tags to separate it from your text and preserve the formatting to make it easier to read.

Do you know what section of this is taking the most time where you wish to improve the performance?
 
Old 02-26-2018, 10:35 AM   #3
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Actually my source file contains millions of record, so while reading the data from while loop, it is taking time, AWK is very fast in reading data, but I don't know much about AWK..Please help
 
Old 02-26-2018, 11:02 AM   #4
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Please do edit your OP to use code tags.
Am I reading it right here:
Loop the big file, parse the compare string into $CODE
Loop the lkp1.txt file (1), see if there's a match,
If a match, write to the output file (2)
Loop the lkp2.txt file (1), see if there's a match
If a match, write to the output file (2)
???
My comments:
(1)Why loop the lookup files? If they are only one line each, just read them. I'd guess the loop requires two passes to find out it's at the end...or better, read the values into an array and test against the array; removes millions of disk i/o's
(2) Why touch the output file before appending to it? The echo is going to update the date and create the file if it doesn't exist.
Those extra steps on "millions of records" will add considerably to processing time, I'd think.

Just some thoughts.
Re awk: What have you tried?
 
Old 02-26-2018, 11:10 AM   #5
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
this is assignment not comparison
Code:
if [[ $CODE = $LKP_AMR1 ]]
so it fails

finding substrings
Code:
# sting1 is in the hat without boots
# string2 is a cow witout milk
substring=is

if [[ string1 =~ 'is' ]] || [[ string2 =~ 'is' ]] ;
both strings will return a find. whereas a direct comparison to strings looking for a subset within the string will never get you the substring you're looking for.

example
Code:
 
userx@slackwhere101:~
$ string="my hat is lost in the field of dreams"
 
 
$ if [[ "$string" == 'in' ]] ; then echo "hat" ; fi

// returned nothing because it is a direct comparison. 
//whereas substings 

 
$ if [[ "$string" =~ 'in' ]] ; then echo "hat" ; fi
hat
returns a hit.

Last edited by BW-userx; 02-26-2018 at 11:20 AM.
 
Old 02-26-2018, 11:17 AM   #6
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Quote:
Originally Posted by BW-userx View Post
this is assignment not comparison
Code:
if [[ $CODE = $LKP_AMR1 ]]
so it fails
No, it's a comparison in shell language
Code:
a=2
[[ $a = 1 ]] || echo "nope..."
 
Old 02-26-2018, 11:37 AM   #7
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by keefaz View Post
No, it's a comparison in shell language
Code:
a=2
[[ $a = 1 ]] || echo "nope..."
are you serious?
Code:
userx@slackwhere101:~
$ a=1
 
$ [[ $a = 4 ]] && echo "poop" || echo "nope"
nope
 
$ [[ $a = 1 ]] && echo "poop" || echo "nope"
poop
 
$ [[ $string = 'os' ]] && echo "$string"

 
$ [[ $string =~ 'os' ]] && echo "$string"
my hat is lost in the field of dreams
I guess I have to stand corrected.

substrings stays the same though.

thanks .. saves my finger tips a little.

Last edited by BW-userx; 02-26-2018 at 11:41 AM.
 
Old 02-26-2018, 12:11 PM   #8
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
There's a difference between integer comparisons and string comparisons. And some other complications.

The recommendation I always go with is to do exactly what you've done BW-userx, which is to test/prove what I'm scripting, so as to be sure I'm making the correct comparison.

Even reading the bash documentation is confusion (for me) when it comes to this topic.
 
Old 02-26-2018, 12:20 PM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
would be nice to post the real code you tried to execute. also please use [code]here comes your script[/code] tags to keep formatting.
Code:
`touch <Path>/<File_Name>
is syntactically incorrect, <Path> is invalid, and also backtick is missing.
Use $( command ) instead of ` command ` (backtick).
Code:
LKP_AMR2=`echo $LINE2`
# is extremely inefficient, use
LKP_AMR2="$LINE2"
# instead
also please try to use shellcheck

by the way both requirements (2 and 3 in your first post) can be implemented with a single grep.

Last edited by pan64; 02-26-2018 at 12:23 PM.
 
Old 02-26-2018, 12:29 PM   #10
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Thanks guys for responding, just to let you know that the code that I have shared is tested and working fine but for few records,for large volume of data it is taking much time to split, please help me in improving the performance or any other way to solve that..
 
Old 02-26-2018, 12:30 PM   #11
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
I agree with pan64 that I'd like to see more code.

The problem statement may be incorrectly typed in. Says to split the AcctID into 5 fields. But there are not 5 fields in that term.

Why not search for the client code in every line, and filter to destination files first, and then reprocess the resultant files to eliminate any outlying terms that shouldn't have been copied?
 
Old 02-26-2018, 12:36 PM   #12
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
again, that is a single grep, nothing more:
fgrep '|AAR' inputfile > outfile
you need to modify a bit if you want to read expressions from file, see man bash
 
Old 02-26-2018, 12:37 PM   #13
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
Please ignore the first points where it is saying to split into 5 substring, just think like I am extracting the first substring from the 4th column that is the code and comparing it with the 1st lookup file, if it matches, send it to one target and again compare with the 2nd lookup file and if matches send it to other target file
 
Old 02-26-2018, 12:40 PM   #14
bishnumnnit2006
LQ Newbie
 
Registered: Feb 2018
Posts: 16

Original Poster
Rep: Reputation: Disabled
@pan64 code is not static, we can get any type of code, there are around 20 client code, so we have a single files with multiple client codes, lookup files can contain any code randomly do out code should be smart enough to handle those
 
Old 02-26-2018, 12:40 PM   #15
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
perhaps even post a chunk of real data so one knows what you're really working with here.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script help to name a file from 1st column in a file SmurfGGM Linux - General 2 08-26-2014 08:07 AM
LXer: Monster RPG 2 source code release on Dec. 1st LXer Syndicated Linux News 0 11-09-2012 01:02 AM
[SOLVED] substring match in specific column udiubu Linux - Newbie 2 06-05-2012 05:29 AM
Printing a column in a file.. slight confusion with my code pdklinux79 Linux - Newbie 3 06-17-2008 06:19 PM
MySQL splitting a string with a delimiter and taking the 1st value and update Lantzvillian Programming 7 01-31-2008 11:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration