LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 01-18-2007, 11:18 AM   #1
ljungers
LQ Newbie
 
Registered: Jan 2006
Posts: 7

Rep: Reputation: 0
Shell script to find/replace build new TAB record


Hi to all, newbe to the world of shell scripting, but I hope someone will have an idea I can use. Have a large text file that contains one big memo field with several constants and a value, most of the time. Plan to convert this file to a MySQL as a table that will have several fields plus a large text memo remaining.

For the most part this memo field has constants like “P:” for phone number, and “Name:” for the persons name, with there values following in the front part of this memo.
Have been using the sed command with the constants that I know have values and are physically in correct positions. Example “ FixedRecord=`sed –e “s/[\ ]* P: / /”` to change the phone number constant to a tab in front of the phone number value. After several –e extensions of sed the FixedRecord is echo to a file for later processing.

Now I need to figure out a way to do the same thing for constants that may or may not have values following them in the rest of the record. Anyone know of an easy way to check if a constant exist, and if it dose exist, dose it have a value.

For example I want to check for the existence of “Item Number: Desc: Start Date: End Date: Duration: Project: Supervisor:” constants then check if they have values following them. So far I have done the following script
RecordPart_1to19=`OrigRecord | cut –d’ ‘ -f1-19` # first 19 tab fields of record
RecordPart_20=`OrigRecord | cut –d’ ‘ –f20` # rest of record to search for new fields
RecordPart_20 contains the following for tab conversion.
Item Number: H14J649
Missing Desc: # this constant is missing and so is it’s value, TAB still required
Start Date: 9/16/2004
End Date:
Duration: 0017
Project:
Supervisor: John Doe

Results I want would be that the new record I build would contain “$RecordPart_1to19 TAB H14J649 TAB TAB 9/16/2004 TAB TAB 0017 TAB TAB John Doe TAB $RecordPart_20”
# RecordPart_20 when outputted should only contain what remains after the found constants and constant values are removed.

Just need some ideas or be pointed to an example script that shows how to accomplish this task.

Thanks in advance for any and all help.
 
Old 01-18-2007, 12:36 PM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Hi ljungers,

while I think it's possible to do what you want to do in a shell script, it isn't going to be very efficient. It might be better to move up a level of sophistication from sed to the likes of awk or Perl. This way, you will only have to read the input record once, and can have variables and more login inside your program.

Does this "memo" get printed on standard output when you execute the program OrigRecord? This is how it appears from the sample you provided above.

If so, I'd probably go about it like this (I'm a Perl fan, so that's what I'd use):
Code:
#!/usr/bin/perl -w

use strict;

open(INPUT, "./OrigRecord|") || die "cound't execute OrigRecord: $!\n";
my %data = ();
while(<INPUT>) {
        chomp;     # this takes the \n off the end of the line
        my ($field, $value) = split(/:/, $_, 2);
        $value =~ s/^\s+//;   # strip leading whitespace
        $field = lc($field);  # make field name lower case
        $data{$field} = $value;
}
close(INPUT);

print "\$RecordPart_1to19";
foreach my $field ( "item number", "missing desc", "start date", "end date", "duration", "project", "supervisor" ) {
        print "\t" . ($data{$field} || "");
}
print "\n";
You could replace the \t with the literal string TAB for testing to show that you are getting enough tabs, or do what I did - use the \t, but send the output through "od -tc". The output with the data you provided above is as follows:
Code:
0000000   $   R   e   c   o   r   d   P   a   r   t   _   1   t   o   1
0000020   9  \t   H   1   4   J   6   4   9  \t  \t   9   /   1   6   /
0000040   2   0   0   4  \t  \t   0   0   1   7  \t  \t   J   o   h   n
0000060       D   o   e  \n
0000065
Is that what you wanted?
 
Old 01-18-2007, 01:08 PM   #3
ljungers
LQ Newbie
 
Registered: Jan 2006
Posts: 7

Original Poster
Rep: Reputation: 0
Thanks matthewg42 for the reply. I forgot to show how I was getting the OrigRecord variable that the cut is used on. This is how I'm doing it.

ORGIFS="$IFS"
IFS="
"
for OrigRecord in `cat the_input_file_with_memo.txt`
do
RecordPart_1to19=`OrigRecord | cut –d’ ‘ -f1-19` # 1st 19 fields have been built
etc., etc/, etc.

Have looked at Perl but for now I'm trying to keep my head above water with the shell scripts. Would like to use awk or whatever it takes to accomplish this last task. Would it be better to build an array with the constants and condition indicator in the array to indicate if a constant was found and it's value was found.

I hope this helps some. Thanks
 
Old 01-18-2007, 04:05 PM   #4
ljungers
LQ Newbie
 
Registered: Jan 2006
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by matthewg42
The output with the data you provided above is as follows:
Code:
0000000   $   R   e   c   o   r   d   P   a   r   t   _   1   t   o   1
0000020   9  \t   H   1   4   J   6   4   9  \t  \t   9   /   1   6   /
0000040   2   0   0   4  \t  \t   0   0   1   7  \t  \t   J   o   h   n
0000060       D   o   e  \n
0000065
Is that what you wanted?
Yes that is what I want to accomplish plus the reminder of $RecordPart_20 less the found constants and there values.

Is that possible to do?
 
Old 01-18-2007, 08:29 PM   #5
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Quote:
Originally Posted by ljungers
Yes that is what I want to accomplish plus the reminder of $RecordPart_20 less the found constants and there values.

Is that possible to do?
Anything's possible with perl Alas, I don't understand what you mean.
 
Old 01-19-2007, 10:46 AM   #6
ljungers
LQ Newbie
 
Registered: Jan 2006
Posts: 7

Original Poster
Rep: Reputation: 0
Have a large ASCII file with records (29233) that is all text characters. Been able to build the first 19 fields because the record allowed for this (original tabs probably where removed) thus creating a record with 19 tab delimited fields and text memo field giving

RecordPart_1to19 # first 19 tab delimited fields
RecordPart_20 # field 20 that has constants and values plus transcription notes

Now I wish to build new fields from data in this field20 and place them after the existing 19 fields. There are anywhere from 12 – 18 known constants in this field20 “ITEM NO: NAME: START DATE: etc. etc.” that may or may not be in this record. When a constant is present it may or may not have a value.

Example: lets say we have 6 possible constants and values
Constant1=’ITEM NO:’
Constant2=’DESC:’
Constant3=’NAME:’
Constant4=’START DATE:’
Constant5=’END DATE:’
Constant6=’PROJECT CODE’

Field20 contains “ITEM NO: 27J426 NAME: START DATE: 12/05/2006 END DATE: PROJECT CODE: 1B2C This is text at the end of field20 and can be as big as 8-12kb“

The results would be

Value1=27J426
Value2=
Value3=
Value4=12/05/2006
Value5=
Value6=1B2C

Plus the constants and values should be removed/deleted from field 20 because we have the values for the new tab delimited fields and do not want this data repeated. The output record should be something like this (spaces used to make it more readable and TAB=hex 09, VT 0B)

“$RecordPart_1to19 TAB Value1 TAB Value2 TAB Value3 TAB Value4 TAB Value5 TAB Value6 TAB $RecordPart_20”

I hope this explains the task I’m trying to accomplish. I wish to keep this process in a shell script using whatever command I need to use. With shell scripts I am able to sort of keep my head above water while keeping what I have the same language (Scripts).

Thanks for any help and ideas on this.
 
Old 01-19-2007, 05:47 PM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
This might be a good way split up the last portion, but it is based on the assumption that there is no space in the value:
Code:
#!/usr/bin/perl -w

use strict;

my $input = "ITEM NO: 27J426 NAME: START DATE: 12/05/2006 END DATE: PROJECT CODE: 1B2C";

print "INPUT DATA IS: $input\n\n";
while ( length($input) > 0 ) {
        if ( $input =~ s/^([^:]+):\s*([^\s]+)(\s+)?// ) {
                print "field name is $1\n";
                print "field value is $2\n";
                print "remaining input is $input\n\n";
        }
        else {
                print "oh dear, we can't match the pattern \"field: value\"\n";
                last;
        }
}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how does a rpm build happen + shell script help sailu_mvn Linux - Software 38 07-21-2006 03:40 AM
How can I use a shell script to add and replace lines in a file? abefroman Programming 10 12-27-2005 06:05 PM
Search and replace text in file using shell script? matthurne Linux - Software 2 11-02-2004 11:11 AM
1. shell script "find and replace" on text 2. java GUI application randomx Programming 4 03-05-2004 02:01 PM
how to find the pid of a perl script from shell script toovato Linux - General 1 12-19-2003 07:25 PM


All times are GMT -5. The time now is 11:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration