ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi to all, newbe to the world of shell scripting, but I hope someone will have an idea I can use. Have a large text file that contains one big memo field with several constants and a value, most of the time. Plan to convert this file to a MySQL as a table that will have several fields plus a large text memo remaining.
For the most part this memo field has constants like P: for phone number, and Name: for the persons name, with there values following in the front part of this memo.
Have been using the sed command with the constants that I know have values and are physically in correct positions. Example FixedRecord=`sed e s/[\ ]* P: / /` to change the phone number constant to a tab in front of the phone number value. After several e extensions of sed the FixedRecord is echo to a file for later processing.
Now I need to figure out a way to do the same thing for constants that may or may not have values following them in the rest of the record. Anyone know of an easy way to check if a constant exist, and if it dose exist, dose it have a value.
For example I want to check for the existence of Item Number: Desc: Start Date: End Date: Duration: Project: Supervisor: constants then check if they have values following them. So far I have done the following script
RecordPart_1to19=`OrigRecord | cut d -f1-19` # first 19 tab fields of record
RecordPart_20=`OrigRecord | cut d f20` # rest of record to search for new fields
RecordPart_20 contains the following for tab conversion.
Item Number: H14J649
Missing Desc: # this constant is missing and so is its value, TAB still required
Start Date: 9/16/2004
End Date:
Duration: 0017
Project:
Supervisor: John Doe
Results I want would be that the new record I build would contain $RecordPart_1to19 TAB H14J649 TAB TAB 9/16/2004 TAB TAB 0017 TAB TAB John Doe TAB $RecordPart_20
# RecordPart_20 when outputted should only contain what remains after the found constants and constant values are removed.
Just need some ideas or be pointed to an example script that shows how to accomplish this task.
while I think it's possible to do what you want to do in a shell script, it isn't going to be very efficient. It might be better to move up a level of sophistication from sed to the likes of awk or Perl. This way, you will only have to read the input record once, and can have variables and more login inside your program.
Does this "memo" get printed on standard output when you execute the program OrigRecord? This is how it appears from the sample you provided above.
If so, I'd probably go about it like this (I'm a Perl fan, so that's what I'd use):
Code:
#!/usr/bin/perl -w
use strict;
open(INPUT, "./OrigRecord|") || die "cound't execute OrigRecord: $!\n";
my %data = ();
while(<INPUT>) {
chomp; # this takes the \n off the end of the line
my ($field, $value) = split(/:/, $_, 2);
$value =~ s/^\s+//; # strip leading whitespace
$field = lc($field); # make field name lower case
$data{$field} = $value;
}
close(INPUT);
print "\$RecordPart_1to19";
foreach my $field ( "item number", "missing desc", "start date", "end date", "duration", "project", "supervisor" ) {
print "\t" . ($data{$field} || "");
}
print "\n";
You could replace the \t with the literal string TAB for testing to show that you are getting enough tabs, or do what I did - use the \t, but send the output through "od -tc". The output with the data you provided above is as follows:
Code:
0000000 $ R e c o r d P a r t _ 1 t o 1
0000020 9 \t H 1 4 J 6 4 9 \t \t 9 / 1 6 /
0000040 2 0 0 4 \t \t 0 0 1 7 \t \t J o h n
0000060 D o e \n
0000065
Thanks matthewg42 for the reply. I forgot to show how I was getting the OrigRecord variable that the cut is used on. This is how I'm doing it.
ORGIFS="$IFS"
IFS="
"
for OrigRecord in `cat the_input_file_with_memo.txt`
do
RecordPart_1to19=`OrigRecord | cut –d’ ‘ -f1-19` # 1st 19 fields have been built
etc., etc/, etc.
Have looked at Perl but for now I'm trying to keep my head above water with the shell scripts. Would like to use awk or whatever it takes to accomplish this last task. Would it be better to build an array with the constants and condition indicator in the array to indicate if a constant was found and it's value was found.
The output with the data you provided above is as follows:
Code:
0000000 $ R e c o r d P a r t _ 1 t o 1
0000020 9 \t H 1 4 J 6 4 9 \t \t 9 / 1 6 /
0000040 2 0 0 4 \t \t 0 0 1 7 \t \t J o h n
0000060 D o e \n
0000065
Is that what you wanted?
Yes that is what I want to accomplish plus the reminder of $RecordPart_20 less the found constants and there values.
Have a large ASCII file with records (29233) that is all text characters. Been able to build the first 19 fields because the record allowed for this (original tabs probably where removed) thus creating a record with 19 tab delimited fields and text memo field giving
RecordPart_1to19 # first 19 tab delimited fields
RecordPart_20 # field 20 that has constants and values plus transcription notes
Now I wish to build new fields from data in this field20 and place them after the existing 19 fields. There are anywhere from 12 – 18 known constants in this field20 “ITEM NO: NAME: START DATE: etc. etc.” that may or may not be in this record. When a constant is present it may or may not have a value.
Example: lets say we have 6 possible constants and values
Constant1=’ITEM NO:’
Constant2=’DESC:’
Constant3=’NAME:’
Constant4=’START DATE:’
Constant5=’END DATE:’
Constant6=’PROJECT CODE’
Field20 contains “ITEM NO: 27J426 NAME: START DATE: 12/05/2006 END DATE: PROJECT CODE: 1B2C This is text at the end of field20 and can be as big as 8-12kb“
Plus the constants and values should be removed/deleted from field 20 because we have the values for the new tab delimited fields and do not want this data repeated. The output record should be something like this (spaces used to make it more readable and TAB=hex 09, VT 0B)
I hope this explains the task I’m trying to accomplish. I wish to keep this process in a shell script using whatever command I need to use. With shell scripts I am able to sort of keep my head above water while keeping what I have the same language (Scripts).
This might be a good way split up the last portion, but it is based on the assumption that there is no space in the value:
Code:
#!/usr/bin/perl -w
use strict;
my $input = "ITEM NO: 27J426 NAME: START DATE: 12/05/2006 END DATE: PROJECT CODE: 1B2C";
print "INPUT DATA IS: $input\n\n";
while ( length($input) > 0 ) {
if ( $input =~ s/^([^:]+):\s*([^\s]+)(\s+)?// ) {
print "field name is $1\n";
print "field value is $2\n";
print "remaining input is $input\n\n";
}
else {
print "oh dear, we can't match the pattern \"field: value\"\n";
last;
}
}
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.