Advice on how to structure a cut command when dealing with very old files?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Advice on how to structure a cut command when dealing with very old files?
It took me a while to get a linux system up and running, very busy, but one of the things I want to get the hang of is the file processing. To practice I got a bunch of old floppy disks with message threads from my old Compuserve days (and this took a while too), they are flat files and I can easily read them using cat or more. My goal was to go through the file and generate a list of users, subjects, times and such. I thought this would be a good way to practice.
So far I've run into a problem with the cut command in that it can't seem to handle the variable lengths and more importantly delimiters. Or I don't know how to structure it so that it will. My main hang up is that I can't define more than a one character delimiter. The file reads like this
#: 0 S0/Forum Announcement
09-Aug-94 09:35:29
Sb: Announcement
Fm: System
To:
Comics/Animation Forum, V. 3B(73)
Hello, Mark S. Ogilvie
Last visit: 08-Aug-94 10:57:23
Forum messages: 592768 to 656114
Last message you've read: 592768
Section(s) Selected: All
Number of Members in Conference: None
Forum !
ˇ
#: 0 S0/Forum Announcement
14-Aug-94 08:03:03
Sb: Announcement
Fm: System
To:
ˇ
#: 0 S0/Forum Announcement
14-Aug-94 08:44:10
Sb: Announcement
Fm: System
To:
Comics/Animation Forum+, V. 3B(73)
Hello, Mark S. Ogilvie
Last visit: 14-Aug-94 08:03:11
Forum messages: 638497 to 661075
Last message you've read: 638497
Section(s) Selected: All
Number of Members in Conference: None
News Flash:
Updated August 12.
We are soon going to be getting new Forum software that will allow us to open
more sections. Users of certain communications software need to make sure they
have a version that will handle this.
All versions of WinCIM, MacCIM, NavCIS and ASCII programs (ProComm, OzCIS,
etc.) will access the new areas, automatically. Programs which need updating
to access the new areas are:
DOSCIM - You need version 2.2, or later (GO CIMSOFT). If you don't wish to
upgrade, you must enter the forum in Terminal Emulation mode (GO ASCII) to see
the sections above 17.
Mac Navigator - you need version 3.2.1, or later (GO NAVIGATOR).
TAPCIS - you need version 5.42, or later (GO TAPCIS).
AutoSIG - you need version 7, or later (GO IBMCOM).
Post a message to SYSOP if you need help.
--------------------------------------------------------------------------
Japanimation CONference every Sunday at 9 pm Eastern time.
General CONference every Wednesday at 9 pm EST.
BREAKING IN CONference with Rob Davis, second Thursday of each month.
WITSIG party every Saturday at 10pm Eastern in CON room 17, open to all.
--------------------------------------------------------------------------
For biographies of most of the industry professionals that hang out here, read
the files PROBIO.TXT (detailed) or PROSYS.TXT (brief) in LIB 1.
We love to get graphics files, but PLEASE remember that you must have the
right to upload the graphic! Pictures scanned from books, videos or magazines
can NOT be uploaded; that's a violation of copyright.
Please do not repeatedly attempt to page or chat with members that you see in
the Forum. Many of them use auto-navigators or are unable to respond to
real-time chat. Attend our weekly informal conference on Wednesday, or post a
message - it is much easier, and gets you a better reply.
We also ask our members to use their real names, first and last.
Forum !
ˇ
#: 0 S0/Forum Announcement
14-Aug-94 08:52:36
Sb: Announcement
Fm: System
To:
Comics/Animation Forum+, V. 3B(73)
Hello, Mark S. Ogilvie
Last visit: 14-Aug-94 08:46:20
Forum messages: 638497 to 661075
Last message you've read: 661075
Section(s) Selected: All
Number of Members in Conference: None
Forum !
#: 658561 S1/General
11-Aug-94 21:25:50
Sb: #Lois n Clark show's dumb
Fm: Phil Adams 72470,1156
To: David Munier 73160,1670 (X)
Actually, I think realistic dialogue is a helluvalot more entertaining
than most of what passes for dialogue in entertainment today.
Phil Adams
Promethean Studios
There is 1 Reply.
#: 658836 S1/General
12-Aug-94 00:51:20
Sb: #658561-#Lois n Clark show's dumb
Fm: David Munier 73160,1670
To: Phil Adams 72470,1156 (X)
True. I was actually thinking of some real dialogue that doesn't go much
beyond:
"Hi"
"Hey"
<Grunt>
I was a bit tired when I wrote that remark.
Well-written dialogue is always entertaining. But that seems to be a
redundant statement.
-David Munier
There is 1 Reply.
Am I expecting too much from the cut command? If I could define the delimiter I could separate out lines like Sb: Fm: and such, but I can't figure out a way to do that. Am I using the wrong command?
You provided a sample of the input file. It would be helpful if you also provided a corresponding output file. That would help the readers to better understand the problem, and also to test any code we might write.
You provided a sample of the input file. It would be helpful if you also provided a corresponding output file. That would help the readers to better understand the problem, and also to test any code we might write.
Daniel B. Martin
I'm a little embarrassed to say that I didn't think of putting up the output file. I'll put it out tomorrow after work.
I'm sure the awk ninjas will come along shortly and do their thing. But you could also use a loop to read the file.
Something like this:
Code:
#!/bin/bash
while read line; do
if [[ $(echo $line | grep '^Fm') ]]; then
echo $line | awk '{ print $2 " " $3 }'
fi
done < Mark_S.txt
exit 0
From your infile (here known as 'Mark_S.txt'), I get this result (extracting only the lines beginning with 'Fm' and then printing first and, if there is one, last names):
Code:
$ ./read_Mark_S.sh
System
System
System
System
Phil Adams
David Munier
You can of course expand this in any number of ways.
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: #Lois n Clark show's dumb Fm: Phil Adams 72470,1156
Sb: #658561-#Lois n Clark show's dumb Fm: David Munier 73160,1670
Explanation: grep "^Sb\|^Fm" $InFile reads InFile, keeps lines starting with Sb or Fm. tr -cd '\11\12\15\40-\176' gets rid of "garbage" characters. paste -sd" \n" combines matching Sb and Fm lines. >$OutFile writes OutFile.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.