LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-16-2015, 04:16 PM   #1
Mark_S
LQ Newbie
 
Registered: Jul 2015
Posts: 28

Rep: Reputation: Disabled
Advice on how to structure a cut command when dealing with very old files?


It took me a while to get a linux system up and running, very busy, but one of the things I want to get the hang of is the file processing. To practice I got a bunch of old floppy disks with message threads from my old Compuserve days (and this took a while too), they are flat files and I can easily read them using cat or more. My goal was to go through the file and generate a list of users, subjects, times and such. I thought this would be a good way to practice.

So far I've run into a problem with the cut command in that it can't seem to handle the variable lengths and more importantly delimiters. Or I don't know how to structure it so that it will. My main hang up is that I can't define more than a one character delimiter. The file reads like this


#: 0 S0/Forum Announcement
09-Aug-94 09:35:29
Sb: Announcement
Fm: System
To:

Comics/Animation Forum, V. 3B(73)

Hello, Mark S. Ogilvie
Last visit: 08-Aug-94 10:57:23

Forum messages: 592768 to 656114
Last message you've read: 592768

Section(s) Selected: All

Number of Members in Conference: None

Forum !


ˇ

#: 0 S0/Forum Announcement
14-Aug-94 08:03:03
Sb: Announcement
Fm: System
To:






ˇ

#: 0 S0/Forum Announcement
14-Aug-94 08:44:10
Sb: Announcement
Fm: System
To:

Comics/Animation Forum+, V. 3B(73)

Hello, Mark S. Ogilvie
Last visit: 14-Aug-94 08:03:11

Forum messages: 638497 to 661075
Last message you've read: 638497

Section(s) Selected: All

Number of Members in Conference: None

News Flash:

Updated August 12.

We are soon going to be getting new Forum software that will allow us to open
more sections. Users of certain communications software need to make sure they
have a version that will handle this.

All versions of WinCIM, MacCIM, NavCIS and ASCII programs (ProComm, OzCIS,
etc.) will access the new areas, automatically. Programs which need updating
to access the new areas are:
DOSCIM - You need version 2.2, or later (GO CIMSOFT). If you don't wish to
upgrade, you must enter the forum in Terminal Emulation mode (GO ASCII) to see
the sections above 17.
Mac Navigator - you need version 3.2.1, or later (GO NAVIGATOR).
TAPCIS - you need version 5.42, or later (GO TAPCIS).
AutoSIG - you need version 7, or later (GO IBMCOM).

Post a message to SYSOP if you need help.

--------------------------------------------------------------------------
Japanimation CONference every Sunday at 9 pm Eastern time.
General CONference every Wednesday at 9 pm EST.
BREAKING IN CONference with Rob Davis, second Thursday of each month.
WITSIG party every Saturday at 10pm Eastern in CON room 17, open to all.
--------------------------------------------------------------------------
For biographies of most of the industry professionals that hang out here, read
the files PROBIO.TXT (detailed) or PROSYS.TXT (brief) in LIB 1.

We love to get graphics files, but PLEASE remember that you must have the
right to upload the graphic! Pictures scanned from books, videos or magazines
can NOT be uploaded; that's a violation of copyright.

Please do not repeatedly attempt to page or chat with members that you see in
the Forum. Many of them use auto-navigators or are unable to respond to
real-time chat. Attend our weekly informal conference on Wednesday, or post a
message - it is much easier, and gets you a better reply.

We also ask our members to use their real names, first and last.


Forum !


ˇ

#: 0 S0/Forum Announcement
14-Aug-94 08:52:36
Sb: Announcement
Fm: System
To:

Comics/Animation Forum+, V. 3B(73)

Hello, Mark S. Ogilvie
Last visit: 14-Aug-94 08:46:20

Forum messages: 638497 to 661075
Last message you've read: 661075

Section(s) Selected: All

Number of Members in Conference: None

Forum !

#: 658561 S1/General
11-Aug-94 21:25:50
Sb: #Lois n Clark show's dumb
Fm: Phil Adams 72470,1156
To: David Munier 73160,1670 (X)

Actually, I think realistic dialogue is a helluvalot more entertaining
than most of what passes for dialogue in entertainment today.


Phil Adams
Promethean Studios

There is 1 Reply.

#: 658836 S1/General
12-Aug-94 00:51:20
Sb: #658561-#Lois n Clark show's dumb
Fm: David Munier 73160,1670
To: Phil Adams 72470,1156 (X)

True. I was actually thinking of some real dialogue that doesn't go much
beyond:

"Hi"
"Hey"
<Grunt>

I was a bit tired when I wrote that remark.

Well-written dialogue is always entertaining. But that seems to be a
redundant statement.

-David Munier

There is 1 Reply.



Am I expecting too much from the cut command? If I could define the delimiter I could separate out lines like Sb: Fm: and such, but I can't figure out a way to do that. Am I using the wrong command?
 
Old 09-16-2015, 07:18 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Mark_S View Post
Am I expecting too much from the cut command?
awk is a better choice.

You provided a sample of the input file. It would be helpful if you also provided a corresponding output file. That would help the readers to better understand the problem, and also to test any code we might write.

Daniel B. Martin
 
Old 09-16-2015, 07:30 PM   #3
Mark_S
LQ Newbie
 
Registered: Jul 2015
Posts: 28

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
awk is a better choice.

You provided a sample of the input file. It would be helpful if you also provided a corresponding output file. That would help the readers to better understand the problem, and also to test any code we might write.

Daniel B. Martin
I'm a little embarrassed to say that I didn't think of putting up the output file. I'll put it out tomorrow after work.
 
Old 09-17-2015, 03:53 AM   #4
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
I'm sure the awk ninjas will come along shortly and do their thing. But you could also use a loop to read the file.
Something like this:

Code:
#!/bin/bash

while read line; do
    if [[ $(echo $line | grep '^Fm') ]]; then
        echo $line | awk '{ print $2 " " $3 }'
    fi  
done < Mark_S.txt

exit 0
From your infile (here known as 'Mark_S.txt'), I get this result (extracting only the lines beginning with 'Fm' and then printing first and, if there is one, last names):
Code:
$ ./read_Mark_S.sh 
System 
System 
System 
System 
Phil Adams
David Munier
You can of course expand this in any number of ways.

Best regards,
HMW
 
Old 09-17-2015, 07:22 AM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With the InFile as given in Post #1, this code ...
Code:
grep "^Sb\|^Fm" $InFile      \
|tr -cd '\11\12\15\40-\176'  \
|paste -sd" \n"              \
>$OutFile
... produced this OutFile ...
Code:
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: Announcement Fm: System
Sb: #Lois n Clark show's dumb Fm: Phil Adams 72470,1156
Sb: #658561-#Lois n Clark show's dumb Fm: David Munier 73160,1670
Explanation:
grep "^Sb\|^Fm" $InFile reads InFile, keeps lines starting with Sb or Fm.
tr -cd '\11\12\15\40-\176' gets rid of "garbage" characters.
paste -sd" \n" combines matching Sb and Fm lines.
>$OutFile writes OutFile.

Daniel B. Martin
 
Old 09-17-2015, 07:32 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Please place code or data in [code][/code] tags to maintain formatting.

As you didn't really explain how you wanted to use cut, my initial feedback would be to simply use grep based on your last input:
Code:
grep -E '^\s*(Fm|Sb):' file
This will return the required lines, but not necessarily the data you wanted specifically.
 
Old 09-17-2015, 09:25 AM   #7
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,862
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
(If your files are older than five years, you can use -d option of cut(1))
 
Old 09-17-2015, 10:51 AM   #8
Mark_S
LQ Newbie
 
Registered: Jul 2015
Posts: 28

Original Poster
Rep: Reputation: Disabled
This explains a lot, I was trying to do this with only one line.
cut -f1-4 -d:COMICS1.MSG > comic_test1

I'll try your suggestions tonight and let you know how it comes out. Thanks all.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Command to copy files older than <n> days keeping dir structure. MoonMan89 Linux - General 4 07-15-2010 10:28 PM
help with cut command using find. Cut last 8 characters leaving the rest ncsuapex Programming 4 09-16-2009 08:55 PM
How to use command grep,cut,awk to cut a data from a file? hocheetiong Linux - Newbie 7 09-11-2008 07:16 PM
Advice on filesystem structure mikeyt_333 Linux - General 3 03-14-2006 08:56 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:05 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration