LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 02-15-2010, 07:07 PM   #1
lt1776
LQ Newbie
 
Registered: Feb 2010
Posts: 13

Rep: Reputation: 0
Need help manipulating a text file


So here is the deal. I have a massive text file full of random information. I want to extract certain paragraphs and output them into a separate text file.

Here is a sample of the initial text file:
Quote:
Reading 67: The Theory of Active Portfolio Management
The candidate should be able to:
a. justify active portfolio management when security markets are nearly efficient;
b. discuss the steps and the approach of the Treynor–Black model for security
selection;
c. describe how an analyst’s accuracy in forecasting alphas can be measured and
how estimates of forecasting can be incorporated into the Treynor–Black
approach.




www.example.dot/toolkit—Your online preparation resource

Study Session 18




Reading 68: The Portfolio Management Process and the Investment
Policy Statement
The candidate should be able to:
a. explain the importance of the portfolio perspective;
b. describe the steps of the portfolio management process and the components of
those steps;
c. define investment objectives and constraints and explain and distinguish among
the types of investment objectives and constraints;
d. discuss the role of the investment policy statement in the portfolio management
process and explain the elements of an investment policy statement;
e. explain how capital market expectations and the investment policy statement
help influence the strategic asset allocation decision, and discuss how investors’
investment time horizon may influence their strategic asset allocation;
f. contrast the types of investment time horizons, determine the time horizon for a
particular investor, and evaluate the effects of this time horizon on portfolio
choice;
g. justify ethical conduct as a requirement for managing investment portfolios.




www.example.dot/toolkit—Your online preparation resource
Here is what I want to see in the output file:
Quote:
Quote:
Reading 67: The Theory of Active Portfolio Management
The candidate should be able to:
a. justify active portfolio management when security markets are nearly efficient;
b. discuss the steps and the approach of the Treynor–Black model for security
selection;
c. describe how an analyst’s accuracy in forecasting alphas can be measured and
how estimates of forecasting can be incorporated into the Treynor–Black
approach.

Reading 68: The Portfolio Management Process and the Investment
Policy Statement
The candidate should be able to:
a. explain the importance of the portfolio perspective;
b. describe the steps of the portfolio management process and the components of
those steps;
c. define investment objectives and constraints and explain and distinguish among
the types of investment objectives and constraints;
d. discuss the role of the investment policy statement in the portfolio management
process and explain the elements of an investment policy statement;
e. explain how capital market expectations and the investment policy statement
help influence the strategic asset allocation decision, and discuss how investors’
investment time horizon may influence their strategic asset allocation;
f. contrast the types of investment time horizons, determine the time horizon for a
particular investor, and evaluate the effects of this time horizon on portfolio
choice;
g. justify ethical conduct as a requirement for managing investment portfolios.
I can match the beginning of each paragraph using the regex 'Reading [0-9]*:'. This regex will return the first line of the paragraph, but I also need to select the remaining bullets until the end of the bullets. I can match the end of the bullets with the regex '^$'.

If I could find a way to grep and everything between 'Reading [0-9]:' and '^$' then output that to a file, this would fix my problem. Please help. Thanks

Last edited by Tinkster; 02-16-2010 at 06:46 PM. Reason: anonymise URL
 
Old 02-15-2010, 08:19 PM   #2
allanf
Member
 
Registered: Sep 2008
Location: MN
Distribution: Gentoo, Fedora, Suse, Slackware, Debian, CentOS
Posts: 97
Blog Entries: 1

Rep: Reputation: 19
Are you attempting to remove the "www.example.dot/toolkit—Your online preparation resource" lines or are you doing more stuff?

Last edited by Tinkster; 02-16-2010 at 06:46 PM. Reason: anonymise URL
 
Old 02-16-2010, 08:02 AM   #3
lt1776
LQ Newbie
 
Registered: Feb 2010
Posts: 13

Original Poster
Rep: Reputation: 0
Yes. I want to remove the URL as well as the empty spaces as well as other next which is not part of the paragraphs.

I wanted the first block of quoted text to look like the second block of quoted text
 
Old 02-16-2010, 08:26 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
How about the "address range" in SED?

Code:
sed -n '/Reading *[0-9]/,/^$/p' filename > newfilename
returns all lines beginning with one containing "Reading" plus any number of spaces followed by 1 digit; and ending with the first blank line.

Slightly more robust:
Code:
sed -n '/Reading *[0-9]*:/,/^$/p' filename > newfilename
 
Old 02-16-2010, 11:11 AM   #5
lt1776
LQ Newbie
 
Registered: Feb 2010
Posts: 13

Original Poster
Rep: Reputation: 0
Right on the money. The 'more robust' option worked perfectly. Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
manipulating text in python suran Programming 4 11-30-2009 11:33 PM
Manipulating text phillipseamore Linux - Newbie 6 10-05-2008 12:51 PM
Manipulating Text File with awk or sed kushalkoolwal Programming 2 09-10-2008 07:35 PM
Editing/manipulating text files winchester169 Linux - Software 3 08-18-2005 06:01 PM


All times are GMT -5. The time now is 03:33 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration