LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-19-2012, 10:55 AM   #1
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Rep: Reputation: Disabled
How to split a text based on keywords and put each block in a separate file?


hi,

I'm trying to do a task here, that is to separate certain block of text from a file and put them separately into different files. I've searched the net for resources and luckily landed on this forum, hope you can help me out.

To be exact, I'm working on a file containing SQL commands. All these SQL commands are stored in a file, and I want to extract each one of them and put them onto files-their filenames bearing their object names (ie. CREATE TABLE tableA - gives out tableA.tab as a filename).

here's what the SQL file looks like

CREATE TABLE PHONEBOOK_TABLE ...
;
------------------------------------------------

CREATE VIEW PHONEBOOK_VIEW ...
;

------------------------------------------------
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;


Now, following the code posted above I was only able to remove the --- lines but seemed clueless on what to do next. Is there a way to read the third word in a paragraph (regardless of space or next line) and use that as a filename ? Example below

file PHONEBOOK_TABLE.tab contains
CREATE TABLE PHONEBOOK_TABLE ...
;

file PHONEBOOK_HIDDEN_VIEW.vw contains
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;

hope you guys can help me out. Thanks
 
Old 04-19-2012, 02:53 PM   #2
quarlington
LQ Newbie
 
Registered: Apr 2012
Distribution: Fedora, RHEL, Unbreakable
Posts: 4

Rep: Reputation: Disabled
If I'm reading what you're trying to do correctly, this code may be a quick and dirty start of a solution - you'll still have to deal with the file extensions though:


#!/bin/bash

INFILE=$1
DIR=$(pwd)
FILE=$DIR/$INFILE

while read LINE
do
STATEMENT="$STATEMENT $LINE"
echo "$LINE" >> $DIR/tmpfile.tmp
echo "$LINE" | grep \; > /dev/null
if [ $? -eq 0 ]; then
NEWFILENAME=$(echo $STATEMENT | awk '{print $3}')
mv $DIR/tmpfile.tmp $DIR/$NEWFILENAME
STATEMENT=""
fi
done < $FILE

exit 0
 
Old 04-19-2012, 03:31 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well searching the forums can help to:

http://www.linuxquestions.org/questi...-files-940465/
 
Old 04-21-2012, 04:40 AM   #4
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Hi Quarlington,

Thanks for this one, I'll try to decipher and please do correct me if I'm wrong, let's see if I understood it right.

Quote:
Originally Posted by quarlington View Post
If I'm reading what you're trying to do correctly, this code may be a quick and dirty start of a solution - you'll still have to deal with the file extensions though:


#!/bin/bash

INFILE=$1 # Pass the first parameter to the variable INFILE
DIR=$(pwd) # Set the value for variable DIR to the present working directory
FILE=$DIR/$INFILE # set the value of variable FILE to the fully qualified file name

while read LINE # while loop, where the variable LINE came from?
do
STATEMENT="$STATEMENT $LINE" # hmm i got lost here, are you passing each statement on the while loop to the variable STATEMENT? What does variable LINE contain?

echo "$LINE" >> $DIR/tmpfile.tmp # append the lines to a temporay file
echo "$LINE" | grep \; > /dev/null # get the line with ";" character and send to output /dev/null (blank)
if [ $? -eq 0 ]; then
NEWFILENAME=$(echo $STATEMENT | awk '{print $3}')
mv $DIR/tmpfile.tmp $DIR/$NEWFILENAME
STATEMENT=""
fi # If my idea of looping thru the file, reading line per line is correct, won't this create multiple files with the third word as the file name? Breaking the block of SQL into multiple files, one line per file ?

done < $FILE

exit 0
Thanks.
 
Old 04-21-2012, 04:48 AM   #5
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Well searching the forums can help to:

http://www.linuxquestions.org/questi...-files-940465/
Guru Grail,

Thank you for responding.

Here's what I got from the other post

Quote:
Originally Posted by grail View Post
awk 'BEGIN{i=1}/^[A-Z].*proc/,/\//{print > "File"i}/\//{i++}' orig_file
I've seen people used this tool, I've made a few studies with it and based from the cryptic statement above, What I understood is

awk 'BEGIN{i=1} # Begin statement and setting variable i to the value 1
/^[A-Z].*proc/ # find all lines containing A to Z as a line start character, the .* I don't understand but if used with conjunction to proc might mean, any line with characters preceeding "proc", did I understood it right?
/\//{print > "File"i}/\//{i++}' # What I only understood here is you'll want to search for the "/" symbol and print the line into File[i] where I would be the number, and a search for the "/" symbol again and {i++} would represent a loop increment?

Hmm I haven't have an access to a unix box at the moment but will this code of yours print the lines one by one to a file?

Thank you.
 
Old 04-21-2012, 05:28 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Not too bad a shot at the understanding, let me flesh it out a little more to hopefully make it clear:

BEGIN{i=1} - the upshot here is that we initialise the variable 'i'. The other piece of information is that BEGIN is only ever performed once prior to all files being read.

/^[A-Z].*proc/,/\// - As you can see I have included the test for the slash (/) as this is called a range. This means that from finding a line that starts (^) with a capital letter ([A-Z])
followed by zero or more of any character (.*) and finally the string "proc" perform the tasks inside the curly braces until you reach a line containing a slash (/)

{print > "File"i} - Only when previous expression equates to true, print the currently stored line into a file called "FileN", where N is the current value of the variable "i"

/\//{i++} - This is completely separate to the previous tests and actions. On any line that contains a slash (/), increment the variable "i" by 1

So with a few changes this could be made to process your data, but the upside is that your data is actually a little easier. Awk has a variable called RS (record separator) which allows one
to define what makes a single record. As your data, assuming example is correct, always has a line of dashes between each record you can now use this as the RS.

As the solution is trivial I will let you investigate further. Let me know if you get stuck?

Also, here is a valuable resource for awk: - http://www.gnu.org/software/gawk/man...ode/index.html
 
Old 04-23-2012, 10:03 AM   #7
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Not too bad a shot at the understanding, let me flesh it out a little more to hopefully make it clear:

BEGIN{i=1} - the upshot here is that we initialise the variable 'i'. The other piece of information is that BEGIN is only ever performed once prior to all files being read.

/^[A-Z].*proc/,/\// - As you can see I have included the test for the slash (/) as this is called a range. This means that from finding a line that starts (^) with a capital letter ([A-Z])
followed by zero or more of any character (.*) and finally the string "proc" perform the tasks inside the curly braces until you reach a line containing a slash (/)

{print > "File"i} - Only when previous expression equates to true, print the currently stored line into a file called "FileN", where N is the current value of the variable "i"

/\//{i++} - This is completely separate to the previous tests and actions. On any line that contains a slash (/), increment the variable "i" by 1

So with a few changes this could be made to process your data, but the upside is that your data is actually a little easier. Awk has a variable called RS (record separator) which allows one
to define what makes a single record. As your data, assuming example is correct, always has a line of dashes between each record you can now use this as the RS.

As the solution is trivial I will let you investigate further. Let me know if you get stuck?

Also, here is a valuable resource for awk: - http://www.gnu.org/software/gawk/man...ode/index.html
Hi Grail,

I've read your awk pages and managed to do this line

awk 'BEGIN{RS="---------------------------------------------------------------------------";FS="\n"} /^$/,/\;/ { print $0 }' file_SQL.txt

the RS is the record separator, and since the block of codes were separated with dashed lines, so I used them as RS; following it would be the field separator. If I'm right, this \n will treat the field as one line. So next, I placed my search condition - starting with a blank line till it reaches something with ";"

Interestingly, when I executed this one with print $0, the first block of SQL text came out.

CREATE TABLE PHONEBOOK_TABLE ...
some statements ...
;

when I tried changing the $0 to $1, it showed nothing
when I tried $2, there it shows only the first statement
CREATE TABLE PHONEBOOK_TABLE ...

I'm trying to imagine things, go like say, the whole file contains 10 block of codes, separated by dashes. How could I tell awk that I want those 10 block of codes inside a collection and either loop thru it one by one (per block not per lines) and saving them to file

am I still on the right track?
 
Old 04-23-2012, 11:20 AM   #8
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
heres my stab at it (i cheated a little by editing the input to make all the stanzas a standard format:
Code:
[schneidz@hyper patatahead]$ cat patatahead.txt | while read line; do if [ "`echo $line | grep "\---"`" ]; then  read fout; read line; echo $fout > `echo $fout | awk '{print $3}'`.txt; fi; echo $line >> `echo $fout | awk '{print $3}'`.txt ; done
[schneidz@hyper patatahead]$ head *.txt
==> patatahead.txt <==
----------------------------------------
CREATE TABLE PHONEBOOK_TABLE ...
;
------------------------------------------------
CREATE VIEW PHONEBOOK_VIEW ...
;

------------------------------------------------
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;

==> PHONEBOOK_HIDDEN_VIEW.txt <==
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;

==> PHONEBOOK_TABLE.txt <==
CREATE TABLE PHONEBOOK_TABLE ...
;

==> PHONEBOOK_VIEW.txt <==
CREATE VIEW PHONEBOOK_VIEW ...
;

Last edited by schneidz; 04-23-2012 at 11:23 AM.
 
Old 04-23-2012, 11:51 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I am with schneidz in that the format not being uniform could cause issues so you would need to advise if this actual data or if it looks more like the data in post #8?

Making the same assumption, in awk it would look like:
Code:
awk '{print > $3".txt"}' RS="-+\n" file
 
1 members found this post helpful.
Old 04-26-2012, 09:20 AM   #10
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
I am with schneidz in that the format not being uniform could cause issues so you would need to advise if this actual data or if it looks more like the data in post #8?

Making the same assumption, in awk it would look like:
Code:
awk '{print > $3".txt"}' RS="-+\n" file
apologies for the delay in response.

the dashed lines in the SQL file I have are uniform. I've read @schneidz posts and boy! what a way to attack the problem! I liked it! So basically, we loop thru the doc and capture the lines that contain dashed lines "---" and the rest, echo them inside the file.. sweet!

how did you guys get to know those tricks? i'm quite interested to know

@grail: guru, I know that \n stands for new line, what does -+ stand for?
 
Old 04-26-2012, 10:30 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
+ - is regex for one or more of the previous, so in this case one or more -'s

btw. The dashes are not we were concerned about, the issue was the following comparison from yours to schneidz:
Code:
CREATE VIEW 
PHONEBOOK_HIDDEN_VIEW ...
;

CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
So our scripts require the CREATE to have all parts on the one line as in your example there is no $3 on either line.
 
Old 04-29-2012, 09:08 AM   #12
patatahead
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
+ - is regex for one or more of the previous, so in this case one or more -'s

btw. The dashes are not we were concerned about, the issue was the following comparison from yours to schneidz:
Code:
CREATE VIEW 
PHONEBOOK_HIDDEN_VIEW ...
;

CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
So our scripts require the CREATE to have all parts on the one line as in your example there is no $3 on either line.
hi Grail,

yeah, too bad, poorly written code that is! I had a situation wherein it gets much worst than this. Imagine, having a keyword split in two due to wrong line sizing. To split these SQLs in different files are one thing, to get the name from numerous create object lines is another story. sigh ... but much thanks for you and schneidz's effort. I don't have access to a unix box during weeknights and weekends. That's why today, I converted my windows box into a linux box.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Inserting a block of text into a text file on system boot krptodr Linux - Newbie 5 02-14-2012 07:11 PM
split file based on number of string apperance mcbenus Programming 10 12-24-2009 06:44 PM
How to put back a split file procfs Linux - General 8 08-16-2006 03:58 AM
bash: split text file iluvatar Programming 4 08-22-2005 08:58 AM
split up text file jollyjoice Programming 4 06-10-2005 03:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration