Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
04-19-2012, 10:55 AM
|
#1
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Rep: 
|
How to split a text based on keywords and put each block in a separate file?
hi,
I'm trying to do a task here, that is to separate certain block of text from a file and put them separately into different files. I've searched the net for resources and luckily landed on this forum, hope you can help me out.
To be exact, I'm working on a file containing SQL commands. All these SQL commands are stored in a file, and I want to extract each one of them and put them onto files-their filenames bearing their object names (ie. CREATE TABLE tableA - gives out tableA.tab as a filename).
here's what the SQL file looks like
CREATE TABLE PHONEBOOK_TABLE ...
;
------------------------------------------------
CREATE VIEW PHONEBOOK_VIEW ...
;
------------------------------------------------
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;
Now, following the code posted above I was only able to remove the --- lines but seemed clueless on what to do next. Is there a way to read the third word in a paragraph (regardless of space or next line) and use that as a filename ? Example below
file PHONEBOOK_TABLE.tab contains
CREATE TABLE PHONEBOOK_TABLE ...
;
file PHONEBOOK_HIDDEN_VIEW.vw contains
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;
hope you guys can help me out. Thanks
|
|
|
04-19-2012, 02:53 PM
|
#2
|
LQ Newbie
Registered: Apr 2012
Distribution: Fedora, RHEL, Unbreakable
Posts: 4
Rep: 
|
If I'm reading what you're trying to do correctly, this code may be a quick and dirty start of a solution - you'll still have to deal with the file extensions though:
#!/bin/bash
INFILE=$1
DIR=$(pwd)
FILE=$DIR/$INFILE
while read LINE
do
STATEMENT="$STATEMENT $LINE"
echo "$LINE" >> $DIR/tmpfile.tmp
echo "$LINE" | grep \; > /dev/null
if [ $? -eq 0 ]; then
NEWFILENAME=$(echo $STATEMENT | awk '{print $3}')
mv $DIR/tmpfile.tmp $DIR/$NEWFILENAME
STATEMENT=""
fi
done < $FILE
exit 0
|
|
|
04-19-2012, 03:31 PM
|
#3
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
|
|
|
04-21-2012, 04:40 AM
|
#4
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Original Poster
Rep: 
|
Hi Quarlington,
Thanks for this one, I'll try to decipher and please do correct me if I'm wrong, let's see if I understood it right.
Quote:
Originally Posted by quarlington
If I'm reading what you're trying to do correctly, this code may be a quick and dirty start of a solution - you'll still have to deal with the file extensions though:
#!/bin/bash
INFILE=$1 # Pass the first parameter to the variable INFILE
DIR=$(pwd) # Set the value for variable DIR to the present working directory
FILE=$DIR/$INFILE # set the value of variable FILE to the fully qualified file name
while read LINE # while loop, where the variable LINE came from?
do
STATEMENT="$STATEMENT $LINE" # hmm i got lost here, are you passing each statement on the while loop to the variable STATEMENT? What does variable LINE contain?
echo "$LINE" >> $DIR/tmpfile.tmp # append the lines to a temporay file
echo "$LINE" | grep \; > /dev/null # get the line with ";" character and send to output /dev/null (blank)
if [ $? -eq 0 ]; then
NEWFILENAME=$(echo $STATEMENT | awk '{print $3}')
mv $DIR/tmpfile.tmp $DIR/$NEWFILENAME
STATEMENT=""
fi # If my idea of looping thru the file, reading line per line is correct, won't this create multiple files with the third word as the file name? Breaking the block of SQL into multiple files, one line per file ?
done < $FILE
exit 0
|
Thanks.
|
|
|
04-21-2012, 04:48 AM
|
#5
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Original Poster
Rep: 
|
Quote:
Originally Posted by grail
|
Guru Grail,
Thank you for responding.
Here's what I got from the other post
Quote:
Originally Posted by grail
awk 'BEGIN{i=1}/^[A-Z].*proc/,/\//{print > "File"i}/\//{i++}' orig_file
|
I've seen people used this tool, I've made a few studies with it and based from the cryptic statement above, What I understood is
awk 'BEGIN{i=1} # Begin statement and setting variable i to the value 1
/^[A-Z].*proc/ # find all lines containing A to Z as a line start character, the .* I don't understand but if used with conjunction to proc might mean, any line with characters preceeding "proc", did I understood it right?
/\//{print > "File"i}/\//{i++}' # What I only understood here is you'll want to search for the "/" symbol and print the line into File[i] where I would be the number, and a search for the "/" symbol again and {i++} would represent a loop increment?
Hmm I haven't have an access to a unix box at the moment but will this code of yours print the lines one by one to a file?
Thank you.
|
|
|
04-21-2012, 05:28 AM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Not too bad a shot at the understanding, let me flesh it out a little more to hopefully make it clear:
BEGIN{i=1} - the upshot here is that we initialise the variable 'i'. The other piece of information is that BEGIN is only ever performed once prior to all files being read.
/^[A-Z].*proc/,/\// - As you can see I have included the test for the slash (/) as this is called a range. This means that from finding a line that starts (^) with a capital letter ([A-Z])
followed by zero or more of any character (.*) and finally the string "proc" perform the tasks inside the curly braces until you reach a line containing a slash (/)
{print > "File"i} - Only when previous expression equates to true, print the currently stored line into a file called "FileN", where N is the current value of the variable "i"
/\//{i++} - This is completely separate to the previous tests and actions. On any line that contains a slash (/), increment the variable "i" by 1
So with a few changes this could be made to process your data, but the upside is that your data is actually a little easier. Awk has a variable called RS (record separator) which allows one
to define what makes a single record. As your data, assuming example is correct, always has a line of dashes between each record you can now use this as the RS.
As the solution is trivial I will let you investigate further. Let me know if you get stuck?
Also, here is a valuable resource for awk: - http://www.gnu.org/software/gawk/man...ode/index.html
|
|
|
04-23-2012, 10:03 AM
|
#7
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Original Poster
Rep: 
|
Quote:
Originally Posted by grail
Not too bad a shot at the understanding, let me flesh it out a little more to hopefully make it clear:
BEGIN{i=1} - the upshot here is that we initialise the variable 'i'. The other piece of information is that BEGIN is only ever performed once prior to all files being read.
/^[A-Z].*proc/,/\// - As you can see I have included the test for the slash (/) as this is called a range. This means that from finding a line that starts (^) with a capital letter ([A-Z])
followed by zero or more of any character (.*) and finally the string "proc" perform the tasks inside the curly braces until you reach a line containing a slash (/)
{print > "File"i} - Only when previous expression equates to true, print the currently stored line into a file called "FileN", where N is the current value of the variable "i"
/\//{i++} - This is completely separate to the previous tests and actions. On any line that contains a slash (/), increment the variable "i" by 1
So with a few changes this could be made to process your data, but the upside is that your data is actually a little easier. Awk has a variable called RS (record separator) which allows one
to define what makes a single record. As your data, assuming example is correct, always has a line of dashes between each record you can now use this as the RS.
As the solution is trivial I will let you investigate further. Let me know if you get stuck?
Also, here is a valuable resource for awk: - http://www.gnu.org/software/gawk/man...ode/index.html
|
Hi Grail,
I've read your awk pages and managed to do this line
awk 'BEGIN{RS="---------------------------------------------------------------------------";FS="\n"} /^$/,/\;/ { print $0 }' file_SQL.txt
the RS is the record separator, and since the block of codes were separated with dashed lines, so I used them as RS; following it would be the field separator. If I'm right, this \n will treat the field as one line. So next, I placed my search condition - starting with a blank line till it reaches something with ";"
Interestingly, when I executed this one with print $0, the first block of SQL text came out.
CREATE TABLE PHONEBOOK_TABLE ...
some statements ...
;
when I tried changing the $0 to $1, it showed nothing
when I tried $2, there it shows only the first statement
CREATE TABLE PHONEBOOK_TABLE ...
I'm trying to imagine things, go like say, the whole file contains 10 block of codes, separated by dashes. How could I tell awk that I want those 10 block of codes inside a collection and either loop thru it one by one (per block not per lines) and saving them to file
am I still on the right track?
|
|
|
04-23-2012, 11:20 AM
|
#8
|
LQ Guru
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,326
|
heres my stab at it (i cheated a little by editing the input to make all the stanzas a standard format:
Code:
[schneidz@hyper patatahead]$ cat patatahead.txt | while read line; do if [ "`echo $line | grep "\---"`" ]; then read fout; read line; echo $fout > `echo $fout | awk '{print $3}'`.txt; fi; echo $line >> `echo $fout | awk '{print $3}'`.txt ; done
[schneidz@hyper patatahead]$ head *.txt
==> patatahead.txt <==
----------------------------------------
CREATE TABLE PHONEBOOK_TABLE ...
;
------------------------------------------------
CREATE VIEW PHONEBOOK_VIEW ...
;
------------------------------------------------
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
==> PHONEBOOK_HIDDEN_VIEW.txt <==
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
==> PHONEBOOK_TABLE.txt <==
CREATE TABLE PHONEBOOK_TABLE ...
;
==> PHONEBOOK_VIEW.txt <==
CREATE VIEW PHONEBOOK_VIEW ...
;
Last edited by schneidz; 04-23-2012 at 11:23 AM.
|
|
|
04-23-2012, 11:51 AM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
I am with schneidz in that the format not being uniform could cause issues so you would need to advise if this actual data or if it looks more like the data in post #8?
Making the same assumption, in awk it would look like:
Code:
awk '{print > $3".txt"}' RS="-+\n" file
|
|
1 members found this post helpful.
|
04-26-2012, 09:20 AM
|
#10
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Original Poster
Rep: 
|
Quote:
Originally Posted by grail
I am with schneidz in that the format not being uniform could cause issues so you would need to advise if this actual data or if it looks more like the data in post #8?
Making the same assumption, in awk it would look like:
Code:
awk '{print > $3".txt"}' RS="-+\n" file
|
apologies for the delay in response.
the dashed lines in the SQL file I have are uniform. I've read @schneidz posts and boy! what a way to attack the problem! I liked it! So basically, we loop thru the doc and capture the lines that contain dashed lines "---" and the rest, echo them inside the file.. sweet!
how did you guys get to know those tricks? i'm quite interested to know
@grail: guru, I know that \n stands for new line, what does -+ stand for?
|
|
|
04-26-2012, 10:30 AM
|
#11
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
+ - is regex for one or more of the previous, so in this case one or more -'s
btw. The dashes are not we were concerned about, the issue was the following comparison from yours to schneidz:
Code:
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
So our scripts require the CREATE to have all parts on the one line as in your example there is no $3 on either line.
|
|
|
04-29-2012, 09:08 AM
|
#12
|
LQ Newbie
Registered: Apr 2012
Posts: 6
Original Poster
Rep: 
|
Quote:
Originally Posted by grail
+ - is regex for one or more of the previous, so in this case one or more -'s
btw. The dashes are not we were concerned about, the issue was the following comparison from yours to schneidz:
Code:
CREATE VIEW
PHONEBOOK_HIDDEN_VIEW ...
;
CREATE VIEW PHONEBOOK_HIDDEN_VIEW ...
;
So our scripts require the CREATE to have all parts on the one line as in your example there is no $3 on either line.
|
hi Grail,
yeah, too bad, poorly written code that is! I had a situation wherein it gets much worst than this. Imagine, having a keyword split in two due to wrong line sizing. To split these SQLs in different files are one thing, to get the name from numerous create object lines is another story. sigh ... but much thanks for you and schneidz's effort. I don't have access to a unix box during weeknights and weekends. That's why today, I converted my windows box into a linux box. 
|
|
|
All times are GMT -5. The time now is 02:37 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|