LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-22-2018, 03:46 AM   #1
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209
Blog Entries: 1

Rep: Reputation: Disabled
How to split a file according to new lines


I have a file with a pattern like as below. As it is shown the file is contained two parts: header 1 and header 2. These parts are separated by a new line. I need to save each of one in a separated file such as header1, header2. How to split them by new line in linux?

---Begin header 1 ------
sdkjfdlskf
ldsjfsldk
lkdsjf
---- end header 1 -----

---Begin header 2 ------
sadasd
asdas
asd
---- end header 2 ------
 
Old 09-22-2018, 03:56 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,306
Blog Entries: 3

Rep: Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720
It's easy enough to do in awk or perl. The former if the spacing in the headers are consistent, the latter if the spacing in the headers is more random (or if perl is more comfortable). Less than ten instructions more or less ought to do it in either.

Which do you prefer, awk or perl? Also, can you please show your script so far so we can see where you are at and which approach you are trying?

Last edited by Turbocapitalist; 09-22-2018 at 03:58 AM.
 
Old 09-22-2018, 05:53 AM   #3
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
It's easy enough to do in awk or perl. The former if the spacing in the headers are consistent, the latter if the spacing in the headers is more random (or if perl is more comfortable). Less than ten instructions more or less ought to do it in either.

Which do you prefer, awk or perl? Also, can you please show your script so far so we can see where you are at and which approach you are trying?
I didn't write script, yet. But I prefer awk, do u know how to use it?
 
Old 09-22-2018, 05:58 AM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,306
Blog Entries: 3

Rep: Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720
Yes, but we're here to help you learn to write it rather than write it for you.

There are several approaches. One would be:
  • check for the begin header and capture the part to be included in the file name, plus toggle a flag
  • check for the end header and toggle the flag
  • if the flag is set, redirect output to a calculated file name and then print

How familiar are you with awk? Which version of awk do you have?
 
1 members found this post helpful.
Old 09-22-2018, 06:30 AM   #5
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
If headers are consistent and you wish to keep them into the new files, another approach with awk could be :

- Detect the begin header, use it to set the filename.
- Print all lines between the begin and end headers into a file with the filename set in previous instruction.

Awk can work on a range of lines surrounded by borders.
Code:
/beginning/,/ending/ {...}
 
2 members found this post helpful.
Old 09-23-2018, 03:08 AM   #6
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by lougavulin View Post
If headers are consistent and you wish to keep them into the new files, another approach with awk could be :

- Detect the begin header, use it to set the filename.
- Print all lines between the begin and end headers into a file with the filename set in previous instruction.

Awk can work on a range of lines surrounded by borders.
Code:
/beginning/,/ending/ {...}
I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!

Last edited by massy; 09-23-2018 at 03:40 AM.
 
Old 09-23-2018, 03:41 AM   #7
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
Yes, but we're here to help you learn to write it rather than write it for you.

There are several approaches. One would be:
  • check for the begin header and capture the part to be included in the file name, plus toggle a flag
  • check for the end header and toggle the flag
  • if the flag is set, redirect output to a calculated file name and then print

How familiar are you with awk? Which version of awk do you have?
I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!
 
Old 09-23-2018, 03:48 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,306
Blog Entries: 3

Rep: Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720Reputation: 3720
Quote:
Originally Posted by massy View Post
I used it:
Code:
awk '/-----BEGIN/{flag=1} flag > file1;/-----END/{flag=0}' domain-crt.txt
but it doesn't separate the output in two separated files!
Excellent.

I would change the test for 'flag' to be simpler. The print function is needed for later:

Code:
awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt
But you still need a file name. One way is to add another variable that gets incremented when the /begin/ pattern is found. Another way would be to extract a field from the line with the /begin/ pattern and use that. Store the name in a variable.

Either way you'll need to redirect to a file. Look in the manual for awk and scroll down to the explanation of the print function and look at the line with the redirect >
You can try redirecting everything to the same file first.
Then after that you can redirect to a file name stored in a variable.

Remember [code] [/code] tags when posting.

Last edited by Turbocapitalist; 09-23-2018 at 03:49 AM.
 
Old 09-23-2018, 07:51 AM   #9
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
Quote:
Originally Posted by massy View Post
I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!
As Turbocapitalist said you have to redirect to a file. And one way or another, you have to define the filename.

I can not do more without giving you the answer :
Code:
awk '/begining/ { defining filename } /beginning/,/ending/ { writing line into your file }' domain-crt.txt
 
Old 09-23-2018, 08:29 AM   #10
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by lougavulin View Post
As Turbocapitalist said you have to redirect to a file. And one way or another, you have to define the filename.

I can not do more without giving you the answer :
Code:
awk '/begining/ { defining filename } /beginning/,/ending/ { writing line into your file }' domain-crt.txt
I can't understand! I'm stressful to do a task and I don't have much time to learn all things about awk!!! I just need to the exact answer!!
I should have 2 files taken one!!! like this:
Input file is:
---Begin---
ldsjflds
ldsf
---END---
---Begin---
ldsjfsdl
ldsjfs
---END---

Output files should be:

file1:
---Begin---
ldsjflds
ldsf
---END---

file2:
---Begin---
ldsjfsdl
ldsjfs
---END
 
Old 09-23-2018, 08:46 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,830

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
so you need a flag and a counter.
Code:
awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt
is now clear (at least I think), but you want something like this:
Code:
print > filename
where filename will contain file1, file2, file3 ....
You need to add it to the BEGIN block
Code:
/-----BEGIN/{flag=1;counter++;filename="file"counter}
(something like this).
I'm not really sure if this is what you need and also it is not tested, but I hope it helps

Last edited by pan64; 09-23-2018 at 09:11 AM.
 
2 members found this post helpful.
Old 09-24-2018, 01:00 AM   #12
massy
Member
 
Registered: Nov 2013
Distribution: CentOS 6.4
Posts: 209

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
so you need a flag and a counter.
Code:
awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt
is now clear (at least I think), but you want something like this:
Code:
print > filename
where filename will contain file1, file2, file3 ....
You need to add it to the BEGIN block
Code:
/-----BEGIN/{flag=1;counter++;filename="file"counter}
(something like this).
I'm not really sure if this is what you need and also it is not tested, but I hope it helps
Thank you. Below code did work:
Code:
 
awk '/-----BEGIN/{flag=1;counter++;filename="file"counter}{print $0 > filename} /-----END/{flag=0}' domain-crt.txt

Last edited by massy; 09-24-2018 at 01:03 AM.
 
1 members found this post helpful.
Old 09-24-2018, 02:11 AM   #13
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,789

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
You made it!
If you know awk better you can do
filename="file"++counter
and
flag{print > filename}
that would not print the stuff between the begin-end blocks.
If you do not make use of the flag then you don’t need to set it!
 
Old 09-25-2018, 04:30 PM   #14
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,789

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
With bash builtins:
Code:
#!/bin/bash
x=0
while IFS= read line
do
  case $line in
  *---BEGIN---*)
    filename=file$((++x))
    echo "writing $filename ..."
    exec 3> $filename
  ;;
  esac
  echo "$line" >&3
done < domain-crt.txt

Last edited by MadeInGermany; 09-25-2018 at 04:31 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to split a file using 'split ' without destroying lines? CaptainDerp Linux - Software 6 09-20-2016 04:08 AM
split file into chunks delimited by blank lines elalexluna83 Programming 8 06-25-2015 05:35 AM
[SOLVED] Split a file into two - the first being the first two lines and the second the rest jasonws Linux - General 2 11-02-2010 04:32 AM
Scripting: split file into 12 lines array zklone Programming 12 12-07-2009 10:02 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration