[SOLVED] How to split a file according to new lines

massy · 09-22-2018, 03:46 AM

I have a file with a pattern like as below. As it is shown the file is contained two parts: header 1 and header 2. These parts are separated by a new line. I need to save each of one in a separated file such as header1, header2. How to split them by new line in linux?

---Begin header 1 ------
sdkjfdlskf
ldsjfsldk
lkdsjf
---- end header 1 -----

---Begin header 2 ------
sadasd
asdas
asd
---- end header 2 ------

Turbocapitalist · 09-22-2018, 03:56 AM

It's easy enough to do in awk or perl. The former if the spacing in the headers are consistent, the latter if the spacing in the headers is more random (or if perl is more comfortable). Less than ten instructions more or less ought to do it in either.

Which do you prefer, awk or perl? Also, can you please show your script so far so we can see where you are at and which approach you are trying?

massy · 09-22-2018, 05:53 AM

Quote:

Originally Posted by Turbocapitalist

It's easy enough to do in awk or perl. The former if the spacing in the headers are consistent, the latter if the spacing in the headers is more random (or if perl is more comfortable). Less than ten instructions more or less ought to do it in either.

Which do you prefer, awk or perl? Also, can you please show your script so far so we can see where you are at and which approach you are trying?

I didn't write script, yet. But I prefer awk, do u know how to use it?

Turbocapitalist · 09-22-2018, 05:58 AM

Yes, but we're here to help you learn to write it rather than write it for you.

There are several approaches. One would be:

check for the begin header and capture the part to be included in the file name, plus toggle a flag
check for the end header and toggle the flag
if the flag is set, redirect output to a calculated file name and then print

How familiar are you with awk? Which version of awk do you have?

lougavulin · 09-22-2018, 06:30 AM

If headers are consistent and you wish to keep them into the new files, another approach with awk could be :

- Detect the begin header, use it to set the filename.
- Print all lines between the begin and end headers into a file with the filename set in previous instruction.

Awk can work on a range of lines surrounded by borders.

Code:

/beginning/,/ending/ {...}

massy · 09-23-2018, 03:08 AM

Quote:

Originally Posted by lougavulin

If headers are consistent and you wish to keep them into the new files, another approach with awk could be :

- Detect the begin header, use it to set the filename.
- Print all lines between the begin and end headers into a file with the filename set in previous instruction.

Awk can work on a range of lines surrounded by borders.

Code:

/beginning/,/ending/ {...}

I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!

massy · 09-23-2018, 03:41 AM

Quote:

Originally Posted by Turbocapitalist

Yes, but we're here to help you learn to write it rather than write it for you.

There are several approaches. One would be:

check for the begin header and capture the part to be included in the file name, plus toggle a flag
check for the end header and toggle the flag
if the flag is set, redirect output to a calculated file name and then print

How familiar are you with awk? Which version of awk do you have?

I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!

Turbocapitalist · 09-23-2018, 03:48 AM

Quote:

Originally Posted by massy

I used it:

Code:

awk '/-----BEGIN/{flag=1} flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!

Excellent.

I would change the test for 'flag' to be simpler. The print function is needed for later:

Code:

awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt

But you still need a file name. One way is to add another variable that gets incremented when the /begin/ pattern is found. Another way would be to extract a field from the line with the /begin/ pattern and use that. Store the name in a variable.

Either way you'll need to redirect to a file. Look in the manual for awk and scroll down to the explanation of the print function and look at the line with the redirect >
You can try redirecting everything to the same file first.
Then after that you can redirect to a file name stored in a variable.

Remember [code] [/code] tags when posting.

lougavulin · 09-23-2018, 07:51 AM

Quote:

Originally Posted by massy

I used it:
awk '/-----BEGIN/{flag=1}flag > file1;/-----END/{flag=0}' domain-crt.txt

but it doesn't separate the output in two separated files!

As Turbocapitalist said you have to redirect to a file. And one way or another, you have to define the filename.

I can not do more without giving you the answer :

Code:

awk '/begining/ { defining filename } /beginning/,/ending/ { writing line into your file }' domain-crt.txt

massy · 09-23-2018, 08:29 AM

Quote:

Originally Posted by lougavulin

As Turbocapitalist said you have to redirect to a file. And one way or another, you have to define the filename.

I can not do more without giving you the answer :

Code:

awk '/begining/ { defining filename } /beginning/,/ending/ { writing line into your file }' domain-crt.txt

I can't understand! I'm stressful to do a task and I don't have much time to learn all things about awk!!! I just need to the exact answer!!
I should have 2 files taken one!!! like this:
Input file is:
---Begin---
ldsjflds
ldsf
---END---
---Begin---
ldsjfsdl
ldsjfs
---END---

Output files should be:

file1:
---Begin---
ldsjflds
ldsf
---END---

file2:
---Begin---
ldsjfsdl
ldsjfs
---END

pan64 · 09-23-2018, 08:46 AM

so you need a flag and a counter.

Code:

awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt

is now clear (at least I think), but you want something like this:

Code:

print > filename

where filename will contain file1, file2, file3 ....
You need to add it to the BEGIN block

Code:

/-----BEGIN/{flag=1;counter++;filename="file"counter}

(something like this).
I'm not really sure if this is what you need and also it is not tested, but I hope it helps

massy · 09-24-2018, 01:00 AM

Quote:

Originally Posted by pan64

so you need a flag and a counter.

Code:

awk '/-----BEGIN/{flag=1} flag{print} /-----END/{flag=0}' domain-crt.txt

is now clear (at least I think), but you want something like this:

Code:

print > filename

where filename will contain file1, file2, file3 ....
You need to add it to the BEGIN block

Code:

/-----BEGIN/{flag=1;counter++;filename="file"counter}

(something like this).
I'm not really sure if this is what you need and also it is not tested, but I hope it helps

Thank you. Below code did work:

Code:

 
awk '/-----BEGIN/{flag=1;counter++;filename="file"counter}{print $0 > filename} /-----END/{flag=0}' domain-crt.txt

MadeInGermany · 09-24-2018, 02:11 AM

You made it!
If you know awk better you can do
filename="file"++counter
and
flag{print > filename}
that would not print the stuff between the begin-end blocks.
If you do not make use of the flag then you don’t need to set it!

MadeInGermany · 09-25-2018, 04:30 PM

With bash builtins:

Code:

#!/bin/bash
x=0
while IFS= read line
do
  case $line in
  *---BEGIN---*)
    filename=file$((++x))
    echo "writing $filename ..."
    exec 3> $filename
  ;;
  esac
  echo "$line" >&3
done < domain-crt.txt