LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-16-2012, 07:01 AM   #1
ashishkumar1000
LQ Newbie
 
Registered: Jul 2012
Posts: 3

Rep: Reputation: Disabled
Need to remove empty lines from a file


Hi,
I want to remove all the empty line from a text file.
The size of file is 22gb and i have only 30 gb disk space out of which only 5 gb is free.
 
Old 08-16-2012, 07:04 AM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,233

Rep: Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708
Are they all in a few (or one) groups, or randomly scattered through the file?
How much is space and how much is data to be kept?
 
Old 08-16-2012, 07:08 AM   #3
ashishkumar1000
LQ Newbie
 
Registered: Jul 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by chrism01 View Post
Are they all in a few (or one) groups, or randomly scattered through the file?
How much is space and how much is data to be kept?
HI.
The data size is use. i cannot use awk since awk need to redirect the output to a different file.
The size of the disk is 30 gb and size of my data file is around 22gb.
i need to delete all the new line whose length is zero.

####SAMPLE INPUT FILE######
abc
xyz

abc
fgh

bng

bjh
 
Old 08-16-2012, 11:13 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 25,798

Rep: Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745
Quote:
Originally Posted by ashishkumar1000 View Post
HI.
The data size is use. i cannot use awk since awk need to redirect the output to a different file.
The size of the disk is 30 gb and size of my data file is around 22gb.
i need to delete all the new line whose length is zero.

####SAMPLE INPUT FILE######
abc
xyz

abc
fgh

bng

bjh
Ok, what have you done/tried so far? And 30GB of disk for a system that generates 22GB files seems very slim, especially in this day, when 1TB disks are less than $75.

I suggest you look at sed, and try looking up some examples. This may work:
Code:
sed -i -e '/^$/d' filename.txt
...that removes blank lines from a file, in place.
 
2 members found this post helpful.
Old 08-16-2012, 12:26 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,976

Rep: Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181Reputation: 3181
Is an empty line defined by having nothing on it, as the example from TB0ne has shown, or are we to also expect lines that have no visible characters as well?

Also, the following statement is in error:
Quote:
i cannot use awk since awk need to redirect the output to a different file.
As you could do the following:
Code:
awk '/./{print > FILENAME}' yourfile
It may be interesting to note that sed actually creates a temp file so this may cause an issue also??
 
1 members found this post helpful.
Old 08-16-2012, 12:35 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I don't think sed will work. The -i option also creates a temporary file behind the scenes.

If space weren't an issue it would be trivial, with a whole host of options you could choose from. But as it is, I personally can't think of any solution that doesn't require either a temporary file or loading a copy into RAM (which is how most text editors work, I believe). With so little free space you'd need something that could edit the contents directly. You may need to custom-create something in a serious programming language for that.

My best suggestion, if possible, would be to get some kind of external storage to work with. An extra hard disk or a USB drive large enough to handle temporary copies that large.
 
1 members found this post helpful.
Old 08-16-2012, 01:57 PM   #7
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 25,798

Rep: Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745
Quote:
Originally Posted by David the H. View Post
I don't think sed will work. The -i option also creates a temporary file behind the scenes.

If space weren't an issue it would be trivial, with a whole host of options you could choose from. But as it is, I personally can't think of any solution that doesn't require either a temporary file or loading a copy into RAM (which is how most text editors work, I believe). With so little free space you'd need something that could edit the contents directly. You may need to custom-create something in a serious programming language for that.

My best suggestion, if possible, would be to get some kind of external storage to work with. An extra hard disk or a USB drive large enough to handle temporary copies that large.
Good catch, David. Not sure if it will work, but surely ANY sort of file processing will create some means of working with the file, wouldn't it? Not even sure that awk would work, but if it creates a temp file, perhaps it would be in the /tmp directory, which may be on a different partition.

Regardless, the OP is in a bad spot...22GB file will have to be processed somehow. OP, you could also try using vi, and (in escape mode)
Code:
:g/^$/ d
But, since vi creates a swap file by default, you'd have to start it with the "-n" switch, to disable that.
 
1 members found this post helpful.
Old 08-16-2012, 06:31 PM   #8
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,233

Rep: Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708Reputation: 2708
Quote:
Not sure if it will work, but surely ANY sort of file processing will create some means of working with the file, wouldn't it?
This was worried me when I saw the qn; there's a lot of hidden background space needed for just about any tool.
As above, you could write a program in eg Perl or C that used a combination of tell + seek to move lines back up through the file, then truncate it in-situ.

http://perldoc.perl.org/functions/seek.html
http://perldoc.perl.org/functions/tell.html

Last edited by chrism01; 08-16-2012 at 06:32 PM.
 
Old 08-17-2012, 12:58 PM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Here's a bit of a hack solution I scripted that may fit the requirements.

It uses tail to grab a fixed chunk off the end of the file, and then truncate to shorten the file by that same amount. Then it processes each chunk separately and reassembles the final file.

I tested it on files up to a few MiB in size, and it appears to work ok. But I can't promise it's completely safe, since it has to operate destructively on the original, without backup. It's also not fast; it seems to be taking about 1 second per MiB on my system.

Code:
#!/bin/bash

infile="infile.txt"
outfile="outfile.txt"
blocksize=1M
count=0

# process a "blocksize" block of text from the end of the file during each
# iteration of the loop, remove all the blank lines from it, and store in a temp
# file.  Remove an equivalent amount from the original file at the same time.

while [[ -s $infile ]]; do

		# read in a block of text, and truncate the original file by the same amount.
		# pads each end with an "x" also, which will be removed later.
		# this ensures that any newlines at the ends of the block are preserved.

		block=$( printf 'x' ; tail -c "$blocksize" "$infile" ; printf 'x' )
		truncate --size=-"$blocksize" "$infile"

		# pre-increment the counting variable for each block.
		(( count++ ))

		# "squeeze" all newlines in the block, and print it to a tempfile.
		# then remove the characters at the ends and print the result to
		# a temporary file, with $count in its name.
		# unwanted newlines from creeping back in.

		block=$( tr -s '\n' <<<"$block" )
		block=${block#x}
		block=${block%x}
		printf '%s' "$block" > "$outfile.$count"

done

# now cat all the tempfiles back together.
# since we started from the bottom we work
# from highest number to lowest.

while (( count )); do

	cat "$outfile.$count" >> "$outfile"
	rm "$outfile.$count"
	(( count-- ))

done

# Finish by adding a final newline back to the file (comment out if not desired).
echo >> "$outfile"

exit 0
I still say the best solution would be to simply get more disk space of some kind. It looks like you can get cheap 32MB thumbdrives for under US$20 now, for example.
 
1 members found this post helpful.
Old 08-17-2012, 02:27 PM   #10
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 25,798

Rep: Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745Reputation: 7745
Very nice, DavidTheH....and I agree about the additional space, too. A system that generates 22GB files certainly needs more than 30 GB of drive space.
 
Old 08-19-2012, 02:13 PM   #11
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 359

Rep: Reputation: 170Reputation: 170
dd can edit a file in-place, but I don't recommend using it on valuable data; if it were interrupted it would leave a corrupted file.

The test file:
1095073792 bytes with 66722 blank lines.

This overwrites the old file with the smaller new file, however, at first the file keeps its old size. To cut the file down to its new smaller size we need to know the precise number of bytes copied by dd.
Code:
# Some dd versions tell you the number of bytes
tr -s '\n' < file | dd of=file bs=1M conv=notrunc
0+140231 records in
0+140231 records out
1095007070 bytes .... # Number of bytes through the pipe (from dd)

# Otherwise, in Bash try this
tr -s '\n' < file | tee >(wc -c >&2) | dd of=file bs=1M conv=notrunc
1095007070  # Number of bytes through the pipe (from wc)
0+133329 records in
0+133329 records out

# Truncating the file to the new size:
dd if=/dev/null seek=1095007070 of=file bs=1
0+0 records in
0+0 records out
When I tried grail's awk command on a 1GB test file the result was a 4KB file instead of the approximately 1GB file I was expecting. Also, the command terminated so quickly it didn't have time to go through the full 1GB of data. A possible explanation is that awk reads in the first filesystem IO Block of 4KB from the file, modifies it, writes it back to the file, and then truncates the file at that point. So when awk tries to read in the second IO Block it thinks it has reached the end of the file and quits.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove specific lines from file elfoozo Programming 9 01-22-2011 11:07 AM
How to remove first 2 lines of a file in a script nazs Programming 16 02-19-2007 07:08 AM
How to remove lines from a file doza Linux - General 2 04-27-2005 11:59 AM
remove identical lines in a file benjithegreat98 Linux - General 4 04-24-2004 06:12 AM
How do i remove blank lines from a file? kakho Programming 1 04-15-2004 03:57 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration