ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
because printf doesn't insert a newline unless you tell it to, the output you see will be lines concat together, until the key word "the" is found, then print a newline. this is much more simpler to understand than the bunch of sed secret code
Before I can run side-by-side tests, the above needs to be modified to remove extra spaces and to have a (linefeed or EOF?) at the end.
Note the following:
Code:
[mherring@Ath play]$ awk 'NR>1&&$1=="the"{print ""}{ printf "%s ",$0}' words.txt
the house is blue
the cat is hungry
the sun is bright
the
the cat gone
the [mherring@Ath play]$ sed -n '${H;x;s/\n/ /g;s/^ *//;s/ \+/ /g;s/ the/\nthe/g;p};H' words.txt
the house is blue
the cat is hungry
the sun is bright
the
the cat gone
the
[mherring@Ath play]$
Note that the awk solution lacks a line feed, EOF, or ? Also, it's hard to see, but it leaves in some extra spaces (where there are blank lines in the original file.)
I would guess that my solution could have the same issue (the whole file winds up in the SED buffers.)
I have tried to find a SED solution that goes line by line, but no luck so far.
On the other hand, you could preprocess using tr to get rid of the newlines, or even sed to put everything on one line and trim whitespace before splitting it up again. Nothing says sed has to load a seemingly-endless line all at once if the pattern doesn't require it.
Kevin Barry
Originally Posted by ghostdog74 View Post
ok on small files, but will choke on big files.
Quote:
Originally Posted by pixellany
I would guess that my solution could have the same issue (the whole file winds up in the SED buffers.)
I have tried to find a SED solution that goes line by line, but no luck so far.
My SED solution fails with a file size of 15MB, but works at 4MB. The alpha-geek will want to figure out exactly what file size breaks it and why.......
#!/bin/bash
while read -r LINE ; do
BUFFER=" $BUFFER $LINE "
BUFFER="${BUFFER// / }"
BUFFER="${BUFFER:1}"
BUFFER="${BUFFER// the /$'\n'the }"
if [[ "$BUFFER" =~ $'\n' ]] ; then
echo "${BUFFER%%$'\n'*}"
BUFFER="${BUFFER#*$'\n'}"
fi
done
BUFFER="${BUFFER:0:$((${#BUFFER}-1))}"
BUFFER="${BUFFER// the /$'\n'the }"
echo "$BUFFER"
The sed version actually has the the vs. thesis problem too
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.