[SOLVED] SED Help (Pattern Buffer Overflow I think?)

NvrBst · 01-13-2010, 02:11 PM

$ sed --version
GNU sed version 4.1.5
Copyright (C) 2003 Free Software [...]

I'm using Ubuntu 9.04. I want to search for "XXX[^~]*XXY" and replace XXY with a new value. Reasion I think it has something to do with a buffer overflow is because the line I created works correctly on a 1kb file, but it does not work correctly on the original 8kb file (the 1kb is a subset of the 8kb file).

--Doesn't Work--
$ sed -n '1h;1!H;${;g;s/$0x23F00000[^~]*$ 0x2[137]F00000/\1 NEWADDRESS/;p;}' MYPATCH-8kb.patch

--Works--
$ sed -n '1h;1!H;${;g;s/$0x23F00000[^~]*$ 0x2[137]F00000/\1 NEWADDRESS/;p;}' MYPATCH-1kb.patch

I can post my patch file if needed, but basically it is just the following:

**Lots of Stuff Including Tilda's** 0x23F00000 **Some stuff that doesn't include Tilda** 0x2[137]F00000 **Lots of Stuff Including Tilda's**

I think I have to modify the sed command to not put everything into the hold-space and only do it when it finds a "0x23F00000", but I'm unsure if this is possible. Or a way to increase the buffer size for sed via command line option that I missed?

Alternatively I'm open to other methods that would be simpler than using sed. I've tried ssed but it had the same problem. I started looking at awk but haven't finished testing it yet. Thanks for any help.

Tinkster · 01-13-2010, 02:22 PM

Hi, welcome to LQ!

It would help if you substituted "doesn't work" with an actual error
message. If there's none then there's a good chance the problem is
with the actual data (hard to say w/o having seen it).

Cheers,
Tink

NvrBst · 01-13-2010, 04:55 PM

There is no error messages, but, I can do diff's to illustrate

--1kB Works--

Code:

nvrbst@kubuntu-pc:~/test$ sed -n '1h;1!H;${;g;s/\(0x23F00000[^~]*\) 0x2[137]F00000/\1 NEWADDRESS/;p;}' MYPATCH-1kb.patch > new1k.patch
nvrbst@kubuntu-pc:~/test$ diff MYPATCH-1kb.patch new1k.patch
17c17
< ! #define JUMP_ADDR                     0x21F00000                      /* Final Jump Address         */
---
> ! #define JUMP_ADDR                     NEWADDRESS                      /* Final Jump Address         */
nvrbst@kubuntu-pc:~/test$

--8kB Doesn't Work (new8k.patch is identical to MYPATCH-8kb.patch)--

Code:

nvrbst@kubuntu-pc:~/test$ sed -n '1h;1!H;${;g;s/\(0x23F00000[^~]*\) 0x2[137]F00000/\1 NEWADDRESS/;p;}' MYPATCH-8kb.patch > new8k.patch
nvrbst@kubuntu-pc:~/test$ diff MYPATCH-8kb.patch new8k.patch
nvrbst@kubuntu-pc:~/test$

To make the 1kb patch I simply copied the 8kb patch and deleted the lines that are not relevant (so I can see sed printed to stdout while testing). I don't see how it can be actual data as at the very least it should match the same part of the file (unless there is a sed buffer size issue); which is what I am guessing.

If you think it'll help I don't mind making a 8kb+ test file to simulate the problem. You should be able to make it yourself though by doing the following.

1. PASTE 7kb of Junk Characters (anything including newlines).
2. Add "0x23F00000".
3. PASTE More Junk Character (no tilda's, newlines are okay).
4. Add "0x21F00000".
5. PASTE 1kb of Junk Characters (anything including newlines).
6. Save as MYPATCH-8kb.patch

7. Run commands in first post.

Tinkster · 01-13-2010, 05:06 PM

I'm by no means a sed expert, it's just that I've never seen any
docu referring to there being a limit on buffer size or line length
in GNU sed. It only doesn't seem to like NULL characters, which
is fair enough, and which is where my thought that it might be
the data came in.

GrapefruiTgirl · 01-13-2010, 05:19 PM

Code:

 5. GNU sed's Limitations and Non-limitations

For those who want to write portable sed scripts, be aware that some 
implementations have been known to limit line lengths 
(for the pattern and hold spaces) to be no more than 4000 bytes.
The POSIX standard specifies that conforming sed implementations shall 
support at least 8192 byte line lengths. GNU sed has no built-in limit
on line length; as long as it can malloc() more (virtual) memory, 
you can feed or construct lines as long as you like.

However, recursion is used to handle subpatterns and indefinite 
repetition. This means that the available stack space may limit 
the size of the buffer that can be processed by certain patterns.

I came across some stuff about limits the other day, and don't know where exactly, but just did a search of the net for a second and found this above, from here : http://www.delorie.com/gnu/docs/sed/sed_31.html

I'm no sed expert either by far, and Tinkster knows much more than I about it, AND I am aware that the "Last Updated" date at the page I linked above is 2003, but 2003 is also the only date I see on my sed man page in my machine too. FInally, there are other pages (maybe more recent, maybe less recent) which do NOT say the same as quoted above.. My guess based on this all, is that it is *possible* for *something* to be limiting at least the buffer space, if not the line length which seems to be unlimited according to everywhere.

Sasha

jschiwal · 01-13-2010, 05:31 PM

An 8k file should not cause a problem.
I don't see why you need to store the entire file in the Hold register. Why not let sed apply the rule to each line. The substitution command should be enough. Is it because you only want the first match to be substituted?

syg00 · 01-13-2010, 05:40 PM

Sed doesn't deal with newlines - it's a stream editor. The OP indicated there could be (possibly multiple) newlines in the test field.
If there is the the possibility of multiple consecutive newlines, then that even rules out the next option - awk with the RS set to null.
Maybe just translate the newlines to some other known unused character, do a normal sed substitution on the data, then set the newlines back.

NvrBst · 01-13-2010, 05:42 PM

Quote:

Originally Posted by jschiwal

An 8k file should not cause a problem.
I don't see why you need to store the entire file in the Hold register. Why not let sed apply the rule to each line. The substitution command should be enough. Is it because you only want the first match to be substituted?

Aye, I only want the 2nd match to be substituted (and there are newlines between them). So I couldn't figure out how to do a line by line thing (if it processed the file in reverse i'd be able to get it working I think).

However, I'm very sorry I have found my problem. Tinkster was correct, it had to do with data not the buffer size. Somehow all the tabs got converted to spaces somewhere along the way with the 1kb file and the regular expression I was using "$0x23F00000[^~]*$ 0x2[137]F00000" matched the spaces, but it didn't match the tabs (I thought " " matched any blank space? hehe).

When I converted it to "$0x23F00000[^~]*$\t0x2[137]F00000" it then worked for the original 8kb file. Sorry for the inconvenience

syg00 · 01-13-2010, 05:56 PM

Use [[:space:]] in future - I didn't even see that space character.