[SOLVED] modify lines that are too long

sharky · 05-25-2010, 01:10 PM

I have a CDL netlist with 5630 lines. 512 of the lines are over 128 characters. The tool I am using to read in the CDL returns an error for each line over 128 characters.

If the line is too long I can fix it by adding a line continuation symbol, in this case a "/", somewhere prior to the 128th character then a line feed, obviously, and a "+" to the continuation.

example (pretend its a long line);
before;
this line is too long
after;
this line /
+ is too long

Part of the problem is that I can't use a constant point prior to the 128th character because I can't break up a term.

bad;
this line i /
+ s too long

If I can replace the last space before the 128th character with " / \n+ " on all lines that are over 128 characters then I'm golden. I'm not sure if I need to escape the + or not. If so then the substitution is " / \n\+ ". And if I use sed then I'll escape the \.

I'll be digging through awk and sed references but if someone has an answer then please save me the work.

)

sharky · 05-25-2010, 03:06 PM

I've started a perl script but not making much progress. I open the file, split the lines by whitespace into an @list and somewhere in that array I'll append " /" to $list[#] and prepend "+ " to $list[#+1]. Then I need to write that array back to $line.

colucix · 05-25-2010, 03:56 PM

Yes, the idea is good. Moreover you don't need to split the line if there is a function which gives the position of blank spaces. Since I'm not clever with perl, here is a possible solution in awk:

Code:

length > 128 {
  string = $0
  p = match(string,/ /)
  
  while ( p ) {
    pre = p
    sub(/ /,"_",string)
    p = match(string,/ /)
    if ( p > 127 ) break
  }
  
  $0 = ( substr($0, 1, pre) "/\n +" substr($0, pre) )
}
1

Here the match function gives the index of the first character of the matching substring (in this case the position of the leftmost single blank space). Inside the loop the blank spaces are progressively substituted by another character and the position of the first blank is computed again. When the index is beyond the 127th character, the loop breaks and we retain the position of the previous blank space (stored in the variable 'pre').

After that we can simply rebuild the original line using substrings. This assumes that the lines cannot be over 256 character length. Another assumption is that every line has at least one space within the first 127 characters. If my assumptions are correct, maybe this is what you're looking for.

sharky · 05-25-2010, 04:07 PM

Got this to work in perl. The elements I chose were arbitrary and worked for my particular case.

Quote:

open(CDL, "netlist.cdl");
while ($line = <CDL>)
{
if (length $line > 120)
{
@list = split(/ /, $line);
$list[6] = "$list[6]\n";
$list[7] = "+ $list[7]";
$line = "@list";

}
print $line
}
close CDL;

colucix's example looks like a better general solution.

Thanks,

grail · 05-25-2010, 11:47 PM

I know this is already solved, but colucix inspired me to try and come up with a solution irrelevant of length (ie perhaps greater than 256):

Code:

awk 'length > 128{for(i=int(length/125);i>0;i--){match(substr($0,0,i*125),/.* /);sub(substr($0,0,RLENGTH),"&/\n+")}}1' input

colucix · 05-26-2010, 03:39 PM

Code:

sub(substr($0,0,RLENGTH),"&/\n+")

Sorry grail, nice idea but it doesn't work in some cases. The reason clearly explained in the GNU awk user's guide (in the description of the sub function):

Quote:

if the regexp is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match.

This means that if you have some special character in the substring, it will be treated with the regexp rules and doubtfully the substring will match. For example just a little plus sign can change the meaning of the expression and the string will not match anymore.

Moreover, I have some doubt about the decrement of the length of the substring in the match statement. This is for reasons I cannot explain right now (almost late in the night, here...) but a test demonstrates that there is a shift somewhere:

Code:

$ cat infile
--- --- 10--- --- 20--- --- 30--- --- 40--- --- 50--- --- 60--- --- 70--- --- 80--- --- 90--- ---100--- ---110--- ---120--- ---8-0--- ---140--- ---150--- ---160--- ---170--- ---180--- ---190--- ---200--- ---210--- ---220--- ---230--- ---240--- ---250--- -6---0--- ---270--- ---280--- ---290--- ---300--- ---310--- ---320--- ---330--- ---340--- ---350--- ---360--- ---370--- ---380--- ---390--- ---400--- ---410--- ---420--- ---430--- ---440--- ---450--- ---460--- ---470--- ---480--- ---490--- ---500--- ---510--- ---520
short line
$ awk '
> length > 128 {
>   for (i=int(length/125);i>0;i--){
>     match(substr($0,0,i*125),/.* /)
>     sub(substr($0,0,RLENGTH),"&/\n+")
>   }
> }1' infile | awk '{print length}'
125
122
132  <-- this is the too long piece
122
27
10   <-- this is the length of the short line

it should be related to the fact that the length of $0 changes upon each substitution. Indeed it requires further investigation.

Another little note (you will hate me... I know) is that the first character in a string is the character number 1, not 0 as it appears in the substring statements.

Hope you will not be disappointed for this. Actually I was intrigued by your solution and I just made some test to see it work.

grail · 05-26-2010, 07:15 PM

Quote:

Hope you will not be disappointed for this.

Can't get better if I don't know my mistakes

Quote:

Another little note (you will hate me... I know) is that the first character in a string is the character number 1, not 0 as it appears in the substring statements.

I always forget this one

keep thinking of it like arrays in C

Quote:

it should be related to the fact that the length of $0 changes upon each substitution

I did the reverse loop to allow for this fact that the length is changing and because adding the newline caused issues going forwards (will look into it further)

Quote:

This means that if you have some special character in the substring, it will be treated with the regexp rules and doubtfully the substring will match. For example just a little plus sign can change the meaning of the expression and the string will not match anymore.

And that bit just sux. Shows I had not read manual closely enough. Valuable info in this one for me that I will have to try and remember

Back to the drawing board