Using with vim substitute command: lines created are not scanned in future steps. Can they be?

dedec0 · 10-20-2021, 08:50 PM

Task: make a file made with long lines be wrapped to a desired width.

Bad solution:

1. Set textwidth variable with the desired width
2. Visually select the whole file: ggVG
3. Do the command gq

This is a horrible solution because it will join consecutive lines that are already smaller that the wanted maximum width we set up in step 1.

Soso solution:

1. Do the command :%s_^$.\{44\}$$[^.]$$.\+$_\1\r \2\3_ge

This is not a complete solution because it will split each line just one time, each time the command is run. So, if a line is longer than 2 times the wanted width, you will end up with a line with the wanted width, and the rest of it in a line below it. To have this line also splitted, we have to repeat the command. But if the lines of our first run of the command are 3 times longer than the wanted width, we will still have longer lines, in the end. So, we run the command again, and again, and again... recursively, until all lines are good.

Do you know a solution in Vim for this?

Example file given below, with very long lines, good to use with the example command. Each line contain numbers from 00 to 99, to easierly seeing the results of tests:

Code:

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

chrism01 · 10-20-2021, 11:32 PM

If I understand your requirement (?), try this page, particularly the first soln: https://stackoverflow.com/questions/...racters-in-vim

shruggy · 10-21-2021, 02:28 AM

Not a Vim solution (as you were already given one), but have you heard of fold? It can break at spaces and count each tab as eight characters. Well, you can invoke it from Vim with

Code:

:%!fold

pan64 · 10-21-2021, 02:34 AM

Quote:

Originally Posted by shruggy

Not a Vim solution (as you were already given one), but have you heard of fold? It can break at spaces and count each tab as eight characters.

Code:

:%!fold

Additionally you can try sed, that will understand the same command - as vi (without :%)

dedec0 · 10-21-2021, 08:15 AM

Quote:

Originally Posted by chrism01

If I understand your requirement (?), try this page, particularly the first soln: https://stackoverflow.com/questions/...racters-in-vim

The best solution is bad because:

1. We choose textwidth=14

2. We have the file:

Code:

33 66 999
33 66 999

3. The command ggVGgq will reformat the file into

Code:

33 66 999 33
66 999

But this is wrong because line 1 should not be touched. It was already smaller than textwidth. Both lines were 9 characters long.

dedec0 · 10-21-2021, 08:45 AM

Quote:

Originally Posted by chrism01

If I understand your requirement (?), try this page, particularly the first soln: https://stackoverflow.com/questions/...racters-in-vim

The best solution given in the pointed page is:

:%s/.\{80}$$$\@!/&\r/gec

It is great because it splits a line recursively. It is bad because it splits words.

Any of you know how to make a regex to improve the command above, so it splits the line in the last space character before 80 (for the given example), or in the 80th position, if there is no space in the line?

Or maybe a vim macro, which i use very limitedly) to the following algorithm?

1. Define desired width N.

2. If line is visually longer than N (tabs), it is a candidate. Else, go to next line, repeat this step. If no more lines, we finished.

3. Find the last space (space char or tab) before Nth character.

4.1. If no space was found, insert "\r " (newline + 4 spaces), and write the rest of the current line in it.

4.2. Else, a space was found. Remove this space and all spaces or tabs together, before it. Insert "\r " (newline + 4 spaces), and write the rest of the current line in it.

5. Change current line to the line created in 4. Go to step 2.

shruggy · 10-21-2021, 09:07 AM

So, have you tried fold? Specifically, fold -s?

Quote:

Originally Posted by dedec0

Any of you know how to make a regex to improve the command above, so it splits the line in the last space character before 80 (for the given example), or in the 80th position, if there is no space in the line?

Code:

:%s/\v.{72}\S{,8}($)@!/&\r/g

dedec0 · 10-21-2021, 09:07 AM

Quote:

Originally Posted by shruggy

Not a Vim solution (as you were already given one), but have you heard of fold? It can break at spaces and count each tab as eight characters. Well, you can invoke it from Vim with

Code:

:%!fold

More than that, i can invoke fold with

Code:

:%!fold -s -w [N]

and it seems to work almost like the algorithm i just listed:

1. File with 9 lines: first 3 with a word of 10 chars, a space, a word of 9 chars, a space, another word of 9 chars; next 3 lines with a word with 20 chars, a space, a word of 9 chars; 3 short lines with 3 words each. All lines in the file are consecutive.

Code:

1234567890 234567890 234567890
1234567890 234567890 234567890
1234567890 234567890 234567890
12345678901234567890 234567890
12345678901234567890 234567890
12345678901234567890 234567890
33 66 999
33 66 999
33 66 999

2. For

Code:

:%!fold -w 14 -s

the result is:

Code:

1234567890 
234567890 
234567890
1234567890 
234567890 
234567890
1234567890 
234567890 
234567890
12345678901234
567890 
234567890
12345678901234
567890 
234567890
12345678901234
567890 
234567890
33 66 999
33 66 999
33 66 999

3. First, the lines starting with a word longer than desired width had the word split, which is good. It is recursive (the lines are completely split, not just one time). The shorter lines were left untouched, and no consecutive lines were joined.

So, the only difference i want, is to put some tabbing to splitted parts of lines, to help us know when a line is possibly the continuation of a line we just read.

boughtonp · 10-21-2021, 09:12 AM

"fmt -t" does indentation.

dedec0 · 10-21-2021, 09:20 AM

Quote:

Originally Posted by shruggy

So, have you tried fold? Specifically, fold -s?

Yes. I was playing with fold while you wrote, i think. (: Have you seen what i wrote?

Quote:

Originally Posted by shruggy

Code:

:%s/.\{72}\S\{,8}/&\r/g

Can you explain me your regex? (72 "anything, even spaces", followed by at most 8 "not spaces")... so what?

pan64 · 10-21-2021, 09:25 AM

you can always implement your own fold like tool to do exactly what you need.

dedec0 · 10-21-2021, 09:27 AM

Quote:

Originally Posted by boughtonp

"fmt -t" does indentation.

But it does not split something longer than the asked width:

File:

Code:

1234567890 234567890 234567890
1234567890 234567890 234567890
1234567890 234567890 234567890
12345678901234567890 234567890
12345678901234567890 234567890
12345678901234567890 234567890
33 66 999
33 66 999
33 66 999

Vim command:

Code:

:%!fmt -t -w 14

Result:

Code:

1234567890
   234567890
   234567890
1234567890
   234567890
   234567890
1234567890
   234567890
   234567890
12345678901234567890
   234567890
12345678901234567890
   234567890
12345678901234567890
   234567890
33 66 999
33 66 999
33 66 999

shruggy · 10-21-2021, 09:30 AM

Quote:

Originally Posted by dedec0

Can you explain me your regex? (72 "anything, even spaces", followed by at most 8 "not spaces")... so what what?

Yep. 72 characters is the soft limit. Or what the fmt man page calls goal. The splitting occurs after each 72 characters plus 0 to 8 non-spacing characters. So 80 characters is the hard limit. The {,8} quantifier is greedy, so it will try to consume as many characters as it can (up to 8). As soon as there's a white space among those 8 characters, the RE fires and splitting occurs. If there's no white space in that range, the splitting occurs after the 80th character anyway.

dedec0 · 10-21-2021, 09:34 AM

Quote:

Originally Posted by pan64

Code:

:%!fold

Additionally you can try sed, that will understand the same command - as vi (without :%)

I know regex well, for vim and a few other programs. But for sed, it is sometimes strange to use. I never try it, unless it is something ready to run, or with changes i understand how to do. Can you show something to me?

shruggy · 10-21-2021, 09:34 AM

Quote:

Originally Posted by dedec0

But it does not split something longer than the asked width

Because fmt splits only between words.