Best data structure for implementing a text editor
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The first linked list is the read only file containing the original text of the file in question. Each node contains 80 characters (a line)
What do you mean by a linked list is a file? Why are you implying 80 characters is a line?
I think the designs you might have read for keeping the original separate from the edits are not intended for undo (as you seem to have assumed) but for efficiency in reducing the original processing and storage costs of reading in the initial file. It appears you are defeating that purpose by breaking the original into nodes.
Quote:
The last linked list is the one in which user will append the changes or the new text.
I am unable to even guess what you meant by the rest of the design. Efficiently representing just the changes is not likely to be practical. Storing the text of only the changes is practical, but it should require a unified structure of pointers into changed and unchanged text that represents the current state of the partially edited file.
I am sorry that I didn't state clearly what I meant.
Quote:
Originally Posted by johnsfine
What do you mean by a linked list is a file?
This meant that the original contents of the file would be read into the first link list.
Quote:
Originally Posted by johnsfine
I think the designs you might have read for keeping the original separate from the edits are not intended for undo (as you seem to have assumed) but for efficiency in reducing the original processing and storage costs of reading in the initial file. It appears you are defeating that purpose by breaking the original into nodes.
Yes, absolutely, I had a wrong understanding of that data structure. Now I do realize there is no need to break the first linklist into nodes!
Quote:
Originally Posted by johnsfine
I am unable to even guess what you meant by the rest of the design. Efficiently representing just the changes is not likely to be practical. Storing the text of only the changes is practical, but it should require a unified structure of pointers into changed and unchanged text that represents the current state of the partially edited file.
The middle link list showing the pointers is not the actual design. In the actual design, of course, sets/groups of pointers would be made to represent different tasks. I am sorry for that confusion now.
and now I am thinking that if I denote new lines by keeping a pointer on the start of each line, then:
When the user wants to move one line up, we just have to jump to the next pointer, skipping the characters in between.
If according to that thread I take 80 chars maximum for a line, and the user shrinks the window, I'll just have to adjust the pointers to denote new lines rather than moving data.
Thanks again for your reply, I shall post here the final data structure design in a short duration.
struct deletionHandler
{
/* 'ptrFirst' points to the character on the left of the character(s) to be deleted */
char *ptrFirst;
/* 'ptrSecond' points to the character on the right of the character(s) to be deleted */
char *ptrSecond;
};
struct insertionHandler
{
/* 'ptrFirst' points to the character on the left of the left of the cursor */
char *ptrFirst;
/* 'ptrSecond' points to the character to be inserted */
char *ptrSecond;
/* 'ptrThird' points to the character on the right of the left of the cursor */
char *ptrThird;
};
/* The vectors acts as a stack (stack is used for maintaining undo's and redo's) to store the deletionHandler's/insertionHandler's objects */
vector <deletionHandler> deletionPointerStack;
vector <insertionHandler> insertionPointerStack;
The line pointers kept in another structure will point to the new line characters entered by user. Line numbers will be displayed according to those new line characters.
A linked list is better for insertions and deletions only if you don't need to traverse the list to find the insertion or deletion point. If you do, then insertions and deletions become O(n) and their only advantage over dynamic arrays disappears.
That's why linked lists are typically used to implement data structures where you insert and delete from the beginning or end (stacks, queues, and deques), not where you insert and delete from the middle.
The link I posted above, which shows how actual text editors have actually done it, is still active. The TLDR is that "buffers", which is to say arrays, are more popular than linked lists.
And here's a blog entry on developing VSCode's text buffer:
(People who saw previous versions of this post might have seen swipes at a guy upthread who presented himself with authority and strongly advocated using a linked list. You know who it was, and I'm sure you know what I said).
Today, we have languages like C++ which implement various container classes. Sometimes-magical and hard-to-implement exotic data structures have already been correctly implemented in them. Just take them off the shelf and use them, knowing that they will work: you don't have to look inside. In similar fashion we have a robust string type. We have "dictionaries." We have "garbage collector" dynamic memory management just like interpreters use. You simply use the class for what it's built for and look at the source code of the thing only if you are morbidly curious. You know that it has been thoroughly debugged already. And this is how many interpreters and other tools which you reliably use every day were built.
(You can even find: "a complete, working, text editor.")
Actum Ne Agas: "Do Not Do A Thing Already Done."
Last edited by sundialsvcs; 01-03-2022 at 09:53 AM.
In the 1980s, I wrote a text editor that uses a form of doubly-linked list. The doubly-linked list works well because most editing operations are performed on local areas of text. The user inserts and deletes from the same local areas much more often than he accesses randomly from the entire text.
My experience has been that the doubly-linked list's ability to insert and delete without moving anything in memory is a net win.
Ed
To me, perhaps the greatest benefit of "object-oriented programming" was that we could finally stop re-implementing the same wheels. Instead, we could abstract them.
We could finally build a piece of logic which could once-and-for-all correctly implement a particular algorithm, and then "safely apply them to [anything]." Why? Because every [anything] was: 'an object.' (Most "language-internally-defined types" in this context are effectively equivalent to objects.)
So far, there is (AFAIK ...) no Nobel Prize for software. However, if there were, I think that this concept would surely qualify.
To me, perhaps the greatest benefit of "object-oriented programming" was that we could finally stop re-implementing the same wheels. Instead, we could abstract them.
If “abstract” means restricting our exchanges with an object to the necessary and fill in the gaps selectively, I may consent. But the mere “re-use” with the ever-returning argument to not “invent the wheel twice” made us drive square-wheeled tricycles for the simple reason that everybody did. Anybody who utters “invent the wheel again” nowadays should be obliged to prove his comprehension of the current concept (the original “wheel”) instead of pointing at a long line of programs and people who integrate tons of libraries which, each one, only provide a fraction of their functionality to the program and some double the functionality of other libraries, present in the same project.
– I admit that I had been taught to call this Java, but the principle appears to be general.
The main and original achievement of Object Orientation is the possibility to think about program code in the way we think about other objects that we encounter as human beings in our daily life. Programming object-oriented should be more natural than the mere production of algorithms. It still needs insight and discipline. Everything can be perverted and blown up, I do not doubt that.
And I appreciate any project which picks up old ideas to produce better results. From scratch or otherwise. Who cares.
Last edited by Michael Uplawski; 01-04-2022 at 03:10 AM.
Much of the design of a text editor is in its data structure. One shouldn't expect to write a world-class editor without also designing a custom data structure.
The reason is that the data structure matters. It is tightly-coupled to the algorithms, and these affect the editor's ability to handle enormous files.
I have tested various editors on gigabyte files. You might be surprised how many fall apart. Mine works.
Ed
My point is simply this – "while you might know how to correctly implement a red/black tree, you don't need to do it anymore." Someone else already did it for you, and they thoroughly debugged it. Then, they created it as a public "container class" that is capable of storing and retrieving anything "as long as it is an Object." This is of course a gigantic step forward. Creative and inventive though we might all be, we should not have to "first, re-invent bricks and mortar" in order to build a wall with them.
Last edited by sundialsvcs; 01-04-2022 at 04:56 PM.
My point is that text editors are like race cars, custom-built for the job. You have to make the important parts yourself because no-one sells such things.
Ed
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.