LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-15-2010, 11:47 PM   #1
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Best data structure for implementing a text editor


Options:
  • A double linked list with fixed size character arrays. New nodes will be added when previous ones get filled.
  • Dynamic arrays using realloc().

Linked list has the disadvantage of not having an O(1) search like an array but I think using realloc() will be more dangerous since it moves the whole block to a new location and if the new location is not big enough to hold the whole block, realloc fails !! this case will not occur with the linked list since its nodes are scattered.

Am I overlooking some point here ?
OR
Is there any better way of representing the problem in question ?
 
Old 11-15-2010, 11:53 PM   #2
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
The answer, of course, is "it depends".

But your reasoning is sound; "a list of fixed size character arrays" might well be the ideal solution for your scenario.
 
1 members found this post helpful.
Old 11-15-2010, 11:59 PM   #3
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Quote:
Originally Posted by paulsm4 View Post
The answer, of course, is "it depends".
Now I understand, when you say "it depends", it means whether "search" facility is important to me ? And even if it is, do I have any other choice ?

and thanks for posting.
 
Old 11-16-2010, 12:54 AM   #4
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Quote:
Q: Now I understand, when you say "it depends", it means whether "search" facility is important to me?
No, I meant nothing of the kind. I have no idea whether "search" is an important criterion for you. You certainly didn't say that it was in your first post.

I merely meant that it sounded like you thought things through. I agreed with your conclusions, and I thought it would be helpful to say so.

Quote:
Q: Do I have any other choice ?
Sure - you have LOTS of choices. Take "search", for example. If extremely fast random access times were important, you'd probably want a hash list (instead of a linked list).

PS:
I stand by everything I said yesterday ... right up to where I started babbling about hash lists. Sorry about that

ANYWAY: Johnsfine's post below is excellent.

Good luck - and have fun!

Last edited by paulsm4; 11-16-2010 at 03:51 PM.
 
1 members found this post helpful.
Old 11-16-2010, 01:07 AM   #5
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Thanks again,

I was reading up till now more on this and let me tell you now, I am doing all this for making a "TEXT EDITOR".

When you said "it depends" I remembered the following:
  • Find and Replace feature
  • Auto completion
  • Automatic highlighting

All these features need "searches".

I looked up on Trie, of course that can be considered for fastest searches but now I am thinking whether is it worth to implement that much ?

Last edited by TheIndependentAquarius; 11-16-2010 at 01:10 AM.
 
Old 11-16-2010, 08:32 AM   #6
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,765

Rep: Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467
The answer is the dynamic array. Arrays are optimized for the CPU cache, because they store their data contiguously. With a linked list, the data is scattered everywhere, so you'll get cache misses all the time.

And you don't need to implement any data structure if you don't want to.

http://library.gnome.org/devel/glib/stable/

Last edited by dugan; 11-16-2010 at 08:40 AM.
 
Old 11-16-2010, 09:01 AM   #7
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,107

Rep: Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114
Quote:
Originally Posted by anishakaul View Post
I was reading up till now more on this and let me tell you now, I am doing all this for making a "TEXT EDITOR".
So you need to consider insert and delete of text. Probably you need to prioritize insert and delete of text.

I think that rules out a simple contiguous data structure (ordinary array realloc'ed when you want it bigger) because inserting a character would require moving every following character.

For text searching, I don't think a linked list is terribly worse than a flat array. It is certainly worse than a flat array, but not my nearly as much as a flat array is worse than a linked list for insertion and deletion.

For insertion and deletion to work well with a linked list of chunks, you would need the use level of each chunk to be variable. You probably want fixed allocation size for the chunks, but variable use level.

I'm sure there are better data structures for a text editor than a linked list of such chunks. But I can't think of a simple data structure that is better.

Quote:
Originally Posted by dugan View Post
The answer is the dynamic array.
That is such a strange answer that I did a google search (and another search inside the area you linked to) to see if there was another common meaning for "dynamic array" that I didn't know about. I didn't find that. Do you have some meaning other than the usual one for "dynamic array"? (If so, a direct URL would be nice).

Otherwise you are ignoring the original
Quote:
if the new location is not big enough to hold the whole block, realloc fails
If relevant to the question, I would ignore that also, because it isn't a serious consideration compared to others. But if I had to ignore that, I'd explain why.
More seriously, you are ignoring
Quote:
for making a "TEXT EDITOR".
That tells you insert/delete are more important than most other operations. You should not select a data structure in which insert/delete performance is terrible.

Last edited by johnsfine; 11-16-2010 at 09:09 AM.
 
1 members found this post helpful.
Old 11-16-2010, 11:01 AM   #8
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,765

Rep: Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467Reputation: 1467
Johnesfines is absolutely right. What was I thinking? :smacks forehead:

Last edited by dugan; 11-16-2010 at 11:16 AM.
 
Old 11-16-2010, 07:48 PM   #9
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
If you are going to use a linked list, what will the node of the list represent?

If the node is just unstructured text held in a fixed length array, with a new node being added each time the array fills up then their are several issues that I can immediately think of. Implementing an insert of text would either need some array manipulation trickery or you split the node into two and append the new text into the first of the two nodes. Simple to implement but lots of editing can lead to a fragmented data structure that may waste a lot of memory. So it may possibly need to have a routine that compresses the data structure. Locating line numbers (often a useful task with a text editor) is not straight forward because the data is held in an unstructured way.

If the nodes are holding the text in some structured way, such as each node represents a line (or paragraph if you'd prefer). Then you overcome the above issue of locating line numbers but at a cost. The node is going to become more complicated it can't just be a simple array anymore because it would need to manage the case where it can support a very short line (say a blank one) to a very long line (say thousands of characters). Which sounds like the original problem just broken down to a lower level. So maybe the node of a line is itself a linked list. Which comes back to my original question what will this node now represent?

Just some thoughts
 
1 members found this post helpful.
Old 11-17-2010, 05:19 AM   #10
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Thanks to all of you for taking the time to post here

I was reading up and searching more on the data structures that could be used for the "Text Editors" and I found some useful links:

I have not yet gone through deeply in most of the comments presented here by all of you. I thought it will be better if I read up on this more before opening my mouth again (that way I'll have something more constructive to say).

But I will in a short period of time respond back about the progress as well as all of your comments !

Thanks again.

Quote:
Originally Posted by paulsm4 View Post
PS:
I stand by everything I said yesterday ... right up to where I started babbling about hash lists. Sorry about that

ANYWAY: Johnsfine's post below is excellent.

Good luck - and have fun!
Paul,

Please do not modify your old post after a long period of time, it is not easily noticeable as no email notification gets sent for edits. Please make a new post. I could have missed that easily. Thanks.

Last edited by TheIndependentAquarius; 11-23-2010 at 01:58 AM. Reason: added one more link
 
Old 11-17-2010, 08:17 AM   #11
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,107

Rep: Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114
I should mention that the last time I actually wrote a text editor was long enough ago that typical text files being edited didn't fit entirely in ram. So almost all the data structure choices focused on dealing with the fact that most of the text was on disk at any specific moment. (When I first wrote a text editor, most of the text was on Dec Tape at any given moment).

After a glance at one of the links anishakaul posted, I am reminded that a modern text editor probably wants to deal well with small edits to big files (here big does not mean too big to map into the virtual address space, but maybe big enough that an initial transform to a chunked data structure would be considered an excess expense). That may motivate a more complicated and more flexible data structure.

I still don't know a decent simple data structure better than what I described above. But that structure may be too simple for a professional quality text editor.

Quote:
Originally Posted by graemef View Post
If you are going to use a linked list, what will the node of the list represent?
I thought I made that clear.

Quote:
If the node is just unstructured text held in a fixed length array, with a new node being added each time the array fills up
A new node would be added any time any node fills up. Not when the whole array fills up.

Quote:
Implementing an insert of text would either need some array manipulation trickery or you split the node into two and append the new text into the first of the two nodes.
You split a node when inserting into a full node. If you insert into a non full node, you just insert.

Quote:
Simple to implement
That is the point. Simple to implement and decent time performance and decent space performance. That doesn't mean the time or space performance is great, just decent.

Quote:
but lots of editing can lead to a fragmented data structure that may waste a lot of memory.
It is pretty easy to keep the maximum wasted space down to slightly more than 50% and the typical waste even lower. 50% storage waste is definitely not great. Maybe I'm missing a simple data structure with decent time performance that wastes a lot less space. I haven't given it a lot of thought. But 50% storage waste is generally considered acceptable as compared to a much more complex and slower algorithm.

Quote:
Locating line numbers (often a useful task with a text editor) is not straight forward because the data is held in an unstructured way.
One does need to judge how central lines are to the operation of the text editor (are you fundamentally editing a collection of text or a collection of lines).

A first order level of support for lines in a fundamentally text oriented editor would be keeping for each node a count of the number of line breaks in that node. The node may still start and end arbitrarily in the middles of lines, but counting the number of lines in a range of many nodes is made much easier.

Quote:
If the nodes are holding the text in some structured way, such as each node represents a line (or paragraph if you'd prefer).
I don't like the implication that a line oriented approach is "structured" and not line oriented not so. A text editor might be so line oriented that its data structures should also be line oriented (more so than the line break counts I just described) but that is just different and more complicated, not inherently more structured.

Quote:
it would need to manage the case where it can support a very short line (say a blank one) to a very long line (say thousands of characters).
Why not millions of characters? What is the limit?

At my current job, my ex boss wrote a simple parser for a type of data file that is a minor side input to our product. The data is generated by other programs and typically has thousands of lines of under a hundred characters per. After he quit, I diagnosed why his code was crashing on a customer example where the text file was hundreds of megabytes. Nothing in his code would be bothered by any number of lines nor any total file size. But he had done one sloppy step that effectively set the max supported line length to 10K characters. Buried among over a million ordinary length lines in the customer's file was one line of over a million characters.

Quote:
Which sounds like the original problem just broken down to a lower level. So maybe the node of a line is itself a linked list.
That is a plausible approach to line oriented without the arbitrary limits. I usually prefer not being that line oriented.

Quote:
Which comes back to my original question what will this node now represent?
I suggested a simple, acceptable performance data structure. I didn't say there isn't a better performance data structure out there.
 
Old 11-22-2010, 04:12 AM   #12
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Thank you John, for the enlightening reply.
Quote:
Originally Posted by johnsfine View Post
For insertion and deletion to work well with a linked list of chunks, you would need the use level of each chunk to be variable. You probably want fixed allocation size for the chunks, but variable use level.
I really didn't understand this. What does "use level" mean ?

Quote:
Originally Posted by johnsfine View Post
That tells you insert/delete are more important than most other operations. You should not select a data structure in which insert/delete performance is terrible.
Thanks for the important reminder, I did miss that.
 
Old 11-22-2010, 04:18 AM   #13
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Thank you Graem for the concern,
Quote:
Originally Posted by graemef View Post
If you are going to use a linked list, what will the node of the list represent?
I was thinking that a node should represent a line, probably the 80 characters ?
http://www.linuxquestions.org/questi...screen-844325/

Quote:
Originally Posted by graemef View Post
The node is going to become more complicated it can't just be a simple array anymore because it would need to manage the case where it can support a very short line (say a blank one) to a very long line (say thousands of characters).
I didn't think of this case till now, thanks for the reminder!
 
Old 11-22-2010, 04:32 AM   #14
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Here is what I have learnt in past 4 days:

The piece table method shown in the below link, does not require actual insertions and deletions of data, it is keeping and moving the "pointers".

One buffer will have the original contents of the file and second buffer will be in the appending mode.
That means if a character is to be inserted between pre-existing two characters then the first and the third pointer will be pointing to the two old characters and the second one to the new character residing in the second buffer.

Because we are not deleting the characters we can have unlimited number of undoes!

<removed the pathetic question>

http://www.cs.unm.edu/~crowley/papers/sds/node15.html

See if you can give me some directions on it.
Thanks.

Last edited by TheIndependentAquarius; 11-22-2010 at 10:30 PM. Reason: <removed the pathetic question>
 
Old 11-23-2010, 01:32 AM   #15
TheIndependentAquarius
Senior Member
 
Registered: Dec 2008
Posts: 4,622
Blog Entries: 29

Original Poster
Rep: Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896Reputation: 896
Does the below design make sense ?
The case I have shown here is of EDITING
Code:
/* Below is buffer 1: (the read-only file) */
[abcd efghi ...(upto 80th char)]->[ijklm nop ...(upto 80th char)]->...
 ^        ^
 |        |
 |        |        /*The ever-growing list of pointers*/
[P1, P2, P3, P4, ..., (up to 80th pointer)]->[P1, ..., (up to 80th pointer)]->...
     | 
     |
     v
[new data]->
/* Above is buffer 2: (the append-only file) */
  • The first linked list is the read only file containing the original text of the file in question. Each node contains 80 characters (a line)
  • The last linked list is the one in which user will append the changes or the new text.
    I am unable to decide the default size of the nodes here.
  • The linked list in the middle contains pointers which will be pointing to both the other linked lists in cases of insertion and deletions.
    P1 points to the first part of old text and P3 points to last part of old text. And P2 points to the new text which is supposed to replace the text between P1 and P3.

    Is there any other way of representing it better ?
 
  


Reply

Tags
text editor


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ - Store a text file in a Data Structure nilly16 Programming 3 05-26-2009 06:42 AM
text selection question in text editor olin Linux - Software 1 09-25-2007 01:25 PM
Text Editor on Fedora Core 4 in Text Mode blong4life Linux - Software 5 07-31-2006 09:07 PM
Which light text editor can copy text from file and paste in browser? davidas Linux - Software 9 03-06-2006 11:28 AM


All times are GMT -5. The time now is 10:22 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration