php organize two lists

secretlydead · 04-25-2008, 11:03 PM

this one is pretty tricky.

i've got list 1, which is like this:
a
b
c
d
e
f

and list 2, like this:
ab
bfe
dd

i want a new list, that is ordered so that any entry from list 2 only comes after each of it's components has already been listed. so, it should look like this:
a
b
ab
c
d
dd
e
f
bfe

the two lists are stored in mysql.

anyone have a suggestion on how to do that?

(the actual application is building flashcards from a dictionary for learning a language. so, that only AFTER you learn the words "I", "love", and "you" in another language, do you learn the sentence "I love you." so, building this would involve databases with 10's of thousands of words (list 1) and sentences (list 2).)

graemef · 04-26-2008, 12:03 AM

Given your requirement I'm not convinced that a specialist merge is your answer. Rather on the database have a table for each person indicating the words that they have learnt. For the sentence table indicate the word that the sentence uses. Now all you need to check that the sentence is a subset of the words that the person knows.
This can be simplified if the words are presented in order. Then the position of the sentence will be based solely on the largest word id. that is if "I" has id 5 (that is it is the fifth word learnt) "you" has id 12 and "love" has id 33 then the sentence "I love you" will have an id of 33.
The idea can be expanded for random order of learnt words at the expense of slightly more storage.

secretlydead · 04-26-2008, 02:23 AM

Quote:

Originally Posted by graemef

check that the sentence is a subset of the words that the person knows.

how? any clues on the functions in php or the command in mysql?

Quote:

Originally Posted by graemef

"I" has id 5 (that is it is the fifth word learnt) "you" has id 12 and "love" has id 33 then the sentence "I love you" will have an id of 33.

The lists are generated from books in order that they appear (for example, if the book, "what the buddha taught" starts out with Buddha was born..., Buddha is id 1, was id 2, etc) or from a frequency dictionary.

Quote:

Originally Posted by graemef

The idea can be expanded for random order of learnt words at the expense of slightly more storage.

This is what will have to be done, as the program is already made in that fashion... cards are presented in random order on some word lists, and on a hierarchal fashion on other word lists (and then i plan that the lists will have priority and goals - for example, you can learn 10 words a day of this list, and 20 of this list minimum before it starts feeding you randomly from your lists).

Can you please explain more how you would do that?

It seems that you would assign each single word an id number, and then each sentence would have each of those numbers. I can't quite grasp how this would be done technically...

secretlydead · 05-03-2008, 08:35 AM

Quote:

Originally Posted by graemef

The idea can be expanded for random order of learnt words at the expense of slightly more storage.

The main problem with this is that the sentence list is about 50,000 sentences, so querying that for subsets while a person is studying would slow the program down.

rubadub · 05-03-2008, 09:27 AM

You could split it all into chapters and assume that you've learnt a chapter before moving onto the next chapter (a bit like a binary tree way of splitting up big sets).

Other thoughts:
A table where you add the learnt words for each person (id or word), then either split the sentence into words and search through the learnt words or build a massive REGEXP statement with the learnt words list and check the sentence. Oh yeah, also tag for the chapter as stated before...

graemef · 05-03-2008, 05:50 PM

Sorry for not replying I was really busy last week and I felt that this required a more detailed answer than I was able to give.

As rubadub suggested think of the problem in the terms of chapters.

For your initial solution assume that each student will tackle the chapters in order. So given this we have the folowing:

Each chapter consists of a set of words.
Each student will have tackled a number of chapters in order
Each sentence will consist of a number of words from different chapters

Now assume that the words are stored on a relational database (It doesn't have to be a RDB) this will give you three tables

words
word_id
word
chapter_id

sentence
sentence_id
sentence
last_chapter_id

student
student_id
name
last_chapter_id

You may also want to add a table for Chapter which can be used to give each chapter a name but that is not needed in the algorithm, it would just make it pretty for the user.

The fields are all fairly self explanatory except for the two fields called last_chapter_id.
student.last_chapter_id will be the id of the last chapter that the student has studied, remember they are presented to the student in order, thus student.last_chapter_id will progress in order, 1, 2, 3, 4 etc. For this the next chapter to be studied will be student.last_chapter_id + 1.

sentence.last_chapter_id will be constructed once and at the start, thus it will not have any performance implications. This will require each word in the sentence to be parsed and find the chapter that this word belongs to. sentence.last_chapter_id will simply be set to the maximum of these numbers. Thus if a sentence has five words then find out which chapter the first word was covered in then find out which chapter the second word was covered in and so on. This will give you five numbers select the largest and store that for the sentence.

Now to get a set of sentences where the student has seen all the words you woudl need a SQL statement similar to the following:

SELECT sentence FROM sentence, student WHERE student.last_chapter_id <= sentence.last_chapter_id will;

and equality will give you the new sentences that the student should be shown:

SELECT sentence FROM sentence, student WHERE student.last_chapter_id == sentence.last_chapter_id will;