Welcome to the most active Linux Forum on the web.
Go Back > Blogs > sag47
User Name


Rate this Entry

Making predictions and thinking like a computer

Posted 02-23-2012 at 07:33 PM by sag47
Updated 02-23-2012 at 07:34 PM by sag47

Another user from a different forum asked me a question. I gave, what I though was, a good answer so I'm posting it here for all LQ to enjoy.

Originally Posted by Capt Kenpachi
Sar & Jim;

If I were to build a database that were to compare historical purchases for a single individual to forecast what they are likely to purchase again in the future...

I know I would use a SQL database to store the Historical Data on the individual.

But what would I use to comb through the Data to determine the forecast?

(for example jean purchases)
id | Purch Date | Dark Jeans | Light Jeans | Zipper crotch | Button Crotch | etc

I would fill in yes or no's into each category (other then name and purchase date) to simplify the input process and create less chance for error (spelling differences/typos/etc). I would want to code to scan the known historical data on each person and come up with a profile on the type of jeans they want to buy (Light Jeans, zipper Crotch, Boot Cut, etc etc etc).

What scripting language would allow for me to compile the historical data together for the forecast prediction?

Well you could make the yes or no an integer or boolean. 0=no and 1=yes.

Then you can select all entries in the database and run a length command for the entries returned. Let's call this return length variable num_purchases.

You can handle the prediction in one of four ways. There are other methods that exist but I'll mention four.

Static prediction as in make the same prediction every time. Predict the same thing every time. Keep track of the number of times the prediction was right in a separate database table. We'll call this variable correct_predictions.

correct_predictions/num_purchases*100 = percentage of correct predictions.

Start out with a static prediction. Iterate through all of the returned values and create a sum for each category. Let's call these values sum_darkj, sum_lightj, sum_zipperc, and sum_buttonc respectively.

Category #1
sum_darkj/num_purchases*100 = percentage_of_time_item_is_purchased
sum_lightj/num_purchases*100 = percentage_of_time_item_is_purchased

Category #2
sum_zipperc/num_purchases*100 = percentage_of_time_item_is_purchased
sum_buttonc/num_purchases*100 = percentage_of_time_item_is_purchased

Choose only one highest percentage from each category and make that your next prediction. Keep track of correct predictions and calculate accuracy like you did in equation #1.

As you accumulate more data, statistical forecasts may be more accurate but as you accumulate more data the predictions will be more static like static prediction. Only predicting the highest percentage of chosen features in an item each time. This method is usually more accurate than method two. But only slightly depending on how the user spends.

The simple two bit branch predictor. See the following articles.

You have two bits to work with. This means that your branch has four states of yes/no prediction.
00=strongly light jeans (T)
01=weakly light jeans (t)
10=weakly dark jeans (n)
11=strongly dark jeans (N)

You start out by predicting a default feature weakly (weakly taken state) in our case weakly light jeans. If you get it right then you subtract one from the state and it moves to strongly light jeans. If you get it wrong then you add one and it moves to weakly dark jeans for the next prediction. Every time you get the answer right you keep predicting the same option as before and switch to a more strong stance on predicting that state. If you get that prediction wrong then you move to a weaker state of prediction on your next prediction.

A simple example of a buyer and predictor with the initial state at 01.
Actually purchased item, prediction, correct prediction?, next state?
light jeans, t, yes, T
dark jeans, T, no, t
dark jeans, t, no, n
dark jeans, n, yes, N

The simple two bit branch predictor can be wrong 3 times in a row before it gets the prediction right. This is not always optimal which is why a more advanced two bit branch prediction scheme was invented.

A more advanced two bit prediction scheme is more simple than it sounds. It's the exact same as the simple two bit predictor. With exception that if it gets two wrong predictions in a row then it will automatically switch to choosing the opposite jean.

Do some tables of prediction and the more you do you'll find at times it's equal or better.

Hopefully that sheds some light on what you want to accomplish.

To sum it all up basically you can calculate accuracy using method #1, split each feature of an item into two categories like in method #2, and then apply 2-bit branch prediction on each category. You'll find with more numbers your predictions will be most accurate over other methods applied.
Posted in Uncategorized
Views 1343 Comments 0
« Prev     Main     Next »
Total Comments 0




All times are GMT -5. The time now is 01:37 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration