ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Perhaps this thread's title is not the best, but I couldn't some up with a snazzy title... such is life. Anyway, to the question:
I'm looking for some suggestions on how to propagate a neural network from one portion of a simulation to the next. I've written a neural network class, and have included it in a tic-tac-toe simulator. I have each player in the pool (each with a unique neural network) play two games against every other player (once as player X and once as player O). When all the games are finished, I sort the scores given to each player, and do some player-pool-tweaking.
* Specifically, the top 2/5 of the players are left unchanged -> advance to next round.
* Those same top 2/5 of the players are copied once, with each copy receiving a random "tweak" to their network (only one value within the network is modified) -> they are mutated
* The bottom 1/5 of the players are completely tossed, and new, purely random networks replace them.
After all of that business, the players slug it out again, and repeat the whole process until a sufficient network is found.
Out of a pool of 100 players, I'm not seeing the scores improve at all. It seems to always hove around the starting score. I'm beginning to wonder whether the "population" is becoming stagnant. One obvious thing to try is "genetic crossover" but I was hoping someone might have another interesting idea to try, or would be able to point out a flaw in my logic.
If anyone thinks it's extremely important, I can post the basics of the scoring and "tweak" methods employed.
The players that don't score in the top 2/5's are lost from the population. They are dropped completely in favor of the "proven" players and their mutations. The new, random nets are intended to preserve diversity (possible introduction of a new strategy).
I think that only keeping the top 2/5's is the crux of the problem. All of them could be clustered around a local maximum, and the simulation becomes trapped. The chance that one of the new, random nets could jolt the simulation out of that situation is probably very remote.
I'll try to implement a probability-based selection routine (greater score implies greater chance to propagate to the next round) and see what that does for me rahter than just taking the top scorers only.
What you might want to do, is rather than just randomly changing a single variable, is take more copies, each of which have one of the variables altered by a set amount, as well as the original.
This means adjusting the you take through to the next round, but gives you more of a chance of propagating useful traits, and strengthing them. where your adjustments at the moment, with only a single copy could simply remove them.
Once you have a net that consistantly beats ots progeny multiple times, you might have an ultimate winner.
Another point is tic-tac-toe, is a game that can be drawn, and may not allow for improvment.
If I follow you, I tried something similar earlier (I may not have had the patience to see it play out though). I took the top 1/5, passed them through, copied them three times with a different mutation to each copy, and then introduced random nets for the last portion (to fill out the population). I got similar results (a little improvement, but the scores seemed to stabilize quickly - a false convergence). Like I said, maybe I didn't give the simulation enough time to improve. It's something I may look into again.
And yeah, tic-tac-toe might not have been the best possible choice, but I thought it was simple enough to cut my teeth on. I have the simulation set up to trigger a solution when a candidate wins half of the matches played, but I also keep an eye out for when the players tie for every game. In theory, at some point, the "unbeatable" defense will evolve, and that suggests the nets would progress from profoundly stupid to winner to stalemate strategy; where the peak point value does not actually represent the best player.
I do appreciate the help. It's good to have an outside perspective on this stuff.
I have this link from when I was messing with NN, and was looking and evolution rather than training which you might find useful. Neuroevolution Methods.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.