Any info on how the AI works? - page 1 -Tutorial, training, online play

General > Tutorial, training, online play

Any info on how the AI works?

(1/3) > >>

Chris M.:
I've been really enjoying Lost Cities, and the AI manages to pull off some victories when I least expect it, which is great.

I would love to hear from the AI creator (Nicolas Guibert?) about how the AI is designed. Do the different bots have 'personalities', or how are the different strength bots distinguished, more computing time? Are there simple heuristics programmed in or does it use anything fancy like a neural network? Does it know the concept of bluffing or concealing information about its hand?

Thanks :)

Jimmy V.:
Hi there,

Thanks for asking!

It is difficult to explain how the AI is done without explaining too much and giving away how they each play. But I will try.

First, you have to understand that we can't afford any CPU-intensive algorithm. Our AIs run on our server and your browser (depending if it is an online match or a training one) and in both cases, it is not possible to ask the machine to think too long. Suppose every move takes just one second to compute. You soon realize that the server will not be able to play more than a few games at a time before coughing hard. We want our server to serve thousands of players at a time, not a handful. Same applies to the client side, we can't ask an old mobile phone to do much. I have even seen mobile browsers stop completely when the page runs Javascript (the language we use for the AI) for more than a few seconds without interruption.

That is a big issue as most of the powerful techniques used in AI are CPU-intensive. The last (huge) AI achievement done by Google on the game of Go (look for Alphago) used super-high-caliber machines to let the machine learn the game by self-playing many many times. I am digressing a bit here as this was for training the algorithm, not for the actual playing of the game (although I don't know what machine they used for the actual games once the machine was trained). So let's go back to the original point, CPU-intensive algorithms are very often used to get good results and it is the only way. Give a chess program only 1/1000 of a second to play a move and there is every chance that it will not be able to compete with the best players as it would in 1/10 of a second.

So, as if AI was not difficult enough, we have this extra constraint that we must make the programs play as quickly as possible and preferably within 1/100 of a second. This is usually achieved and when I let the bots play against each other to tune them and seed them (find their relative rating), I can see that they play several games per second. I don't remember the exact figure and it depends a lot on the game, but I'd say it is probably about 5 to 10 games per second on average. A game of Lost Cities is about 90 moves.

This does not leave much room for fancy techniques. So for most games, we use a very streamlined version of Alpha-Beta algorithm. Alpha-Beta is the basic algorithm for Chess, Othello, and most abstract games without hidden information. I have used it successfully in the past for developing my international draughts program (see Buggy for more information).

Alpha-Beta has 2 sides. The first one is the search, in other words, the analysis of the variants in the tree of possibilities. Once you have looked up x moves ahead, you stop searching and reach what is called a leaf of the tree. That's where the 2nd part of the algorithm plays its part: the evaluation function. Its goal is to assess the leaf and return a score, good or bad, more or less positive or negative. The search part then uses the leaf scores to navigate through the variants, eliminates the weak ones and finally decides which move is best at the root of the tree.

Both sides of the Alpha-Beta can be CPU-intensive. Each call to the evaluation function can take a lot of time if this function is refined. The more the evaluation knows (and is accurate), the more it will need time. Also, the deeper you search, the more leaves you will have and the more calls to the evaluations you will need. So basically we have this formula :

Time spent on a move = number of leaves X time needed to evaluate one leaf

This tells the whole story. We can't afford to explore many leaves and we can't afford to assess them accurately.

So what we usually do is only search one move ahead and evaluate the leaves from there. Very basic stuff.

To get a decent end product, despite the low complexity of the algorithm, requires building an evaluation function that estimates the positions as accurately as possible with as few lines of codes as possible. And that is probably where the secret of our AIs lies (if there is a secret). A good understanding of the game is key (we can't hope that brute force will hide the developer deficiency, we can't use brute force) and only then a good modelization of that knowledge is necessary.

I think your original question was only related to Lost Cities, but as the above applies to all games including Lost Cities, I think it was worth mentioning.

Now how did we build this evaluation function for Lost Cities? The most important feature of Lost Cities is time. You want to buy time and avoid making decisions most of the time. But how do you tell this to the computer? I did not specifically teach him this notion, but the way scoring was done naturally takes care of that idea. The whole evaluation is based on estimating the score likelihood for each series. Example: you have already started a series with $ and 3. Is it good to play a 6 now when the 4 and the 5 have not shown up yet? Obviously playing the 6 right now costs potential points in the future. How many? Mostly it's maths with a few refinements. The program will recognize that if we have a 7 and 8 in hand, it is better to play the 6 right now than if we did not have them. We will have good moves next time, that's worth points. It will obviously also now the value of $ and the expectation of future possible $ and the likelihood of missing cards coming into our hands in future (less and less likely as the game progresses and less likely if the opponent has started the color). Then we compare this move with alternative moves and take the best.

Obviously, the program will also recognize the end game and try not to get stuck with all his good cards in hand at the end (as we humans always do).

This is enough to get to a very decent level. The program does not even think "play a card" / "draw a card" as one single move. It looks at the "play a card" move first and plays it, then looks at what it can draw. A clear possible improvement.

We usually let the program run against itself to optimize parameters (and find bugs when a supposedly good new piece of code leads to a sharp decrease in rating). The best way to proceed is to alternate between playing the program oneself and let it run against each other. There is no point optimizing a program that has evident flaws that can easily be seen with your own eyes. What's the point of knowing that version A is better than version B by 100 points if both play really badly?

Once we have the best bot we can in a reasonable amount of time (a couple of weeks in general), we can start to program the 11 weaker bots. That's not completely straight-forward but in general much easier than building the best bot. This usually gives us new ideas so we are often caught going back to the previous phase, trying to improve the best bot further.

We try to give each bot its personality by tweaking the main parameters of the game. For Lost Cities, that would be how much the program likes to start a new series, how much it likes to discard cards (both strongly related), how much it likes to draw from the deck rather than the discard piles, etc.

For Lost Cities, there is not much you can do really. So 3 to 4 personalities maybe. But that's a good start. To make bots even weaker (in a different way too), we almost always add randomness to the evaluation so that in essence the bot does not always play the same in the same situation. Not only does it weaken it, it also adds some unpredictability. We usually don't add randomness to the best bot as it inevitably leads to a weaker rating when evaluating against other bots. This is not an accurate representation of the strength of the bots though but we have no better way to measure it (we can't ask 10 players to play 100 games against each bot before releasing the game). However, we did it for Hanamikoji as this is really a game where you don't want to be too predictable (I am quite sure the bot is still too predictable). So we voluntarily weakened Verboten thinking that it would then be better against real players, able to exploit predictability. Finally, there are other ways to make the bots weaker and different. We can simply make them not understand/recognize certain features of the game, or at least not understand them all the time. For example, some might struggle to recognize the endgame or not know that the 6 is not a bad move if we have a 7 and 8 in hand.

And of course, we must make sure that Lobotomo is not too strong for beginners or even worse for some experienced players.

I guess this answers your questions. Feel free to ask more as required! In any case, I hope it is good reading and that I have not given away too much.

Have fun all!

Nicolas Guibert.

PS: I have moved this post to another sub-forum (originally posted in Lost Cities forum) as my reply has now gone much further than the original question. Thanks for asking!

Jamie A.:
Fascinating read, thank you so much!!

Ti D.:
Really interesting.
One thing that has always interested me is whether the bots play with perfect memory. This is especially relevant for Glastonbury, where knowing exactly what you and your opponent has is a huge advantage in the endgame. Most humans will remember most but not all of the cards, and weigh up the benefits of lokking at the cards collected or picking up a different card

Pouche H.:
Though the game (Lost Cities) is simple enough, i'd say that there are certain understandings, characteristics , acknowledgements that single out 'better' players (whatever that means).

In the above AI explanation, i found this - "Obviously, the program will also recognize the end game and try not to get stuck with all his good cards in hand at the end (as we humans always do)" to be most interesting...

IMO one of the more 'advanced' techniques to be on the watch for (and bots use!), is their ability to grasp probabilities that their opponent has X number of playable cards in their hand. Given the number of remaining draw cards, the bot may choose to extinguish the discard piles (something most players don't keep track of) eliminating the players ability to extend the match.

I hate when that happens. :)

Congrats on keeping it fun...
And have you ever considered an even higher level bot (for Lost Cities) that wouldn't be in the regular rotation, but could be accessed if desiring a higher challenge?

Navigation

[0] Message Index

[#] Next page

Go to full version