My side-project this week has been searching for an optimal Wordle strategy. I figured it would make a great blog post but I haven't got that far yet.
Step 1 is to find an optimal starting word. My most insightful finding so far has came from trying to define a cost function for comparing potential starting words. It turns out that "% of potential guesses [not] eliminated" is an excellent loss function.
Importantly, the full set of all Acceptable Guesses is knowable. For any given (guess, answer) pair, the game provides feedback about each character (or digit). There are basically three pieces of potential feedback: (N)ot Used, (U)sed Elsewhere, (E)xact Match. For example, a single guess might produce the feedback "NUNNE". Each of these will eliminate some subset of Acceptable Guesses which means you can attribute a fixed value between 0.0 and 1.0 to any piece of feedback, and multiply those together to get the loss score of a given guess. Average that across all Potential Answers to get a cost score.
Not sure what my goal of this post is. Guess I just wanted to share that these games are as much fun to analyze as they are to play (if not more).
I've given some light thought to this: There are several which are considered "good starting words", that mainly contain lots of vowels or most common used letters (such as "audio", Raise, etc). Then, as a second word, you could use another word that complements those letters (audio/rents, raise/mould ).
But there is a better strategy (similar to what is used in those interview puzzles of "From N weights, find 1 that is different in X measurements of a scale): The fact that you input some vowels and see that they are not in there, tells you that the remaining ones MUST be in the word (treating Y as a vowel as well). So it might be possible to come up with a strategy that uses that negative information as well to minimize the search space.
I made a lil clone of wordle and found that the full list of solutions, past and future, is right there in the minified js. Could be useful for your analysis.
Maybe I shouldn't be sharing this, it's a bit sad to break the illusion that the creator is picking a new word for us every day.
I went to the source code of wordle & copied all 12000 words to an excel sheet Column A, starting from row 4.
3rd row for next 18 columns has drop down filter.
You guess any word with at least 2 vowels, if 3 better, no repeat letters. You get some grey, some yellow, some maybe green.
2nd column to 6th column on excel sheet is the Grey Letters, Not Found in Answer. In this, 2nd row, if u find any grey you type it here. The formula from row 4 downwards in this group checks if its top input cell is empty, if yes, true, if not empty, then it check if that letter exists in its row cell of column A. If exists, false (grey means no letter), otherwise true.
Column 7th to 11 are yellow. Same thing, row 2 gets input, 4th row onwards checks if this exists in Column A cell, true, otherwise false. If input cell empty, then True.
Column 12 to 16 gets green letter input. Here input goes by position, if 2nd letter is green, you type it in 2nd column of this group, which is 13th column.
Formula in 4th and next checks if input empty, true, otherwise if input exist in exact position, true, otherwise false.
17th column is empty & narrow, to create a gap.
18th column returns 0 if any false found in while row. Any false means this word is not the answer. Otherwise returns 1.
A separate one cell counts all those 1, tells me how many potential answers are. After every try/word, one can filter this last column on 1.
This website is currently hosting a competition until the end of the month for Wordle strategies if you want to see how it compares: https://botfights.io/game/wordle
Interestingly, I made a coding challenge for this exact problem a few years ago. You can find it on the Code Golf StackExchange website: https://codegolf.stackexchange.com/questions/26858/guess-the....
The challenge is named after the Lingo TV show which used the concept repopularized by Wordle.
Lingo itself took inspiration from a 1955 paper game named Jotto.
Regarding strategies used in the challenge, the best performing solution was an adaptation of a Mastermind solver.
I use WEIRD, my completely untested and I researched intuition being that knowing common consonants are in a word isn’t helpful unless you get the right place, but knowing the vowels is. At least that was my logic when I first played and my brain produced WEIRD.
I’m sure it’s far from optimal, but I continue to use it because I obviously understand how I came to choose it, and it “works” and survivor bias is powerful :)
I recall seeing HN posts from people working out “optimal” starting words and guess patterns, but I just like mine :)
Since you've been looking into it, how are words with multiple of the same letters treated on Wordle. If there are two (or three) 'E' in the solution or the guess?
I've spent some time analyzing Wordle (my starting words are SOARE and then BUILT, unless I've already got strong clues from SOARE).
If the answer has one E and you guess a word with two Es, only one of the Es will be marked correct (green/yellow). If one of the Es is in the correct place, it will be green and the other one grey. Otherwise the first E will be yellow and the second grey. So if the answer has fewer Es than your guess the game tells you.
If the answer has more Es than your guess there is no indication of this.
Today 538 gave the optimal starting word and all the best second words, with a 60% chance of solving it in three words. Completely ruined the game, in my opinion.
Step 1 is to find an optimal starting word. My most insightful finding so far has came from trying to define a cost function for comparing potential starting words. It turns out that "% of potential guesses [not] eliminated" is an excellent loss function.
Importantly, the full set of all Acceptable Guesses is knowable. For any given (guess, answer) pair, the game provides feedback about each character (or digit). There are basically three pieces of potential feedback: (N)ot Used, (U)sed Elsewhere, (E)xact Match. For example, a single guess might produce the feedback "NUNNE". Each of these will eliminate some subset of Acceptable Guesses which means you can attribute a fixed value between 0.0 and 1.0 to any piece of feedback, and multiply those together to get the loss score of a given guess. Average that across all Potential Answers to get a cost score.
Not sure what my goal of this post is. Guess I just wanted to share that these games are as much fun to analyze as they are to play (if not more).