• 0 Posts
  • 6 Comments
Joined 1 year ago
cake
Cake day: August 6th, 2023

help-circle

  • The game gets kinda meta on itself. This is mild spoilers: There’s a greater overall plot that gets progressed by a looping simpler plot. The idea is that, you are instructed by a narrator to go to a cabin in the woods and slay the princess inside. The choices you make cause this plot to repeat with a twist. When the simple plot loops its influenced by what you did on the prior iteration of the simple plot. Each of these loops is actually you advancing down a branching story path, and you need enough of these branches completed to complete the greater overall plot.

    Its sort of like the Stanley Parable, where you can defy the narrator, or go along with his demands. The fun is getting a reaction out of the narrator or any of the other characters by your actions or dialogue choices, and seeing the story change based on what you choose. However its still a visual novel, so its a lot of listening to dialogue.


  • Well, my example of the word ‘elephant’ has the same property as ‘herb’ where the use of ‘a’ or ‘an’ can depend on who you ask. I chose my example trying to anticipate this exact question, and I believe I gave you an answer.

    Let me put it this way: it depends… It depends on the data the LLM (Chat GPT for example) has been given to train its output. If we have an LLM dataset which uses only text by people in the United Kingdom, then the data will favor “a herb” as the ‘h’ is pronounced, where data from the United States will favor the other way as the ‘h’ is usually silent when spoken out loud.

    As a fairly general rule, people use the article “an” before a vowel sound (like a silent “h”) and “a” before a consonant sound (like a pronounced, or aspirated, “h”). Usually the data gathered is from multiple English speaking countries, so both “an herb” and “a herb” will exist in the training data, and from there the LLM will favor picking the one that is shown more often (as the data will biased.)

    Just for fun, I asked the LLM running on my local machine. Prompt: "Fill in the blank: “It is _ herb” Response: “It is an herb.”


  • To be overly simple about it, the LLM uses statistics and a bit of controlled RNG to pick its words. Words in the LLM have links to each other with statistical probabilities attached. If you take the sentence “I fed a peanut to an elephant” and “I fed a peanut to a elephant” and then asked 100 people which is more correct, there will be a percentage which favors one over the other. Now with LLMs its not choosing using weighted coin flips, but rather picking the most likely next word (most of the time). So if the 100 people choose “an elephant” over “a elephant” 65% of the time in its training data, then the LLM will be inclined to use “an elephant.” However, Its important to know that the words around “an elephant” will also bias its choice to use the word ‘an’ for the word ‘elephant’.

    Really, its largely based on the training data and the contexts to which ‘a’ and ‘an’ are used. Or in other words, the LLM knows because people figured it out for the LLM. People did all the thinking, LLM’s just use statistics on our bottled phrases to know when to use which. Of course, because it got its data from people - it will sometimes get it wrong which is based on how often people got it wrong.