well, what you do is you have a corpus, which is a body of text that you want to use to train the model

the model of a simple markov chain bot is a table of statistical probabilities, every entry in the table is indexed with a character (or word) and the data stored in that entry is a list of what the next character (or word) could be (based on the training data) and a % chance

the corpus is supposed to be broken up into one sentence per line if you want the bot to create sentences, and that way the last word in a sentence, followed by the first word in another sentence, does not mess up the table entries with values that don't need to be added to it

so if you had the corpus "apple buzz buzz" then you might build a word-indexed table {[apple] => [buzz, 100%], [buzz] => [buzz, 50%]} because half the time buzz is followed by buzz and half the time it is followed by nothing, but apple was always followed by buzz

character-indexed models work better when the corpus is very small and there are not a lot of sentences, with a variety of consecutive word uses, to train on, but no matter what they always form broke ass shit, the granularity is just too small