![]() ![]() Tells us how many letters occur in the text, including the spaces between words. So, for example, len(gutenberg.raw( 'blake-poems.txt')) The raw() function gives us the contents of the file The previous example also showed how we can access the "raw" text of the book , (In fact, the average word length is reallyģ not 4, since the num_chars variable counts space characters.)īy contrast average sentence length and lexical diversityĪppear to be characteristics of particular authors. Observe that average word length appears to be a general property of English, since Item appears in the text on average (our lexical diversity score). This program displays three statistics for each text:Īverage word length, average sentence length, and the number of times each vocabulary 5 25 26 austen-emma.txt 5 26 17 austen-persuasion.txt 5 28 22 austen-sense.txt 4 34 79 bible-kjv.txt 5 19 5 blake-poems.txt 4 19 14 bryant-stories.txt 4 18 12 burgess-busterbrown.txt 4 20 13 carroll-alice.txt 5 20 12 chesterton-ball.txt 5 23 11 chesterton-brown.txt 5 18 11 chesterton-thursday.txt 4 21 25 edgeworth-parents.txt 5 26 15 melville-moby_dick.txt 5 52 11 milton-paradise.txt 4 12 9 shakespeare-caesar.txt 4 12 8 shakespeare-hamlet.txt 4 12 7 shakespeare-macbeth.txt 5 36 12 whitman-leaves.txt print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid) num_vocab = len(set(w.lower() for w in gutenberg.words(fileid))) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |