Python Deep Dive: Q10 : Most frequently used words in a text

Monday, November 12, 2018

Q10 : Most frequently used words in a text

Question 10: Write a function that, given a string of text (possibly with punctuation and line-breaks), returns an array of the top-3 most occurring words, in descending order of the number of occurrences.

Assumptions:
A word is a string of letters (A to Z) optionally containing one or more apostrophes (') in ASCII. (No need to handle fancy punctuation.)
Matches should be case-insensitive, and the words in the result should be lowercased.
Ties may be broken arbitrarily.
If a text contains fewer than three unique words, then either the top-2 or top-1 words should be returned, or an empty array if a text contains no words.
Examples:
top_3_words("In a village of La Mancha, the name of which I have no desire to call to
mind, there lived not long since one of those gentlemen that keep a lance
in the lance-rack, an old buckler, a lean hack, and a greyhound for
coursing. An olla of rather more beef than mutton, a salad on most
nights, scraps on Saturdays, lentils on Fridays, and a pigeon or so extra
on Sundays, made away with three-quarters of his income.")

# => ["a", "of", "on"]

top_3_words("e e e e DDD ddd DdD: ddd ddd aa aA Aa, bb cc cC e e e")

# => ["e", "ddd", "aa"]

top_3_words(" //wont won't won't")

# => ["won't", "wont"]

Bonus points (not really, but just for fun):
Avoid creating an array whose memory footprint is roughly as big as the input text.
Avoid sorting the entire array of unique words.

5 comments:

AnonymousNovember 13, 2018 at 1:21 AM
from heapq import nlargest
import operator
def top_3_words(text):

words = {}
word_list = text.lower().split()
if len(word_list) > 0:
for i in word_list:
if i in words:
words[i]+=1
else:
words[i]=1
return [i for (i,j) in nlargest(3,words.items(),key=operator.itemgetter(1))]
else:
return []

print(top_3_words(input()))
ReplyDelete
Replies
AnonymousNovember 13, 2018 at 1:22 AM
import re,operator
from collections import Counter
from heapq import nlargest
def top_3_words(text):
if len(text)>0:
words = dict(Counter(re.findall(r'\w+', text.lower())))
return [i for (i,j) in nlargest(3,words.items(),key=operator.itemgetter(1))]
else:
return []
print(top_3_words(input()))
ReplyDelete
Replies
AnonymousNovember 13, 2018 at 1:50 AM
import re
from collections import Counter
def top_3_words(words):
strip_words = re.sub(r'[^ a-zA-Z0-9\']', '', words.lower()).split()
counted_words = Counter(strip_words)
return sorted(counted_words, key=counted_words.__getitem__, reverse=True)[:3]
ReplyDelete
Replies

Add comment

Monday, November 12, 2018

Q10 : Most frequently used words in a text

5 comments:

Followers