You are going to write a tool for text analysis. We will use it
to generate word counts. Create a module named concordancer.py.
Earlier, the book talked about a concordancer and how it worked, but it focused on searching for information about a single word in a text file. We’re going to expand upon this idea, keeping information about many words, and also work with the Web. The program will keep a tally of how many times a particular word has been seen on a document on the web.
Once again, we’ll pay attention to program design. Use top-down design, design by contract and unit testing to build your program.
Create a function count_web_words
that will take a
string that
contains a URL as a parameter. This function should
open that URL and read its contents. For each
word encountered, check a dictionary. If the word
is there, add 1 to the count that exists. If the word is not
there, set its associated value to 1. Have this function
return the dictionary that was created.
read_url
,
which will take a url and return one big string containing the contents
of the file; string_to_words
, which takes one big string
and returns a list of lowercase words (hint: use the split
method you just learned in section 8.4); and count_words
which will take as input
a list of words and return a dictionary containing
word counts.Create another function print_descending
that will
take one
parameter: A dictionary with words and counts.
The dictionary should be printed out to the terminal in the form
word: count
in descending count order.
Write a separate module, test_concordancer.py
that contains
unit tests for the three helper functions described in part 1.
It turns out that we can’t really use a unit test very
effectively for the print_descending
function. (Why?)
Use the following URLs for your common cases for the url_to_lines
function.
This carefully about testing the results of count_words
.
You’ll be getting a dictionary as a return value. Your test may be more
complicated than a single equality test.
Please style your code per section 3.4 of the textbook.
Submit your concordancer.py
and concordancer_test.py
modules via the turn-in tool.
Part 1 - 5 points
Part 2 - 2 points
Comments and style - 1 points
Created October 27, 2016
Last revised November 04, 2016, 02:15:24 PM PDT
This
work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.