Homework 13: Concordancer

Assigned
Friday November 4, 2016
Due
11 p.m., Monday November 7, 2016
Goals
You will practice with files, lists, and dictionaries.
Collaboration
Do this assignment with your assigned partner.
Submitting
Submit a Python program using the online turnin form. See below.
Scoring
10 points

Your task

You are going to write a tool for text analysis. We will use it to generate word counts. Create a module named concordancer.py.

Earlier, the book talked about a concordancer and how it worked, but it focused on searching for information about a single word in a text file. We’re going to expand upon this idea, keeping information about many words, and also work with the Web. The program will keep a tally of how many times a particular word has been seen on a document on the web.

Once again, we’ll pay attention to program design. Use top-down design, design by contract and unit testing to build your program.

Part 1: Word Count

Create a function count_web_words that will take a string that contains a URL as a parameter. This function should open that URL and read its contents. For each word encountered, check a dictionary. If the word is there, add 1 to the count that exists. If the word is not there, set its associated value to 1. Have this function return the dictionary that was created.

Decompose this problem into three functions: read_url, which will take a url and return one big string containing the contents of the file; string_to_words, which takes one big string and returns a list of lowercase words (hint: use the split method you just learned in section 8.4); and count_words which will take as input a list of words and return a dictionary containing word counts.

Part 2: Output

Create another function print_descending that will take one parameter: A dictionary with words and counts. The dictionary should be printed out to the terminal in the form

word: count

in descending count order.

Unit tests

Write a separate module, test_concordancer.py that contains unit tests for the three helper functions described in part 1. It turns out that we can’t really use a unit test very effectively for the print_descending function. (Why?)

Use the following URLs for your common cases for the url_to_lines function.

This carefully about testing the results of count_words. You’ll be getting a dictionary as a return value. Your test may be more complicated than a single equality test.

Grading and Submission

Please style your code per section 3.4 of the textbook.

Submit your concordancer.py and concordancer_test.py modules via the turn-in tool.

  • Part 1 - 5 points

  • Part 2 - 2 points

  • Tests - 2 points
  • Comments and style - 1 points



Janet Davis (davisj@whitman.edu).
This assignment is adapted from one assigned this spring. Thanks to PayPal for providing sample (fake) credit card numbers for testing purposes and to FreeFormatter for providing an online validation tool.

Created October 27, 2016
Last revised November 04, 2016, 02:15:24 PM PDT
CC-BY-NC-SA This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.