This assignment is made of only programming problems. Its goal is to test your understanding of python, but also help you learn some advanced language features that might come in handy later on. Use the documentation for the last stable version to read up on things that are new to you: https://docs.python.org/3.6/. If you want a comprehensive tutorial of python programming language, study all Python 3 Tutorial sessions at https://www.python-course.eu/python3_course.php. This tutorial is good for both novice and experienced programmers.
You should submit your assignments in one hw0.py python file and one results.txt text file through blackboard. Put all the code for programming problems in the python file. For programming problems, make sure you have Python 3 on your computer. Make sure you can import it into Python without error and test your solutions before submitting.
- Install and Test Python Environment Install python3 environment, save the following code with name "hw0.py" (note that python is indentation sensitive), and run it in the terminal with "python hw0.py" or in your preferred IDE.
def test_print():
print("This is a test statement.")
if __name__ == '__main__':
test_print()
- Define Objects Define a function called 'list_set_length' to assign items_list= [1, 2, 3, 4, 3, 2, 1] as a list object, and items_set = {1, 2, 3, 4, 3, 2, 1} as a set object. Use the len function, print the size of the list and size of the set.
- Comprehension Python provides for expressions called comprehensions that let you build collections out of other collections. Here’s an example:
>>>{x*y for x in {1,2,3} for y in {2,3,4}}
{2, 3, 4, 6, 8, 9, 12}
This is said to be a set comprehension because its values are sets. The notation is similar to the traditional mathematical notation for expressing sets in terms of other sets. To compute the value, Python iterates over the elements of the sets, temporarily binding the control variables x and y to each element in turn and evaluating the expression x*y in the context of that binding. Each of the values obtained is an element of the final set.
Assume that S and T are assigned sets. Without using the intersection operator (&), write a function called 'set_intersect' that uses comprehension to returns the intersection of S and T. Hint: Use a membership test in a conditional statement at the end of the comprehension. Try out your comprehension with S = {1,2,3,4} and T = {3,4,5,6}.
- Tuples: Suppose S is a set of integers, e.g. {−4,−2, 1, 2, 5, 0}. Write a function 'three_tuples' that includes a triple comprehension and returns a list of all three-element tuples (i, j, k) such that i, j, k are elements of S whose sum is zero. Note it is possible that i=j=k (e.g. (0,0,0) is in the solution set).
- Dictionaries: Conceptually, a dictionary is a set of key-value pairs. The syntax for specifying a dictionary in terms of its key-value pairs resembles the syntax for sets—it uses curly braces—except that instead of listing the elements of the set, one lists the key-value pairs. In this syntax, each key-value pair is written using colon notation:
key : value
- Write a function 'dict_init' to initialize the following dictionary:
mydict = {'Neo':'Keanu', 'Morpheus':'Laurence', 'Trinity':'Carrie-Anne'}
- Suppose dlist is a list of dictionaries and k is a key. Write a function 'dict_find' that receives dlist and k, and uses a comprehension to return a list whose ith element is the value corresponding to key k in the ith dictionary in dlist. If a dictionary does not contain k as one of its keys, use 'NOT PRESENT' for that dictionary.
- File reading: Write a function 'file_line_count' to open a file and return the number of lines that it has. Try out your function on stories_small.txt.
- Mini Search Engine: Given a file of “documents” where each document occupies a line of the file, you are to build a data structure (called an inverse index) that allows you to identify those documents containing a given word. We will identify the documents by document number: the document represented by the first line of the file is document number 0, that represented by the second line is document number 1, and so on. Make matching of case-insensitive (e.g. Wall and wall are the same words)
Note that the period is considered part of a substring. To make this easier, we have a file of documents in which punctuation are separated from words by spaces. Often one wants to iterate through the elements of a list while keeping track of the indices of the elements. Python provides enumerate(L) for this purpose.
For example, start with a tiny test file --- doc0.txt which contains only three lines of data:
A bb Ccc , .
D ee A bb
FFF
The inverse_index should be a dictionary with the key storing each word, and the value storing the line containing all occurrence of that word:
{{'a': [0, 1], 'bb': [0, 1], 'ccc': [0], ',': [0], '.': [0], 'd': [1], 'ee': [1],
'fff': [2]}
So, ‘a’ appears in both line 0 and 1. `fff’ appears in only line 2. Your assignment is to first write python code to generate such inverse_index dict, and then use it to perform queries in the following.
- Write a procedure make_inverse_index(strlist) that, given a list of strings (documents), returns a dictionary that maps each word to the set consisting of the document numbers of documents in which that word appears. This dictionary is called an inverse index. (Hint: use enumerate.)
- Write a procedure or_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain any of the words in query.
- Write a procedure and_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain all of the words in query.
- Try out your procedures on stories.txt. Try or_search, and_search with the queries “united states” and “wall street” on the documents in stories.txt and save the resulting lists (tw lists of document ids per query) in a file named results.txt.