Worksheet 3: Predictive Text Entry

Similar documents
CS2 Practical 6 CS2Bh 24 January 2005

CANON FAX L1000 SIMPLIFIED GUIDE

CMPSCI 187 / Spring 2015 Hangman

Claremont McKenna College Computer Science

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

CS2 Practical 2 CS2Ah

CS 2110 Summer 2011: Assignment 2 Boggle

Mandatory Assignment 1, INF 4130, 2017

CMSC 201 Spring 2016 Homework 7 Strings and File I/O

In this lab we will practice creating, throwing and handling exceptions.

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

CS 4218 Software Testing and Debugging Ack: Tan Shin Hwei for project description formulation

CHAPTER 7 OBJECTS AND CLASSES

This is an individual assignment and carries 100% of the final CPS 1000 grade.

Due: 9 February 2017 at 1159pm (2359, Pacific Standard Time)

Part III Appendices 165

CHAPTER 7 OBJECTS AND CLASSES

H212 Introduction to Software Systems Honors

CMPSCI 187 / Spring 2015 Postfix Expression Evaluator

13 th Windsor Regional Secondary School Computer Programming Competition

9555 Satellite Phone. User s Manual Supplement. web: toll free: phone:

CE151 ASSIGNMENT

CS2 Practical 1 CS2A 22/09/2004

Project 1 Computer Science 2334 Spring 2016 This project is individual work. Each student must complete this assignment independently.

USER REFERENCE MANUAL

Lab Exercise 6: Abstract Classes and Interfaces CS 2334

Summary. Recursion. Overall Assignment Description. Part 1: Recursively Searching Files and Directories

CSE 143: Computer Programming II Winter 2019 HW6: AnagramSolver (due Thursday, Feb 28, :30pm)

CS211 Computers and Programming Matthew Harris and Alexa Sharp July 9, Boggle

Xerox WorkCentre Color Multifunction Printer. Xerox WorkCentre Quick Use Guide

Hp J4550 Fax Instructions

Maps and Trees. CS310: Program 2. 1 Overview. 2 Requirements. Instructor Provided Files. Late Work: Up to 72 hours Penalty: 10% per 24

Administrative Manual

Quick Reference Guide for Phon Users

Open Web Device. Keyboard Recommendations Release 1. See page 16 for future releases Telefónica Digital

McGill University School of Computer Science COMP-202A Introduction to Computing 1

Deliverables. Problem Description

Programming Assignment 2 (PA2) - DraggingEmoji & ShortLongWords

COMP-202: Foundations of Programming. Lecture 26: Review; Wrap-Up Jackie Cheung, Winter 2016

Chapter Two Bonus Lesson: JavaDoc

ASSIGNMENT 5 Objects, Files, and More Garage Management

Name Section Number. CS210 Exam #3 *** PLEASE TURN OFF ALL CELL PHONES*** Practice

ASSIGNMENT 6. COMP-202, Winter 2015, All Sections. Due: Tuesday, April 14, 2015 (23:59)

ASSIGNMENT 5 Objects, Files, and a Music Player

ASSIGNMENT 5 Data Structures, Files, Exceptions, and To-Do Lists

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

Inf1-OP. Inf1-OP Exam Review. Timothy Hospedales, adapting earlier version by Perdita Stevens and Ewan Klein. March 20, School of Informatics

General Instructions. You can use QtSpim simulator to work on these assignments.

School of Informatics, University of Edinburgh

Getting started with Java

Assignment 2, perquack2 class hierarchy in Java, due 11:59 PM, Sunday March 16, 2014 Login into your account on acad/bill and do the following steps:

Al al-bayt University Prince Hussein Bin Abdullah College for Information Technology Computer Science Department

CSE 143: Computer Programming II Summer 2017 HW5: Anagrams (due Thursday, August 3, :30pm)

Out: April 19, 2017 Due: April 26, 2017 (Wednesday, Reading/Study Day, no late work accepted after Friday)

Instructions PLEASE READ (notice bold and underlined phrases)

Chapter 4 Defining Classes I

CMPSCI 187 / Spring 2015 Implementing Sets Using Linked Lists

You must bring your ID to the exam.

Name: Checked: Access the Java API at the link above. Why is it abbreviated to Java SE (what does the SE stand for)?

You will not be tested on JUnit or the Eclipse debugger. The exam does not cover interfaces.

Programming Project 5: NYPD Motor Vehicle Collisions Analysis

Good Luck! CSC207, Fall 2012: Quiz 1 Duration 25 minutes Aids allowed: none. Student Number:

Object Oriented Programming: In this course we began an introduction to programming from an object-oriented approach.

15-110: Principles of Computing, Spring 2018

AP Computer Science A Course Syllabus

Thank you for purchasing our product which adopts excellent workmanship and exceptional reliability.

Homework Assignment #3

COMP-202 More Complex OOP

CITS1001 exam 2013 SAMPLE SOLUTIONS O cial cover page to be substituted here. May 10, 2013

ICOM 4015 Advanced Programming Laboratory. Chapter 1 Introduction to Eclipse, Java and JUnit

(a) Write the signature (visibility, name, parameters, types) of the method(s) required

COBOL - TABLE PROCESSING

AUDIX Voice Power System R3.0 Portable User s Guide

Advanced Java Concepts Unit 5: Trees. Notes and Exercises

Lab 4. Since this lab has several components in it, I'm providing some guidance to keep everyone

CS 116. Lab Assignment # 1 1

Week 1 Exercises. All cs121 exercise and homework programs must in the default package and use the class names as given.

Check the entries in the home directory again with an ls command and then change to the java directory:

coe318 Lab 1 Introduction to Netbeans and Java

Application Notes for 911 ETC CrisisConnect for Softphones and CrisisConnect for VoIP with Avaya IP Office Issue 1.0

Linux Command Homework Individual/Team (1-2 Persons) Homework Assignment Turn In One Copy Per Person 5 Points

CIT 590 Homework 6 Fractions

CS 455 Final Exam Fall 2015 [Bono] Dec. 15, 2015

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently.

Classes. Classes as Code Libraries. Classes as Data Structures

Homework #4 RELEASE DATE: 04/22/2014 DUE DATE: 05/06/2014, 17:30 (after class) in CSIE R217

ITI Introduction to Computing II Winter 2018

CSE115 Lab 9 Fall 2016

USER GUIDE USER HELLO! How may we help you?

CSCI 200 Lab 11 A Heap-Based Priority Queue

Download the forcomp.zip handout archive file and extract it somewhere on your machine.

Visual Voic Guide

Relate 700. Caller Display Telephone. User guide

Animals Due Part 1: 19 April Part 2: 26 April

Midterm Exam 2 CS 455, Spring 2015

CSE143X: Computer Programming I & II Programming Assignment #9 due: Monday, 11/27/17, 11:00 pm

Introduction to Programming System Design CSCI 455x (4 Units)

Introduction. Bjarki Ágúst Guðmundsson Tómas Ken Magnússon. Árangursrík forritun og lausn verkefna. School of Computer Science Reykjavík University

Programming Assignment Multi-Threading and Debugging 2

15-110: Principles of Computing, Spring 2018

Transcription:

Worksheet 3: Predictive Text Entry MSc & ICY Software Workshop, Spring term 2015-16 Seyyed Shah and Uday Reddy Assigned: Tuesday 2 February Intermediate deadline : parts 1 and 2, Tuesday 9th February, 11:59pm Final Deadline : all parts, Tuesday 16th February, 11:59pm. As usual, include in your submission: 1. appropriate comments and JavaDoc. 2. thorough testing. (You may use JUnit wherever applicable.) As well as data structures and algorithm complexity, this exercise assesses several concepts taught on the course. If you don t understand any part of the exercise, ask tutors and lab demonstrators. Start early. The questions get progressively harder. All work and progress on the exercise must be submitted using Canvas. You must submit parts 1 and 2 of the Worksheet by 9th February. 5% of the marks are allocated for this timely submission. Contents 1 Prototypes and design (25%) 2 2 Storing and searching a dictionary(20%) 4 3 More efficiency (25%) 7 4 Prefix-matching (25%) 7 1

Introduction In this exercise, you will write the algorithms for a sample application using the Java Collection classes. In the next exercise, you will attach a Graphical User Interface (GUI) to make it a full application. The sample application is that of predictive text. Before the advent of touch screens, mobile telephones in English-speaking countries used a keypad like this one: 1 2 (abc) 3 (def) 4 (ghi) 5 (jkl) 6 (mno) 7 (pqrs) 8 (tuv) 9 (wxyz) * space # As you notice, there are keys for digits 1 9, used for dialing phone numbers. But these keys were also used to enter letters a z. When a text message needed to be entered, the keys corresponding to the letters would be used. However, since there are multiple letters on each key, the required letter needed to be disambuated somehow. In the basic system without predictive text, the user must press the appropriate key a number of times for a particular letter to be shown. Consider the word hello. With this method, the user must press 4, 4, 3, 3, 5, 5, 5, then pause, then 5, 5, 5, 6, 6, 6. To enter text more easily, the system of predictive text (also called T9 ) was devised. The user presses each key only once and the mobile phone uses a dictionary to guess what word is being typed using a dictionary, and displays the possible matches. So the word hello can be typed in 5 button presses 43556 without pauses, instead of 12 in the standard system. The numeric string 43556 is referred to as a signature of the world hello. If this is the only match, the user can press space and carry on. If there are multiple matches, the user might need to select one of them before proceeding. A given numeric-signature may correspond to more than one word. Predictive text technology is possible by restricting available words to those in a dictionary. Entering the numeric signature 4663 produces the words gone and home in many dictionaries. In this exercise, you will design and develop a predictive text system. For simplicity, assume that the user does not need punctuation or numerals. You must also limit your solutions to producing only lower-case words. You must use the dictionary found in /usr/share/dict/words on the School s file systems. All the classes in this worksheet should be placed in a package called predictive. Use the class/method names given in the question. 1 Prototypes and design (25%) This part deals with building a prototype for the predictive text problem, which is not expected to be efficient, but it will be simple and allow you to compare it with the efficient implementation to be done in later parts. 2

Write the first two methods in a class named PredictivePrototype inside the package predictive. 1. (5%) : Write a method wordtosignature with the type: public static String wordtosignature(string word) The method takes a word and returns a numeric signature. For example, home should return 4663. If the word has any non-alphabetic characters, replace them with a (space) in the resulting signature. Accumulate the result character-by-character. You should do this using the StringBuffer class rather than String. Explain, in your comments, why this will be more efficient. 2. (10%): Write another method signaturetowords with the type: public static Set<String> signaturetowords(string signature) It takes the given numeric signature and returns a set of possible matching words from the dictionary. The returned list must not have duplicates and each word should be in lower-case. The method signaturetowords will need to use the dictionary to find words that match the string signature and return all the matching words. In this part of th exercise, you should not store the dictionary in your Java program. Explain in the comments why this implementation will be inefficient. 3. (10%): Create command-line programs (classes with main methods) as follows: Hints: Words2SigProto for calling the wordtosignature method, and Sigs2WordsProto for calling the signaturetowords method. Each program must accept a list of strings and call the appropriate method to do the conversion. Use the Scanner class to read the dictionary line by line, assuming there is only one word per line. When reading the dictionary, ignore lines with non-alphabetic characters. A useful helper method to accomplish this would be: private static boolean isvalidword(string word) in PredictivePrototype, which checks if a given word is valid. Words in the dictionary with upper case letters should be converted to lower-case because only lower-case letters should be returned by the signaturetowords method. 3

You should be able to complete this part of the Worksheet and test it in about one lab session. To create the command-line programs, you will need to use the args array of the method: public static void main(string[] args) which contains the command line input. For example, when executing sxs@cca112:~$ java predictive.words2sigproto Hello World! this is the input the args array will contain ["Hello", "World!", "this", "is", "the", "input"] You should ignore any words with non-alphabetic characters given in the input of Sigs2WordsProto. Format the output of Sigs2WordsProto as one line per signature, as there may be more than one word for a given numeric signature. E.g. sxs@cca112:~$ java predictive.sigs2wordsproto 4663 329 4663 : good gone home hone hood hoof 329 : dax fax faz day fay daz the actual output you get will depend on the dictionary used. Notice that the package name predictive qualifies the class name, and this command works in the main directory. You can also use the -cp.. option to run the command from a different directory, e.g., sxs@cca112:~/predictive$ java -cp.. predictive.sigs2wordsproto 4663 329 The program Words2SigProto can be tested by converting large amounts of text to signatures, the output can be used to test Sigs2WordsProto (and later, in timing comparisons). Try using news articles to start with. 2 Storing and searching a dictionary(20%) In the remaining parts of the worksheet, you are asked to implement a number of dictionary classes that will be more efficient than the prototype. All of these classes should implement this interface: public interface Dictionary{ public Set<String> signaturetowords(string signature); } 4

The required method signaturetowords finds the possible words that could correspond to a given signature and returns them as a set. In this part, you will read and store the dictionary in memory as a list of pairs. As the list will be sorted and in memory, a faster look-up technique can be used. 1. (15%) : Create a class named ListDictionary. In its constructor, you should read the dictionary from a file and store it in an ArrayList. Each entry of the ArrayList must be a pair, consisting of the word that has been read in and its signature. So you will need to create a class named WordSig that pairs words and signatures (see the hints). The wordtosignature method will be the same so you can re-use the code from the first part. The signaturetowords method must be re-written as an instance method in the List- Dictionary class to use the stored dictionary. The ArrayList<WordSig> must be stored in sorted order and the signaturetowords method must use binary search to perform the look-ups. 2. (5%) : Design and create a command-line program Sigs2WordsList for testing the ListDictionary class. Hints : Compare the time taken to complete the execution of Sigs2WordsList and Sigs2- WordsProto with the same large input(s). Is it possible to make the time difference between Sigs2WordsList and Sigs2WordsProto noticeable? Make a note of the data you use and your timing results. Create a class which pairs the numeric signatures with words, like this: public class WordSig implements Comparable<WordSig>{ private String words; private String signature; public WordSig (...) {... } public int compareto(wordsig ws) {... } }... When you read the dictionary you will need to create new WordSig objects. A list of Comparable objects can be sorted using the method Collections.sort 1. 1 Find out more about collections and the comparable interface in the Java tutorial on Collections: http: //java.sun.com/docs/books/tutorial/collections/index.html 5

To automatically sort a list using the collections API, the objects WordSig stored in the list must implement the Comparable interface. That means they must have a compareto(...) method. compareto returns -1, 0 or 1 according to whether the current object is less than, equal to, or greater than the argument object, in the intended ordering. Sort the dictionary only once. You can search a sorted list using Collections.binarySearch. It simplified type can be written as follows: static <T> int binarysearch(list<t>, T) Note that the type variable T in both the arguments must be the same. Binary search will return the index of the first match it finds. You must return all matching words. Scan above and below the found index to collect all matching words. The time command-line program on Linux machines will tell you how long a given command takes to complete. E.g. sxs@cca112:~/predictive/$ time java -cp.. predictive.sigs2wordslist <input> <output> real user sys 0m0.286s 0m0.260s 0m0.010s Use the real elapsed time in all comparisons. 6

3 More efficiency (25%) This part involves creating an improved implementation of the Dictionary interface using a Map data structure. 1. (15%) : Implement a new class MapDictionary that stores the dictionary using a generic multi-valued Map. In this context, a multi-valued map is a data structure that maps signatures to a set of words. Using a Map, data can be retrieved quickly by signature, as in ListDictionary, but does not require scanning either side of the index as earlier. MapDictionary will also allow efficient insertion of new words in the dictionary while still allowing fast look-up. You must choose a Map implementation from the Java Collections API. Explain how the map works and justify your choice. The constructor MapDictionary must populate the Map using the given dictionary file. Write a method signaturetowords that returns, in a Set<String>, only the matching whole words for the given signature. The character length of each returned word must be the same as the input signature. 2. (5%) : Create a program Sigs2WordsMap that uses the MapDictionary class. It should be possible to modify just one line in your Sigs2WordsList program so that it can work with any given implementation of the Dictionary interface. Hints: The MapDictionary class must implement Dictionary. Do not use the WordSig class. When deciding what your Map will store in MapDictionary, keep in mind that one signature often corresponds to several words. When developing ListDictionary, you may have noticed that it was useful to create helper methods to add words to the data structure. Creating add helpers will simplify the constructors of both MapDictionary and TreeDictionary. 4 Prefix-matching (25%) This part involves creating another improved implementation of the Dictionary interface using your own tree data structure. This should allow the words or parts of words that match partial signatures, so that the users will be able to see the parts of the words they are typing as they type. 1. (20%) : Implement a new class TreeDictionary that now stores the dictionary in your own tree implementation. It should be possible to search the tree-based implementation quickly (similar to ListDictionary) and for words to be inserted quickly (as in Map- Dictionary). In addition, TreeDictionary should support finding words when only 7

the first part of the signature (a prefix) is known. This is so that the user can see the part of the word they intend to type as they are typing. This tree implementation is quite similar to the actual implementations found in mobile phones. The TreeDictionary class forms a recursive data structure, similar to, but more general than, the Tree class in the first exercise of the semester. This tree differs in that each node now has up to eight branches, one for each number (2-9) that is allowed in a signature. Each path of the tree (from the root to a node) represents a signature or part of a signature. At each node of the tree, you must store a collection of all the words that can possibly match the partial signature along the path. That means that every word that has a prefix corresponding to the partial signature appears in the collection. For example, if the dictionary has the words a, ant and any, then the words at nodes corresponding to paths would be as follows: at node 2, we have a, ant and any, at node 2, 6, we have ant and any. at node 2, 6, 8, we have only ant. Write a constructor for the class TreeDictionary that takes a String path to the dictionary and populates the tree with words. Write a method signaturetowords that returns, in a Set<String>, the matching words (and prefixes of words) for the given signature. The character length of each of the returned words or prefixes must be the same as the input signature. 2. (5%) : Create a program Sigs2WordsTree, similar to Sigs2WordsMap, that uses the TreeDictionary class. Hints: Compare the time taken to complete the execution of Sigs2WordsMap and Sigs2WordsTree with large inputs. Is it possible to make the time difference between Sigs2WordsList and Sigs2WordsMap or Sigs2WordsTree and Sigs2WordsMap noticeable? Again, make a note of the data you use and your timing results. The TreeDictionary class must implement Dictionary. class. Do not use the WordSig Before starting TreeDictionary, sketch a tree-dictionary containing 2-3 words. Every node of TreeDictionary will have a collection of words and eight TreeDictionarys. You may use an array of TreeDictionary or just store several objects, as you prefer. The root node of TreeDictionary should not store any words. In TreeDictionary it is more memory efficient to store only whole words as read-in from the dictionary. You should do this and write a helper-method to trim all the words in a given list to produce the output of signaturetowords. 8