matchbox Documentation

Similar documents
Gearthonic Documentation

Iterators & Generators

PyTrie Documentation. Release 0.3. George Sakkis

Python Finite State Machine. Release 0.1.5

CSC148 Recipe for Designing Classes

What is git? Distributed Version Control System (VCS); Created by Linus Torvalds, to help with Linux development;

flask-dynamo Documentation

Archan. Release 2.0.1

Text Input and Conditionals

Beyond Blocks: Python Session #1

contribution-guide.org Release

Instrumental Documentation

COMP519 Web Programming Lecture 20: Python (Part 4) Handouts

Pypeline Documentation

e24paymentpipe Documentation

git-pr Release dev2+ng5b0396a

Airoscript-ng Documentation

RIPE Atlas Cousteau Documentation

Google Drive: Access and organize your files

Intro. Scheme Basics. scm> 5 5. scm>

6.189 Project 1. Readings. What to hand in. Project 1: The Game of Hangman. Get caught up on all the readings from this week!

Poetaster. Release 0.1.1

OTX to MISP. Release 1.4.2

django-embed-video Documentation

pytest-benchmark Release 2.5.0

Creating Hair Textures with highlights using The GIMP

Python Overpass API Documentation

lazy-object-proxy Release 1.3.1

Control of Flow. There are several Python expressions that control the flow of a program. All of them make use of Boolean conditional tests.

pyshk Documentation Release Jeremy Low

NiFpga Example Documentation

sainsmart Documentation

ATMS ACTION TRACKING MANAGEMENT SYSTEM. Quick Start Guide. The ATMS dev team

Django-CSP Documentation

django-auditlog Documentation

xmljson Documentation

Sequences and iteration in Python

pydrill Documentation

CS Lecture 19: Loop invariants

How to Make a Book Interior File

Dragon Mapper Documentation

kvkit Documentation Release 0.1 Shuhao Wu

SIMPLE PROGRAMMING. The 10 Minute Guide to Bitwise Operators

What s new in SketchUp Pro?

Scope of this lecture. Repetition For loops While loops

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Paul's Online Math Notes Calculus III (Notes) / Applications of Partial Derivatives / Lagrange Multipliers Problems][Assignment Problems]

Python Type Checking Guide Documentation

Python AutoTask Web Services Documentation

Simple Binary Search Tree Documentation

Word: Print Address Labels Using Mail Merge

Intro to Github. Jessica Young

TPS Documentation. Release Thomas Roten

Liquibase Version Control For Your Schema. Nathan Voxland April 3,

Roman Numeral Converter Documentation

Slicing. Open pizza_slicer.py

Control Structures 1 / 17

Flow Control: Branches and loops

Assignment 1c: Compiler organization and backend programming

nacelle Documentation

ArcGIS Server: publishing geospatial data to the web using the EEA infrastructure

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017

Intermediate/Advanced Python. Michael Weinstein (Day 1)

pylatexenc Documentation

Azure SDK for Python Documentation

Python simple arp table reader Documentation

git-flow Documentation

Change Log. L-Py. July 24th 2009: version (rev 6689): Introduce Lsystem::Debugger. Introduce first ui of a Lsystem Debugger.

Python StatsD Documentation

ipuz Documentation Release 0.1 Simeon Visser

chatterbot-weather Documentation

Word Tips (using Word but these work with Excel, PowerPoint, etc) Paul Davis Crosslake Communications

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Announcements. Lecture Agenda. Class Exercise. Hashable. Mutability. COMP10001 Foundations of Computing Iteration

FOMOD Designer Documentation

redis-lua Documentation

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

django-embed-video Documentation

cwmon-mysql Release 0.5.0

15-780: Problem Set #3

XDI Link Contract Deep Dive

dicompyler-core Documentation

spacetrack Documentation

Repetition, Looping. While Loop

Writing a connector to Property Unification How-to guide

Python wrapper for Viscosity.app Documentation

solrq Documentation Release Michał Jaworski

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Signals Documentation

dublincore Documentation

Creating Breakout - Part 2

Background. $VENDOR wasn t sure either, but they were pretty sure it wasn t their code.

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

django-contact-form Documentation

Kaiso Documentation. Release 0.1-dev. onefinestay

Semester 2, 2018: Lab 1

Dictionaries. Looking up English words in the dictionary. Python sequences and collections. Properties of sequences and collections

Taskbar: Working with Several Windows at Once

django-embed-video Documentation

pysharedutils Documentation

Transcription:

matchbox Documentation Release 0.3.0 Clearcode.cc Oct 27, 2018

Contents 1 Package status 3 2 Package resources 5 3 Contents 7 3.1 Glossary................................................. 7 3.2 Rationale................................................. 7 3.3 Usage................................................... 8 3.4 Api.................................................... 9 3.5 Contribute to matchbox......................................... 12 3.6 CHANGELOG.............................................. 13 4 License 15 Python Module Index 17 i

ii

Contents 1

2 Contents

CHAPTER 1 Package status Matchbox is a simple python library designed to make selecting object, or object s set based on required characteristics quick operation. No iterating, and no value checking on actual objects, just select and operations on dictionaries. 3

4 Chapter 1. Package status

CHAPTER 2 Package resources Bug tracker: https://github.com/clearcodehq/matchbox/issues Documentation: http://matchbox.readthedocs.org/ 5

6 Chapter 2. Package resources

CHAPTER 3 Contents 3.1 Glossary Entity - entity is an object indexed by MatchBox Characteristic - is an attribute that can describe entity Trait - Characteristic s trait, a value by which given characteristic can describe entity 3.2 Rationale Main reason MatchBox got created was to limit any data processing within the process of looking up for the desired set of entities. MatchBox achieves that through pre-processing entities and setting them up in a hashmap. 3.2.1 Assumptions Now, please note that matchbox makes some assumptions about entities it is processing Characteristics Every one of us, has up to two eyes, up to two hands, legs, has determined hair colour and height. It s not something that identifies us, but rather describes us. That s what we call characteristic. In database we d set up an index on a column with traits - values for given characteristic to make the search faster, here, each characteristic is being categorised by a separate MatchBox. Characteristic always have to have some trait It means that even the lack of characteristic trait is a trait of it s own. Entity with no trait for given characteristic simple means that it fits any trait the characteristic might have. Otherwise, if we d want to say, that this characteristic 7

has no trait, then the entity would be always left out of the search/matching, removed from resulting set, at least on selected few characteristic. Such object should not be used in the process which should the pool data smaller, and whole process this bit faster. And sometimes, even in a operations that get constantly repeated, even a nanosecond is a huge gain. A trait can never be both matched and missed There s no point of setting a characteristic trait to actually match, and other traits to missmatch on the same entity. Setting few selected traits that matches means also that all other traits doesn t fit this entity at all, and probably matching process would never get there anyway. It s like saying this couch is red, there s no point to say it s not pink. 3.2.2 Computational complexity For each box, the computational complexity can be quickly reviewed on python s wiki page Time Complexity Accessing key in a dictionary is at worst O(n) while average O(1), and difference between two sets s and t is O(len(s)) with average for the combination being O(1)+O(len(s)). 3.3 Usage To create a MatchBox, you have to decide which characteristic must it work on. box = MatchBox('colour') A MatchBox instance can work on only one characteristic at once. All entities that will be added and process by this matchbox, should have two attributes defined on them, which in combination create our characteristic. These are colour which should describe a colour and colour_match which should describe whether this entity is of that colour (or colours) or it s not that colour (or colours). In the example let s use entities defined by this collections.namedtuple(): HomeObject = namedtuple('homeobject', 'colour colour_match') chair = HomeObject('red', True) table = HomeObject('blue', True) wall = HomeObject('pink', False) paint_bucket = HomeObject('orange', True) And add these entities to our box. box.add(chair) box.add(table) box.add(wall) box.add(paint_bucket) match mehod performs simple set operation on passed collection, from which rejects a set of objects that MatchBox determine, won t fit red colour: home_objects = {chair, table, wall, paint_bucket} box.match(home_objects, 'red') Method match, will return set that consists of only chair and wall. chair because it explicitly says it s red, and wall because it says it s not pink, so it might as well be red or blue. Replacing red with pink in the above example will return an empty set from a box. 8 Chapter 3. Contents

3.4 Api Data structure that allows indexing includes and excludes of values. class matchbox.index.matchindex Bases: object An index for matching or mismatching of entities by hashable traits. It can answer one question - given a trait, what entities are excluded by it?. It can be used to filter a set of potential matches - not for general querying ( given a trait return all mathing entities ). When used as a filter, all entities in the input set must also be indexed by the MatchIndex to be fully filtered. Note: We can index entities by including or excluding them for given traits. Matching an object for some characteristic traits means that this object will match those values and it will NOT match any other traits. Mismatching an object for some traits means that this object will NOT match those traits and will match any other trait. It makes no sense for an entity to both match and mismatch the same characteristic. Data layout: We store: a dict that maps traits to a set of mismatches a set of entities indexed as matches. We don t store matching entities in any dict - instead if an entity is matched by a trait, it simply doesn t belong to the set of mismatches for this trait. Querying: When asked for entities to be filtered out by this index, we are given a value of the same kind as our index traits. This value can be: already indexed - we return the set of elements not matching it. The set contains entities to be rejected. previously unknown - we return the set of all objects that were indexed as matching. This is OK because: - Mismatched entities can be filtered out only by the traits they are indexed with so mismatches will be rejected by an unknown value. Matched entities should be rejected for everything else than traits they are indexed with. Note: Due to relying heavily on dictionaries and sets, MatchIndex indexes only hashable entities by one or more hashable traits. Example of indexing entities each by trait: Will result in an index: Entity Match Traits Ob1 Excluding 1 Ob2 Including 3 Ob3 Excluding 5 Ob4 Excluding Ob5 Including 1 3.4. Api 9

Example of indexing entities each by multiple traits: Should result in this index: Initialize the index. add_match(entity, *traits) Add a matching entity to the index. Trait Mismatched entities 1 Ob1, Ob2 3 Ob5 5 Ob2, Ob3, Ob5 Any new Ob2, Ob5 Entity Match Traits Ob1 Excluding 1, 2 Ob2 Including 3, 4, 7 Ob3 Excluding 5, 6 Ob4 Excluding Ob5 Including 1, 7 Trait Mismatched entities 1 Ob1, Ob2 2 Ob1, Ob2, Ob5 3 Ob5 4 Ob5 5 Ob2, Ob3, Ob5 6 Ob2, Ob3, Ob5 7 Any other Ob2, Ob5 We have to maintain the constraints of the data layout: self.mismatch_unknown must still contain all matched entities each key of the index must mismatch all known matching entities except those this particular key explicitly includes For data layout description, see the class-level docstring. Parameters entity (collections.hashable) an object to be matching the values of traits_indexed_by traits (list) a list of hashable values to index the object with add_mismatch(entity, *traits) Add a mismatching entity to the index. We do this by simply adding the mismatch to the index. Parameters entity (collections.hashable) an object to be mismatching the values of traits_indexed_by 10 Chapter 3. Contents

traits (list) a list of hashable traits to index the entity with match(collection, trait) Filter out those entities from collection that do not match the trait. Note: collection should be a set of objects that have already been added to the index. But if the objects in the collection hadn t had given characteristics defined, they wouldn t be indexed, and won t get rejected either. Parameters collection (set) a set of entities that should be filtered by this index. trait (collections.hashable) a value of the same kind as the traits that entities in this index are indexed with Returns set of matching entities. Return type set mismatch(trait) Return a set of indexed entities that are mismatched by the trait. The returned set can be used for filtering by substracting it from another set created by a previous MatchIndex or other set operations. Parameters trait any hashable object that can be considered as characteristic trait for this MatchBox. Returns set of entities that doesn t match characteristic trait Return type set mismatch_unknown = None This set will keep matching entities. They do not match unknown traits. Used for self.index default value, that means any previously unknown trait. Match box - for indexing objects by their fields. class matchbox.box.matchbox(characteristic, *args, **kwargs) Bases: matchbox.index.matchindex MatchBox is a MatchIndex that can index objects by their fields. Initialise the box and set the attribute this box will be indexing objects on. Indexed entities are expected to have an attribute of the same name as characteristic that contains a trait by which entity is classified. Optionally entities may have a second attribute stating whether the object should be classified to match this trait or the object should be classified to mismatch the trait. This atribute, for each characteristic, is called {characteristic}_match. Note: Each indexed entity has to have at least one characteristic. Each indexed entity can be described by: given characteristic traits, by all but given characteristic traits, all possible traits 3.4. Api 11

Indexed entity can t be described by none possible characeristic trait - in this case the logic dictates that the entity will not ever match, hence it shouldn t even be considered in queries and make it to the collection. Parameters characteristic (str) value identifying the attribute MatchBox will index entites with. Optionally the objects may have a {characteristic}_match boolean attribute to determine whether the object should be indexed as a match or mismatch add(entity) Add entity to index. Parameters entity (object) single object to add to box s index extract_traits(entity) Extract data required to classify entity. Parameters entity (object) Returns namedtuple consisting of characteristic traits and match flag Return type matchbox.box.trait remove(entity) Remove entity from the MatchBox. Parameters entity (object) class matchbox.box.trait(traits, is_matching) Bases: tuple Create new instance of Trait(traits, is_matching) _asdict() Return a new OrderedDict which maps field names to their values. classmethod _make(iterable, new=<built-in method new of type object at 0xa385c0>, len=<built-in function len>) Make a new Trait object from a sequence or iterable _replace(**kwds) Return a new Trait object replacing specified fields with new values is_matching Alias for field number 1 traits Alias for field number 0 3.5 Contribute to matchbox Thank you for taking time to contribute to matchbox! The following is a set of guidelines for contributing to matchbox.these are just guidelines, not rules, use your best judgment and feel free to propose changes to this document in a pull request. 3.5.1 Bug Reports 1. Use a clear and descriptive title for the issue - it ll be much easier to identify the problem. 2. Describe the steps to reproduce the problems in as many details as possible. 12 Chapter 3. Contents

3. If possible, provide a code snippet to reproduce the issue. 3.5.2 Feature requests/proposals 1. Use a clear and descriptive title for the proposal 2. Provide as detailed description as possible Use case is great to have 3. There ll be a bit of discussion for the feature. Don t worry, if it is to be accepted, we d like to support it, so we need to understand it thoroughly. 3.5.3 Pull requests 1. Start with a bug report or feature request 2. Use a clear and descriptive title 3. Provide a description - which issue does it refers to, nad what part of the issue is being solved 4. Be ready for code review :) 3.5.4 Commits 1. Make sure commits are atomic, and each atomic change is being followed by test. 2. If the commit solves part of the issue reported, include refs #[Issue number] in a commit message. 3. If the commit solves whole issue reported, please refer to Closing issues via commit messages for ways to close issues when commits will be merged. 3.5.5 Coding style 1. All python coding style are being enforced by Pylama and configured in pylama.ini file. 2. Additional, not always mandatory checks are being performed by QuantifiedCode 3.6 CHANGELOG 3.6.1 unreleased small code enhancement during adding matching entities to boxes remove method - ability to remove entity from already built box fix license information 3.6. CHANGELOG 13

3.6.2 0.3.0 added short glossary updated docs to reflect naming changes rewritten usage renamed various object s usages and index_object to entity [thanks Michael Sweeney] renamed characteristics_value and value references to traits, as in Characteristic s trait. [thanks Michael Sweeney] renamed MatchBox.not_matching method into MatchBox.mismatch - signature remained the same. only None and empty list will be treated as a value not used for matching added repr method to box renamed exclude_unknown to mismatch_unknown to clarify set s meaning Extracted indexing logic from MatchBox to a base class. 3.6.3 0.2.0 extended tests to cover python 3.5 merge MultiMatchBox into MatchBox - now anyone extending MatchBoxes will be able to work with value extractors rather than re-implementing MatchBoxes. 3.6.4 0.1.0 MatchBox - single value based Matching Box MultiMatchBox - multivalue based Matching Box package structure documentation 14 Chapter 3. Contents

CHAPTER 4 License Copyright (c) 2015 by matchbox authors and contributors. See authors This module is part of matchbox and is released under the MIT License (MIT): http://opensource.org/licenses/mit 15

16 Chapter 4. License

Python Module Index m matchbox.box, 11 matchbox.index, 9 17

18 Python Module Index

Index Symbols _asdict() (matchbox.box.trait method), 12 _make() (matchbox.box.trait class method), 12 _replace() (matchbox.box.trait method), 12 A add() (matchbox.box.matchbox method), 12 add_match() (matchbox.index.matchindex method), 10 add_mismatch() (matchbox.index.matchindex method), 10 E extract_traits() (matchbox.box.matchbox method), 12 I is_matching (matchbox.box.trait attribute), 12 M match() (matchbox.index.matchindex method), 11 MatchBox (class in matchbox.box), 11 matchbox.box (module), 11 matchbox.index (module), 9 MatchIndex (class in matchbox.index), 9 mismatch() (matchbox.index.matchindex method), 11 mismatch_unknown (matchbox.index.matchindex attribute), 11 R remove() (matchbox.box.matchbox method), 12 T Trait (class in matchbox.box), 12 traits (matchbox.box.trait attribute), 12 19