Extending Jython with SIM, SPARQL and SQL 1
Outline of topics Interesting features of Python and Jython Relational and semantic data models and query languages, triple stores, RDF Extending the Jython grammar, abstract syntax, code compiler, runtime behavior Project suggestions and opportunities 2
Why is Python lovely? High level, multi-paradigmed: object oriented, imperative, functional, the best of many worlds Dynamic-duck-strong typing, automatic memory management, garbage collection Emphasizes code readability, clear syntax Off-side rule block delimiting instead of braces Scripting, prototyping, scientific computing, data visualization, web frameworks, Google, bittorrent CPython is the mainstream implementation 3
Isn t Jython reinventing the wheel? Python does not compile into universal binaries which run on every platform Jython dynamically compiles Python into Java bytecode which runs directly on a JVM Jython programs can interact with other Java packages and libraries Sacrifice a little speed for portability Extend language with Java tools instead C 4
J/Python features and caveats Compile or Interpret Off-side rule Duck Typing Multiple return values Argument passing List comprehension Functional programming Decorators 5
Interactive interpreter 1 $./dist/bin/jython 2 Jython 2.5.1+ (unknown:exported, Feb 24 2010, 14:11:17) 3 [Java HotSpot(TM) Client VM (Apple Inc.)] on java1.5.0_22 4 Type "help", "copyright", "credits" or "license" for more information. 5 6 >>> from math import * 7 8 >>> 2**32/1024**3 # how many gigabytes in 32 bit address space? 9 4L 10 11 >>> 2**64/1024**3 # how many gigabytes in 64 bit address space? 12 17179869184L 13 14 >>> 2**1024 # try doing this on a TI-89 without overflow... 15 179769313486231590772930519078902473361797697894230657273430081157732675805500963132 708477322407536021120113879871393357658789768814416622492847430639474124377767893424 865485276302219601246094119453082952085005768838150682342462881473913110540827237163 350510684586298239947245938479716304835356329624224137216L 16 17 >>> log(exp(2)) == exp(log(2)) # are nat log and e^x inverse functions? 18 True 19 20 >>> [(x, log(x)) for x in xrange(1, 11)] # log of first ten positive integers 21 [(1, 0.0), (2, 0.6931471805599453), (3, 1.0986122886681096), (4, 1.3862943611198906), (5, 1.6094379124341003), (6, 1.791759469228055), (7, 1.9459101490553132), (8, 2.0794415416798357), (9, 2.1972245773362196), (10, 2.302585092994046)] 6
Offside-rule block delimiting Indentation using spaces or tabs at the very left of statements is used to delimit blocks instead of curly braces. The exact amount of indentation does not matter, instead the relative indentation of nested blocks matters. Bruce Eckel: Because blocks are denoted by indentation in Python, indentation is uniform in Python programs. And indentation is meaningful to us as readers. So because we have consistent code formatting, I can read somebody else's code and I'm not constantly tripping over, "Oh, I see. They're putting their curly braces here or there." I don't have to think about that. 7
Offside-rule implementation The tokenizer uses a stack to store indentation levels. Whenever a nested block begins, the new indentation level is pushed on the stack, and an <INDENT> token is passed to the parser, though there can never be more than one <INDENT> token in a row. When a line is encountered with a smaller indentation level, values are popped from the stack until a value is on top which is equal to the new indentation level (if none is found, a syntax error occurs). For each value popped, a <DEDENT> token is generated, and there can be multiple <DEDENT> tokens in a row. At the end of the source code, <DEDENT> tokens are generated for each indentation level left on the stack. 8
Offside-rule example Sample code:! if foo:!!if bar:!!!x = 1! else:! x = 0! Token stream and indent stack:! <if> <foo> <:>!!! [0]! <INDENT> <if> <bar> <:>!! [0, 4]! <INDENT> <x> <=> <1>!! [0, 4, 8]! <DEDENT> <DEDENT> <ELSE> <:> [0]! <INDENT> <x> <=> <0>!! [0, 2]! <DEDENT>!!!! [0]! 9
Duck typing 1 def function(x, y, z): 2 return (x+y)*z 3 4 >>> print function(1, 2, 3) 5 9 6 7 >>> type(function(1, 2, 3)) 8 <type 'int'> 9 10 >>> print function("one", "two", 3) 11 onetwoonetwoonetwo 12 13 >>> type(function("one", "two", 3)) 14 <type 'str'> An object's current set of methods and properties determines the valid semantics Allows polymorphism without inheritance The name of the concept refers to the duck test, attributed to James Whitcomb Riley: "when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck." 10
Multiple return values 1 def function(): 2 x = 1 3 y = "two" 4 z = [1, 2, 3] 5 return x, y, z 6 7 >>> print function(); 8 9 (1, 'two', [1, 2, 3]) 10 11 >>> a, b, c = function() 12 >>> print a, b, c 13 14 1 two [1, 2, 3] 15 16 >>> d = function() 17 >>> print d 18 19 (1, 'two', [1, 2, 3]) Python will automatically pack multiple values into a tuple You can otherwise pack your own lists, tuples, dicts 11
Function arguments 1 def function(arg0, arg1, *args, **kwargs): 2 print "arg0: %s" % arg0 3 print "arg1: %s" % arg1 4 print "args: ", args 5 print "kwargs: ", kwargs 6 7 >>> function(1, 2, "three", "four", key1="five", key2="six") 8 9 arg0: 1 10 arg1: 2 11 args: ('three', 'four') 12 kwargs: {'key2': 'six', 'key1': 'five'} Arguments are passed by reference, but immutable types (tuples, ints, strings) behave like pass by value Positional arguments precede keyword arguments *args receives a tuple (automatically packed) **kwargs receives a dictionary (automatically packed) 12
List comprehensions and generator expressions 1 list = [<expression> for <value> in <collection> if <condition>] 2 3 itr = (<expression> for <value> in <collection> if <condition>) List comprehension is a construct for creating a new list from an existing list This is analogous to mathematical set builder notation Generator expressions are the lazy evaluation equivalent of list comprehensions These are extremely useful, and you should master them! 13
Anonymous functions 1 map(lambda x: x**2, range(1, x+1)) 2 3 == [x**2 for x in range(1, x+1)] 4 5 filter(lambda x: x%2 == 0, range(1, x+1)) 6 7 == [x for x in range(1, x+1) if x%2 == 0] 8 9 reduce(lambda x,y: x*y, xxrange(2, x+1)) 10 11 ==? In Python, lambda functions are a matter of style. You can always define a separate normal function instead. Lambdas are appropriate when encapsulating specific, non-reusable code without littering the code base with simple one-line functions. List comprehensions and generator expressions are generally used instead of filter() and reduce() reduce() has been removed from Python 3.0 standard library 14
Decorator pattern A design pattern for use with any objected oriented programming language Uses composition rather than inheritance to extend classes Allows new behavior to be added to a particular object at runtime, independently of other instances of the same class Java I/O Streams implementation employs the decorator pattern 15
Function Decorators 1 def entryexit(f): 2 def new_f(): 3 print "Entering", f. name 4 f() 5 print "Exited", f. name 6 return new_f 7 8 @entryexit 9 def func1(): 10 print "-inside func1-" 11 12 @entryexit 13 def func2(): 14 print "-inside func2-" 15 16 >>> func1() 17 18 Entering func1 19 -inside func1-20 Exited func1 21 22 >>> func2() 23 24 Entering func2 25 -inside func2-26 Exited func2 16
Data Models How well does a model capture the meaning of data and inter-data relationships? 17
University database )1,.)*!?*5),%! <2.0%(! @/+.2&A! )&.3*%&)H /3;()(%G 7&.3*%&! 0/I,5(%GH(%! /3;(),5 C5/3D7&.3*%&! $%)&5.+&,5 E*)*/5+'*5 +2/))*)H &*/+'(%G! (%)&5.+&,5! 3*1&H)&.3*%&) 3*1&H! 6/+.2&(*) 6/+.2&AH3*1&! B*1/5&0*%&! +2/))*)H 5*G()&*5*3 )&.3*%&)H *%5,22*3! +,22*G*H3*1/5&0*%&)! F,22*G*! +,22*G*H(% 7*+&(,%! KLCLMB! 3*1&H/& +,.5)*)H,66*5*3 <! <!()!/!+2/))!%/0*! <! J! <!()!/!).4+2/))!,6!J! N N!()!/!)(%G2*!;/2.*3!LO<! N N!()!/!0.2&(!;/2.*3!LO<! F,.5)*! Diagram created by: Saurabh Boyed )*+&(,%)H,6! +,.5)*H,6! 18
Relational data models Based on mathematical relations and firstorder predicate logic By far the most popular model for data management in use today Complex relationships are difficult to describe and slow to query over Referential integrity issues Biased toward the implementation 19
Semantic data models Representation of how data relates to the real world is part of the model (rather than the implementation as per SQL databases) Class hierarchies and inheritance similar to objected oriented programming paradigm Better for models with hierarchic structure or entity relationships (many-to-many relationships are easy to define and query) 20
Hacking Antlr and Jython /grammar/python.g sqlquery_stmt, sqlinsert_statement - define new types of small_stmt; decompose SQL statements into respective lists; expressions in where clause stored in separate list /src/org/python/antlr/ast/sqlquery.java SQLGetQueryProcessor - call SQLProcessQuery of current SQLQuery object SQLProcessQuery - retrieve query results from database /src/org/python/compiler/codecompiler.java visitsqlquery - push where clause expressions onto runtime stack for evaluation /src/org/python/core/py.java sqlwhereclause - store evaluated where clause expression of current SQLQuery sqlwhereclausefinal - callback to SQLQuery.SQLGetQueryProcessor sending back evaluated where clause expression /src/org/python/antlr/ast/visitorbase.java, /src/org/python/antlr/ast/visitorif.java visitsqlquery, visitsqlinsert, visitsimquery - updates to handle new AST types 21
Grammar for sqlquery_stmt SELECT NAME STAR COMMA NAME CAPSFROM NAME COMMA NAME WHERE NAME ASSIGN LESS... ALT_NOTEQUAL expr (boxes with double lines are accepting) CAPSAND CAPSOR NAME ASSIGN LESS... ALT_NOTEQUAL expr 22
References The Python Language Reference Quick reference for Python 2.6 The Jython Wiki The Definitive Guide to Jython SPARQL Query Language for RDF A Semantic Database Management System: SIM Web Database (WDB): A Java Semantic Database 23