Package rpref. August 4, 2014

Similar documents
Package ECctmc. May 1, 2018

Semantic Optimization of Preference Queries

Package assertr. R topics documented: February 23, Type Package

Package queuecomputer

Package RCA. R topics documented: February 29, 2016

Foundations of Preferences in Database Systems

Package strat. November 23, 2016

Package purrrlyr. R topics documented: May 13, Title Tools at the Intersection of 'purrr' and 'dplyr' Version 0.0.2

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package narray. January 28, 2018

Package datasets.load

Package pomdp. January 3, 2019

Package cgh. R topics documented: February 19, 2015

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package spark. July 21, 2017

Package keyholder. May 19, 2018

Package tidyimpute. March 5, 2018

Package ezsummary. August 29, 2016

Package rgho. R topics documented: January 18, 2017

Package kdtools. April 26, 2018

Package kirby21.base

Package tidyselect. October 11, 2018

Package ompr. November 18, 2017

Package comparedf. February 11, 2019

Package diffusr. May 17, 2018

Package validara. October 19, 2017

Package Numero. November 24, 2018

Package reval. May 26, 2015

Package woebinning. December 15, 2017

Package readxl. April 18, 2017

Package crossword.r. January 19, 2018

Package SimilaR. June 21, 2018

Package timeordered. July 5, 2018

Package condusco. November 8, 2017

Package rtext. January 23, 2019

Package NFP. November 21, 2016

Package resumer. R topics documented: August 29, 2016

Package colf. October 9, 2017

Package tidytransit. March 4, 2019

Package sfc. August 29, 2016

Personalization and recommendation of OLAP queries

Package fst. December 18, 2017

Package OLScurve. August 29, 2016

Package nonlinearicp

Package calpassapi. August 25, 2018

Package fastdummies. January 8, 2018

Package canvasxpress

Package clustree. July 10, 2018

Package snplist. December 11, 2017

Package intcensroc. May 2, 2018

Package svalues. July 15, 2018

Package nmslibr. April 14, 2018

Package farver. November 20, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package MTLR. March 9, 2019

Package lucid. August 24, 2018

The Preference SQL System An Overview

Package lexrankr. December 13, 2017

Package TVsMiss. April 5, 2018

Package reconstructr

Package hypercube. December 15, 2017

Package sankey. R topics documented: October 22, 2017

Package signalhsmm. August 29, 2016

Package readobj. November 2, 2017

Package io. January 15, 2018

Package ggloop. October 20, 2016

Package bisect. April 16, 2018

Package corclass. R topics documented: January 20, 2016

Package fst. June 7, 2018

Package wrswor. R topics documented: February 2, Type Package

Package nullabor. February 20, 2015

Package hyphenatr. August 29, 2016

Package WordR. September 7, 2017

Package rollply. R topics documented: August 29, Title Moving-Window Add-on for 'plyr' Version 0.5.0

Package labelvector. July 28, 2018

Package QCAtools. January 3, 2017

Package messaging. May 27, 2018

Package dkanr. July 12, 2018

Package DCD. September 12, 2017

Package tweenr. September 27, 2018

Package jstree. October 24, 2017

Package postgistools

Package padr. November 16, 2017

Package tibble. August 22, 2017

Package mdftracks. February 6, 2017

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package coda.base. August 6, 2018

Package modmarg. R topics documented:

Package rsppfp. November 20, 2018

Package Rvoterdistance

Package splithalf. March 17, 2018

Package rlas. June 11, 2018

Package attrcusum. December 28, 2016

Package opencage. January 16, 2018

Package combiter. December 4, 2017

Package catenary. May 4, 2018

Package zebu. R topics documented: October 24, 2017

Package gppm. July 5, 2018

Package CuCubes. December 9, 2016

Package PedCNV. February 19, 2015

Transcription:

Version 0.1 Date 2014-08-01 Title Preferences and Skyline Computation in R Author Patrick Roocks <mail@p-roocks.de> Maintainer Patrick Roocks <mail@p-roocks.de> Package rpref August 4, 2014 Implementing Preferences in R to retrieve Maxima for a given strict partial order. This also includes the Skyline Operator, which allows to retrieve the pareto-optimal values. URL http://www.p-roocks.de/rpref Depends R (>= 3.0.0), methods Imports Rcpp, dplyr, igraph License GPL (>= 2) LazyData true LinkingTo Rcpp Suggests testthat NeedsCompilation yes Repository CRAN Date/Publication 2014-08-04 16:48:39 R topics documented: base_pref.......................................... 2 base_pref_macros...................................... 3 complex_pref........................................ 4 get_btg........................................... 6 psel............................................. 7 rpref............................................. 9 1

2 base_pref Index 10 base_pref Base preferences Base preferences are used to describe the different goals of a preference query. Usage low(expr) high(expr) true(expr) Arguments expr Value which is the term for the current preference and should be minimized/maximized or, for boolean preferences, TRUE. The term expr can be just a single attribute an arbitrary expression, e.g., low(a+2*b+f(c)), where a, b and c are columns of the addressed dataset and f is a previously defined function. Details All base preferences are mathematically strict weak orders (irreflexive, transitive and negative transitive). The three fundamental base preferences are: low(a), high(a) Search for minimal/maximal values of a, i.e., the induced order is the "smaller than" or "greater than" order on the values of a. The values of a must be numeric values. true(a) Search for true values in logical expressions, i.e. TRUE is better than FALSE. The values of a must be logical values. Functions contained in expr are evaluated over the entire dataset, i.e., it is possible to use aggregate functions (min, mean, etc.). Note that all functions (and also variables not occuring in the dataset, where pref will be evaluated on) must be defined in the same environment (e.g. function scope) as the base preference. See Also See complex_pref how to compose complex preferences to retrieve e.g. the Skyline. See base_pref_macros for more base preferences.

base_pref_macros 3 Examples # Define a preference with a score value combining mpg and hp p1 <- high(4*mpg + hp) # Perform the preference selection psel(mtcars, p1) # Define a preference with a given function f <- function(x, y) (abs(x-mean(x))/max(x) + abs(y-mean(y))/max(y)) p2 <- low(f(mpg, hp)) psel(mtcars, p2) base_pref_macros Useful base preference macros In addition to the fundamental base preferences, rpref offers some macros to define preferences where a given interval or point is preferred. Usage around(expr, center) between(expr, left, right) pos(expr, pos_value) layered(expr,...) Arguments expr center left right pos_value The value which should be in the preferred interval, layer, etc. The same requirements as for base_pref apply. Preferred value, where a values from the given expr should be near by. Lower limit for a preferred interval. Upper limit for a preferred interval. The preferred value or set for a pos-preference.... Layers (sets) for a layered-preference, where the first set are the most preferred values. Definition of the preference macros between(expr, left, right) Those tuples are preferred, where expr evaluates to a value between l and r. For values not in this interval, the values nearest to [l, r] are preferred. around(expr, center) Same as between(expr, center, center).

4 complex_pref pos(expr, pos_value) If expr evaluates to a value which is contained in pos_value, these tuples are preferred. layered(expr, layer1, layer2,..., layern) For the most preferred tuples, expr must evaluate to a value in layer1. The second-best tuples are those, where expr evaluates to a value in layer2, and so forth. Values occuring in non of the layers are considered worse than those in layern. Technically, this is realized by a Prioritization (lexicographical order) chain of boolean preferences. Examples # Search for cars where mpg is near to 25 psel(mtcars, around(mpg, 25)) # cyl = 2 and cyl = 4 are equally good, cyl = 6 is worse psel(mtcars, layered(cyl, c(2, 4), 6)) complex_pref Complex preferences Usage Complex preferences are used to compose different preferences orders. For example the Pareto composition (via operator *) is the usual operator to compose the preference for a Skyline query. All complex preferences are mathematically strict partial orders (irreflexive and transitive). ## S3 method for class preference p1 * p2 ## S3 method for class preference p1 & p2 ## S3 method for class preference p1 p2 ## S3 method for class preference p1 + p2 reverse(p1) Arguments p1,p2 Preferences to be composed (either base preferences via base_pref or also complex preferences)

complex_pref 5 Skylines The most important preference composition operator is the Pareto-Operator (p1 * p2) to formulate a Skyline query. A tuple t1 is better than t2 w.r.t. p1 * p2 if it is strictly better w.r.t. one of the preferences p1, p2 and is better or equal w.r.t. the other preference. The syntactical translation from other query languages to rpref is as follows: A query in the syntax from Borzsonyi et. al (2001) like "... SKYLINE OF a MAX, b MIN, c MAX" corresponds in rpref to the preference low(a) * high(b) * low(c). A query in the syntax from Kiessling (2002) like "... PREFERRING a LOWEST and (b HIGHEST PRIOR TO c LOWEST)" corresponds in rpref to low(a) * (high(b) & low(c)). Definition of additional preference operators Additionally, rpref supports the following preference composition operators: p1 & p2 Prioritization (Lexicographical order): A tuple t1 is better than t2 w.r.t. p1 & p2 if it is strictly better w.r.t. p1 or is euqal w.r.t. p1 and is better w.r.t. p2. p1 p2 Intersection preference: A tuple t1 is better than t2 w.r.t. p1 p2 if it is strictly better w.r.t. both preferences. p1 + p2 Union preference: A tuple t1 is better than t2 w.r.t. p1 + p2 if it is strictly better w.r.t. to one of the preferences. Note that this can violate the strict partial order property, if the domains (the tuples on which p1 and p2 define better-than-relationships) of the preferences are not disjoint. reverse(p1) or -p1 Reverse preference (converse relation): A tuple t1 is better than t2 w.r.t. -p1 if t2 is better than t1 w.r.t. p1. The unary minus operator, i.e. -p1, is a short for reverse(p1). References W. Kiessling (2002): Foundations of Preferences in Database Systems. In Very Large Data Bases (VLDB 02), pages 311-322. S. Borzsonyi, D. Kossmann, K. Stocker (2001): The Skyline Operator. In Data Engineering (ICDE 01), pages 421-430. See Also See base_pref for the construction of base preferences. See psel for the evaluation of preferences.

6 get_btg Examples # Define preference for cars with low consumption (high mpg-value) # and simultanously high horsepower p1 <- high(mpg) * high(hp) # Perform the preference search psel(mtcars, p1) get_btg Better-Than-Graph Returns a Hasse-Diagramm of a preference order (also called the Better-Than-Graph) on a given dataset to be plotted with the igraph package. Usage get_btg(df, pref) Arguments df pref A dataframe. A preference on the columns of df, see psel for details. Details This function returns a list l with the following list entries: l$graph An igraph object, created with the igraph package. l$layout A typical Hasse-diagram layout for plotting the graph, also created with igraph. To plot the resulting graph, use the plot function as follows: plot(l$graph, layout = l$layout). For more details, see igraph.plotting and the examples below. The Hasse diagram of a preference visualizes all the better-than-relationsships on a given dataset. All the edges which can be retrieved by transitivity of the the order are omitted. The names of the vertices are characters ranging from "1" to as.character(nrow(df)) and they correspond to the row numbers of df. See Also igraph.plotting

psel 7 Examples # Pick a small data set and create preference / BTG df <- mtcars[1:10,] pref <- high(mpg) * high(hp) btg <- get_btg(df, pref) # Create labels for the nodes with relevant values labels <- paste0(df$mpg, "\n", df$hp) # Plot the graph using igraph library(igraph) plot(btg$graph, layout = btg$layout, vertex.label = labels, vertex.size = 25) # Add colors for the maxima nodes and plot again colors <- rep(rgb(1,1,1), nrow(df)) colors[psel.indices(df, pref)] <- rgb(0,1,0) plot(btg$graph, layout = btg$layout, vertex.label = labels, vertex.size = 25, vertex.color = colors) # Show lattice structure of 3-dimensional Pareto preference df <- merge(merge(list(x = 1:3), list(y = 1:3)), list(z = 1:2)) labels <- paste0(df$x, ",", df$y, ",", df$z) btg <- get_btg(df, low(x) * low(y) * low(z)) plot(btg$graph, layout = btg$layout, vertex.label = labels, vertex.size = 20) psel Preference selection Usage Evaluate a preference on a given dataset, i.e. return the maximal elements of a dataset for a given preference order. psel(df, pref, top = NULL) psel.indices(df, pref, top = NULL) Arguments df pref A dataframe or, for grouped preference selection, a grouped dataframe. See below for details. The preference order, constructed via complex_pref and base_pref. All variables occuring in pref must be columns of the dataframe df (or can be variables of the environment, where pref was defined).

8 psel top By default NULL, which means that the maxima are returned. For top = k the k-best elements according to the preference are returned. Details The difference between the two variants of the preference selection is: The psel function returns a subset of the dataset which are the maxima according to the given preference. The function psel.indices returns just the row indices of the maxima. Hence psel(df,pref,top) is equivalent to df[psel(df,pref,top),] for non-grouped dataframes. For grouped dataframes, the groups are restored after the preference selection. For a given top value "k", the k best elements are returned. By definition, this is non-deterministic. A top-1 query of two equivalent tuples (according to pref) can return on both of these tuples. In the rpref implementation, in this case the first occuring tuple in the dataset is picked. If the top value is greater than the number of elements in df, i.e., top > nrow(df) then all elements of df will be returned without further warning. Grouped preference selection With psel it is also possible to perform a preference selection, where the maxima are calculated for every group seperatly. The groups have to be created with group_by from the dplyr package. The preference selection preserves the grouping, i.e., the summarize function from dplyr refers to the set of maxima of each group. This can be used to e.g. calculate the number of maxima in each group, see examples below. A given top value k in connection with a grouped preference selection returns the k best values for each group. Hence if there are three groups in df, each containing at least 2 elements, and we have top = 2 then 6 tuples will be returned. See Also See complex_pref how to construct a Skyline preference. Examples # Skyline and Top-K skyline psel(mtcars, low(mpg) * low(hp)) psel(mtcars, low(mpg) * low(hp), top = 5) # Visualize the skyline in a plot sky1 <- psel(mtcars, high(mpg) * high(hp)) plot(mtcars$mpg, mtcars$hp) points(sky1$mpg, sky1$hp, lwd=3) # Grouped preference with dplyr library(dplyr) psel(group_by(mtcars, cyl), low(mpg)) # Return size of each maxima group summarise(psel(group_by(mtcars, cyl), low(mpg)), n())

rpref 9 rpref The rpref package. The rpref package.

Index Topic skyline complex_pref, 4 *.preference (complex_pref), 4 +.preference (complex_pref), 4 -.preference (complex_pref), 4 &.preference (complex_pref), 4 around (base_pref_macros), 3 base_pref, 2, 3 5, 7 base_pref_macros, 2, 3 between (base_pref_macros), 3 complex_pref, 2, 4, 7, 8 get_btg, 6 group_by, 8 high (base_pref), 2 igraph, 6 igraph.plotting, 6 layered (base_pref_macros), 3 low (base_pref), 2 pos (base_pref_macros), 3 psel, 5, 6, 7 reverse (complex_pref), 4 rpref, 9 rpref-package (rpref), 9 true (base_pref), 2 10