Visualization of Pareto Data through Rank-By-Feature Framework

Size: px

Start display at page:

Download "Visualization of Pareto Data through Rank-By-Feature Framework"

Luke Woods
5 years ago
Views:

1 Visualization of Pareto Data through Rank-By-Feature Framework Dan Carlsen, Hao Jiang, Bo Yu (IST 597B Term Paper, Fall 2006) Correspondence: Dan Carlsen Hao Jiang Bo Yu Abstract Identifying performance trade-offs between various designs given a set of independent variables that define the design is of the utmost importance in understanding complex systems. These solutions to the multi-objective problem are known as Pareto optimum (Coello, 1999) or non-dominated and have the property that no other solution can be found that performs better for all objectives. Visualizing the Pareto-frontier in the performance space (objective function space) with more than three-objectives has been a great challenge to the optimization community. Our work decomposes these higher order Pareto sets into viewable (comprehensible) graphs with the overall goal of this project to come up with a way to support decision makers who have different interests while exploring this Pareto data in great detail. To achieve this goal, we place our work in a Rank-By-Feature framework by Seo & Shneiderman (2005) with new ranking criteria fit to Pareto data. These criteria include discontinuities, linearity, and shapes. In order to better support users while using the system, we added a focus + context technique with multiple focuses to the Rank-By-Feature Framework. Keywords Pareto Optimum, Rank-by-feature, Discontinuities, Linearity, Shapes, Focus + Context

2 I. Introduction Background Identifying performance trade-offs between various designs given a set of independent variables that define the design is of the utmost importance in understanding complex systems. Consider, for example the illustration shown below. Here the goal is to design a system that maximizes the life of a satellite while minimizing the cost to build it. Figure 1 represents a two-objective trade-off and the optimal solutions are A, B and C. These solutions to the multi-objective problem are known as Pareto optimum (Coello, 1999) or non-dominated and have the property that no other solution can be found that performs better for all objectives (A, B, and C are not dominated by each other or solutions D or E). Figure 1. Pareto Solutions In two dimensions, the set of optimal designs that define this trade-off are known, geometrically as a Pareto-frontier. In three-dimensions, the set is a Pareto-surface. In higher dimensions, visualization of interaction between these non-dominated designs becomes a challenge. The Problem Visualizing the Pareto-frontier in the performance space (objective function space) with more than three-objectives has been a great challenge to the optimization community. Although several suggestions have been made, they remain unwieldy or require extensive training. What is lacking from current 1

3 approaches is a methodology for extracting geometric features of higher order (greater than 3-dimensions) Pareto optimal data sets while maintaining a global context of all trade-offs considered in the complex system. In real world, such as automobile industry (Fujita et al, 1998), the optimum problems are usually involved with many objectives, and it may have many different solutions to these problems. After Pareto optimum calculation through various algorithms, it is possible to have a collection of solutions all of which are non-dominant points. That means we cannot say one solution in that solutions pool is better than any other in that solutions pool, because all of them are optimized solutions. However, for many reasons, different stakeholders or the same stakeholders in different situation may have different interests on certain objectives and care about the relations among some objectives. Think about this scenario. We have a Pareto data set, which has 30 objectives or dimensions for each solution. As this is Pareto data set, each point in this set is an optimal point, however, for some purpose, we would like pick certain points as our final choice based on certain criteria. For example, in the automobile design case, the designer may want to see the relation between horsepower and other objectives, and he or she wants to find out what objectives have negative relation with house power. In this situation, the designer can not tell the information he or she wants from the Pareto frontier itself, because Pareto data set only shows the final optimum. This problem exists in current Pareto data set visualization solution, such as HSDC (Agrawal et al, 2006). In present work, what we would like explore is the mechanism to maintain the global context while provides flexible means for exploring detailed information and relationship among dimensions (objectives). Related work Currently visualization in the performance space is limited to three objectives. Using colors, shapes, glyphs and other visual channels, it is possible to integrate more than three dimensions. These methods result in a busy display with complicated legends wherein decision makers must expend considerable energy and time to makes sense out of the chaos. Parallel coordinates (Inselberg, 1990) is one of the leading approaches used in industry to assist in the selection of optimal designs but the approach is not 2

4 intuitive for a large number of dimensions. Cloud visualization (Eddy & Lewis, 2002) is used to visualize Pareto data. Agrawal and his colleagues (2005, 2006) introduced a new Pareto data visualization technique called Hyper-Space Diagonal Counting (HSDC), in which multiple objectives or dimensions are agglomerated into a two or three coordinators plane. In three-dimension plane, one dimension is used to show the density of dots in a bin. All the solutions above focused on overall information visualization, but somehow are not convenient for explore detailed information, especially for digging the relationship and trends among objectives or dimensions. Our approach For this research, we would like to extend the work of Jinwook Seo and Ben Shneiderman and their Rank-by-Feature framework to aid in the understanding non-dominated sets in multi dimensions. Our strategy is that, we first deescalate high dimensional data set into two dimensional data pairs, and provide means to explore any two-dimensional relation. Case 1 Case 2 Figure 2 We will obtain examples of higher order Pareto data sets to use for feature identification and extraction. We will explore potential ranking criterion and develop (where needed) algorithms to identify and rank the given feature for any two-objective pair. Possible ranking criterions, related to the geometric properties of each two-objective Pareto-frontier include: the degree to which the solutions conflict with each other; the shape of the Pareto-frontier (normalized); the discontinuities exhibited by the frontier. The problem of identifying geometric primitives is complicated further by the fact that the optimization direction (maximum or minimum) for each objective will impact the ranking of the features and must be accounted for in a global context. For example, consider Case 1 in figure 2 where two objectives are both minimized. The Pareto-data set is a convex curve in the performance space. Now suppose that both 3

5 objectives are to be maximized (Case 2). The same data set may now consist of non-dominated solutions that form a concave curve or perhaps now exhibit discontinuous behavior. We must account for the optimization direction while identifying geometric properties so that they are understood in the global context of all 2D pairs of the n-dimensional Pareto set. Specifically in present work, in order to achieve our goal, we will place our work on rank-by-feature framework, and generate some specific ranking criteria that aid stakeholders discovering Pareto data set with particular interests. In general, we have three categories of ranking criteria other than the original ones presented in Seo & Shneiderman s (2005) work. These categories are discontinuity, linearity and curve shape. We will introduce each category in details in following sections. Besides new criteria we employ, we will make some modification in original work of rank-by-feature framework to enhance issues in usability and visualization. For example, we will apply focus + context technique to enable user having a thumbnail view on each two-dimensional pair in the global view window, and multiple-focus technique to enable user pour multi-focus in global view. As we place our work on rank-by-feature framework, we will introduce this framework by Seo & Shneiderman (2005) succinctly. Then issues of applicability and suitability for building our work on it will be discussed. After that we will focus on the details of criteria and concrete procedures for generating ranking information according to different criteria. Before conclusion we will introduce several enhancements for visualization and usability. Finally, we will close our discussion in conclusion section with some potential future work. Generation of Pareto Data In order to help simplify our project, one assumption we had to make was that we already had the Pareto data. In addition to already having the data, we assumed that the data was also already decomposed into 2-dimensional plots so that we could begin applying our ranking criteria. To generate the Pareto data examined for our project we used a free downloadable version of the NSGA-II, which is a Non-Dominating Sorting Genetic Algorithm. Non-Dominating meaning that the solutions it generates for a given problem are Pareto solutions (Pareto solutions dominate all other non-pareto solutions with respect to both objective functions in 2-dimensions). The problems which 4

6 generated our plots are discussed in A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II. In total, 11 plots were generated by running this algorithm. The solutions generated allowed us to examine concave, convex, and discontinuous plots, as well as plots that exhibited both concave and convex sections and plots that contained nearly horizontally or vertically linear regions. II. Application of Rank-By-Feature Framework Introduction to Rank-By-Feature Framework Dealing with multidimensionality has been challenging to researchers in many disciplines for many years. In 1985, the statistician John Tukey proposed an approach called scagnostics (Tukey, 1985), and believed that displaying scatter plots with two of the many dimensions in a matrix was a comprehensible way to look at data. But he mentioned that there were often too many such 2D projections to examine in large data sets, so he proposed a few criteria for ranking scatterplots, but no one has implemented his ideas until now. The rank-by-feature framework (Seo & Shneiderman, 2005), developed by Jinwook Seo and Ben Shneiderman at the University of Maryland in the Hierarchical Clustering Explorer (HCE) software tool ( implements Tukey s vision in an open-ended manner to allow easy addition of new criteria. Figure 3. Rank-by-feature framework interface for scatterplots (2D) (After (Seo & Shneiderman, 2005) The rank-by-feature framework is designed for interactive feature detection in multidimensional data sets using axis-parallel projections in low dimensions (1D or 2D). It is believed that, by combining information visualization techniques (overview, coordination, and dynamic query) with ranking, summaries and statistical methods, users can systematically examine the most important 1D and 2D 5

7 axis-parallel projections, and develop a deeper understanding of the whole data sets. The Graphics, Ranking, and Interaction for Discovery (GRID) principles can be summarized as: (1) study 1D, study 2D, then find features ; (2) ranking guides insight, statistics confirm (Seo & Shneiderman, 2005). Based on the GRID principles, the rank-by-feature framework has a multiple components interface as shown in Figure 2.1. In the control panel (A), users can select a ranking criterion and rank low-dimensional projections (1D or 2D) of the multidimensional data set according to the strength of the selected feature in the projection. The score overview (B) is an m-by-m grid view where all dimensions are aligned in the rows and columns. Each cell of the score overview represents a scatterplot whose horizontal and vertical axes are dimensions at the corresponding column and row respectively. Each cell is color-coded by its score value of the selected ranking criterion. The ordering list (C) shows the result of ordering sorted by the ranking with scores color-coded on the background. A click on a cell in the score overview or an item in the ordering list will show the corresponding scatterplot in the scatterplot browser (D). The most valuable virtue of the rank-by-feature framework is the open-ended manner to incorporate new ranking criteria into the framework. Since the rank-by-feature framework can be considered as a telescope for high-dimensional data (Shneiderman, 2006), the ranking criteria can serve as different light filters to provide users different perspectives of the same data set. Although some common criteria, such as correlation coefficient, uniformity, are helpful in most conditions, it is necessary to develop new criteria to some specific problem domains. The rank-by-feature framework allows integrating novel statistical tests or new data mining algorithms into the current framework easily as plug-ins. Several criteria are implemented in the original framework, which include: correlation coefficient, least square error for curvilinear regression, quadracity, the number of potential outliers, the number of items in the region of interest, and uniformity of scatterplots. Suitability to Pareto Data There are a few important reasons why this Pareto data would be suitable for having a Rank-By-Feature Framework applied to it. One very significant reason is that multi-objective design/decision problems result in a vast number of data points (even if only Pareto points are considered). This immense amount 6

8 of data lends itself to having visualization techniques applied to examine it so that one would be able to discover interesting relationships and facts. A Rank-By-Feature Framework makes discovering relationships in large amounts of data relatively easy. Another very important reason is that higher order Pareto sets are not intuitive. In 2 dimensions, the Pareto set is a frontier (a curve), in 3 dimensions it is a surface, but what does it mean to have a Pareto set in 4 or more dimensions? This question can not be easily answered. Other techniques could possibly be used to visualize this higher order data without simplifying it, but those techniques themselves are not intuitive and require extensive training to begin to reap any benefit from them. Even once they are understood by the user, the user would find it quite difficult to explain what they see to another person. With these multi-objective design problems, any decisions that are made need to be explained to others in the group. The main reasons for using Rank-By-Feature for Pareto data are as follows. It decomposes the higher order Pareto sets into viewable (comprehensible) graphs, meaning the 2-dimensional plots are very easy to understand for anyone. It provides an overview of all objective function relationships. By ranking all of the plots it gives the designer or decision maker quantifiable criterion for making a selection. Finally, pictures (2D plots in this case) make storytelling easier, so that when the user needs to explain their choice to management or another stakeholder they have a plot to show how they made their decision in an easy manner. Application of Rank-By-Feature There are two things which need to be considered in order to apply the Rank-By-Feature Framework to the Pareto data. First, we need to create useful criteria to be evaluated (criteria that aids the decision process). Second, we need to decide on what attribute of that criteria to numerically rank. With this in mind, we created our new criteria. We created three criteria to evaluate the 2D Pareto plots: discontinuities, shape (i.e., concavity or convexity), and linearity (nearly horizontally or vertically linear regions in the plot). 7

9 III. Our Ranking Criteria A. Discontinuities In order to find discontinuities in the data we formed a simple algorithm. The discontinuities are found in the following manner: Find the linear distance (ld) between all data points. Using a sum of squares (Pythagorean Theorem). ld² = Δx² + Δy² Average ld between all data points. Each individual ld then compared to a multiple of that average. If the ld between two data points is greater than the multiple of the average, a discontinuity would be said to exist at that point. The user would determine the multiple to be used, likely based on the expected spread of the scatter plots. The 2-dimensional Pareto plots would be ranked by the number of discontinuities found in the plot. For example, the plot for fon in Figure 4.1 would exhibit no discontinuities unless a very small number (such as 1) for the multiple was used because the average ld between points is only The plot for zdt3 in Figure 4.2 would show discontinuities for even larger numbers of the multiple since the ld between the points is Discontinuities would be returned for the large gaps in the plot. The plot of zdt3 would score higher when counting discontinuities. Such discontinuity information would be quite useful for decision makers. Which designs are not feasible? Does other information indicate they would be a good choice? The decision maker could then investigate an earlier design step to find out why they are not feasible. Such information would not simply give an overall optimal solution, but would help the designer to figure out why certain designs are currently not feasible. 8

10 fon - Pareto Data zdt3 - Pareto Data 1.20E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E+00 Figure 4.1: Plot of Pareto Data for fon Figure 4.2: Plot of Pareto Data for zdt3 B. Linearity In order to find the nearly vertically or horizontally linear regions of the plot we applied another simple algorithm. We say nearly vertically or horizontally linear because a true horizontal or vertical line could not exist in the Pareto plot. For example, if a line were completely horizontal and goal was to get both objectives as close to zero as possible then the point furthest to the left of the horizontal line would dominate all other points on that line with respect to the objective on the x-axis. These regions were found by: Find the slope (s) between data points. s = Δy/Δx Average s between all data points. Each s then compared to a multiple of that average as well as the average divided by another user chosen number. If the s between two data points is greater than the multiple of the average it is said to be nearly vertically linear. If the s between two data points is less than the average divided by a chosen number is it said to be nearly horizontally linear. Just as in the discontinuity algorithm, the user would determine the multiple to be used as well as the 9

11 number used to divide. The plots are then ranked according to the percentage of points found to be nearly horizontally or vertically linear. For example, zdt5 in Figure 5.3 would show both vertically (depending on the multiple) and horizontally linear regions. The plot of pol in Figure 5.1 would show only a small vertical portion, but most would be considered nearly horizontal. Both pol and zdt5 would have a high score for linearity while zdt2 in Figure 5.2 would not. Knowledge of linear regions is very important. A highly linear region indicates that design choices vary greatly with respect to one objective function while varying very little with respect to the other. Much improvement can be made with respect to one objective function while losing very little with respect to the other. pol - Pareto Data zdt2 - Pareto Data Figure 5.1: Plot of Pareto Data for pol Figure 5.2: Plot of Pareto Data for zdt2 zdt5 - Pareto Data Figure 5.3: Plot of Pareto Data for zdt5 10

12 C. Shapes Along with discontinuities and linearity criteria, we now introduce the third category of the new criteria, the shapes of two-dimensional data in Pareto data set and their trends. For the purpose of providing users the means to explore the trends of any two-dimensional data, first, we need to get the proximal shapes of the data set. We use curve-fitting technique (Denison, Mallick & Smith, 1998) to get the approximate shape and corresponding polynomials that represent two-dimensional data. After that we can get the rank of each curve by calculating the coefficients of equations. In this category, we basically have eight criteria. They are linear increase and decrease, concave increase and decrease, convex increase and decrease, and U-shape and reversed U-shape. For U-shape and reversed U-shape criteria, because they have both increase and decrease branches, so we simply take the curvature of each curve as rank criterion, the bigger the curvature, the higher the rank. These ranking criteria, we think, can aid the process of decision making by enabling users to dig into the details of each two-dimensional pair, and particularly providing the information of shape of curves that represent the tendency of change. By such information, different users can easily find the objectives they are interested in and capture the relationship among objectives they care about. For example, in the automobile design case, there are about 30 objectives, and among them we have an objective representing longevity of cylinders. If a user (maybe designer, buyer or other potential decision maker) have special need on this dimension, he or she can explore the all of the data pairs with one dimension as cylinder longevity. This does not give the direct answer to the user about which solution is perfect for him or her, but it can aid the decision by providing means by which the user can explore detailed information among objectives. Some issues need to be clarified are 1) curve fitting is not supposed to generate the original equations which generates those curves, and in fact, curve fitting usually loss fidelity, comparing with the original data distribution, even if the fitting is the best fitting; 2) related to the first issues, fidelity losing is amplified by the fact that people can choose different fitting degree. For example, for the same data set, people can use quadratic or three degree polynomial to fit the curve. These problems also exist in our present work. However, because our goal is to provide estimated shapes and tendency for users, although higher accuracy or fidelity is important to us, it will not impair users exploring information too much, in 11

13 terms of approximate trends and shape. Furthermore, it can be a feature enhancement in that getting higher fitting coefficient of each fitting can be taken into account. Linear Fitting Linear fitting aims to capture the linear changing tendency of two-dimensional data from Pareto data set. Linear relation is the simplest relationship among two-dimensional data, and it represents a relation in which if value on one dimension, usually X-axis in the two-dimension coordinator plane, varies, value on another dimension, usually Y-axis, will change, and the rate of change always remains the same. In our categories, linear change has two criteria, linear increase and linear decrease. For these two criteria, we consider the rate of change as main rank criteria and the direction of change. Mathematically, rank of linear increase and decrease can be drawn from each one of them, which means if rank of linear increase is given, users can get the rank of linear decrease. However, we still explicitly provide these two criteria in our final design, because 1) if we just provide only one of them, calculation still is needed to get another rank result, so it will consume extra but unnecessary cognitive process; 2) Semantically, these two criteria are different. y = 3x y = 2x y = -2x Figure 6: Example of linear criteria Rank information for linear increasing and decreasing is relatively easily to get. Now, we present the 12

14 main procedures through which we generate ranking formation for linear criteria. As we mentioned before, we get the linear polynomial and corresponding fitting coefficient (L_fitting_coefficient) for each polynomial by using curve fitting technique, and the resultant equations will take the form as y = ax + b. We will use L_fitting_coefficient to communicate the information of how well is the fitting. After we get the equation of each curve, line in this case, we calculate the slop (L_slop = a) of each of them, which will be used in final ranking computing. If linear increase is specified as a rank criterion, the bigger the L_slop, the higher the rank; if linear decrease is selected, the bigger the L_slop, the higher the rank. Figure? shows the example of linear criteria. In linear case, both the actual increase and decrease situation will be presented in score overview panel with different color and gray scaling coding. That means, users can find linear decrease relations in the score overview panel, even though they select linear increase as rank criterion. Quadratic fitting Figure 7: Example of quadratic fitting Quadratic fitting aims to reveal the relations among those data which is not linear distributed, or in other words, having obvious curve with its distribution. Curve or quadratic relations are more complex than linear relations, because they contain much more information. For example, concave decrease curves have 13

15 high rate of decrease at the beginning period of increasing of value on X-axis (Figure 2, case 1); while curves with convex decrease have high rate of decrease at the latter part of increase of value on X-axis (Figure 2, case 2). Such information may be important to decision makers, because there is a threshold which distinguishes to situations. Besides linear criteria, we have other six are related to quadratic curve criteria. However, we will put U-shape and reversed U-shape into a separate category, because they have both increase and decrease branches. Now, we mainly focus on concave increase and decrease, convex increase and decrease. Figure 7 shows an example. To get the information for ranking, we first use curve-fitting technique to generate quadratic polynomial of each data sets and corresponding fitting coefficient (Q_fitting_coefficient) of each fitting. We will use L_fitting_coefficient to communicate the information of how well is the fitting. Because we use quadratic fitting, all of the resultant polynomials should have the form y = a*x*x + b*x + c. Based on the polynomials we get, we will get the largest curvature (Q_curvature) of each polynomial, which is used to calculate the final ranking. To get the shapes of real data in the Pareto data set, we need to know exact information of the position of real data on the generated curves. We compute the normal (Q_Normal = -b/ (2a)) of each curve. After we get this information, we can determine the position and shape of real data. We call the position information as Q_position. For each curve, if Q_Normal is at somewhere between minimal and maximal values of the data we have on X-axis, we give Q_position a value 0; we can tell the curve is U-shape (a > 0) or reversed U-shape (a < 0 ); if Q_Normal is less than minimal value of the data we have, we give Q_position a value -1; the curve is concave increase when a > 0, or the curve is convex decrease when a < 0; if Q_Normal is bigger than maximal value of the data we have, we give Q_position a value 1; the curve is concave decrease when a > 0, or the curve is convex increase when a < 0. After getting all the information above, we can compute the rank of each curve based on different criteria the users specify. If concave increase is selected as rank criterion, first, we drop the curves with Q_position = 0, and curves with a < 0, because these curves are convex or U-shape; second, we use equation R = Q_curvature* a/ a *Q_position, the bigger the R, the higher the rank. If concave decrease is selected as rank criterion, first, we drop the curves with Q_position = 0, and curves with a < 0, because these curves are convex or 14

16 U-shape; second, we use equation R = Q_curvature* a/ a *Q_position, the bigger the -R, the higher the rank. If convex increase is selected as rank criterion, first, we drop the curves with Q_position = 0, and curves with a > 0, because these curves are concave or U-shape; second, we use equation R = Q_curvature* a/ a *Q_position, the bigger the R, the higher the rank. If convex decrease is selected as rank criterion, first, we drop the curves with Q_position = 0, and curves with a > 0, because these curves are concave or U-shape; second, we use equation R = Q_curvature* a/ a *Q_position, the bigger the -R, the higher the rank. Through the steps shown above, we can generate the final ranking for each curve with criterion as concave increase or decrease, or convex increase or decrease. Beyond that, we can generate the U-shape and reversed U-shape rank by information above. For the case with U-shape as criterion, we only pick curves with Q_position = 0, because others are only concave or convex; then we computing the rank of each curve, using equation R = Q_curvature* a/ a, the bigger the R, the higher the rank. If the case with reversed U-shape is selected, we only pick curves with Q_position = 0, because others are only concave or convex; then we computing the rank of each curve, using equation R = Q_curvature* a/ a, the bigger the -R, the higher the rank. IV. Interface Improvement While meaningful ranking criteria provide users the chance to systematically examine the most important low-dimension projections for the Pareto data sets, appropriate information visualization techniques can help users explore the ranking results more effectively and therefore maximize the benefit of ranking. Based on the analysis of current visual interface in the rank-by-feature framework, we believe that further improvement could be proposed to better users understanding of the data. Analysis of current Rank-By-Feature interface The original rank-by-feature interface used in HCE toolkit has four parts (see Figure 2.1): control panel (A), score overview (B), ordering list (C), and scatter-plot browser (D). The control panel allows users to dynamically choose different ranking criteria and change the views both in the score overview and the ordering list. The score overview provides a color-coded overview of the ranking scores in a two-dimension matrix. The ordering list is a linear list with more detailed ranking information. The 15

17 scatter plot browser shows the actual scatter plot for a specific low-dimension (1D or 2D) projection. Views in these three visual components are linked together. When the user changes the focus in any of them, the other two components will change correspondingly. Generally, three visualization techniques can be identified in this interface: A. Overview + Details Generally, the rank-by-feature interface follows the visual information seeking mantra: overview first, zoom and filter, then details-on-demand. The score overview (B) provides an overview of the entire collection of data; the ranking-criterion and color-coding can help people filter out uninteresting items; the scatter-plot explorer (D) shows the detailed scatter-plot when a cell in the overview matrix is selected. The overview + details technique helps users explore large sets of data by keeping a view of whole data available, while pursuing detailed analysis of a part of it. B. Coordination The rank-by-feature interface also provides an enhanced level of interactivity by combining displays and allowing highlights to be broadcast from one to the other. The overview matrix and the score list show different views of the same ranking result. When users choose a focus in any of them, the focus in the other one will be changed accordingly. By using this technique, the system reduces the time for users to coordinate in different components. C. Dynamic query Besides, the framework provides dynamic feedbacks by allowing users to control the contents of the display. Users can quickly change the ranking criteria and the views in overview matrix and the score list will be updated by the system. In this way, users can compare the difference between different criteria, so that they can find the most helpful ones. Although these visualization techniques have been applied in current framework and provide users an interactive interface, we still believe that some problems have not been addressed and could be improved by integrating other techniques: (1) The details-on-demand component can only show one scatter plot at each time. However, when users study the multi-dimensional Pareto data sets, what they really want is to compare 16

18 characteristics in multiple dimensions at the same time. For the rank-by-feature frame, that means they want to compare several scatter plots in the same display, which is unfeasible in current interface. (2) When the number of dimensions grows very large, the score overview will become very crowd and the size of the cells will be very small. As a result, it will be difficult for users to choose the interested item. (3) The overview and details-on-demand components in this interface is separated as different windows. However, it is said that when information is broken into two displays, visual search and working memory consequences degrade performance (Larkin and Simon, 1987). Users need to change their focuses from the overview window to the detail window, which will increase the cognitive load of using the system. Integration with Focus + Context To address these problems in current framework, we propose to integrate Focus + Context strategy into the visual interface. The working hypothesis of this strategy is that it may be possible to create better cost structures of information by displaying more peripheral information at reduced detail in combination with the information in focus, dynamically varying the detail in parts of the display as the user s attention changes. Our method can be described as follow: (1) The background of cells in the overview matrix is colored using our ranking criteria. The colors serve as the clue for users to navigate in the matrix and help users find possible points of interests. This is the same as original interface. (2) When the mouse cursor moves to one of the cells, which means it becomes the focus, we amplify the area of it and show the scatter plot of these two dimensions directly in it. Also, the cells around this cell are amplified, but with a smaller scale. These cells can be used for users to find exact cell, which they may be interested in, when the number of cells becomes very large (See Figure 8(A)). 17

(3) When the user clicks on one of these cells, it will be amplified even when the mouse cursor moves away. The cell can be restored to the normal size when the user clicks it again.

A problem that should be addressed in the multiple focuses is that if the multiple focuses are far away from each other, which means that they cannot be displayed in the same view, the cells among

19 (3) When the user clicks on one of these cells, it will be amplified even when the mouse cursor moves away. The cell can be restored to the normal size when the user clicks it again. In this way, the user can specify multiple focuses, so that they can compare the details of several different combinations of dimensions at the same time (See Figure 8(B)). A problem that should be addressed in the multiple focuses is that if the multiple focuses are far away from each other, which means that they cannot be displayed in the same view, the cells among those focus cells will be compressed to a smaller scale than the normal size. Generally, the interface can benefit from this technique in three ways: (A) (B) Figure 8: Context + Focus in Score Overview (1) By showing the details-on-demand directly on the overview window, users can avoid the switch of focus from one window to another, so that the cognitive load to coordinate different views will be reduced. (2) The focus + context strategy allows the system to display large volume of data in a single view without undermining the understanding of the user. (3) The users can compare multiple details at the same time by using the multiple focuses. In our application of Pareto data sets, this is very important for the users because they usually need to investigate the data from more than two dimensions. 18

20 V. Conclusion The overall goal of this project was to come up with a way to support decision makers who have different interests while exploring Pareto data with respect to different objectives in great detail. To achieve this goal, we place our work in a Rank-By-Feature framework by Seo & Shneiderman (2005) with new ranking criteria fit to Pareto data. These criteria include discontinuities, linearity, and shapes. In order to better support users while using the system, we added a focus + context technique with multiple focuses to the Rank-By-Feature Framework. We discovered that while our method is a tool to help aid decision makers, it does not necessarily give the best design, but it provides a meaningful way to explore Pareto Data and add validation for design decisions. Of course, there is a lot of future work that could be done. For example, we found that our ranking criteria are not the only useful criteria; others could likely be developed by experts in various knowledge domains. Also, we plan to implement our work into the Rank-By-Feature software, and we would like to have our methods evaluated by actual decision makers to prove its usefulness. 19

21 Reference 1. Coello, C. A. C. (1999). A comprehensive survey of evolutionary-based multiobjective optimization. Knowledge and Information Systems, 1(3): Seo, Jinwook and Shneiderman, Ben (2005). A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data. Information Visualization Voll. 4, No. 2, Summer 2005, pp Deb, K. (2001). Multi-Objective Optimization using Evolutionary Algorithms, John Wiley & Sons, New York, 2001, pp Horn, J., Nafpliotis, N. and Goldberg, D. E. (1994). A niched pareto genetic algorithm for multiobjective optimization. In Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Computation, Volume 1, pages Abbass, Hussein A., Sarker, Ruhul, Newton, Charles (2001). PDE: A Pareto-frontier Differential Evolution Approach for Multi-objective Optimization Problems. Proceedings of the 2001 Congress on Evolutionary Computation CEC Agrawal, G., Bloebaum, C. L., Lewis, K. (2005). Intuitive Design Selection Using Visualized n-dimensional Pareto Frontier. 46 th AIAA/ASME/ASCE/AHS/ASC Structures, Agrawal, G., Lewis, K. E., Bloebaum, C. L. (2006). Intuitive Visualization of Hyperspace Pareto Frontier. 44th AIAA Aerospace Sciences Meeting and Exhibit; Reno, NV; USA; 9-12 Jan pp Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A Fast and Elitist MultiobjectiveGenetic Algorithm: NSGA-II, IEEE Trans. on Evolutionary Computation: 6 (2002) Deb, Kalyanmoy and Saxena, Dhish Kumar (2005). On Finding Pareto-Optimal Solutions Through Dimensionality Reduction for Certain Large-Dimensional Multi-Ob jective Optimization Problems. KanGAL Report: No Seo, Jinwook and Shneiderman, B. (2006). Knowledge discovery in high-dimensional data: case studies and a user survey for the rank-by-feature framework. Visualization and Computer Graphics, IEEE Transactions on Volume 12, Issue 3, May-June 2006 Page(s): Shneiderman, B. (2006) A Telescope for High-Dimensional Data. Computing in Science & Engineering, vol. 8, no. 2, 2006, pp Pareto efficiency. 20

22 Contributors 1 Introduction Hao 1.1 Background 1.2 The Visualization Problem 1.3 Related Work 1.4 Our Approach 1.5 Structure of This Paper 2 Application of Rank-By-Feature Framework Bo & Dan 2.1 Introduction of Rank-By-Feature Framework 2.2 Why is it suitable for our problem? 2.3 What should we do to apply it in our context? Bo Dan Dan 3 Ranking Criteria Hao & Dan 3.1 Current criteria provided by the original work Hao 3.2 Our new criteria Discontinuties Dan Shapes Hao Linearity Dan 4 Interface Improvement Bo 4.1 Analysis of current Rank-By-Feature interface 4.2 Integration with Focus + Context 5 Conclusion All 6 Reference All 21

EVOLVE : A Visualization Tool for Multi-Objective Optimization Featuring Linked View of Explanatory Variables and Objective Functions

2014 18th International Conference on Information Visualisation EVOLVE : A Visualization Tool for Multi-Objective Optimization Featuring Linked View of Explanatory Variables and Objective Functions Maki