Visualization Architecture for User Interaction with Dynamic Data Spaces in Multiple Pipelines

Visualization Architecture for User Interaction with Dynamic Data Spaces in Multiple Pipelines S. Prabhakar 6444 Silver Avenue, Apt 203, Burnaby, BC V5H 2Y4, Canada Abstract Data intensive computational systems use multiple dynamically changing data sources. For the user to understand their behavior, visualization systems need to interactively set up new visualization scenarios from the existing visualizations. Designing such visualization systems require supporting complex interactions between diverse data sources and tools. The global nature of these interactions is a major impediment in the design and visualization time extension of such systems. We propose a principled approach that transforms the global interactions to fewer local interactions. The design is facilitated by a novel abstract software layer, which is based on multilayered data flow architecture with multi-layered control. We also present a set of event based design patterns sequences of local interactions among modules of software layer, which achieve the user interaction tasks. This software architecture is implemented in a Java based system called VisCoAdapt. Keywords: Architecture; Dataflow; Multi-layered control; Event based design patterns; Visualization systems; Human Computer Interaction 1 Introduction An important class of Data Oriented Visualization Systems visualizes data from diverse and dynamically changing data spaces. Such data spaces are typically transformed by algorithms, which run independent of the visualization processes but incorporate guidance received from them. User interaction tasks require that visualization system support complex interactions between various tools and data sources. One such important task is composing an existing visualization scenario with new visualization scenarios, where each additional scenario supports interactions with data sources similar to the original scenario. Further, new interactions arise between various existing scenarios. Design of such visualization systems is very complex as global interactions arise between various components. This is because various interactions arise between diversity of data sources and multiple tools. Further, the user driven visualization tasks require new type of control that manages interactions between components of visualization system from the display. Dataflow architectures provide an important layer of abstraction for the integration of various software modules [9]. This is enabled by mapping various stages of visualization onto a single dataflow. While this is a powerful architecture, there are additional requirements on these visualization systems incorporation of multiple diverse dynamic data sources, ability to support several user interactions and extensibility to multiple scenarios. During an interaction with visualizations, the user does not have explicit access to specific processes that transform data spaces or visualizations. The new architecture needs to deliver either the user input or the data source output to the appropriate destination. This requires deciding on the path and direction of dataflow within the architecture. The architecture should allow the user to create new scenarios from the existing ones. The new scenarios should maintain the same properties, as other scenarios, of interaction with the user, data sources and existing scenarios. These requirements point to a critical issue in the design of visualization systems. The interactions between modules in the architecture are global in nature: each interaction can depend on the states of other modules in the architecture, some of them farther away in the path of dataflow. As a result, design of visualization systems becomes tedious, and extensibility to multiple scenarios during user interactions is unachievable. In order to address these requirements, we propose novel dataflow architecture. The central idea is to transform the global interactions between different modules into local interactions. This is achieved by the following novel aspects of architecture. The dataflow pipeline abstraction is at the level of abstract transformations over data. This abstraction unifies various kinds of transformations into a single layer. For example, visualization and data space transformations are specializations of this abstraction. Generally, in a dataflow architecture, each pipeline module has multiple functionalities. This is a major

Figure 1. Dataflow Architecture with abstract data transformations as stages. source for global interactions. In the proposed architecture, these functionalities are separated out into different control layers. The interactions between different visualization scenarios are enabled by interactions between pipelines, and are supported by new control layers. These include creation of new pipelines and interaction between pipelines. stages for Machine Learning: Data Selection, Data Space Transformation 1 and Three D Projection. The visualization stages transform the visualization properties of inputs to outputs. For example, the data points are organized into geometry groups. The data space transformation stages work with visualization stages to set up visualizations of various aspects of Machine Learning algorithms with which the user can interact. Each stage is associated with a set of parameters, which the user can modify by interacting with visualization. Modules with localized and isolated functionalities support local decisions for computing the data flow path. We propose event based design patterns that carry out the complex user interaction tasks as sequences of decision supported local event firings. This architecture is completely implemented in Java in a system called VisCoAdapt. The visualized data is modeled using a scene graph, which is grounded in jreality [5]. 2 Architecture 2.1 Pipeline Stages In order to incorporate diverse data space and visualization transformations into a single dataflow, the pipeline stages are defined as abstract transformations. Figure 1 illustrates the architecture with two pipelines instantiated. Each pipeline starts with a data source and ends with a display, and integrates multiple data space transformations into their stages. Each stage has the only function of transforming its input data into output data and is specified in terms of a set of constructors. Figure 1 illustrates three types of instantiations of data space transformations in pipeline stages. These transformations are used typically in many Machine Learning algorithms. Each pipeline includes three Figure 1 also illustrates the hierarchical structure of pipeline: the Scene Graph Generation stage is further decomposed into various sub-stages. These sub-stages model the visualization data into a scene graph, which is displayed. Hierarchical decomposition masks some stages from the others in interaction. This makes the control of data flow and user interaction data manageable. This issue is discussed further in the section on Event based Design Patterns. The functionalities of various stages in pipeline are briefly presented here. The Data Selection stage captures an important first step in Machine Learning algorithms data selection from a given data set. The Data Space Transformation stage generates the target function learnt by using a Machine Learning algorithm [2]. The Three D Projection stage generates 3D projections of target functions by using algorithms such as Principal Component Analysis [4]. The output of this stage has geometry, such as a manifold [14], even though the input data to the pipeline may not be geometric. The Scene Graph Generation stage is central to user interactions as it provides the language, visualization and tools for representing and interacting with the data produced in earlier stages. As shown in figure 1, Scene Graph 1 We use data space transformation to indicate a general transformation of data, and also as a name to a pipeline stage. Wherever there is a scope for confusion, we explicitly add stage in the latter case.

Generation stage contains six sub-stages. The Visualization stage has a set of displayers that display the scene graph. A variety of displayers, such as the one-panel or two-panel displayers, allow for various visualization configurations. Our displayers are grounded in the viewers of jreality [5]. control layers are designed such that they support a new kind of Event-based Design Patterns to implement data and control flows. These are discussed in a later section. The Shape Formation sub-stage uses Non-Uniform Rational B-Splines (NURBS) [9] to represent the data. This parameterizes visualizations for user interactions. The Geometry Group sub-stage organizes data is into groups of geometric objects for visualization and interaction purposes. The Scene Graph Segment sub-stage provides a language to represent the geometric objects and geometric groups along with coordinate system and tools for visualization. These scene graphs can be reconstructed based on the parameters received from visualization. We implemented our scene graph over jreality [5], a general scene graph for visualizations. The Coordinate System Embedding sub-stage creates a coordinate system in which the geometry groups are embedded. The Visualization Characteristics sub-stage manages appearance parameters for the geometric objects and the coordinate system. The User Interaction Tools sub-stage creates and manages tools that enable the user to interact with visualizations. The tools allow the modifications and selections made to the visualizations to be parametrically communicated to the rest of the pipeline. Thus the tools act as another significant source of data to the pipeline. 2.2 Controlling Intra-Pipeline Communications Each pipeline needs to support several functionalities carrying out user interaction tasks, and responding to data input from preceding stages. The invocation and direction of data and control flows at each pipeline stage are conditional to the states present in the rest of the architecture. It is not possible for a pipeline module to decide on the flow without knowing the states of other modules in the entire architecture. For example, a pipeline stage performs input data transformation if data becomes present either at stage input or at the corresponding stage in another pipeline, or the user performs a visualization interaction task relevant to the current stage. Another example of dependence of transformation on another event is the replication of a pipeline stage when the user chooses to add a new scenario or a preceding pipeline module is replicated. In order to convert the non-local dependencies of data flow to local decisions, several control layers are added, three of which are shown in figure 2. The Controller layer decides the direction in which the data and control flow needs to be propagated. Each Adapter decides if the received request for data transformation is relevant to the current Stage Layer or not. The Factories layer creates a new pipeline from the current pipeline using modification data provided by the user. This layer is discussed in more detail in the next section. The Figure 2. Multiple control layers of a pipeline. Figure 3. Communication between pipelines 2.3 Controlling Inter-Pipeline Communications Each pipeline may be operating on a different set of visualization parameter values. For example, Data Selection algorithms in two different pipelines may be sampling data at different rates. But the pipelines need to coordinate for each user interaction task. This gives the user a comparative view of similar processes he/she is trying to understand. For example, the user may change the shape visualized in a scenario set up by a pipeline. These changes need to be translated to other pipelines. This is done by the Negotiator layer. Each Negotiator module stores the parametric relationships between two stages in two different pipelines as shown in figure 3. Each Negotiator implements this relationship by forcing one of its stages to transform if its other stage is transformed due to a request coming from the rest of the architecture. Thus the transformation requests are localized in between two similar stages belonging to different pipelines within the context of visualization changes. In figure 3, the Adapter module fits the data received from the Negotiator to suit its pipeline stage.

3 Event-based Design Patterns The architecture for visualization system needs to support complex interactions that arise due to (i) user s inputs at visualization, (ii) input data given to each pipeline, (iii) data inputs present at a stage, and (iv) negotiations that arise between multiple pipelines. To change the global nature of these interactions to those local to modules 2 in the architecture, our solution has three aspects. Each pipeline is hierarchically organized, masking lower level interactions from other levels. The complex functionalities of stages are isolated and organized among multiple control layers. These two aspects have been discussed in section 2. We discuss the third aspect in this section - a set of Event-based Design Patterns, which are defined to enable each module to have decision-based local control of the paths for data and control flows within the hierarchy of stages and the control layers. 3.1 Design Patterns for User Interaction Tasks An event-based design pattern is defined to carry out a user interaction task or data input. Each design pattern is a sequence of event invocations over modules within the architecture. Each module generates events locally based on the conditions present in the architecture. These conditions are made available to modules through the propagation of a set of properties. Each event type is associated with a set of properties. Based on the events received and the properties of those events, each module performs data flow tasks and fires a set of events to other modules in the architecture. Currently, VisCoAdapt contains several event based design patterns that cover various interaction tasks. Figure 4. Event-based design pattern for the Controller module to control stage. An example of an event-based design pattern captures the control of data transformations by the Controller within a stage. This pattern has a sequence of event invocations: (i) 2 We reserve the term module to refer to any component of architecture. It can be a pipeline stage or it can be a component of a control layer. Controller module receives an event, (ii) the Controller module decides on the action to take based on the history of event invocations and property values, (iii) Controller module sends an event to the stage, (iv) the stage applies transformation on its input data, making the data available at the input of the succeeding stage, and finally (v) the Controller module sends an event to the succeeding Controller module. Figure 4 shows this event-based design pattern. 3.2 Design Patterns for Pipeline Creation Figure 5. New pipeline creation by a sequence of events. One of the user interaction tasks is to compose a complex visualization scenario by adding new scenarios to the existing scenarios. The user can invoke such a request using a user interaction tool at visualization. This request is handled by a control layer called Pipeline Factories, as shown in figures 2 and 5. This layer stores all the factories required to create a pipeline. It creates a new pipeline by sending a sequence of replicating events to all the modules of existing pipeline. 4 Experiments This section presents a set of examples that illustrate the basic capabilities of the architecture of VisCoAdapt presented in section 2 and design patterns presented in section 3. The primary objective in all these experiments is to show that each user interaction based visualization task is realized through a sequence of event firings with localized decision making confined to modules of the architecture. The experiments presented make some simplifying assumptions without loss of generality. The experiments start with a visualization system having one pipeline. All the data is presented to the pipeline from the input data source (figure 1). In these experiments, the input data source at the pipeline presents geometry rich data such as knot vectors, control points and degrees of the NURBS objects. The visualizations shown are at the display end of the pipeline (figure 1). The

Data Selection, Data Space Transformation, and Three D Projection stages perform identity transformations. This does not minimize the illustrative potential of the experiments as the specific functions used in data space transformations do not play any role in the abstraction required for the architecture. In our second example, the input data source to the pipeline is the geometric data of NURBS surface: U knot vector, V knot vector, Control Points, U degree and V degree. The design pattern consists of similar steps to those of NURBS curve presented earlier, and upon execution produces the visualization of NURBS surface as shown in figure 7. 4.1 Visualization without User Interactions In this first experiment, the input data to the pipeline is the geometric information of a NURBS curve: a knot vector, control points, and the curve degree. Figure 6 shows the resulting visualization of the NURBS curve. Figure 7. NURBS Surface and Control Patch. Figure 6. A NURBS Curve The following is a sequence of steps of a design pattern that performs this visualization task. 1. Data becomes available at the input of the pipeline. This triggers an event and the Controller of Data Selection stage receives the event. 2. The Data Selection stage Controller fires an event requesting the Data Selection stage to apply stage transformation on its input data. 3. The Data Selection stage applies the transformation on input data. Since this stage has an identity transformation function, the input data is carried over to the output of the stage, without any changes. Since the output of Data Selection stage is connected to the input of Data Space Transformation stage, the data becomes present at its input. 4. The Data Selection stage Controller sends an event to Data Space Transformation stage Controller asking it to control its stage transformation. 5. The Data Space Transformation stage Controller receives the message that data is available at the input of its stage. It also receives an event from Data Selection stage Controller. 6. The steps 2 5 are repeated for each of the stages and sub-stages succeeding the Data Selection stage, producing the display shown in figure 6. Figure 8. Shape modification of NURBS curve. 4.2 User Interaction for Shape Modification Shape modification is a user interaction task, and figure 8 illustrates its effect on a NURBS curve (figure 6) by moving the control point. A new design pattern called Shape Modification Design Pattern produces the necessary event sequence. When the user drags a control point, this design pattern is triggered and it starts a new sequence of events inside the pipeline. 1. The Control Point Drag tool generates an event containing the context data. It is propagated along the Adapters layer (figure 2). 2. The Adapter of Shape Formation sub-stage recognizes it as relevant for its sub-stage. The Adapter triggers an event asking the Controller in the same Stage Layer (figure 2) to make necessary changes.

3. The Controller fires an event to the Shape Formation sub-stage asking it to apply the transformation on the data received from the tool. 4. The resulting data is made available at the input of the Geometry Group sub-stage. 5. Steps 2 5 of previous subsection are repeated for the succeeding sub-stages of Shape Formation sub-stage. 6. The resulting visualization shows the changes in the curve (shape) along with changes in the coordinate system. Figure 9 shows the changes in the shape of a surface and its coordinate system, which are set up by the same sequence of steps as above. Figure 9. Shape modification of NURBS surface 4.3 User Interaction based Visualization Composition The final example illustrates how the design pattern sets up a composed visualization, v c from an existing visualization, v 1 and a new visualization, v 2 that meets some constraints. In this example, the user wants to add a visualization of a curve (v 2 ) that is 5 times larger than the curve in v 1 (figure 6). Each of {v 1, v 2 } in v c has a unique pipeline and is able to provide similar kinds of user interactions with the pipeline as v 1. The added visualizations interact with the starting visualization to maintain a relationship the user intended. That is, the scaling between the curves in both the visualizations is maintained. Figure 10 shows the composed visualization. We briefly summarize the steps that create the composition. These steps form the Visualization Composition Design Pattern. In this example, we use a simple implementation of a tool which allows a predefined set of constraints to be passed. Here, the constraints state that the new shape should obey a scaling factor of 5 with respect to the current shape. 1. The user selects the visualization creation tool and then sets up constraints. These constraints are passed to the Pipeline Factories layer (figure 5) of v 1. 2. This design pattern generates several events shown in figure 5, which replicate multiple layers of pipeline for v 1. This results in a new pipeline that can generate v 2. During this replication, the constraints are incorporated into the Negotiator in between the pipelines. Figure 10. Composite visualization. 5 Related Research Building software engineering architectures, to enable visualization of vast amounts of data, has been an active theme for research [8, 14, 16, 17, and 19]. In order to make the visualizations effective for each user, they need to be personalized so that the user has access to the patterns hidden in the vast amounts of data. This personalization of visualizations is achieved in a number of ways. One of the approaches is to allow the user create pipelines by combining computational components in a dataflow [3, 7, 8, 12, 13]. This approach requires that the user understands the computational components, mechanism to interconnect them and a detailed knowledge of visualization. Another novel approach enables the user create visualizations by providing a comprehensive infrastructure that allows the application developer to explore collections of pipelines and combine them to create the applications [10]. It also captures detailed provenance of both application development and use [11]. VisCoAdapt shares these goals to provide a framework that the user can easily extend to build an application. It provides an infrastructure and the user builds an initial setup of application by extending a set of Java classes. The user is not required to have knowledge of the architecture. The user can build an application with the knowledge of algorithms and visualization attributes. For example, the user should be able to select a Machine Learning algorithm. The application can further be extended by the user at visualization time based on visualization requirements only. Several attempts tried to integrate visualizations with other tasks such as problem solving [1, 8]. In VisCoAdapt, the integration is between Machine Learning algorithms and visualization. The algorithms are a part of pipeline, thus creation of new pipelines enables new interactions between modules of pipelines and the algorithms supported in those modules. The negotiators can be extended by the user thus capturing wide range of interactions between modules. One of the requirements of a visualization system, which enables the user to have insights into data, is the support for

interaction between various visualizations [6]. In VisCoAdapt, we address this problem through pipeline interactions by using a mechanism for negotiations. The extensibility of the negotiators by the users supports wide range of interactions between pipelines. 6 Discussion and Conclusions The data oriented visualization systems that visualize data from diverse and dynamically changing data spaces require a uniform mechanism to integrate diverse data sources. Further, the visualization systems are required to support a range of user interaction tasks. Design of such visualization systems faces a problem of global interactions between various components of the visualization system. To address this problem, we presented a new software architectural layer that has three features: a hierarchical dataflow architecture that integrates diverse data transformations and generations into stages of the pipelines; multiple layers of control to isolate and localize functionalities to modules; and several event based design patterns that implement complex user interaction tasks as sequences of localized decision-based event invocations. This three part solution supports personalizing visualizations in dynamically emerging situations. The architecture breaks down the complex user interaction tasks into local tasks where specialized information provided by the user is applied. In personalization, the extensions provided by the user for Adapters and Negotiators play a significant role. For example, the Adapters decide the relevance of the given data to a Stage Layer. The Adapter can use simple rules or complex algorithms to make this decision. The Negotiators mediate between two pipelines in order to maintain the user specified relationship between the pipelines. An important consequence of the extensibility of Adapters and Negotiators is that the VisCoAdapt is highly scalable to various visualization tasks. 7 References [5] jreality: http://www3.math.tu-berlin.de/jreality/ [6] Koop, D., Scheidegger, C.E., Callahan, S.P., Vo, H.T., Freire, J., and Silva C.T., VisComplete: Automating Suggestions for Visualization Pipelines. IEEE Transactions on Visualization and Computer Graphics, 14, 6 (2008) 1691-1698. [7] Lee, E.A., and Parks, T.M., Dataflow Process Networks. Proceedings of the IEEE, 83, 5 (1995) 773 801. [8] Macleod, R., Weinstein, D., de St. Germain, J., Brooks, D., Johnson, C., and Parker., S. SCIRun/BioPSE: Integrated problem solving environment for bioelectric field problems and visualization, in Proceedings of the Int. Symp. on Biomed. Imag.(Arlington VA, April 2004), 640 643. [9] Piegl, L., and Tiller, W. The NURBS Book, 2nd Edition. Springer, New York, 1997. [10] Santos, E., Lins, L., Ahrens, J., Freire, J., and Silva, C. VisMashup: Streamlining the Creation of Custom Visualization Applications. IEEE Trans. Vis. Comp. Graph 15, 6 (2009), 1539-1546. [11] Silva, C.T., Freire, J., and Callahan, S.P. Provenance for Visualizations: Reproducibility and Beyond. Computing in Science and Engineering Journal 9, 5 (September 2007) 82-89. [12] Silva, C.T., and Freire, J. Software Infrastructure for exploratory visualization and data analysis: Past, present, and future. Journal of Physics: Conference Series 125, 1 SciDac 2008 Conference (2008). [13] The VisTrails Project. http://www.vistrails.org. [14] Lee, J. A. and Verleysen, M. Nonlinear Dimensionality Reduction. Springer, New York, 2010. [1] Brodlie, K., Brankin, L., Poon, A., Banecki, G., Wright, H. and Gay, A. GRASPARC-A problem solving environment integrating computation and visualization. In Proceedings of IEEE Conference on Visualization (Oct 1993) 102-109. [2] Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York, 2001. [3] IBM. OpenDX. http://www.research.ibm.com/dx. [4] Joliffe, I.T. Principal Component Analysis. Springer, New York, 2002.