Research on Incomplete Transaction Footprints in Networked Software

Similar documents
A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Sequences Modeling and Analysis Based on Complex Network

Research on Community Structure in Bus Transport Networks

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Towards New Heterogeneous Data Stream Clustering based on Density

Image segmentation based on gray-level spatial correlation maximum between-cluster variance

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

Web Data mining-a Research area in Web usage mining

Analysis and Improvement of Encryption Algorithm Based on Blocked and Chaotic Image Scrambling

A Comparative Study of Association Mining Algorithms for Market Basket Analysis

Research Article Apriori Association Rule Algorithms using VMware Environment

Jessica Su (some parts copied from CLRS / last quarter s notes)

Intuitionistic Fuzzy Petri Nets for Knowledge Representation and Reasoning

A Test Sequence Generation Method Based on Dependencies and Slices Jin-peng MO *, Jun-yi LI and Jian-wen HUANG

An Indian Journal FULL PAPER. Trade Science Inc.

Lecture 11: Maximum flow and minimum cut

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

An Agricultural Tri-dimensional Pollution Data Management Platform Based on DNDC Model

Framework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm

Geometric-Edge Random Graph Model for Image Representation

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

Advanced Data Management

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

An Automatic Posture Planning Software of Arc Robot Based on SolidWorks API

manufacturing process.

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

A hierarchical network model for network topology design using genetic algorithm

Optik 124 (2013) Contents lists available at SciVerse ScienceDirect. Optik. jou rnal homepage:

Forward-backward Improvement for Genetic Algorithm Based Optimization of Resource Constrained Scheduling Problem

Block-based Thiele-like blending rational interpolation

An Adaptive Threshold LBP Algorithm for Face Recognition

and 6.855J March 6, Maximum Flows 2

Application of Nonlinear Later TV Edition in Gigabit Ethernet. Hong Ma

A New Evaluation Method of Node Importance in Directed Weighted Complex Networks

An Evolving Network Model With Local-World Structure

Study on GA-based matching method of railway vehicle wheels

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Reliability Problem on All Pairs Quickest Paths

Landslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis.

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

Study on Improving the Quality of Reconstructed NURBS Surfaces

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

THE REGULAR PERMUTATION SCHEDULING ON GRAPHS

A New Distance Independent Localization Algorithm in Wireless Sensor Network

A Graph Clustering Algorithm Based on Minimum and Normalized Cut

CMPSCI611: The SUBSET-SUM Problem Lecture 18

Job-shop scheduling with limited capacity buffers

Segmentation: Clustering, Graph Cut and EM

UML-Based Conceptual Modeling of Pattern-Bases

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

Branch-and-bound: an example

Mathematical Analysis of Google PageRank

Research Article Optimization of Access Threshold for Cognitive Radio Networks with Prioritized Secondary Users

Automatic Clustering Old and Reverse Engineered Databases by Evolutional Processing

Graphs and Network Flows IE411. Lecture 13. Dr. Ted Ralphs

A Note on Polyhedral Relaxations for the Maximum Cut Problem

Study on Loading-Machine Production Scheduling Algorithm of Forest-Pulp-Paper Enterprise

An Information Hiding Algorithm for HEVC Based on Angle Differences of Intra Prediction Mode

CHAOTIC ANT SYSTEM OPTIMIZATION FOR PATH PLANNING OF THE MOBILE ROBOTS

LEAST COST ROUTING ALGORITHM WITH THE STATE SPACE RELAXATION IN A CENTRALIZED NETWORK

Artificial Intelligence

An Agricultural Tri-dimensional Pollution Data Management Platform based on DNDC Model

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

Grid Resources Search Engine based on Ontology

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

The Gene Modular Detection of Random Boolean Networks by Dynamic Characteristics Analysis

A SDN-like Loss Recovery Solution in Application Layer Multicast Wenqing Lei 1, Cheng Ma 1, Xinchang Zhang 2, a, Lu Wang 2

Research Article QOS Based Web Service Ranking Using Fuzzy C-means Clusters

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Flow equivalent trees in undirected node-edge-capacitated planar graphs

An embedded system of Face Recognition based on ARM and HMM

Efficient Algorithm for the Paired-Domination Problem in Convex Bipartite Graphs

Controllability of Complex Power Networks

The Key Technology and Algorithm Design for the Development of Intelligent Examination System

Binary Relations McGraw-Hill Education

Construction and Application of Cloud Data Center in University

AN EFFICIENT DEADLOCK DETECTION AND RESOLUTION ALGORITHM FOR GENERALIZED DEADLOCKS. Wei Lu, Chengkai Yu, Weiwei Xing, Xiaoping Che and Yong Yang

Tasks Scheduling using Ant Colony Optimization

UAV Motion-Blurred Image Restoration Using Improved Continuous Hopfield Network Image Restoration Algorithm

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

Definition 2.3: [5] Let, and, be two simple graphs. Then the composition of graphs. and is denoted by,

Detect tracking behavior among trajectory data

1. Lecture notes on bipartite matching

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

COMP 251 Winter 2017 Online quizzes with answers

A Row-and-Column Generation Method to a Batch Machine Scheduling Problem

Ontology Molecule Theory-based Information Integrated Service for Agricultural Risk Management

732A54/TDDE31 Big Data Analytics

Graph Algorithms (part 3 of CSC 282),

A new fractal algorithm to model discrete sequences

Design of Liquid Level Control System Based on Simulink and PLC

Constructive and destructive algorithms

Computational problems. Lecture 2: Combinatorial search and optimisation problems. Computational problems. Examples. Example

Transcription:

Research Journal of Applied Sciences, Engineering and Technology 5(24): 5561-5565, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 30, 2012 Accepted: December 12, 2012 Published: May 30, 2013 Research on Incomplete Transaction Footprints in Networked Software 1 Junfeng Man, 2 Cheng Peng, 3 Qianqian Li and 1 Changyun Li 1 College of Computer and Communication, Hunan University of Technology, Zhuzhou 412007, 2 School of Information Science and Engineering, Central South University, Changsha 410083, 3 Department of Information and Engineer, Hunan Chemical Vocation Technology College, Zhuzhou 412004, China Abstract: In networked software, interactive behaviors of software entities generate lots of behavioral footprints, some of them may lose tokens or tokens are useless. The paper studies the constructing process of State Transition Model (STM) in which the process of incomplete transactions are satisfied with Markov property, it is pointed that the STM are originally extracted from behavior log files generated by the runtime behaviors of networked software. The contributions of this paper are to mark the partly tokenized behavior footprints in the STM through Maximum Flow (MF) algorithm, then find the original source for each behavior footprint. The experiment results indicate that the maximum flow algorithm can accurately turn the partly tokenized behavior into complete footprint sequences. Keywords: Behavior footprints, incomplete transaction maximum flow algorithm, networked software, state transition model INTRODUCTION The emergence of the Internet has made software from the static, closed and controllable environment into open, dynamic and uncontrollable condition, in order to adapt to this trend, traditional centralized software system gradually turns into the distributed networked software (Yang et al., 2002); the dynamic changes of self state and interactive environment of the runtime networked software is called dynamic evolution. Software dynamic evolution is the process of online adjustment in order to reach the hope state, which happens in runtime, consists of a series of complex activities, e.g., dynamic updating, adding or deleting software components, system structure dynamic configuration etc (Kramer and Magee, 1990). However, we only consider the changes of software entities structure, seldom notice the running state of software entities as individuals in interactive activities and the interaction behaviors they generate. At present, the software failure and fault are becoming more and more serious, not only increase the cost of use and maintenance, but also make people lost the confidence for software (Chen et al., 2003). The recent studies show that fault and failure of software are largely due to the software behaviors can not meet the user expectation. We can get the original message of software behavior (Li et al., 2009) through monitoring, but current monitoring technology cannot enhance the software reliability at once, still need further effectively analysis of the software interaction. Generally, we can use the correlator which is the open-group ARM instrumentation or token to track transaction circulation. But, many transactions are often executed in parallel in networked environment, such events generated by these transactions mix together, which results in the tokens of transaction behavior footprints lost or can not be used, thus software system may not accurately and correctly identify the only source of every footprint. These partly tokenized behavior footprints cause great obstacles for subsequent behavior analysis, behavior prediction and other operation. Therefore, tokenizing the interaction entity behavior is in the important position for guaranteeing the reliability of networked software. STATE TRANSACTION MODEL The concept and definition: Let f x (x) be the Probability Density Function (PDF) of a continuous random variable X and FX ( x): = PX [ > x] its complementary cumulative distribution function (CCDF). For an undirected graph G(V, E), if V = X Y and X Y = Φ, such that every edge E connects a vertex in X to one in Y, G is bigraph, denoted as G = (X, Y, E). For a directed acyclic graph G = (V, E, C), V denotes vertex set, E denotes directed edge set and Corresponding Author: Junfeng Man, College of Computer and Communication, Hunan University of Technology, Zhuzhou 412007, China 5561

Fig. 1: Discover state transition model denotes capacity, then G is called network flow graph (Zhang et al., 2003). If C ij denotes the allowed maximum flow from V i to V j, x ij denotes the real flow at this arc, there is 0 x ij C ij. In the network, except for the start and the end vertices, any other vertices, the total of inflow is equal to the total of the outflow, namely, x ij = x ji, i st,. At this time, f = x ij is known as feasible flow. The maximum network flow can be described as: The monitoring information is inputs for: Online system log files, where footprints left by ongoing transaction instances are being recorded The transaction model discovered through offline processing. Using these inputs, the monitoring system generates dynamic execution profiles of ongoing transaction instances that allow their status to be tracked at individual and aggregate levels. x ji x max fs. t. j i 0 xij Cij ij f, i = s = 0, i s, t f, i = t A Markov chain is a sequence of random variables X 1, X 2, X 3,... with the Markov property, namely that, given the present state, the future and past states are independent (Lee and Ho, 2006). Let S i, I = 0,, N s denote the i th state of the process and let T i,j denote the (random) time to transition from state S i to state S j. A process is said to be semi-markov if the sequence of states visited is a Markov chain, with transition probability matrix P = [P(i, j)] and each transition time T i, j is a random variable that depends only on the states S i and S j involved in the transition. (1) Discover state transition model: Figure 1 depicts the process of discovering state transition model from the history log files. Our approach consists of two phases: off-line processing and online monitoring, the latter refers to on online monitoring of transaction instances. 5562 State creation: Analyzing the records in the log files and creating candidate states will typically involve some form of clustering. For each new log record to be processed, the record is tokenized and placed in a bucket of records with the same number of words; we refer to this number as the dimensionality n of the record. This process creates a partitioning of the sample space of log records into sample subspaces Ω n n = 1,, N, of log records. As shown in Fig. 2. Model discovery: State transition follows Markov, that s to say, the transition probability of each transaction instance under each state, only depends on the current state and the next state. As shown in Fig. 3, except the behavior footprints at state A has tokens (express by different pictures), others behavior footprints at the intermediate state only partly tokenized, the aim of the paper is to find the original source for each behavior footprint that lost token. The transition model is a directed acyclic graph (DAG), which consists of a start states set and an end states set. At start state, the transaction goes into the model; at the end state, the transaction quits the model. The transition time between each state is Independent

Fig. 2: State creation Fig. 3: The instance of state transition model and Identically Distributed (I.I.D.). Each state S k in the model, any valid matching can be denoted by the set of permutation vector π k. Set Y k denote footprint timestamp vector of state S k. When the monitoring engine received behavior footprints by the correct time order, Y k ordered by the ascending, so the latest events arranged at the rear. According to permutation vector π k, let Y πk k denote the permutation of Y k. For convenience, at start state, we assign tokens to footprints Y 0. When the Probability Density Function (PDF) of transaction transition time f T is known, we could transfer the problem of quantitative tracking into figure out the probability of all instances properly match their footprints by MLR. So, for N S+1 states, transaction ML tracing simplified as finding the N s sets of permutation vector: set of successors, So the multi-state system model described in the formula (1) can be transformed in: π π N S (,..., Y Y ) P Y 1 1 NS 0 = m> 0 P π k Yk Sk Bm π l l S P( B ) l Y m (3) Here define a partition of states (B 1,, B m ), m 0 and B 0 = S 0. States in any two sets in the partition do not share a common immediate predecessor, it is p(s k ) p(s l ) = Φ, Sk Bm, Sl B j, m, j 0. So, bipartite graph can be described as (p(b m ), B m ), where start state is p(b m ), end state is B m. According to the definition of state partition, all bipartite graphs are disjoint, a state cannot occur in two systems. arg max ML ML π π 1 N s 1 N PY s 1 YN Y s 0 π1,..., πn s [ ˆˆ π,..., π ]: = (,..., ) (2) IDENTIFY PARTLY TOKENIZED TRANSACTION As the transition time of each transaction is Semi- Markov Process (SMP), the transition time of different transactions is also independent, that is to say, the transition time between states only depends on the current state and the next state, according to this, we can splice the multi-state system into two parts: the first one is the set of predecessors and the other one is the 5563 It s easy to prove that the maximal matching of bigraph G = (X, Y, E). is corresponding to the maximal network flow of graph GG = (s, t, X, Y, E, C)., C is capacity set, as is shown in Fig. 4. When the network flow achieve maximum, if the capacity at (x i, y j ) is one, which means x i and y j is the perfect matching. We

s X Y t {v outi /i = 1, n} when departing state and node set V in = {v ini /i = 1, n} when entering state, n is the total number of all behavior footprints in a single state set of bigraph subsystem. For any node v outi and v in j, if satisfied with the transition time limit between two states, which means footprint i and footprint j form the matching, link the node v outi and v in,j with directed arc, we get connection arc set: E c { e e = ( v, v ), t t t t, i, j i j} =, ij (4) ij outi inj min inj outi max Fig. 4: Transfer bipartite into network flow diagram adopt the algorithm of Ford-Fulkerson to solve the problem of bigraph maximal matching, this algorithm is introduced in literature 9. Note the start time and end time in the two state sets of bigraph subsystem as t out and t in separately and expressed by node v out and v in, we get node set V out = Combining with the three bipartite subsystems and Table 1 mentioned above, we construct three network flow diagrams. Constructing arc sets E c according to formula (4), the state transition time lower limit is t min = 5sec, upper limit is t max, we calculate the state transition time obey index distribution, uniform distribution and normal distribution situation, finally, we get a network flow diagram corresponding to the bipartite matching. Table 1: Footprints and corresponding timestamp Footprints ----------------------------------------------------------------------------------------------------------------------------------------------------------- Time stamp TP01 TP02 TP03 TP04 TP05 TP06 TP07 TP08 Start time 204018 204015 204010 204032 204035 204030 204208 204210 End time 204030 204023 204020 204140 204130 204120 204220 204221 TP09 TP10 TP11 TP12 TP13 TP14 TP015 TP16 Start time 204230 204233 204218 204410 204400 204603 204340 204610 End time 204340 204322 204350 204523 204510 204710 204520 204715 Fig. 5: Maximum network flow matching 5564

instance under networked environment, combined with monitored log files from the interaction, then remark the partly tokenized footprints generated by each state. Using the state splitting algorithm to divide the state transition model into several bigraph subsystems and transfer each subsystem into network flow graph, matching every footprint under each state, finally, forming the complete footprint sequences. The results of simulation analysis show that our method is feasible. ACKNOWLEDGMENT Fig. 6: Algorithm performance comparison We adopt the network flow method to deal with the matching of the bipartite subsystems in the state transition model; finally, splice the matching results of the three subsystems and form three complete behavior footprint sequences, FP01-FP05-FP08-FP10-FP15, FP02-FP06- FP11-FP12-FP14 and FP03-FP04-FP07- FP09-FP13-FP16. In this way, we find the original source of the footprints that are not tokenized. The simulation experiment results are showed in Fig. 5. The best time complexity is O(n(m+n log n)), which is strongly polynomial algorithm, n and m denote nodes and edges separately. By adopting maximum weight matching algorithm, no matter the time complexity and space complexity can get the good results. In this paper, we use the algorithm of maximum network flow, the time complexity is O(nm), the two algorithms performance comparison is showed in Fig. 6. Under the same experiment environment, with the nodes in the state transition model increasing, the algorithm we proposed is better than the traditional bipartite maximum weight matching algorithm in dealing with the problem of tokenizing partly tokenized behavior footprints. CONCLUSION This study researches on the problem of extracting state transition model from E-commerce trade platform This study is supported by the National Natural Science Foundation of China under grant No. 61171192 and 61170102, the Natural Science Foundation of Hunan province in China under grant No. 11JJ4050 and 11JJ3070, the Education Department Foundation of Hunan Province under the grant No. 11B039, 11W002 and 11C0400. REFERENCES Chen, H.W., J. Wang and W. Dong, 2003. High confidence software engineering technologies [J]. Acta Electron. Sinica, 31(12): 1933-1938. Kramer, J. and J. Magee, 1990. The evolving philosophers problem: Dynamic change management [J]. IEEE T. Software Eng., 16(11): 1-33. Lee, C.K.M. and G.T.S. Ho, 2006. A dynamic information schema for supporting product life cycle management [J]. Expert Syst. Appl., 31(1): 30-40. Li, R.J., Z.X. Zhang, H.Y. Jiang and H.M. Wang, 2009. Research and implementation of trusted software constitution based on monitoring [J]. Appl. Res. Comp., 26(21): 4585-4588. Yang, F.Q., H. Mei, J. Lv and Z. Jin, 2002. Some discussion on the development of software technology [J]. Acta Electron. Sinica, 30(z1): 1901-1906. Zhang, X.C., G.L. Chen and Y.Y. Wan, 2003. Research on the maximum network flow problem [J]. J. Comput. Res. Dev., 40(9): 1281-1291. 5565