An Entropy-Based Approach to Integrated Information Needs Assessment

Similar documents
Fusion Performance Model for Distributed Tracking and Classification

CS 534: Computer Vision Model Fitting

Performance Evaluation of Information Retrieval Systems

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Classifying Acoustic Transient Signals Using Artificial Intelligence

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Analysis of Continuous Beams in General

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Support Vector Machines

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers


The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A Binarization Algorithm specialized on Document Images and Photos

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Support Vector Machines

y and the total sum of

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Problem Set 3 Solutions

The Codesign Challenge

The Research of Support Vector Machine in Agricultural Data Classification

Active Contours/Snakes

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Wishing you all a Total Quality New Year!

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A fault tree analysis strategy using binary decision diagrams

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

CS229 Class Project: Fusion arc treatment planning strategy by adaptive learning cost function based beam selection Ho Jin Kim

Programming in Fortran 90 : 2017/2018

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Binary Neural Decision Table Classifier

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article

Signature and Lexicon Pruning Techniques

Test-Cost Modeling and Optimal Test-Flow Selection of 3D-Stacked ICs

Feature Reduction and Selection

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Simulation Based Analysis of FAST TCP using OMNET++

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Unsupervised Learning

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

X- Chart Using ANOM Approach

Load-Balanced Anycast Routing

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Imperialist Competitive Algorithm with Variable Parameters to Determine the Global Minimum of Functions with Several Arguments

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Classifier Selection Based on Data Complexity Measures *

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

Array transposition in CUDA shared memory

Biostatistics 615/815

S1 Note. Basis functions.

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

TOPOGRAPHIC OBJECT RECOGNITION THROUGH SHAPE

Feature Selection for Target Detection in SAR Images

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

TN348: Openlab Module - Colocalization

CSCI 5417 Information Retrieval Systems Jim Martin!

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Petri Net Based Software Dependability Engineering

High-Boost Mesh Filtering for 3-D Shape Enhancement

Attribute Allocation in Large Scale Sensor Networks

Fitting: Deformable contours April 26 th, 2018

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Yair Weiss. Dept. of Brain and Cognitive Sciences. Massachusetts Institute of Technology. Abstract

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

Module Management Tool in Software Development Organizations

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

ETAtouch RESTful Webservices

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

Cluster Analysis of Electrical Behavior

Adaptive Regression in SAS/IML

Dynamic Pattern Detection with Temporal Consistency and Connectivity Constraints

Load Balancing for Hex-Cell Interconnection Network

Using an Adaptive Neuro-Fuzzy Inference System (AnFis) Algorithm for Automatic Diagnosis of Skin Cancer

Efficient Distributed File System (EDFS)

End-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video

A Robust Method for Estimating the Fundamental Matrix

Summarizing Data using Bottom-k Sketches

How can physicians quantify brain degeneration?

Solving two-person zero-sum game by Matlab

Transcription:

Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology Laboratores Cherry Hll, New Jersey 08002 wfarrell@atl.lmco.com ABSTRACT Wth an overload of sensory nputs, fuson processng must ultmately be scoped based upon the requrements of the consumer of the fused data. Ths dea, called computatonal steerng, allows the fuson system to process only the type of nformaton relevant to the consumer s needs. Smple approaches nclude flterng based on spatal proxmty, latency, and data source. Although these methods are useful, the amount of data left for processng after applyng these flterng methods may stll be enormous. An ntegrated method for assessng the needs of a fuson process s requred to make sure the nformatve data s processed frst. Lockheed Martn Advanced Technology Laboratores s developng an entropy-based approach to dentfyng nformaton needs so that computatonal steerng can be performed n an ntellgent manner. Usng the expected reducton n entropy, data s dynamcally selected for fuson processng. As a result, the maxmum dscrmnaton gan s obtaned each tme data s consumed by the fuson process. Ths entropy based approach ensures that fuson processes are processng the most nformatve, mnmzng the processng of less nformatve data. 1. Motvaton Wth the rapd growth of deployable sensor assets and network connectvty, the battle space has become nundated wth nformaton. As a result, the fuson communty s faced wth ncreasng amounts of heterogeneous nformaton to process. n the past decade, efforts have been made to reduce the computatonal load for Level 1 Fuson systems by processng only the data that s necessary and relevant. n some cases, attempts have been made to quantfy the value of the avalable data so that the most nformatve nformaton s processed frst. Ths concept s called nformaton Needs Assessment. n the Level 1 Fuson context, nformaton Needs Assessment takes the form of real-tme sensor cueng [1,2,3]. The success of Level 1 Fuson approaches to nformaton Needs Assessment has nspred Lockheed Martn Advanced Technology Laboratores (ATL to develop analogous approaches to the Level 2 and 3 Fuson problem space.

Currently, Level 2 and 3 Fuson approaches typcally nvolve the use of nference methods such as Baye s and Belef Networks. Snce these nference technques are computatonally complex and consume enormous amounts of data, nformaton Needs Assessment s crtcal. Ths paper presents an nformaton theoretc approach to nformaton Needs Assessment for use n nference networks. 2. Background Concepts Ths secton ntroduces the defntons and concepts that wll be employed throughout the remander of ths paper. Frst, a quanttatve measure, called dscrmnaton gan, s ntroduced. Secondly, an attrbuteorented database s presented. n Secton 3, both of these nformaton theoretc concepts are combned to derve the nformaton Needs Assessment algorthm. 2.1 Dscrmnaton an n the feld of nformaton Theory and Statstcs, a quantty called the Kullback-Lebler (KL dvergence s often used to assess nformaton content [4]. Ths measure s often referred to as the dscrmnaton gan and generally has two nterpretatons. The frst nterpretaton states that the KL quantfes the amount of nformaton per observaton for dscrmnatng between two hypotheses. The second nterpretaton states that the KL s a measure of the dstance between two probablty densty functons. n ths paper, the second nterpretaton s adopted. The Kullback-Lebler dvergence s gven n dscrete form: D( X X = x x ln Q( x, (1 f we want to determne the dstance between a probablty densty functon that assumes complete statstcal ndependence and one that assumes some degree of statstcal correlaton, Equaton 1 can be used to arrve at: D( X X = x, x x, x ln x x, Ths s called the mutual nformaton between random varables X and X. 2 Equaton 1 and 2 suffer from a bas. n general, the dscrmnaton gan s larger for random varables that have more values. n order to mtgate ths bas, Equaton 2 s normalzed by the Shannon entropy [4]: H ( X, X = x, x ln( x, x, (3 Ths yelds the nformaton an Rato: 2

rato ( X X ( X X ( X, X D = (4 H 2.2 Attrbute-Orented Database Hstorcally, databases have been orented around storage ndexed by data type. For example, the database for a Level 1 Fuson system typcally stores data ndexed by track number. Ths paradgm makes t dffcult to assess the value of nformaton mantaned n the database. As Level 2 and 3 Fuson systems tend to mplement nference technques based upon attrbutes of the data, t s more effectve to store nformaton ndexed by attrbute. For example, nstead of ndexng tracks by track number, the attrbuteorented approach ndexes tracks by the track s attrbutes such as locaton and velocty. The attrbuteorented approach smplfes nformaton Needs Assessment. Fgure 1 llustrates a table n an attrbute-orented database for attrbute. The columns n the table ndcate the data type C k havng attrbute. The rows ndcate the values V (or ranges for attrbute. Fnally, q k ndcates the number of entres wthn the database of type C k havng value V. Class n havng attrbute A Value m of attrbute A V1 V2 V3 Vm Total C C1 q11 q21 q31 qm1 q+1 C2 q12 q22 q32 qm2 q+2 Cn q1n q2n q3n qmn q+n Total V q1+ q2+ q3+ qm+ q++ Number of nstances of class n havng value m for class n havng attrbute A Fgure 1. Example Table n an Attrbute-Orented Database Here, the values V can represent dscrete values, dscrete ranges of contnuous values, or fuzzy sets. As nformaton s entered nto the database, each attrbute table s updated approprately. The values mantaned wthn ths attrbute table are used to compute nformaton an Ratos for nformaton Needs Assessment. 3. nformaton Needs Assessment Ths secton presents an nformaton Needs Assessment (NA algorthm appled to nference networks. ven the current state of an nference network, the best nformaton s dentfed for subsequent processng. Ths process s repeated over tme n an attempt to mnmze the computaton requred n order to evaluaton a target node wthn the nference network. The hgh-level procedure s as follows: 1. Select Target Node: select a node wthn the nference network for updatng 2. Rank Chldren Nodes: determne whch of the target s chldren nodes are the most nformatve 3

3. terate: terate steps 1 and 2 untl an nput node s reached 4. Select nput Node: Select the most nformatve nput node A porton of an nference network (Fgure 2 s used to llustrate the NA algorthm. C E A B Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 2. Porton of an nference Network The frst step n the NA algorthm s to select a target node (C wthn the nference network. The target node may be selected n several ways dependng upon the applcaton. A target node may be drven by a user s request or by consderng a node's confdence. n general, several target nodes may be selected, n whch case a global optmzaton over all target nodes s requred. However, n ths paper, the algorthm s presented for a sngle target node. Once a target node s selected, the chldren nodes (A and B are examned. Frst, the chldren nodes are examned to determne the reachablty of nput nodes. f chldren nodes are reachable by nput nodes wthout data, then these chldren nodes should not be consdered. Wth the remanng chldren nodes, the nformaton an Rato (Equaton 4 s computed par-wse between the target node and each of the chldren nodes. Ths value allows the chldren nodes to be ranked from most nformatve to least nformatve. n general, evaluatng the nformaton an Rato s non-trval. To smplfy the expresson, the law of condtonal probabltes [5] s appled to Equaton 2 as follows:, = a, c a, c ln a c a c c a a ln, a (6 Equaton 6 s not readly computable snce the dstrbuton of a s not known. However, the current state of the entre nference network s known. Usng the current state, backward nference s performed to approxmate the dstrbuton of chldren nodes. f backward propagaton does not lead to actvaton of a chld node, a dffuse pror may be assumed. The dffuse pror assumpton wll tend to favor nodes that haven t been actvated to ones that have been actvated through backward nference. Applyng the estmate of the chld nodes probablty dstrbuton Pˆ, the nformaton an Rato for the target-chld par s gven by: 4

= rato, ( A C a c c a Pˆ( a ln, Pˆ( a c a Pˆ( a ln ( c a Pˆ( a (7 Fgure 3 llustrates steps 1 and 2 of the NA algorthm. A C ( A C rato ( B C rato v B B E Backward nference Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 3. Computng nformaton an Rato for Each Target-Chld Par usng Backward nference Estmates C ( A C rato ( B C rato E New Target Node for Next teraton A rato rato ( B F ( B C + B rato rato ( B ( B C + Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 4. Cumulatve nformaton an Rato Computaton for teratve Branch Selecton Now that the chldren nodes are ranked, the n most nformatve nodes wll be selected as target nodes for further consderaton. As the nformaton an Ratos are computed through the nference network, a 5

cumulatve total s mantaned and subsequent target nodes are selected. Fgure 4 llustrates ths teratve process through the nference network. Contnung the NA process wth step 3 (see above, suppose that node B has the hghest nformaton an Rato and s selected as the new target node. The chldren nodes of node B are now nput nodes. The nformaton an Rato for the target-chld pars s computed usng the quanttes mantaned n the attrbute table of the database (Fgure 1. The nput nodes correspondng to the n hghest cumulatve nformaton an Ratos are selected for processng. The NA approach outlned above s generally ndependent of the type of nference network used. f the network s a Baye s network, for example, the forward and backward condtonal probabltes n Equaton 7 are readly avalable. The NA approach s generally applcable to nference networks that defne both forward and backward nference. 4. Conclusons Ths paper has presented an nformaton Needs Assessment algorthm that s applcable to nference networks. n general, as long as both forward and backward nference computatons are defned, ths algorthm may be appled. Future work n nformaton Needs Assessment for nference networks wll focus on optmzaton over multple target nodes wthn the network. n partcular, lnear programmng may be applcable wth the cumulatve nformaton an Ratos servng as a cost functon. References [1] Sarunc, P.W. Adaptve Varable Update Rate Target Trackng for a Phased Array Radar, EEE nt. Radar Conf., May 1995, pp. 317-322. [2] Overfeld, B., and Fung, R. A Decson Theoretc Sensor Management Archtecture for Advanced Fghter Arcraft, Proc. 9 th Natonal Symposum Sensor Fuson, Mar. 1996, pp. 387-395. [3] Kastella, K. Dscrmnaton an to Optmze Detecton and Classfcaton, EEE Trans. On Systems, Man, and Cybernetcs. Vol. SMC-27, Part A, No. 1, Jan. 1997, pp. 112-116. [4] Kullback, S. nformaton Theory and Statstcs, Dover, 1997. [5] Saeed hahraman. Fundamentals of Probablty, Prentce Hall, 1996. 6