An Entropy-Based Approach to Integrated Information Needs Assessment

Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology Laboratores Cherry Hll, New Jersey 08002 wfarrell@atl.lmco.com ABSTRACT Wth an overload of sensory nputs, fuson processng must ultmately be scoped based upon the requrements of the consumer of the fused data. Ths dea, called computatonal steerng, allows the fuson system to process only the type of nformaton relevant to the consumer s needs. Smple approaches nclude flterng based on spatal proxmty, latency, and data source. Although these methods are useful, the amount of data left for processng after applyng these flterng methods may stll be enormous. An ntegrated method for assessng the needs of a fuson process s requred to make sure the nformatve data s processed frst. Lockheed Martn Advanced Technology Laboratores s developng an entropy-based approach to dentfyng nformaton needs so that computatonal steerng can be performed n an ntellgent manner. Usng the expected reducton n entropy, data s dynamcally selected for fuson processng. As a result, the maxmum dscrmnaton gan s obtaned each tme data s consumed by the fuson process. Ths entropy based approach ensures that fuson processes are processng the most nformatve, mnmzng the processng of less nformatve data. 1. Motvaton Wth the rapd growth of deployable sensor assets and network connectvty, the battle space has become nundated wth nformaton. As a result, the fuson communty s faced wth ncreasng amounts of heterogeneous nformaton to process. n the past decade, efforts have been made to reduce the computatonal load for Level 1 Fuson systems by processng only the data that s necessary and relevant. n some cases, attempts have been made to quantfy the value of the avalable data so that the most nformatve nformaton s processed frst. Ths concept s called nformaton Needs Assessment. n the Level 1 Fuson context, nformaton Needs Assessment takes the form of real-tme sensor cueng [1,2,3]. The success of Level 1 Fuson approaches to nformaton Needs Assessment has nspred Lockheed Martn Advanced Technology Laboratores (ATL to develop analogous approaches to the Level 2 and 3 Fuson problem space.

Currently, Level 2 and 3 Fuson approaches typcally nvolve the use of nference methods such as Baye s and Belef Networks. Snce these nference technques are computatonally complex and consume enormous amounts of data, nformaton Needs Assessment s crtcal. Ths paper presents an nformaton theoretc approach to nformaton Needs Assessment for use n nference networks. 2. Background Concepts Ths secton ntroduces the defntons and concepts that wll be employed throughout the remander of ths paper. Frst, a quanttatve measure, called dscrmnaton gan, s ntroduced. Secondly, an attrbuteorented database s presented. n Secton 3, both of these nformaton theoretc concepts are combned to derve the nformaton Needs Assessment algorthm. 2.1 Dscrmnaton an n the feld of nformaton Theory and Statstcs, a quantty called the Kullback-Lebler (KL dvergence s often used to assess nformaton content [4]. Ths measure s often referred to as the dscrmnaton gan and generally has two nterpretatons. The frst nterpretaton states that the KL quantfes the amount of nformaton per observaton for dscrmnatng between two hypotheses. The second nterpretaton states that the KL s a measure of the dstance between two probablty densty functons. n ths paper, the second nterpretaton s adopted. The Kullback-Lebler dvergence s gven n dscrete form: D( X X = x x ln Q( x, (1 f we want to determne the dstance between a probablty densty functon that assumes complete statstcal ndependence and one that assumes some degree of statstcal correlaton, Equaton 1 can be used to arrve at: D( X X = x, x x, x ln x x, Ths s called the mutual nformaton between random varables X and X. 2 Equaton 1 and 2 suffer from a bas. n general, the dscrmnaton gan s larger for random varables that have more values. n order to mtgate ths bas, Equaton 2 s normalzed by the Shannon entropy [4]: H ( X, X = x, x ln( x, x, (3 Ths yelds the nformaton an Rato: 2

rato ( X X ( X X ( X, X D = (4 H 2.2 Attrbute-Orented Database Hstorcally, databases have been orented around storage ndexed by data type. For example, the database for a Level 1 Fuson system typcally stores data ndexed by track number. Ths paradgm makes t dffcult to assess the value of nformaton mantaned n the database. As Level 2 and 3 Fuson systems tend to mplement nference technques based upon attrbutes of the data, t s more effectve to store nformaton ndexed by attrbute. For example, nstead of ndexng tracks by track number, the attrbuteorented approach ndexes tracks by the track s attrbutes such as locaton and velocty. The attrbuteorented approach smplfes nformaton Needs Assessment. Fgure 1 llustrates a table n an attrbute-orented database for attrbute. The columns n the table ndcate the data type C k havng attrbute. The rows ndcate the values V (or ranges for attrbute. Fnally, q k ndcates the number of entres wthn the database of type C k havng value V. Class n havng attrbute A Value m of attrbute A V1 V2 V3 Vm Total C C1 q11 q21 q31 qm1 q+1 C2 q12 q22 q32 qm2 q+2 Cn q1n q2n q3n qmn q+n Total V q1+ q2+ q3+ qm+ q++ Number of nstances of class n havng value m for class n havng attrbute A Fgure 1. Example Table n an Attrbute-Orented Database Here, the values V can represent dscrete values, dscrete ranges of contnuous values, or fuzzy sets. As nformaton s entered nto the database, each attrbute table s updated approprately. The values mantaned wthn ths attrbute table are used to compute nformaton an Ratos for nformaton Needs Assessment. 3. nformaton Needs Assessment Ths secton presents an nformaton Needs Assessment (NA algorthm appled to nference networks. ven the current state of an nference network, the best nformaton s dentfed for subsequent processng. Ths process s repeated over tme n an attempt to mnmze the computaton requred n order to evaluaton a target node wthn the nference network. The hgh-level procedure s as follows: 1. Select Target Node: select a node wthn the nference network for updatng 2. Rank Chldren Nodes: determne whch of the target s chldren nodes are the most nformatve 3

3. terate: terate steps 1 and 2 untl an nput node s reached 4. Select nput Node: Select the most nformatve nput node A porton of an nference network (Fgure 2 s used to llustrate the NA algorthm. C E A B Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 2. Porton of an nference Network The frst step n the NA algorthm s to select a target node (C wthn the nference network. The target node may be selected n several ways dependng upon the applcaton. A target node may be drven by a user s request or by consderng a node's confdence. n general, several target nodes may be selected, n whch case a global optmzaton over all target nodes s requred. However, n ths paper, the algorthm s presented for a sngle target node. Once a target node s selected, the chldren nodes (A and B are examned. Frst, the chldren nodes are examned to determne the reachablty of nput nodes. f chldren nodes are reachable by nput nodes wthout data, then these chldren nodes should not be consdered. Wth the remanng chldren nodes, the nformaton an Rato (Equaton 4 s computed par-wse between the target node and each of the chldren nodes. Ths value allows the chldren nodes to be ranked from most nformatve to least nformatve. n general, evaluatng the nformaton an Rato s non-trval. To smplfy the expresson, the law of condtonal probabltes [5] s appled to Equaton 2 as follows:, = a, c a, c ln a c a c c a a ln, a (6 Equaton 6 s not readly computable snce the dstrbuton of a s not known. However, the current state of the entre nference network s known. Usng the current state, backward nference s performed to approxmate the dstrbuton of chldren nodes. f backward propagaton does not lead to actvaton of a chld node, a dffuse pror may be assumed. The dffuse pror assumpton wll tend to favor nodes that haven t been actvated to ones that have been actvated through backward nference. Applyng the estmate of the chld nodes probablty dstrbuton Pˆ, the nformaton an Rato for the target-chld par s gven by: 4

= rato, ( A C a c c a Pˆ( a ln, Pˆ( a c a Pˆ( a ln ( c a Pˆ( a (7 Fgure 3 llustrates steps 1 and 2 of the NA algorthm. A C ( A C rato ( B C rato v B B E Backward nference Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 3. Computng nformaton an Rato for Each Target-Chld Par usng Backward nference Estmates C ( A C rato ( B C rato E New Target Node for Next teraton A rato rato ( B F ( B C + B rato rato ( B ( B C + Prevously Actvated Node Target Node Chld Nodes nput Node F Fgure 4. Cumulatve nformaton an Rato Computaton for teratve Branch Selecton Now that the chldren nodes are ranked, the n most nformatve nodes wll be selected as target nodes for further consderaton. As the nformaton an Ratos are computed through the nference network, a 5

cumulatve total s mantaned and subsequent target nodes are selected. Fgure 4 llustrates ths teratve process through the nference network. Contnung the NA process wth step 3 (see above, suppose that node B has the hghest nformaton an Rato and s selected as the new target node. The chldren nodes of node B are now nput nodes. The nformaton an Rato for the target-chld pars s computed usng the quanttes mantaned n the attrbute table of the database (Fgure 1. The nput nodes correspondng to the n hghest cumulatve nformaton an Ratos are selected for processng. The NA approach outlned above s generally ndependent of the type of nference network used. f the network s a Baye s network, for example, the forward and backward condtonal probabltes n Equaton 7 are readly avalable. The NA approach s generally applcable to nference networks that defne both forward and backward nference. 4. Conclusons Ths paper has presented an nformaton Needs Assessment algorthm that s applcable to nference networks. n general, as long as both forward and backward nference computatons are defned, ths algorthm may be appled. Future work n nformaton Needs Assessment for nference networks wll focus on optmzaton over multple target nodes wthn the network. n partcular, lnear programmng may be applcable wth the cumulatve nformaton an Ratos servng as a cost functon. References [1] Sarunc, P.W. Adaptve Varable Update Rate Target Trackng for a Phased Array Radar, EEE nt. Radar Conf., May 1995, pp. 317-322. [2] Overfeld, B., and Fung, R. A Decson Theoretc Sensor Management Archtecture for Advanced Fghter Arcraft, Proc. 9 th Natonal Symposum Sensor Fuson, Mar. 1996, pp. 387-395. [3] Kastella, K. Dscrmnaton an to Optmze Detecton and Classfcaton, EEE Trans. On Systems, Man, and Cybernetcs. Vol. SMC-27, Part A, No. 1, Jan. 1997, pp. 112-116. [4] Kullback, S. nformaton Theory and Statstcs, Dover, 1997. [5] Saeed hahraman. Fundamentals of Probablty, Prentce Hall, 1996. 6