Quality-of-Content (QoC)-Driven Rate Allocation for Video Analysis in Mobile Surveillance Networks

Quality-of-Content (QoC)-Diven Rate Allocation fo Video Analysis in Mobile Suveillance Netwoks Xiang Chen, Jenq-Neng Hwang, Kuan-Hui Lee, Ricado L. de Queioz Depatment of Electical Engineeing, Univesity of Washington, Seattle, WA 9895, USA. Email: {xchen28, hwang, ykhlee}@uw.edu Depatment of Compute Science, Univesidade de Basilia, Basilia, Bazil. Email: queioz@ieee.og Abstact Nowadays, moe and moe videos ae tansmitted fo video analytics puposes athe than human peceptions. In mobile suveillance netwoks, a cloud seve collects videos deliveed fom multiple moving cameas and detects suspicious people in all the camea views. Howeve, all the videos ecoded by moving cameas such as phone o dash cameas ae uploaded though bandwidth-limited wieless netwoks. Theefoe, videos ae equied to be encoded with high compession atio to satisfy the total ate constaint, which may affect the video analyses (e.g., human detection/tacking and action ecognition, etc.) pefomance due to the degaded video decoding qualities at the seve side. In this pape, we popose an effective contentdiven video souce coding ate allocation scheme, which can impove the human detection success ate in mobile suveillance netwoks unde a total ate constaint. The poposed scheme allocates appopiate amount of ate to each moving camea based on the coesponding content infomation (i.e., human detection esults). A model of human detection accuacy based on object aea and video quality is povided. The ate allocation poblem is fomulated as a convex optimization poblem and can be solved by standad solves. Simulations with eal video sequences demonstate the effectiveness of ou poposed scheme. Keywods ate allocation; video analysis; human detection; visual suveillance; convex optimization I. INTRODUCTION The apidly inceasing demand of video steaming applications has boosted the development of wieless video tansmission technologies [], [2]. As pedicted in [3], 72 pecent of all consume mobile Intenet taffic will be mobile video in 29, up fom 55 pecent in 24. Futhemoe, mobile taffic will exponentially incease between 24 and 29, epesenting a 57 pecent of compound annual gowth ate (CAGR), which is about thee times faste than fixed IP taffic. Due to the bandwidth-limited natue of wieless channels, it is cucial to design efficient wieless video tansmission schemes fo the bandwidth-consuming eal-time video steaming sevices [4]. In taditional wieless video tansmission eseach, the optimization citeia ae eithe quality-of-sevice (QoS) based This study is conducted unde the 3-EC-7-A-3-S-24 poject fom the Ministy of Economic Affais (MOEA) of Taiwan and Advanced Wieless Boadband System and Inte-netwoking Application Technology Development Poject of the Institute fo Infomation Industy which is subsidized by the Ministy of Economy Affais of Taiwan. design [], o quality-of-expeience (QoE) based design [5] [9]. Fo QoS-based design, netwok paametes such as packet loss, delay, jitte, etc. ae jointly consideed in ode to impove the video steaming applications fom a netwok pespective. Fo QoE-based design, the use peception and expeience of decoded videos ae combined with the QoS paametes so that video tansmission paametes can be adjusted to impove uses satisfaction []. Both subjective and objective video quality measuements have been developed to quantify the QoE-based system design []. Although most of video tansmission sevices ae designed fo human peceptions, moe and moe video steaming ae collected fo video analytics puposes. In [2], authos developed a vehicle tacking system with static suveillance cameas. In [3], a live fish tacking system is developed based on low-contast and low-fame-ate steeo videos. Based on human detectos, pedestian tacking systems in single moving camea ae developed [4], [5]. Moeove, a system of onoad pedestian tacking acoss multiple diving ecodes fo mobile suveillance netwok is poposed in [6]. Most existing human-peception-based (QoE-based) wieless video tansmission designs may not be optimal fo video analytics puposes. Theefoe, it is necessay to develop moe efficient video tansmission schemes fo suveillance and compute vision applications. As intelligent suveillance systems become moe and moe impotant fo cime investigation and tagedy pevention, mobile suveillance netwoks with multiple moving cameas, which have moe flexible camea views compaing to taditional suveillance systems with static cameas, have thus been intoduced [6]. As indicated in [6], videos ae ecoded by diving ecodes (dash cameas) and uploaded to emote cloud seves fo futhe automatic analyses. Due to the mobility natue of moving cameas, wieless wide aea netwoks (WWAN) have to be used fo video tansmissions, whee efficient ate allocation is necessay because of the limited wieless esouces. Among diffeent applications in intelligent mobile suveillance netwoks, such as human tacking, action ecognition, behavio undestanding, etc., human detection is the fist step and its esult will citically affect the pefomance of othe

human-elated video analysis applications [6]. In [7], [8], image/video featues instead of the full video sequences ae uploaded to the cloud seves fo video analyses. Although tansmitting featues can save lots of wieless esouces, they ae not suitable fo suveillance puposes since the full video sequences ae equied to be achived in the seve fo futue investigations. In [9], authos poposed a saliency-based ate contol fo human detection with a single camea. Based on a popely designed saliency map, this scheme adaptively adjusts the quantization paametes (s) to peseve egions with small contast fom excessive smoothing so that the human detection accuacy can be impoved. In this pape, we popose a quality-of-content (QoC)-diven video souce coding ate allocation scheme fo human detection in the mobile suveillance netwoks with multiple moving cameas. Instead of consideing human peception in taditional video steaming design, the poposed scheme maximizes the oveall human detection accuacy at the emote seve when multiple moving cameas upload videos via WWAN with a total ate constaint. We analytically evaluate the factos that affect the pobability of successful human detections and popose a video souce coding ate allocation algoithm based on the human detection esults in the past goup of pictues (GoP). To the best of ou knowledge, thee is no existing QoCdiven wok conducted in video encoding ate allocation fo human detections in mobile suveillance system when multiple moving cameas compete fo the limited wieless esouces. The est of this pape is oganized as follows. In Section II, we will descibe the scenaio and system stuctue of mobile suveillance netwok. In Section III, evaluation of the factos that affect the successful human detections is povided. Section IV gives the poposed video souce coding ate adaptation algoithm. Simulation esults ae shown in Section V, followed by the conclusion emaks in Section VI. II. SCENARIOS AND SYSTEM STRUCTURE As shown in Fig., a mobile suveillance netwok consists of multiple moving cameas (mobile nodes) such as dash cameas and smatphone cameas, which ae andomly distibuted and moving aound in the aeas with diffeent pedestian densities. Each camea can encode and upload videos via a WWAN to a emote cloud seve in eal time fo futhe video analyses, such as human detection. The system stuctue is shown in Fig. 2, whee captued camea views ae encoded with the high efficiency video coding (HEVC) [2] with diffeent encoding ates. To educe the cost and computational complexity on each mobile node, human detection is pefomed in the cloud seve. Afte human detection is pefomed, an upload scheduling and esouce allocation module collects the human detection esults (contents) and assigns diffeent souce encoding ate taget to each camea. The oveall encoding ate is constained such that the tansmission can be bette suppoted by the netwok. Theefoe, the human detection accuacy is only elated to the souce coding ate allocation. In this wok, we assume all the video analyses ae conducted in the cloud seve. Not mobile node Video encode Human detection Mobile node Content info. seve wieless netwok Fig.. Scenaio of mobile suveillance netwok. Paamete estimation Video encode Human detection Mobile node Paamete estimation Wieless netwok Content info. Joint optimal ate allocation Video encode Human detection Cloud seve Fig. 2. Poposed system stuctue. Mobile node Paamete estimation mobile node Taget encoding ates Content info. only can it achive videos in the cloud seve fo futhe investigations, it can save computational cost and powe at mobile nodes as well, especially fo smatphone cameas. III. EFFECT OF VIDEO QUALITY ON HUMAN DETECTOR Many human detectos have been poposed in the liteatues. In [2], a human detecto, which can effectively epesent the shape of human, has been poposed based on the histogam of oiented gadient (HOG) featues. The implicit shape model (ISM) poposed in [22] applies a voting scheme based on multi-scale inteest points to ceate plenty of detection hypotheses, and a codebook is used to peseve the tained featues. The defomable pat model (DPM) [23], an extension of the idea in [2], uses a oot model and seveal pat models to descibe diffeent patitions of an object. Based on a pedefined geomety, the pat models ae spatially connected with the oot model so that the object can be pecisely depicted. Among diffeent human detectos, the DPM is a well-accepted obust and computational efficient scheme. Theefoe, we adopt the DPM as the human detection scheme in this pape. But simila concept can be applied to othe detection schemes. The DPM object detecto is based on HOG featues, which can be affected by the atifacts ceated fom video encode

Fig. 3. Human detection esult of DPM. Video clip: BAHNHOF in the ETHZ set [25]. Left: =5; Right: =39 Detection Accuacy.8.6.4.2 6 x 4 4 2 Object Aea (pixels) Fig. 4. Human detection accuacy with diffeent object aeas and s. with diffeent compession atios [9]. Theefoe, the eceived video quality will affect the detection pefomance in the cloud seve. Figue 3 shows a compaison of the DPM detection esults with two diffeent video encoding qualities in tems of diffeent s. When the video quality is poo, smalle objects in the view have lowe pobability to be successfully detected compaed to the lage objects in the view since a lage may smooth out the detailed shape infomation of smalle objects. Figue 4 illustates the human detection accuacy with diffeent object aeas (in tems of numbe of pixels) and s of HEVC encode [24]. Six video clips in ETHZ set [25] ae tested and each video is encoded with diffeent s fom 5 to 45. The detection esults ae compaed with the gound-tuth coodinate labels of each object in the set. If the ovelapped aea of the detection esult and the gound-tuth is lage than 5 pecent of the gound-tuth aea, the detected object is egaded as a successful detection [23]. The detection accuacy of a specific object aea a is calculated by counting all the tue-positive detected objects whose aeas ae lage than this specific value a and divided by the total numbe of objects whose aea is lage than this value a. Accoding to Fig. 4, the detection accuacy inceases with bette video fame quality (smalle ) and lage object aea. Suppose A is a andom vaiable epesenting the object 5 4 3 2 6 4 2 BAHNHOF 2 3 4 5 CROSSING 3 2 2 3 4 5 LOEWENPLATZ 4 2 2 3 4 5 2 x UW 4 2 3 4 5 4 2 JELMOLI 2 3 4 5 LINTHESCHER 3 2 2 3 4 5 SUNNYDAY 4 2 2 3 4 5 UW 2 5 2 3 4 5 Fig. 5. Cuve-fitting esults of the souce encoding ate model in Eq. (2) with diffeent videos of VGA and 72p esolutions. aea, and Q is a andom vaiable epesenting. Due to the independence of A and Q, the detection accuacy in Fig. 4 can be expessed as: P A,Q (a, q) = f (A a) g (q), () whee f ( ) is the pobability of tue-positive detection esult when the objects aea is lage than a. g ( ) is the pobability of tue-positive detection esult as a function of video encoding q. In total 6 videos with VGA (64 48) esolution in ETHZ set [25] and 2 videos with 72p (28 72) esolution ecoded in the Univesity of Washington (UW) ae tested. We also investigate the human detection accuacy model by two-dimensional cuve-fitting in Fig. 5, Eq. () can be appoximated via egession as: P A,Q (a, q) = (.2865 exp ( 3.3934 4 a )) ( ).6 2 q/6 (2) +.6762. The encoding ate model function (q) can also be epesented as a function of q. In this pape, we adopt a simple exponential model to fit the souce coding ate with espect to, i.e., (q) = c exp (c 2 q), (3) whee c and c 2 ae two paametes to be detemined. Figue 5 shows the elationship between diffeent and souce coding ate using HEVC encode. IV. PROPOSED VIDEO ENCODING RATE ALLOCATION SCHEME Since wieless video steaming is bandwidth consuming, and the oveall wieless esouce is limited in WWAN, it is

cucial to design an efficient ate allocation scheme so that the tue-positive detection esult is maximized unde a cetain total ate constaint. Theefoe, the objective of ou poposed system is to optimally allocate the video encoding ate of each mobile node unde a total ate constaint so that the oveall tue-positive detection pobability is maximized, i.e., max M N m n= P (a m,n, q m ( m )) m R (T) ; m R (min), m, whee M is the total numbe of mobile nodes. = [, 2,, M ] is the ate allocation vecto and the element m epesents the coesponding souce coding ate of the mobile node m. N m is the numbe of objects (people in human detection scenaio) in the view of mobile node m. R (T) is the total available ate of the system. R (min) is the minimum ate equiement so that the minimum detection capability can be maintained fo each mobile node. By taking the logaithm of the objective function, the optimization poblem in Eq. (4) can be efomulated as: max N m log (g (q m ( m )))+ N m n= m R (T) ; m R (min), m. log (f (a m,n )) In Eq. 5, the second tem of the objective function can be consideed as constant since the optimization vaiable only appeas in the fist tem. Theefoe, we emove the second tem so that the final poblem fomulation is: ( ) max N m log.6 2 6 c (m) 2 log m c (m) m R (T) ; m R (min), m. +.6762 Note that in ou poblem fomulation, the optimal solution of the souce coding ate allocation is affected by human density indicated by N m. The objective function in Eq. (6) can be poven as a convex function [26] (see Appendix A). Since the constaint is linea, the optimization poblem in Eq. (6) becomes a convex optimization poblem, which can be effectively solved by existing tools such as CVX [27]. In ou implementation, the esouce allocation is updated in evey GoP time peiod and the human density N m is detemined by human detection esults in the last GoP time peiod. V. SIMULATION RESULTS The poposed algoithm is tested in this section. Thee video clips ae used to compete fo the limited wieless esouces: one video LINTHESCHER fom the ETHZ set [25] and two videos ecoded in UW campus. The esolutions and human densities of the thee videos ae listed in Table I. HEVC (4) (5) (6) TABLE I VIDEO RESOLUTIONS AND HUMAN DENSITIES Video Resolution Human Density UW 28 72 Low UW 2 28 72 Medium LINTHESCHER 64 48 High Fig. 6. The sample fames of the thee videos. Left: UW ; Middle: UW 2 ; Right: LINTHESCHER. (X265 implementation) [24] is used as the video encode. The fame ate of each video is set as 25 fps. GoP sizes ae set as 6 fo all the videos. The encoding patten in each GoP block is one I-fame followed by 5 P-fames. 25 GoPs (4 fames) ae tested fo each video. The sample video fames of the thee videos ae shown in Fig. 6 We compae ou poposed QoC-diven ate allocation scheme with two othe schemes. One is the equal ate allocation scheme, which evenly allocates the total ate to each mobile node. The othe scheme is a distotion-diven ate allocation scheme, which ties to minimize the decoding meansquaed-eo (MSE) of the system. We adopt a ate-distotion model as [28]: d m () = c (m) 3 c(m) 4, (7) whee d m is the distotion in tems of MSE fo the mobile node m, while c (m) 3 and c (m) 4 ae two constants to be detemined. The MSE-diven ate allocation poblem can be expessed as: min d m ( m ) m R (T) ; m R (min), m. In the simulations, the minimum ate equiement R (min) fo ou poposed QoC-diven scheme and the MSE-diven ate allocation scheme ae both set as 2 Kbps. Figue 7 shows the souce coding ate allocation of these 3 videos when the total ate constaint is 4.8 Mbps. With the MSE-diven ate allocation scheme, the ate is allocated based on the distotion of each video, which is not diectly elated to human detection esults. Howeve, with the poposed QoC-diven ate allocation scheme, moe ate is allocated to the mobile nodes with highe human densities. Theefoe, the ate of the video clip LINTHESCHER is highe than that of UW 2 and the ate assigned to UW is the lowest. The pobabilities of total tue-positive detections with diffeent total ate constaints ae plotted in Fig. 8. With (8)

3 2 UW UW2 LINTHESCHER 5 5 2 25 GoP index 3 25 2 5 5 5 5 2 25 GoP index Pobability of false alams.64.62.6.58.56.54.52 Poposed content diven MSE diven Equal.5 5 2 25 3 35 4 45 5 Total ate constaint (Kbps) Fig. 7. Data ate allocation of the 3 videos with the poposed QoC-diven ate allocation scheme (top) and the MSE-diven ate allcoation scheme (bottom). Total ate constaint: 4.8 Mbps. Pobability of tue positive detections.5.48.46.44.42.4.38 Poposed content diven MSE diven Equal.36 5 2 25 3 35 4 45 5 Total ate constaint (Kbps) Fig. 8. Pobability of tue-positive human detections unde diffeent total ate constaints. moe available ate, the video encoding qualities become bette, esulting in impoving the tue-positive detection ates at the cloud seve. Moeove, with the same total ate constaint, the poposed QoC-diven ate allocation scheme has bette human detection pefomance compaing with the equal ate allocation scheme and the MSEdiven ate allocation scheme. It is noticeable that the MSE-diven ate allocation scheme has wose human detection pefomance than the equal ate allocation scheme. This indicates that tansmitting videos based on distotions (decoding qualities) may not be a suitable choice if the deliveed videos ae used fo video analysis othe than human peception. Also, the pefomance gain of the poposed QoC-diven scheme becomes less when the total available Fig. 9. False-alam ate unde diffeent total ate constaints. ate becomes highe. This is because of less video quality degadation with highe encoding ate. The human detecto may geneates some false-alam detections (i.e., no human exists in the egion of bounding box given by human detectos), which will cause poblems fo subsequent video analysis techniques based on human detections, such as human tacking, behavio undestanding, etc.. Theefoe, false-alam is anothe pefomance indicato fo human detections. Figue 9 shows the pobability of falsealams unde diffeent total ate constaints. Obviously, the false-alam ate becomes smalle when moe ate is available and high-quality videos ae decoded at the cloud seve. With the same total ate constaint, the poposed QoC-diven ate allocation scheme has the lowest falsealam ate. The videos of human detection esults ae available at http://allison.ee.washington.edu/xchen/mmsp QoC/ VI. CONCLUSIONS In this pape, we poposed a QoC-diven ate allocation scheme fo video analytics puposes in mobile suveillance netwok with multiple moving cameas. Unlike the taditional wieless video tansmission design fo human peception, ou poposed scheme ties to maximize the human detection ate. The DPM object detecto is used fo human detection and its accuacy model with espect to object aea and video quality is given. Ou poposed ate allocation scheme can be fomulated as a convex optimization poblem, which can be efficiently solved by existing solves. Simulation esults show the effectiveness of ou poposed scheme and its favoable pefomance compaing with equal ate allocation and MSEdiven ate allocation schemes. Plenty of futue woks can be conducted in both compute vision and video tansmission aeas. In compute vision aea, effects of video compession and tansmission eos on existing video analysis and compute gaphics technologies such

as object detection and tacking, pose and event ecognitions, 3-D scene econstuctions etc. can be investigated. While in video tansmission aea, it is necessay to develop noval video coding and tansmissions schemes, which can peseve the equied featues (e.g., [29]) fo existing compute vision technologies. As moe and moe videos ae tansmitted fo video analysis puposes, we believe that combining wieless video tansmission and compute vision techniques contains ich eseach topics and is cucial fo next geneation mobile netwoks based on the Intenet of things (IoT) and the big. APPENDIX A CONVEXITY OF THE OBJECTIVE FUNCTION IN EQ. (6) Let f (x) be defined as: f (x) = ( x log 6 c 2 c ), (9) which is convex with espect to x if c 2 is non-positive, and f 2 (x) is defined as: f 2 (x) =.6 2 x +.6762, () which is concave and non-inceasing with espect to x. By the composition ule [26], f 3 (x) = f 2 (f (x)) is concave. Similaly, since f 4 (x) = log (x) is concave and non-deceasing, f 5 (x) = f 4 (f 3 (x)) is also concave by the composition ule. Also, N m is the detection esult of mobile node m, which is non-negative. Theefoe, the objective function of Eq. (6) is a non-negative sum of concave functions f 5 ( m ), which is also concave [26]. REFERENCES [] J.-N. Hwang, Multimedia Netwoking: Fom Theoy to Pactice. Cambidge Univesity Pess, 29. [2] X. Chen, J.-N. Hwang, P.-H. Wu, H.-J. Su, and C.-N. Lee, Adaptive mode and modulation coding switching scheme in MIMO multicasting system, in Poc. of IEEE Intl. Symp. on Cicuits and Systems, Beijing, China, May 9-23 23. [3] Cisco Visual Netwoking Index: Foecast and Methodology, 24-29, 25. [4] X. Chen, J.-N. Hwang, J. A. Ritcey, and C.-N. Lee, Quality-diven joint ate and powe adaptation fo scalable video tansmissions ove MIMO systems, submitted to IEEE Tans. on Cicuits and Systems fo Video Technologies, 25. [5] P.-H. Wu, C.-W. Huang, J.-N. Hwang, J. young Pyun, and J. Zhang, Video-quality-diven esouce allocation fo eal-time suveillance video uplinking ove OFDMA-based wieless netwoks, IEEE Tans. on Vehicula Tech., pp. 3233 3246, 24. [6] X. Chen, J.-N. Hwang, C.-Y. Wang, and C.-N. Lee, A nea optimal QoE-diven powe allocation scheme fo SVC-based video tansmissions ove MIMO systems, in Poc. of IEEE Intl. Conf. on Communications, Sydney, NSW, June -4 24. [7] X. Chen, J.-N. Hwang, C.-N. Lee, and S.-I. Chen, A nea optimal QoEdiven powe allocation scheme fo scalable video tansmissions ove MIMO systems, IEEE Jounal of Selected Topics in Signal Pocessing, vol. 9, no., pp. 76 88, 25. [8] X. Chen, J.-N. Hwang, C.-J. Wu, S.-R. Yang, and C.-N. Lee, A QoEbased APP laye scheduling scheme fo scalable video tansmissions ove Multi-RAT systems, in Poc. of IEEE Intl. Conf. on Communications, London, UK, 25. [9] X. Chen, H. Du, J.-N. Hwang, J. A. Ritcey, and C.-N. Lee, A QoEdiven FEC ate adaptation scheme fo scalable video tansmissions ove MIMO systems, in Poc. of IEEE Intl. Conf. on Communications, London, UK, 25. [] M. Fiedle, T. Hossfeld, and P. Tan-Gia, A geneic quantitative elationship between quality of expeience and quality of sevice, Netwok, IEEE, vol. 24, no. 2, pp. 36 4, 2. [] A. K. Moothy, K. Seshadinathan, R. Soundaaajan, and A. C. Bovik, Wieless video quality assessment: A study of subjective scoes and objective algoithms, Cicuits and Systems fo Video Technology, IEEE Tansactions on, vol. 2, no. 4, pp. 587 599, 2. [2] K.-H. Lee, J.-N. Hwang, and S.-I. Chen, Model-based vehicle localization based on thee-dimensional constained multiple-kenel tacking, IEEE Tans. on Cicuits and Systems fo Video Technology, vol. 25, no., pp. 38 5, 25. [3] M.-C. Chuang, J.-N. Hwang, K. Willianms, and R. Towle, Tacking live fish fom low-contast and low-fame-ate steeo videos, IEEE Tans. on Cicuits and Systems fo Video Technology, vol. 25, no., pp. 67 79, 25. [4] K.-H. Lee, J.-N. Hwang, G. Okopal, and J. Pitton, Diving ecode based on-oad pedestian tacking using visual SLAM and constained multiple-kenel, in Poc. IEEE Intenational Conf. Intelligent Tanspotation System (ITSC), 24, pp. 2629 2635. [5] L. Hou, W. Wan, K.-H. Lee, J.-N. Hwang, G. Okopal, and J. Pitton, Defomable multiple-kenel based human tacking using a moving camea, in Poc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Pocessing (ICASSP), 25. [6] K.-H. Lee and J.-N. Hwang, On-oad pedestian tacking acoss multiple diving ecodes, IEEE Tans. on Multimedia, vol. 7, no. 9, pp. 429 438, 25. [7] B. Giod, V. Chandasekha, D. M. Chen, N.-M. Cheung, R. Gzeszczuk, Y. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham, Mobile visual seach, IEEE Signal Pocessing Magazine, vol. 28, no. 4, pp. 6 76, 2. [8] A. Redondi, M. Cesana, and M. Tagliasacchi, Rate-accuacy optimization in visual wieless senso netwoks, in Poc. of IEEE Intenational Confeence on Image Pocessing, 22, pp. 5 8. [9] S. Milani, R. Benadini, and R. Rinaldo, A saliency-based ate contol fo people detection in video, in Poc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Pocessing (ICASSP), 23, pp. 26 22. [2] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, Oveview of the high efficiency video coding (HEVC) standad, IEEE Tans. on Cicuits and Systems fo Video Technology, vol. 22, no. 2, pp. 649 668, 22. [2] N. Dalal and B. Tiggs, Histogams of oiented gadients fo human detection, in Poc. of IEEE Compute Society Conf. on Compute Vision and Patten Recognition (CVPR). IEEE, 25, pp. 886 893. [22] B. Leibe, A. Leonadis, and B. Schiele, Robust object detection with inteleaved categoization and segmentation, Intenational jounal of compute vision, vol. 77, no. -3, pp. 259 289, 28. [23] P. F. Felzenszwalb, R. B. Gishick, D. McAlleste, and D. Ramanan, Object detection with disciminatively tained pat-based models, Patten Analysis and Machine Intelligence, IEEE Tansactions on, vol. 32, no. 9, pp. 627 645, 2. [24] The X265 website. [Online]. Available at http://bitbucket.og/multicoewae/x265/wiki/home. [25] A. Ess, B. Leibe, K. Schindle, and L. V. Gool, A mobile vision system fo obust multi-peson tacking, in Poc. of IEEE Compute Society Conf. on Compute Vision and Patten Recognition (CVPR). IEEE, 28, pp. 8. [26] S. Boyd and L. Vandenbeghe, Convex Optimization. Cambidge Univesity Pess, 24. [27] M. Gant and S. Boyd. CVX: MATLAB softwae fo disciplined convex pogamming. [Online]. Available at http://stanfod.edu/ boyd/cvx. [28] Y.-H. Huang, T.-S. Ou, P.-Y. Su, and H. H. Chen, Peceptual atedistotion optimization using stuctual similaity index as quality metic, IEEE Tans. on Cicuits and Systems fo Video Technology, vol. 2, no., pp. 64 624, 2. [29] J. Chao, R. Huitl, E. Steinbach, and D. Schoede, A novel ate contol famewok fo sift/suf featue pesevation in h. 264/avc video compession, IEEE Tans. on Cicuits and Systems fo Video Technology, vol. 25, no. 6, pp. 958 972, 24.