2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks Vector Bank Based Multimedia Codec System-on-a-Chip (SoC) Design Ruei-Xi Chen, Wei Zhao, Jeffrey Fan andasaddavari Computer Science and Information Engineering, St. Johns University, Taipei, Taiwan Electrical and Computer Engineering, Florida International University, Miami, Florida, USA Electrical and Computer Engineering, West Virginia University Institute of Technology, Montgomery, West Virginia, USA Abstract In this paper, we present a design architecture of implementing a Vector Bank into video encoder system, namely, an H.264 encoder, in order to detect and analyze the moving objects within the specific area. Also, we believe that the transmitting bandwidth could be saved with the implementation of Vector Bank design. Motion Estimation is a common technology for today s video codec. By abstracting the vector data from the Motion Estimation block using the motion detection method and with the application of Laplacian of Gaussian operator, we could obtain the object motion data generated by up to 16 reference frames. Thus, it could save the bandwidth, processing load, and memory resources dramatically. Keywords-H.264; Edge Detection; Motion Estimation I. INTRODUCTION People have been using visual sensors to setup surveillance system for years. However, we hardly find any applications that can be used widely for home security purpose due to the lack of ability of data transmitting and analyzing limitation. As you can see in a simple platform that we commonly used for the institutional security in Figure-1, analog sensors along with analog network transfer the TV signals into the eyes of a guard in security room. However, analog signal is good at transmitting, but weak in analyzing, store, and encryption. If we need real-time alert, in most case, human intervention is a must. Apparently, that won t work for regular families in home security. Things have been changed and the whole world is becoming digital. Today, we could purchase a digital visual sensor from any free market. The price is reasonably low and they all do have a video encoding chip inside the camera. But still, the solution is not applicable for home use in some way. As you can see in Figure-2, since most video compression codec are resource consuming, we need a powerful computer to decompress the video data coming from all the sensors and analyze them in real time manner. Definitely, it is not something for home use. The technique we are proposing today is to add a vector bank on a typical video codec core, and use motion object detection method plus boundary detect operator to identify the motion object. Firstly, an H.264/AVC [1] (also called as MPEG-4 Part 10) is becoming the most popular video encoding and Figure 1. Analog Visual Sensors network for Secure Surveillance System decoding standard today. It is developed by the ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) and MPEG (ISO/IEC Moving Picture Experts Group) and it has an advanced compression ratio that is about 50% in size compared to the previous generations, such as MPEG-2 [2]. Figure 2. Digital Visual Sensors network for Secure Surveillance System An important part of H.264 is called Motion Estimation (ME), as shown in Figure-3 [3]. H.264 Encoder pushes both 978-0-7695-3908-9/09 $26.00 2009 IEEE DOI 10.1109/I-SPAN.2009.74 515
current frame and reference frames (i.e. previous frames) to the ME block. The ME block will analyze the similarity of the Marco Blocks (MB) between the current frame and several reference frames. Finally, the relations from ME block are called Motion Vectors. Figure 3. General H.264 Encoder Core Architecture Moving object detection and tracking technique has been studied for years. It is served widely in different areas, such as video surveillance, machine-human interfaces and authentication systems. There are different algorithms today in use to track moving objects [4] [5] [6] [7]. Most of them are using the frame differences of the neighbor frames to detect moving objects. In this paper, we use Motion Vectors generated by H.264 encoder to indicate the difference between two frames and moving objects. After we detect the moving objects using the vector data coming out of the H.264 encoder, we pass those vectors into the edge detection unit. There are several edge detection operators [8] that we can use. In this paper, we suggest two operators: a first-derivative operator Sobel or a secondderivative operator Laplacian of Gaussian operator. The rest of the paper is organized as follows. The Vector Bank and Motion Detection based on vectors will be introduced in Section 2. The edge detection operator will be mentioned in section 3. After that, we will demonstrate some experimental results in Section 4. Finally, the conclusion will be described at Section 5. II. VECTOR BASED MOVING OBJECT DETECTION Basically, Vector Bank is a memory based analyzer which should be attached to the Motion Estimation block of an typical H.264 hard encoder [9] [10]. A. Motion Estimation and Motion Vectors Figure-4 [3] shows portion of a typical Motion Vector Map of a video encoding procedure. With current frame and reference frames (decoded previous frames), the Motion Estimation Block generates the Macro-Block based Motion Vectors. These vectors along with residues are going to be transformed, quantized and compressed into video codes. And the Motion Estimation Block will dump the Vectors so that next MB would come in. In general, if we want to analyze the vector information inside the video code, we need to decompress the video file to see it again. Generally, an H.264Standard Definition (SD) file (e.g.ntsc 480pfile) would be decoded in real time by a full-running Pentium-4 CPU platform. If we simply have 2 video cameras, we need two PCs. And if the Video Sensor is HD (High Definition, e.g. 1080p), it is almost impossible for us to decode it in real-time, even harder for us to implement other algorithms to detect the motion object by the motion vectors. B. Vector Bank As you can see from Figure-6, the Vector Bank grabs Motion Vectors from the output of the Motion Estimation Block. With a queue, the Vector Bank could recover the Block-Based motion vectors back to frame-based. It is important because motion objects are base on frames but not Marco-Blocks. Figure 5. Block Implement Vector Bank into a typical H.264 Motion Estimation Figure 4. A typical Motion Vector Map in H.264 encoding procedure C. Moving Object Detection Basically, a digital camera for surveillance purpose will be placed with fixed and still location. That means the picture will have still or almost steady background. In most cases, the background won t move as there are no vector yields. However, if any movement occurs, the module with vector banks would generate a non-zero interrupt to the CPU. Then the CPU (or a programmed embedded DSP) will process the vector data in order to get the information of the moving object. Here is an example of how we identify moving objects. Figure-6 shows 2 frames of a home video taken by a steady 516
III. EDGE DETECTION OPERATOR As one important component of theory of Computer Vision, Edge Detection is well developed and widely used for Digital Image Processing Field. There are a lot of edge detection algorithms, including the first-derivative operators and second-derivative operators. First-derivative operators such as the Roberts, Prewitt and Sobel [8], can detect the edge of an image in one dimension (horizontal or vertical), while second-derivative operators, such as Laplacian operator could detect in both dimensions at the same time. In this paper, we propose two most famous edge operators, Sobel and Laplacian of Gaussian. Figure 6. An example 2 frame of home video camcorder. The only moving object here is a person. Two frames are taken with the time interval of 0.3 seconds, and the field that we see in the Figure-6 is only a portion of the whole frame. With H.264 motion estimation algorithm, vectors of MBs have been generated by the differences from two frames. And after the Vector Bank collected every Motion Vectors, it should have a view of Figure-7. As you can see, the vectors are not all the same for every individual MB, but the still background has no vector at all in this case. That could easily isolate the motion object from the background. If a programmable DSP or intelligent CPU is provided, the object detection would be more smoothly and the direction and speed of the object could be predictable. Figure 8. The Sobel Operator 3-D Plot in MatLab A. Sobel Operator The Sobel operator is a first-derivative edge detection operator. It is simple and easy to realize in most cases. But in order to detect a 2-D image edge, we need to run Sobel twice with different directions. A typical Sobel bi-directional kernel (also shown in Figure-8): +1 +2 +1 G y = 0 0 0 (1) 1 2 1 and +1 0 1 G x = +2 0 2 (2) +1 0 1 Figure 7. The Motion Vector indicate the frame difference of two frames G x and G y can be combined together to get the absolute magnitude of the gradient: G = G 2 x + G 2 y (3) for the fast computation, the magnitude could also be approximate computed as: G = G x + G y (4) 517
Thus, the approximate kernel for 2-D sobel detection operator is: G = (z 1 +2 z 2 + z 3 ) (z 7 +2 z 8 + z 9 ) (5) + (z 3 +2 z 6 + z 9 ) (z 1 +2 z 4 + z 7 ) B. Laplacian of Gaussian (LoG) Operator The Laplacian of Gaussian (LoG) Operator is a seconddirevative edge operator, the 2-D function is going to be: 2 f = 2 f x 2 + 2 f y 2 (6) The typical Gaussian Kernel with width σ is: G σ = 1 e x2 +y 2 2σ 2 (7) 2πσ IV. EXPERIMENTAL RESULT A. Experiment Assumptions Consider a typical home security video surveillance sensor monitoring a specific area that contains moving objects, public areas and restricted areas shown in the Figure-10. In this case, we have monitored every possible movement in addition to the restricted area. If the moving object is approaching the restricted area (e.g. private garden or yard), then our system should siren an alert and tries to locate the object. So, the Laplacian of Gaussian will be: 2 G σ = 2 G σ x 2 + 2 G σ y 2 (8) Variable x and y is equal in this equation, we determine the xpartfirst: 2 G σ x 2 = x2 σ 2 σ 4 e x 2 +y 2 2σ 2 (9) Let x 2 +y 2 = r 2, and put x, y together back to the equation: 2 G σ = x2 + y 2 2σ 2 σ 4 e x 2 +y 2 2σ 2 = r2 2σ 2 σ 4 e r 2 2σ 2 (10) Figure 10. The Experiment Scene Setup B. LoG result without Motion Vector Bank As we mentioned in the last section, the LoG algorithm is for the still image to use color differential boundary to detect object. But with still image, it is easy to imagine that the boundary of the moving object should be hard to detected and obtained because of the other high-frequency spatial color signals. Figure-11 shows the output for the LoG algorithm from a single frame input of our video. Figure 9. The Laplacian of Gaussian 3D Plot in MatLab Finally, with the selection of σ, wefind the 9 9 digital approximation of equation [9]: 0 1 1 2 2 2 1 1 0 1 2 4 5 5 5 4 2 1 1 4 5 3 0 3 5 4 1 2 5 3 12 24 12 3 5 2 2 5 0 24 40 24 0 5 2 (11) 2 5 3 12 24 12 3 5 2 1 4 5 3 0 3 5 4 1 1 2 4 5 5 5 4 2 1 0 1 1 2 2 2 1 1 0 Figure 11. The Laplacian of Gaussian (LoG) output for a single frame C. LoG result with Motion Vector Bank In comparison to color-based LoG processing, the vector-based LoG processing has 2 important advantages: 518
1) Result of Vector different will not contain still color noises. No matter how colorful the still background is, the Vector-Based LoG result would just contain the moving object(s). 2) Vector-Based LoG processing would be more efficient. Instead of processing 16x16 (256) color points, each 16x16 Macro-Block would generate one Vector data. That is a 256 times saving. Let s take a look the result of the vector-based LoG processing Figure-12. The only result is two moving object borders (the background is for readers to better understand those borders). Figure 13. The Laplacian of Gaussian (LoG) output for a single frame compressed video data. The proposed approach could save the bandwidth when there are more than one visual camera out there which could share the bandwidth with equal priorities. Figure 12. The Laplacian of Gaussian (LoG) output for a single frame In the real algorithm, we select the appropriate threshold to achieve the best results. We assign threshold =0.7 for the color frame LoG and threshold = 0.5 for Vector frame LoG. In this case, the color frame LoG threshold tolerance is quite small that either 0.6 or 0.8 would mess up the whole picture because of the large variety of color schemes. But for the Motion Vectors, they are quite same in the case. Even if we use 0.3-0.9 as the vector, the result will still remain the same. D. Border violation detection and feedback Border violation detection is based on the result of moving object border output of Vector LoG processing. We can easily detect that the object 1 is inside the pre-defined restricted area, while object 2 is in the public area. So, our system should start to give the feedback that signals the alert, while starting to store the H.264 streaming data from this point on. V. CONCLUSION The Vector Bank based H.264 architecture could abstract the motion vectors during the encoding processing. By using a few steps of mathematical analysis, such as Laplacian of Gaussian filter, we could locate and identify the moving object. Furthermore, the Vector Bank could tell if there is movement in the observed area, that would be the key switch to start recording or transmitting the 1) An extended memory structure for H.264 is not hard to build. The cost of the implementation would not be higher than today s video encoder chips. It means the proposed approach is affordable and achievable. 2) Never need a separate computer (or people) to monitor the video stream. Make home-based surveillance network sensor possible. 3) Sensors do not need to stream out the video data when there is no occurrence of moving violations. That saves the bandwidth of the overall surveillance networks. 4) The surveillance video could only be taken while moving violation is taking place. This could potentially save the memory resources dramatically. REFERENCES [1] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU- T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264 ISO/IEC 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Document JVT-GO50, December 2003. [2] R. Chen, W. Zhao, Q. Liu, J. Fan, Efficient H.264 architecture using modular bandwidth estimation, IEEE 5th International Conference on Embedded Software and Systems (ICESS 08), pp. 277-282, Chengdu, China, July 29-31, 2008. [3] I. E. G. Richardson, H.264 and mpeg-4 video compression, pp. 27 28, August 2003. [4] D. Li, Moving objects detection by block comparison, Electronics, Circuits and Systems, vol. 1, pp. 341-344, Dec, 2000. [5] R. Cucchiara, C. Grana, M. Piccardi and A. Prati, Statistic and knowledge-based moving object detection in traffic scenes, IEEE Proceedings. Intelligent Transportation Systems, pp. 27-32, Oct, 2000. 519
[6] Y.K. Jung, K.W. Lee and Y.S. Ho, Content-based event retrieval using semantic scene interpretation for automated traffic surveillance, IEEE Transactions on Intelligent Transportation Systems, vol. 2, pp. 151-163, Sep, 2001. [7] R. Montoliu and F. Pla, Multiple parametric motion model estimation and segmentation, ICIP 2001, vol. 2, pp. 933-936, Oct, 2001. [8] R. C. Gonzalez and R. E. Woods, Digital image processing, vol. 10, no. 2, pp. 585 611, 2001. [9] W. Zhao, Z. Luo, Jeffrey Fan, S. Tan, Vector edge detection in H.264 Implementation, IEEE 5th International Conference on Embedded Software and Systems Symposia (ISHSO 08), pp. 208-212, Chengdu, China, July 29-31, 2008. [10] W. Zhao, Jeffrey Fan, A. Davari, Vector bank based target tracking via vision sensors in aviation systems, IEEE 41st Southeastern Symposium on System Theory (SSST 09), pp. 73-76, Tullahoma, TN, March 15-17, 2009. 520