Moving object detection from video captured by moving surveillance camera

Abstract: This paper presents an effective method for the detection of multiple moving objects from a video sequence captured by a moving surveillance camera. Moving object detection from a moving camera is difficult since camera motion and object motion are mixed. In the proposed method, we created a panoramic picture from a moving camera. After that, with each frame captured from this camera, we used the template matching method to found its place in the panoramic picture. Finally, using the image differencing method, we found out moving objects. Experimental results have shown that the proposed method had good performance with more than 80% of true detection rate on average.

pdf7 trang | Chia sẻ: thanhle95 | Lượt xem: 240 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu Moving object detection from video captured by moving surveillance camera, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 71, 02 - 2021 139 MOVING OBJECT DETECTION FROM VIDEO CAPTURED BY MOVING SURVEILLANCE CAMERA Hoa Tat Thang 1 , Tran Binh Minh 2,* , Doan Van Hoa 2 , Nguyen Van Trung 3 Abstract: This paper presents an effective method for the detection of multiple moving objects from a video sequence captured by a moving surveillance camera. Moving object detection from a moving camera is difficult since camera motion and object motion are mixed. In the proposed method, we created a panoramic picture from a moving camera. After that, with each frame captured from this camera, we used the template matching method to found its place in the panoramic picture. Finally, using the image differencing method, we found out moving objects. Experimental results have shown that the proposed method had good performance with more than 80% of true detection rate on average. Keywords: Moving object detection; Moving camera; Object tracking; Panoramic image; Image difference. 1. INTRODUCTION Motion control in video has wide applications in many fields of life, such as moving object detection [1], object tracking [2], motion segmentation [3], event detection [4]. Many real applications are based on videos taken either by static or moving cameras, such as in video surveillance of human activities, visual observation of animals, home care, optical motion capture, and multimedia applications [5]. These applications often require a moving object detection step followed by tracking and recognition steps. Moving object detection is thus surely among the most investigated field in computer vision. At first, methods were developed for static cameras but, in the last two decades, approaches with moving cameras have been of many interests giving more challenging situations to handle. There are three main categories of approaches to detect moving objects from static cameras: consecutive frame difference, background subtraction, and optical flow [5]. The most commonly used method is background subtraction. In this method, at first, a background image that does not include any moving objects is created. After that, the background subtraction method will help detect moving objects. This method requires two conditions: constant illumination and static background. The advantages of the method are low computational cost and easy implementation, but it cannot be applied to moving cameras. Moving object detection from a moving camera is difficult since camera motion and object motion are mixed. There are 8 different approaches applied to detect moving objects in video captured by moving cameras: panoramic background subtraction, dual cameras, motion compensation, subspace segmentation, motion segmentation, plane+parallax, multi planes, and split the image into blocks. The panoramic background subtraction method is one of the most effective methods to detect moving objects in video captured by moving cameras [5]. With the panoramic background subtraction method, the images captured by a moving camera can be stitch together to form a bigger image, so-called a panorama picture. This panorama picture can be used to model the background and detect moving objects as for a static camera. In this study, we propose a method to detect multiple moving objects from a video captured by a moving surveillance camera. This method belongs to the panoramic background subtraction approach to detect moving objects in video captured by moving cameras. And it is useful for real-time detection and works well in fast-moving object cases. The background subtraction method was used to create a panoramic picture from a moving camera. After that, with each picture captured, a template matching method was used to found its place in the panoramic picture and using the image differencing method to found moving objects. The rest of this paper is organized as follows: Related work and methodology are described in Công nghệ thông tin & Cơ sở toán học cho tin học 140 H. T. Thang, , N. V. Trung, “Moving object detection moving surveillance camera.” Section 2 and Section 3, respectively. Section 4 and section 5 describes experiments, results, and discussion. Finally, the conclusions are given in section 6. 2. RELATED WORK D. Avola and al. [6] proposed a method for background modeling and foreground. In particular, the proposed method uses a spatio-temporal tracking of sets of keypoints to distinguish the background from the foreground. It analyses these sets by a grid strategy to estimate both camera movements and scale changes. The same sets are also used to construct a panoramic background model and to delete the possible initial foreground elements from it. Choi et al. [7] proposed a solution using infrared cameras. This method can control all 3 dimensions of moving objects. The advantage of this method is the ability to accurately determine the moving object and the distance from the object to the cameras. The disadvantage of this method, however, is the use of depth information that is not common in nowadays cameras but only available for some specialized cameras. Jodoin et al. [8] proposed a robust moving object detection method for both fixed- and moving-camera captured video sequences that use a Markov random field (MRF) to obtain label field fusion. Wang [9] proposed to use the JRF model for moving vehicle detection in video sequences in different weather conditions. The model, however, is limited because it applies only to grayscale images. Hu et al. [10] proposed a method to detect moving objects for videos captured by a moving camera. Feature points in frames are classified into the foreground and background with the assistance of multiple view geometry. Image difference is computed using the aid of affine transformation based on background feature points. Moving object contours are then obtained by consolidating the foreground areas and differences in the image. Finally, the movement history of contiguous frames and refinement schemes are used to detect moving objects. This method gives relatively good results in detecting moving subjects, but the method cannot show moving subjects in the whole panoramic picture. 3. METHODOLOGY This study proposes a method for moving object detection from video captured from a moving camera without additional sensors. In the proposed method, a panoramic picture from a moving camera is created. After that, with each frame captured from this camera, we used template matching to find its place in the whole panoramic picture and using image differencing method to found the moving object. Steps involved in this method as in figure 1: 2. Find the location of the current frame in the panoramic picture 1. Create a panoramic picture from moving camera 3. Find moving objects by image differencing method and mask these objects Figure 1. Steps of proposed method. 3.1. Create a panoramic picture from a moving camera To construct our panoramic picture, we’ll utilize computer vision and image processing Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 71, 02 - 2021 141 techniques such as keypoint detection and local invariant descriptors; keypoint matching; RANSAC; and perspective warping. Our panoramic stitching algorithm consists of steps as follows: 1. From the first images of the video, select a number of images so that any two adjacent pictures must satisfy the overlap conditions (for stitching). The first image is the leftmost image, and the last image is the rightmost image; 2. Apply image-stitching and panoramic construction for the first and the second images; 3. Replace the first and the second images with the created panoramic image at 2 nd step. 4. Repeat 2 nd step until the last (rightmost) image is reached. 3.2. Find the location of the current frame in the panoramic picture For finding the location of the current frame in the panoramic picture, we will apply template matching technique to find areas of the panoramic picture that match to the current frame. Suppose that: I - Panoramic picture, T – Current frame. W  H is the size of the panoramic picture. w  h is the size of the current frame. We will slide T through I, compares the overlapped patches of size w×h against panoramic picture (W  H) - moving the patch one pixel at a time (from left to right, up to down). The summation is done over template image: x′=0...w−1, y′=0...h−1 At each location, a metric is calculated to represent how "good" or "bad" the match at that location is (or how similar the patch is to that particular area of the panoramic picture). The result is matrix R with size (W−w+1)×(H−h+1). Each location (x, y) in R contains the match metric and is defined as in Eqs (1): ( ) ∑ ( ( ) ( )) √∑ ( ) ∑ ( ) (1) The location of the current frame in the panoramic picture is x*, y* where R(x*, y*) has a maximum value and greater than some threshold value. Usually, we assign a threshold value = 0.85. This value is selected based on experience, and 0.85 is just enough to find the location of the current image on the panoramic picture. If this value is too small, the wrong image position may be chosen. If the value is too large, the position of the current image may not be found in the panoramic picture. 3.3. Find moving objects by image differencing method For finding movements in the current frame, we use the differencing method. The finding moving object method includes steps as follows: - Convert frame to a grayscale image; - Subtract the current frame from the previous frame (difference image = current image – previous image); - Convert the difference image into binary images using an optimal threshold; - Find foreground pixel (moving pixel); - Apply bounding boxes to mark the moving object; - Update the panoramic picture by replacing the old region with the current frame. The image difference between two images is defined as the sum of the absolute difference at each pixel. The first image It is analyzed with a second image, It–1, is the previous image. The difference value is defined as: Công nghệ thông tin & Cơ sở toán học cho tin học 142 H. T. Thang, , N. V. Trung, “Moving object detection moving surveillance camera.” ( ) ∑ ( ) ( ) (2) where M is the resolution or number of pixels in the image. It and It–1 images were converted to grayscale images before image differencing to reduce the amount of calculation, but do not affect the results. Because the image difference ( ) is noisy and sensitive to camera motion and image degradation, thus some morphological transformations were applied to reduce noises. Then, the difference image is converted to binary using an optimal threshold. If one pixel of the difference image greater than the threshold, then the pixel is considered as foreground pixel (moving pixel). The erosion operator is used one time, and then the dilation operator is used three times to obtain more complete moving object regions. The kernel is a 3x3 matrix full of ones. Finally, the minimum bounding boxes are applied to mark the moving object for moving object detection and the panoramic picture was updated by replacing the old region with the current frame for the next detection. 4. EXPERIMENTS Experiments were conducted on a computer with an Intel Core i7 2.4-GHz CPU and 8GB of RAM. The algorithms were implemented in Python and Open Source Computer Vision Library. In the first experiment, video sequences are received from the surveillance camera on Church Street Marketplace, Burlington (Live Camera from Youtube channel), and results are presented as follows: Figure 2. Panoramic picture created by frames from moving camera. Figure 3. Current frame (from image sequence). Figure 4. Difference frame (Noisy and degradation). Figure 5. Binary image after applying morphological operations to reduce noises. Figure 6. Movement detection on panoramic picture (Moving object was masked by a bounding box). Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 71, 02 - 2021 143 After applying steps in section 3, a moving object was detected and showed on the panoramic picture (masked by a bounding box). For noise removal, we will show only objects having a contour size greater than some value. In this case, for human moving detection, we use thresh value = 1000 with the condition of contour height > contour width. In other experiments, we used images from moving cameras on a ski field in Colorado, Canada, and on a Japanese street (Live Camera from Youtube channel) to check the program's performance. The program classified the moving people and ignored the non-moving people in the pictures. Figure 7. Movement detection on Colorado ski field (Two moving object were masked by bounding boxes). Figure 8. Movement detection on Japanese street (Three moving object were mask by bounding boxes). 5. RESULTS AND DISCUSSION In order to illustrate the performance of video object segmentation, the true detection rate TR and the false detection rate FR were adopted as defined in Eqs (3),(4), respectively, where N is the total number of moving objects, TP is the total number of true detection objects, and FN is the total number of false detection objects. (3) (4) Công nghệ thông tin & Cơ sở toán học cho tin học 144 H. T. Thang, , N. V. Trung, “Moving object detection moving surveillance camera.” Table 1. Performance evaluation for self-made datasets. Sequence N TP FN TR FR Church Street Marketplace 352 279 8 79.26% 2.27% Colorado ski field 459 371 10 80.83% 2.18% Japanese street 540 431 11 79.81% 2.04% All frames 1351 1081 29 80.01% 2.15% Experimental results show that the proposed method better than the method given by Hu et al. [10]. The average true detection rate TR and false detection rate FR obtained using the proposed method are 80.01% and 2.15%, as shown in table 1, respectively, while the state-of-the-art methods (the method proposed by Hu et al.[10]) has average true detection rate TR and false detection rate FR 65.69% and 13.27%. 6. CONCLUSION In this paper, we present a new method to detect multiple moving objects from a video sequence captured from a moving camera. The experiments have shown that the method has good performance and efficiency. In addition, the results have shown that the given method not only helps us to detect multiple moving objects but also shows all moving objects in a full view panoramic picture. Therefore, the proposed method is a useful tool for intelligent surveillance. The main weakness of the proposed method is that, in this paper, for finding the location of the current frame, we use threshold value = 0.85, the erosion operator is used one time, and then the dilation operator is used three times, but with video sequences from other sources, these parameters may be different. In the future, we will improve our method to find out these parameters automatically. REFERENCES [1]. X. Zhou, C. Yang, W. Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation”, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(3) (2013) 597- 610. [2]. W.-C. Hu, C.-Y. Yang, D.-Y. Huang, “Robust real-time ship detection and tracking for visual surveillance of cage aquaculture”, Journal of Visual Communication and Image Representation 22(6) (2011) 543-556. [3]. F.-L. Lian, Y.-C. Lin, C.-T. Kuo, J.-H. Jean, “Voting-based motion estimation for real-time video transmission in networked mobile camera systems”, IEEE Transactions on Industrial Informatics 9(1) (2013) 172-180. [4]. D. Tran, J. Yuan, D. Forsyth, “Video event detection: From subvolume localization to spatiotemporal path search”, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2) (2014) 404- 416. [5]. Marie-Neige Chapel, Thierry Bouwmans, “Moving objects detection with a moving camera: A Comprehensive Review”, Computer Science Review, Volume 38, 2020, pp. 1-2. [6]. D. Avola, L. Cinque, G. Foresti, C. Massaroni, D. Pannone, “A keypoint-based method for background modeling and foreground detection using a PTZ camera”, Pattern Recognition Letters 96 (2017) 96–105. [7]. W. Choi, C. Pantofaru, S. Savarese, “A general framework for tracking multiple people from a moving camera”, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(7) (2013). [8]. P.M. Jodoin, M. Mignotte, C. Rosenberger, “Segmentation framework based on label field fusion”, IEEE Transactions on Image Processing 16(10) (2007) 2535-2550. [9]. Y. Wang, “Joint random field model for all-weather moving vehicle detection”, IEEE Transactions on Image Processing 19(9) (2010) 2491-2501. [10]. W.-C. Hu, C.-H. Chen, C.-M. Chen, T.-Y. Chen, “Effective moving object detection from videos captured by a moving camera”, in: Proceedings of the First Euro-China Conference on Intelligent Data Analysis and Applications, Vol. 1, 2014, pp. 343-353. Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 71, 02 - 2021 145 TÓM TẮT PHƯƠNG PHÁP PHÁT HIỆN ĐỐI TƯỢNG CHUYỂN ĐỘNG DỰA TRÊN HÌNH ẢNH THU ĐƯỢC TỪ CAMERA GIÁM SÁT CÓ QUAY QUÉT Bài báo này giới thiệu một phương pháp hiệu quả để phát hiện nhiều đối tượng chuyển động từ một chuỗi các khung hình thu được từ một camera chuyển động. Phát hiện đối tượng chuyển động từ một camera chuyển động (quay quét) là một vấn đề khó vì chuyển động của camera và chuyển động của đối tượng bị trộn vào nhau. Trong phương pháp đề xuất, tác giả tạo ra một ảnh toàn cảnh từ camera chuyển động. Tiếp theo, với mỗi khung hình thu được từ camera, tác giả sử dụng phương pháp trộn mẫu để tìm vị trí của ảnh trong ảnh toàn cảnh. Cuối cùng, sử dụng phương pháp xác định khác biệt ảnh để tìm ra đối tượng chuyển động. Các kết quả thực nghiệm cho thấy, phương pháp đề xuất đạt được kết quả tốt với tỷ lệ phát hiện chính xác trung bình đạt trên 80%. Từ khóa: Phát hiện đối tượng chuyển động; Camera chuyển động; Định vị đối tượng; Ảnh toàn cảnh; Độ khác biệt giữa các ảnh. Received 30 th December 2020 Revised 27 th January 2021 Published 5 th February 2021 Author affiliations: 1 Military Technique Academy; 2 MITI, Military Academy of Science and Technology; 3 Military Academy of Logistics. *Corresponding author: minhchip79@gmail.com.