A2Ba: Adaptive Background Modelling for Visual Aerial Surveillance Conditions

Background modelling algorithms are widely used to define a part of an image that most time remains stationary in a video. In surveillance tasks, this model helps to recognize those outlier objects in an area under monitoring. Set up a background model on mobile platforms (UAVs, intelligent cars, etc.) is a challenging task due camera motion when images are acquired. In this paper, we propose A 2 Ba, a robust method to support instabilities caused by aerial images fusing different information about image motion. We used frame difference as first approximation, then age of pixels is estimated. This latter gives us an invariability level of a pixel over time. Gradient direction of ages and an adaptive weight are used to reduce impact from camera motion on background modelling. We tested A 2 Ba simulating several conditions that impair aerial image acquisition such as intentional and unintentional camera motion. Experimental results show improved performance compared to baseline algorithms GMM and KDE.


Introduction
Visual surveillance has become an important task in the last years.Its main objective is to find changes over time in a sequence of images, specifically, detect objects that don't belong to a fixed scenario.Motion-related changes are most important but this task is linked with other higherlevel tasks such as localization, identification and tracking of moving objects.Therefore, detecting changes is considered as a key-preprocessing step of a video surveillance system.Activities in which visual surveillance is used include protection of strategic facilities, pedestrian detection, analysis of suspicious behavior, etc.
When we analyze a sequence of images used in surveillance, we can always observe two main features:  The first feature refers to a part of the image that remains temporarily constant, i.e. does not suffer any change unless there are moving objects or capturing device suffers a displacement intentionally or not.
 The second feature is about statistical majority formed by certain color intensities, like in an oversea image, predominant color will be blue.
Aforementioned characteristics describe an important part in image processing called background of an image.
Background modelling or subtraction (BS) is an important component in a video surveillance system because it helps to distinguish moving or incoming objects in a sequence of images.The main objective of BS algorithms is to classify all moving objects as foreground and then identify interesting areas of an image for further analysis.The

Research Article
simplest case for BS is assuming a fixed camera [1] [2][3] [4], however, this assumption rigorously limits the implementation of other computer vision algorithms.Nowadays, a lot of imagery sequences are done by platforms incorporating a mobile camera, like cellular phones, robots and unmanned aerial vehicles (UAVs).Therefore, it exists an increasingly necessity to develop algorithms to separate foreground objects from background using moving cameras for different tasks.
In any kind of images, there are many changes that happen over time and affect the performance of background modelling.It is vital for background modelling to be capable to efficiently handle these changes.Above can be achieved by being invariant or by adapting to changes.These changes can be local, small persisting moving objects belonging to images like waving trees or shadows [5][6], or global, like illumination changes or camera motion [7][8].Spatial and temporal information of pixels are two fundamental elements to understand background structure and these should complement each other to obtain a robust background model [9].Spatial information is related to the structure (histogram of intensities) of each image and temporal information to changes of a sequence of images over time.
In this paper, we present an extension of algorithm presented in [10].We propose a new background subtraction method designed to deal with particular conditions of aerial images.Our method is capable to deal with motion of UAV to extract background information.
We use spatial and temporal information to classify pixels between foreground and background.We used an adaptive method focused to weight spatial or temporal information depending if motion of UAV is smooth or not.
Experimental results of sequences under different conditions demonstrate robustness of A 2 Ba.Evaluation and comparison with existing methods show that A 2 Ba provides same results in motionless sequences and improved performance when sequences are unstable (camera motion).
The rest of this work is organized as follows: Section 2 presents a review of several methods for background modelling.Formulation and description of the proposed approach is presented in Section 3. Section 4 describes evaluation method and features of each dataset.The discussion of results and conclusions are made in Section 5.

Related work
A big number of BS methods have been proposed in recent years, trying to retrieve a background model using different approaches.Basically, classification is divided into two types of algorithms [11] parametric and non-parametric models.Parametric models represent a probability density function (pdf) parametrically.It uses a specific statistical distribution and its associated parameters, based on previous known distributions [12].In a different way, a non-parametric model does not need to fit data to a specific distribution, the pdf is estimated directly and thus avoids calculating all parameters related to a specific distribution.Sections below provide some works for each class.

Parametric background models
Many BS algorithms for fixed cameras work by comparing color or intensities of pixels amongst an incoming image and a reference.The first approaches of BS used frame difference [13][14] between two consecutives images, where major differences in intensity from the reference image were related to moving objects.Thresholding is applied to obtain background model.However, background obtained through this approach is not suitable in some sequences with abrupt illumination or motion changes.
The basic statistical method for background modelling consists in representing the color intensity at each pixel by a single Gaussian distribution.The Gaussian mixture model (GMM) [10] is probably the most popular statistical technique for background modelling.Stauffer and Grimson [15] proposed a broad view to the single Gaussian approach, by using an adaptive multiple Gaussians method that model pixel's color as a mixture of Gaussians [16][17] [18].The model is constructed as follows: (  ) = ∑  , * (  ,  , ,  , ) =1 (1) The intensity of a pixel is modeled by a mixture of K Gaussian distributions (Typically, K is around 3 to 5).Every new pixel is checked against existing K Gaussians (η), if difference between pixel and Gaussian is less than 2.5σ, the weight for that distribution is updated.Several modifications to original model has been proposed to GMM model, e.g.[19][20][21] etc.

Non-parametric background models
The main reference for non-parametric models is kernel density estimation (KDE) and proposed by [1] [22].Given a data sample  = {  } =1… with a specific pdf (), an approximation () is found with (2).
Where   is a kernel function, centered at data points in sample space and uniformly weighted.The kernel can be chosen from a wide range but typically a Gaussian kernel [23][24] is used for its continuity, differentiability and locale properties.Differentiate between choosing a Gaussian as a kernel function and fitting the distribution to a Gaussian model is a critical part to understand non-parametric models [22], KDE uses Gaussian only as a function to weight data points and does not makes assumptions about shape of density function.In conclusion, a pixel is considered a background pixel if () > , where ε is a global threshold that is equal for whole image and can be adjusted to obtain a desired proportion of false positives [25] [26].
The key motivation for using non-parametric density estimation for background modeling lies on robustness presented at outdoor scenarios.This kind of scenarios always contains various disturbances [27] such as ocean waves, waving trees, rain, moving clouds, etc. Modeling such persistent motion requires a more flexible representation of the background probability distribution at each pixel.Since non-parametric models are not based on a specific distribution, these can be updated according to characteristics of each image.

A 2 Ba: solution description
Typical background subtraction techniques are inefficient on images taken from a mobile observer due to low temporal consistency of most pixels.We propose a combination of temporal and spatial information.The workflow of the proposed method is depicted in Fig. 1.The first part of our algorithm consists to do a basic approximation of background model using an adaptive method proposed by [28].However, this model still includes outliers, which will be eliminated with further processing.
In a motionless sequence, the frame difference represents the easiest way to find all objects that move and are not part of the background model.So, we first identify all pixels that are not the same between two consecutive images, i.e. estimate the absolute difference between a past frame and an nth frame and then threshold to obtain a frame difference mask as follows: Frame difference mask is a binary image, where pixels that correspond to moving objects have the value 1, and remaining pixels are 0.
To prevent temporal variation on pixels generated by camera motion, the age of each pixel is used.Age refers to how many frames a pixel remains constant; we can infer that a pixel with a greater age has higher probability to belong to the background model.A 2 Ba establishes an upper limit of pixel age to eliminate excessive data storage.When a pixel reaches this limit, its age remains constant.Matrix containing the age of pixels is calculated according to (4) for each pair of consecutive images.
Then, we estimate the gradient direction of   as follows: The gradient direction provides reliable information about moving objects.First, the areas of image with the maximum possible age have same direction because each point in this area represents a local maximum (its direction is equal to zero).The gradient directions of moving pixels due camera motion, also have same direction because all pixels move in the same way and represent a significant quantity of the distribution of directions.Therefore, gradient directions of moving objects are minority because each one has its own direction.After thresholding sorted histogram of gradient directions, we can separate pixels with predominant gradient directions, which belong to the background.Gradient direction mask is computed as follows: The background image is subtracted from current image to obtain another representation of moving objects in the sequence.Background subtraction mask is formed as shown in (7).
Finally, a function to decide which image to sample is required to update background model.Frame difference mask and background subtraction mask can perfectly describe background model when camera is static, otherwise, model still has outliers.Hence we add gradient direction mask to strengthen our sampling function shown in (8).We apply OR function because gradient direction mask also contains common areas with the other masks, especially those motionless.
The thresholds used in (3), ( 6) and ( 7) are calculated according to thresholding method in [29], it is used to split a histogram into two classes and minimize variance of data at each class.
To update the background model, we need to calculate an instantaneous background using the motion mask.A position where motion mask is 1, indicates that this pixel is part of a moving object then we sample last version of background model.Conversely, when motion mask is 0, current frame is sampled to keep model correctly updated.Updating rules above mentioned are shown in (9).
The background model is a weighted average between instantaneous background and previous version of background model as follows: Weight used in (10) is estimated using an adaptive approach.If an image has high rate of static pixels, it means that we can prioritize instantaneous background and build model from it.Otherwise, the priority is the model that was built with previous versions.We should find the number of static pixels using the gradient direction mask, this is illustrated in (11) and (12).
The parameter α is set in a smooth manner to mitigate e.g.influence of an abrupt camera motion or light changes.Weight update is done as follows: The variables and parameters used for compute A 2 Ba are listed in Table 1 with a detailed description.

Experimental results
We have tested A 2 Ba on several scenarios involving similar conditions in aerial images acquisition, including unintentional and intentional motion of camera.The proposed method is implemented using MATLAB software on an Intel Core i5 PC with 8 GB RAM.
The evaluation was done using dataset provided by [30], this dataset provides a realistic and diverse set of videos, including a wide range of detection challenges.We chose four categories to evaluate our algorithm since these categories represent various conditions that affect Baseline category has videos with subtle background motion, which can represent when static UAV is acquiring images.In this category, background subtraction is easy to apply but not trivial, its main purpose is to serve as a reference.Camera jitter category is formed by sequences taken from a sensor affected by unintentional motion like climatological conditions and vibrations of vehicle.Motionless images are important in BS algorithms to avoid detection of fixed objects caused by unintentional motion of the camera.Bad weather category includes sequences with adverse climatological conditions, which affect BS algorithm because constantly change the structure of images.Lastly, PTZ category contains videos captured by pan-tilt-zoom cameras, which are commonly used for surveillance tasks.Images acquired from a PTZ camera can be represented as intentional motion of UAV, these sequences are challenging because motion must be removed to reduce misclassification of pixels.
We compare A 2 Ba against two commonly used methods, the Gaussian mixture model (GMM) [15] and kernel density estimation(KDE)[1].To evaluate performance of methods, we compute precision of each method as follows: Where: True positives: number of pixels that are correctly detected as background; False positives: number of pixels that are wrongly detected as background.
To obtain true and false positives, results of each method are compared to ground truth, i.e. correct classification result.The ground truth is also provided with each sequence in the dataset.We test all methods in eight different sequences, 2 sequences for each category aforementioned.Precisions of each sequence are shown in Fig. 2. The sequences "office" and "highway" are part of baseline category, which represent the simplest case for BS.In these sequences we obtained highest results.All methods show similar performances due to their constant background with subtle motion; results of this category are shown in Fig. 3.The sequences "zoomInZoomOut" and "intermittentPan" show low temporal consistency between consecutives images because a PTZ camera, which is constantly panning and zooming, is capturing images.However, proposed method outperforms the GMM and KDE significantly in all two sequences, BS results are given in Fig. 4. The sequences from category camera jitter are shown in Fig. 5, these have lowest performance of our algorithm but outperform traditional algorithms and get over intense camera motion.Finally, in Fig. 6, sequences "snowFall" and "skating" present a good performance dealing with climatological conditions, it is important because climatological conditions can change background model instantly, like a big storm or a snowfall.In A 2 Ba are several parameters that need to be estimated during BS processing, most of them using Otsu's method, which consists basically in a clustering-based image thresholding.This algorithm supposes two classes of pixels in the image, consequently a gray level image is reduced to a binary image and as a result we obtain the value that separate pixels in two classes.The next parameter, age of pixels, need to be defined before BS processing because generally UAVs have limited resources like data storage.In A 2 Ba, age of pixels is an important parameter due to gives to BS algorithm invariability against temporal changes caused by camera motion.However, it is not practical store age of pixels during its lifetime, we must establish an age limit to save resources and energy.We tested our algorithm in all sequences varying age until observe a stable behavior of precision of BS algorithm, this is shown in Fig. 7.We can observe that precision is proportionally increased in relation to age of pixels but when age is beyond to eight, precision remains almost constant.Therefore, we chose an age limit of eight because we obtain maximum precision in our proposed method and save storage since age can be stored using only three bits.
Concerning processing time, our method can estimate background around 20 fps in QVGA resolution and 11 fps in VGA resolution; these results are shown in Table 2.Although real time processing is not reached, most of pixels operations are independent and could be easily done in parallel in a specific platform like FPGA or GPU.

Discussion and conclusions
Nowadays, exists an increasingly effort to develop intelligent systems and vehicles to replace risky activities for human beings.An intelligent system or vehicle is defined as an entity that interacts with its environment, i.e. extract and process information through its sensors.However, intelligent systems and vehicles are deployed on non-controlled environments, like in a manufacturing chain, cars or aircrafts.Background subtraction is used in this type of systems to give a first approximation of how is formed an image captured through one of system sensors.A 2 Ba is ideally suited for aforementioned applications because was designed to improve background subtraction under conditions of excessive motion, a typical condition on industrial applications.
This work presents a novel method for background modelling with an adaptive update algorithm.Three types of information are fused to obtain a robust background model.The proposed method models variations according to age of pixels, i.e. we used those pixels with low variability over time.We tested A 2 BA in different sequences simulating aerial imagery conditions and from experimental results, we observe better performance than GMM and KDE algorithms in changing images.Shadow removal can be considered as future work to increase performance of proposed algorithm, since in current evaluation, pixels that are part of shadows are considered false positives and this affects performance evaluation.
Kernel density EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 05 -06 2015 | Volume 2 | Issue 4 | e3estimators whose main property is asymptotically converge to any probability density function, this property is used to model background.

Figure 1 .
Figure 1.Framework of the proposed method Endorsed Transactions on Industrial Networks and Intelligent Systems 05 -06 2015 | Volume 2 | Issue 4 | e3

F
. Sanchez-Fernandez et al.EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 05 -06 2015 | Volume 2 | Issue 4 | e3 performance of the background subtraction algorithms in aerial images.

Figure 2 .
Figure 2. Precision of our proposed method, GMM and KDE for all sequences.

Table 1 .
Variables and parameters used in A 2 Ba

Table 2 .
Processing time for different methods of BS