Comparative Study among Bivariate Statistical Models in Landslide Susceptibility Map

The main purpose of this paper is to compare the performance of bivariate statistical models i.e. Frequency Ratio, Weight of Evidence, and Information Value for landslide susceptibility assessment. These models were applied in Cianjur Regency, West Java Province (Indonesia), in order to map the landslide susceptibility and to rate the importance of landslide causal factors. In the first stage, a landslide inventory map and the input layers of the landslide conditioning factors were prepared in the Geographic Information System (GIS) supported by field investigations and remote sensing data. The 298 landslides were randomly divided into two groups of modeling/training data (70%) and validation/test data sets (30%). The landslide conditioning factors considered for the studied area were slope angle, elevation, slope aspect, lithological unit, and land use. Subsequently, the thematic data layers of conditioning factors were integrated by frequency ratio (FR), weight of evidence (WofeE), and information value (IV). Model performance was tested with receiver operator characteristic analysis. The validation findings revealed that the three models showed promising results since the models gave good accuracy values. The success rates of FR, WofE, and IV models were 0.920, 0.926, and 0.930, while the prediction rates of the three models were 0.913, 0.912, and 0.895, respectively. However, the FR model was proved to be relatively superior in estimating landslide susceptibility throughout the studied area.


Introduction
According to Landslide Inventory Database of Indonesia, from 2011 to 2015, almost 40% of landslides in Indonesia occur in West Java Province. Cianjur Regency with its prominent factors of landslide, highly weathered material (lithology), and the steep morphology is one of hotspots for landslide in West Java (Arifianti and Agustin, 2017). The accelerated population growth towards the landslide-prone areas caused the increasing of casualties by human-induced landslide hazard each year. A significant effort to reduce the number of losses was then carried out through landslide disaster mitigation. One of its activities is to conduct Landslide Susceptibility Assessment (LSA) as the basis of Landslide Susceptibility Map. LSA plays a significant part of landslide disaster mitigation, and has received more attention with the highest number of publications in international journals (Gokceoglu and Sezer, 2009).
Many studies have been carried out to assess landslide susceptibility, with increasing application of GIS using different models. Numerous methods have been used for landslide susceptibility assessment and mapping, which can be classified into two categories (i) qualitative and (ii) quantitative methods. Qualitative method is based on field observations and prior knowledge of experts in identifying judgment rules or assign weighted values for conditioning factor maps and which overlay them to produce a landslide susceptibility map, such as analytical hierarchy process (Ghosh, 2011;Kayastha et al. 2012;Mondal and Maiti, 2012). The quantitative method primarily refers to several statistical analyses, which can be categorized into bivariate statistical and multivariate analysis. This study was only using bivariate statistical analysis such as frequency ratio (Lee and Pradhan, 2006;Vijith and Madhu, 2007;Constantin et al. 2011;Mezughi et al. 2011;Regmi et al. 2014), information value model (Yin and Yan, 1988;Lin and Tung 2004;Sarkar et al. 2008;Conforti et al. 2011;Zhu et al. 2014), and weight of evidence model (Poli and Sterlacchini, 2007;Dahal et al. 2008;Sharma and Kumar, 2008;Kayastha et al. 2012;Chen and Li, 2014;Teerarungsigul et al. 2015). Geomatics by taking advantage of modern tools, such as Geographic Information System (GIS) and Remote Sensing (RS) provide a perfect opportunity for using, validating, and comparing different methods to produce a landslide susceptibility map (Vakhshoori and Zare, 2016). Some studies have applied and compared two or more methods to the same region (Pradhan and Lee, 2010a;Ercanoğlu and Temiz, 2011;Yalcin et al., 2011;Regmi et al., 2014;Vakhshoori and Zare, 2016;Akıncı et al. 2017;Chen et al. 2019).
Based on its physiography, West Java Province is divided into four zones, viz; Jakarta Coastal Plain, Bogor, Bandung or Central Depression  Zone, and West Java Southern Mountain Zone (Van Bemmelen, 1949). Cianjur area is situated in Bandung Zone with mainly morphological features of steep hills, and the predominant lithology is Quaternary volcanic products.

Methods and Materials
The landslide sampled as a homogen georeferenced point (Poli and Sterlacchini, 2007;Neuhäuser et al., 2011;Ozdemir, 2011;Tien Bui et al., 2012;Xu et al., 2014). An approach called seed cell was used to indicate the occurrence or non-occurrence of landslides. Seed cell is a neighborhood analysis (spatial analysis tool) to select landslide pixels within a buffer zone along the crown and flanks. It is a method to describe a prefailure conditions, the undestroyed morphological conditions before the landslides occurred (Süzen and Doyuran, 2004;Nefeslioglu et al., 2008;Bai et al., 2010;Dou et al., 2015;Hussin et al., 2015).
The total of 298 landslide points in Cianjur area were compiled and mapped into the landslide inventory map. The landslide points as the seed cells were used to build the models. The points were randomly divided into 196 points (70%) as a training dataset for building process model. The other 89 points (30%) as a test/validation dataset were not used in building process model, but were used for validation purposes.
In this study, conditions considered as the primary factors were selected in the occurrence of a landslide in the studied area. There were the set of five landslide-related factors which were used and defined as conditioning factors. These conditioning factors are slope angle, slope aspect, elevation, lithological unit, and land use ( Table 1). The factors were converted to raster maps of grid size of 15 x 15 m with a spatial resolution of 15 x 15 m. The relevant data and its analysis for this study were collected and processed in a GIS-environment using ArcGis 10.6 programmes.
In this study, Frequency Ratio, Weight of Evidence, and Information Value models were applied on landslide susceptibility assessment to generate Landslide Susceptibility Maps (LSMs) of the studied area using the five landslide conditioning factors. All LSMs were classified into four landslide susceptibility zones based on the landslide distribution percentage of the total populated as very low (0% -5%), low (5% -10%), moderate (10% -75%), and high (> 75%) (SNI, 2016).

Frequency Ratio (FR)
The FR is one of probability models which is based on observed spatial relationships between landslide distribution and each conditioning factor related to landslides Lee, 2010a, 2010b;Choi et al., 2012;Mohammady et al., 2012;Park et al., 2013;Pardeshi et al., 2013). FR is the ratio of landslides (the probability of an occurrence and a nonoccurrence) in a desired class (given attributes) as a percentage of all landslides (%Ld) to the area of the class as a percentage of the entire map (%Cd): The landslide susceptibility index (LSI) for each pixel or each factor ratio (Lee and Min 2001) is the summation of total overlapped pixels. It is formulated as:

Weight of Evidence (WofE)
The theory of evidence (Weight of Evidence) is a log-linear version on the theorem of Bayes used to calculate probability based on the concept of prior (P) and posterior probability (Agterberg et al., 1993;Elmoulat et al., 2015). This approach is based on the information obtained from the interrelation between landslide conditioning factors and the landslide distribution (Barbieri and Cambuli, 2009;Pardeshi et al., 2013). The landslide conditioning factors are the input parameters for the WofE approach and to provide the information which may control the occurrence of areas prone to landslides (Arifianti and Agustin, 2017). The contrast of the weight (C) is added to define how significant the overall spatial association between the landslide conditioning factors and the landslide distribution (Dahal et al., 2008, Neuhäuser et al., 2011. The contrast value is calculated as the difference of positive and negative weights (Ozdemir, 2011):

Information Value Model (IVM)
The IVM is a statistical approach that has the advantage of assessing landslide susceptibility in an objective way. The IVM is used to calculate the weight for each class of factor layer by rationing landslide density of each class to the landslide density of the total area. In general, the landslides will occur in the future that has the same condition as the past landslides (Lee and Pradhan, 2006).
The IVM model is used to evaluate the spatial relationship between the conditioning factor classes and the probability of landslide occurrence. The higher value of IVM corresponds to the stronger relationship between the probability of landslide occurrence and the conditioning factor class. The IV model can be calculated as follows (Yin and Yan, 1988;Zhu et al., 2004;Wang et al., 2014):  6) where: S i is the number of landslides containing factor class (i), A i is the area of factor class (i), S is total number of landslides, and A is the total area of the entire study.

Validation of Landslide Susceptibility Models
The validation of LSMs based on statistical methods reveals the reliability of the modelling processes. It is to compare the accuracy of different models and the choice of their parameter variables. The 'Area Under Curve' (AUC) of the 'Receiver Operating Characteristics' (ROC) method was performed for the validation. The success rate curve used the training dataset (70% of the whole set) to determine how well the resultant maps had classified the areas of existing landslides (Chung and Fabbri, 1999;Chen et al., 2017). The prediction rate curve using the validation dataset (30% of the whole set) can explain how well the models and conditioning factors predict the future landslides (Chung and Fabbri, 2003;Pradhan and Lee, 2010a). The model accuracy ratings are usually given as 0.9 -1.0 = excellent, 0.8 -0.9 = good, 0.7 -0.8 = acceptable, 0.6 -0.7 = poor, and 0.5 -0.6 = failed (Yilmaz, 2009).

Spatial Relationship between Conditioning Factors and Landslides
The conditioning factors classified into several classes and weights were assigned to I J O G Comparative Study among Bivariate Statistical Models in Landslide Susceptibility Map (Y. Arifianti et al. ) 55 them for FR, WofE, and IV methods as shown in Figure 2. The spatial relationship between the conditioning factors and landslides is presented in Table 2.

57
The spatial relationship between landslide occurrence and its conditioning factors using the three models indicates a relative similar susceptibility of each class. The most susceptible classes of the slope angle are 7° -18°, 18° -24°, and 24° -33°. The models show that the landslide probability increases with the slope angle. This defined as a strong correlation between the slope angle and the landslide occurrence.  In the case of slope aspect and elevation factors, the models depicted the highest susceptible classes is the northeast facing slope with the elevation of 500 -1,000 m a.s.l. The frequency of landslides is relatively lower on the south direction, with the exception in the flat areas. This means the two factors have less correlation with the landslide occurrence and elevation than the slope angle.
The result from lithology factor indicates that the most susceptible classes were (1) breccia, lava, tuff, and conglomerate from old volcanic sediments of Pasir Menteng, (2) clay, marl, and quartz sandstone from Rajamandala Formation, (3) breccia and lava from old volcanic deposits, and (4) old volcanic lava deposits. These four lithological units are most prone to landslides in the studied area. The land use factor has an approximately similar susceptibility on the three models. It shows the highest susceptible is in the vicinity of settlement, shrubs, rain-fed rice fields, agricultural areas, and forestry region.

Comparison and Validation
Landslide susceptibility maps were constructed from bivariate statistical analysis using the FR, WofE, and IV models. The LSMs obtained from three models were divided into four zones using the quantile method in ArcGis: very low, low, moderate, and high ( Figure 3).

59
The areas and the seed cells in the very low, low, moderate, and high in LSM of each models are shown in Table 3. Most of the landslides in shows the majority of the seed cells which are in moderate to high susceptibility zones.
Finally, the AUC of the ROC method was applied in order to reveal which model is more accurate in this study. The AUC was obtained for both the training dataset and the validation dataset ( Figure 4). The AUCs value of success rates based on training dataset are 0.92 for the FR model, 0.926 for the WofE model, and 0.93 for the IV. The AUCs value of the prediction rate based on the validation dataset for the FR, WofE, and the IV models are 0.913, 0.912, and 0.895, respectively. The result for the success rate and prediction rate curve shows that all the three models exhibit a similar performance. The models are found to have an excellent fit to the data with a slight difference where the IV model is the best one with the model accuracy of 93%, followed by WofE with 92.6%, and FR with 92%. It means the IV model produced the most accurate landslide susceptibility map in the studied area. In contrast, the model with the highest prediction ability is FR model with the prediction accuracy of 91.3%, followed by WofE with 91.2%, and IV model with 89.5%. It means the FR model showed the best accuracy in predicting the landslide susceptibility of the studied area.

Conclusions
It is observed in Table 3, that the moderate to high susceptible zones of the LSMs produced by the FR, WofE, and IV model cover 46.83%, 46.12%, and 50.03% of the studied area, respectively. These covered areas are the most landslide-prone regions that should be considered in a susceptibility management. Preferably in the vicinity of settlement, shrubs, rain-fed rice fields, agricultural, and forestry area, with a slope angle between 18° -33° and the elevation of 500 -1,000 m a.s.l.
Although this bivariate statistical models, using the term "favourability values" by Chung et al. (1995) were applied in the conditioning factors (e.g., slope units, litholigical units, etc.) for better values to the expert's opinion, the selection of the models and the landslide related factors was based on a consideration within expert's scientific knowledge. This knowledge-base component was applied for finding the relevance, availability, and scale of data for the studied area.
According to the result given, the success rates and prediction rates of the three models are above 89% (Figure 4). The result reveals that the landslide susceptibility map of each model in this study has succesfully achieved a high degree of reliability. The LSMs of the models will provide spatial-based decision making for the of goverment Cianjur Regency and other associated authorities and agencies.