The Application of GIS – Based Logistic Regression And Frequency Ratio Approaches For Landslide Susceptibility Assessment. A Case Study of Souk Ahras Region, N E Algeria.
Fatna Mahdadi 1*, Abederrahmane Boumezbeur 2.
1 Geology and Environment Laboratory, Department of Geology, University of Constantine1, Constantine, Algeria. [email protected]
2 Department of Geology, Sciences Faculty, University of Tebessa, Tebessa, Algeria.
Landslide susceptibility assessment (LSA) is carried out using various statistical modeling techniques among which figures the logistic regression (LR) and the Frequency Ratio (FR) models. This work allowed to produce a landslide susceptibility maps (LSMs) on a geographic information system (GIS) platform using LR and FR methods in the Northwest of Souk Ahras region, N E of Algeria. Landslide inventory map was established from visual interpretation of satellite images and field survey data. Slope instability phenomena in this region are related to a large variety of factors pertaining to the geological, geomorphological, hydrological and climate characteristics of the terrain. Consequently, a spatial database of seven causal factors were identified and used for predicting landslide prone areas. LSMs produced using LR and FR statistical models subdivided into five classes according to their degree of susceptibility to landslides: very low, low, moderate, high and very high. These raster based LSMs was compared and verified with both training and testing inventory datasets. The AUC (area under the curve) was used for model evaluation. Results showed that the LR model provides a higher prediction accuracy of the LS mapping than the FR model with an AUC based on success rate equal 90.45 % (0.9045) and that based on prediction rate was 91.81 % (0.9181). In addition, the results showed that about 30 % to 37% of the study area was located in high and very high hazard classes. The resulting LSMs play an indispensable role in the region management and can be used in sustainable development planning.
Keywords: statistical modeling, geographic information system (GIS), landslide inventory, Souk Ahras region, landslide – related factors.
Landslides are natural processes; they cause a great deal of damage to man and his environment especially in rapidly growing population areas of the less developed countries.
Recently with the development of computer technologies, GIS can play an important role in landslide prediction; it has a distinct advantage of storage, analyze and display of results in a large amount of data, either directly from the field or from remote sensing techniques, to predict the slope stabilities within the area.
In the literature, a various statistical methods were used in the field of LSA. Such techniques are logistic regression (Jacobs et al., 2018), analytical hierarchy process (Achour et al., 2017), weight of evidence (Teerarungsigul et al., 2016), frequency ratio (Youssef et al., 2015), and many more. These approaches have been successfully applied by several researchers such as Lee and Sambath, 2006; Pradhan et al., 2010; Greco and Sorriso-Valvo, 2013; Sivakami and Sundaram, 2014; Chen et al., 2016; Hadji et al., 2016, Le et al., 2017, using the GIS software for handling the geospatial database.
As a case study, a part of the northwest of Souk Ahras region, N E Algeria, which is one of the most areas exposed to landslide phenomenon in our country, was selected for LSA on a Pixel-based mapping unit.
Souk Ahras is a mountainous region, it known by the widespread occurrence of landslides. Their study requires that geomorphological, geological and hydro – climatic factors likely to affect the slope stability should be considered altogether at the same time with a characteristic weight for each factor.
For this study, 07 common causative parameters were produced for the LS analyses such as: slope angle, elevation, slope aspect, lithological units, distance to river, NDVI, and rainfall events, to prepare LSM using a LR and FR statistical approaches.
The accuracy of the LR and FR models was evaluated using the ROC (receiver operating characteristic) curve and the AUC (area under the curve) parameter. Data processing and modeling have been done using Arc Map 10.4 and XLSTAT – Pro 7.5 software. The results revealed that about 30 to 37 % of the study area was located in high and very high susceptibility classes. The resultant LSMs play an indispensable role in the region management and it can be used in sustainable development planning.
Souk Ahras region is located in the extreme East of Algeria. It occupies an area of 4 360 km². In this work, the study area is located in the Northwest part of Souk Ahras region (figure 1). It was selected for landslide susceptibility assessment and the establishment of a susceptibility maps. It lies between latitude 36°11’6,16”N – 36°5’18,352”N and longitude 7°27’56,89”E – 7°18’54,91”E. It covers an area of 73 km2 (Fig. 1a). It is a mountainous region that is part of the Tellian mountain belt, with slopes ranging from 0° to more than 66°. The altitude decreases from northeast to southwest between the values of 675 m and 1283 m.
The climate is sub-humid Mediterranean type; characterized by a cold and wet winter against a hot and dry summer, with annual precipitation between a low of 428 mm to a high of 460 mm.
Geological study reveals that this region is essentially formed by sedimentary rocks (figure 2a).The upper Cretaceous formations represented by an alternation of limestone and marl – limestone. A predominantly marly Miocene cover, with some sandstone and conglomerate, the majority of the study area. The Plio – Quaternary constituted by alluvial deposits, sandstones, puddings and gravels.
Materials and Methods
The susceptibility assessment of natural environment disasters such as landslides depends on a good knowledge and deep understanding of the interplay of the causative factors to bring about instabilities. It is precisely this good knowledge which allows an accurate prediction of land elements to LS mapping.
In the statistical approaches, all the landslide conditioning factors that could be mapped, are entered into Arc Map 10.4 software and converted from vector to raster thematic maps. Subsequently, an overlay approach is adopted to derive the frequency statistics of each factor map compared to the landslide inventory map.
The study area occupies 1 098 rows and 1 376 columns with a total of 729 429 pixels and a pixel size of 10 m × 10 m. In this paper, the first step of the data gathering is the preparation of landslide inventory map, covering approximately 4 628 pixels from all the study area.
In addition, a 07 thematic data layers corresponding to geomorphological, geological and environmental causal factors, as we previously mentioned, were designed to evaluate the relationship between existing landslides and these factors in order to obtain weight values for each parameter using the statistical methods FR and LR, which will facilitate the preparation and evaluation of the LSMs of the region (Lee and Sambath, 2006).
The database used in this study includes the previous landslide locations recorded in the region and thematic maps of seven major causal factors expressing within thematic layers in GIS platform.
The inventory map is the first and the most important thematic layer in LSA procedure. A total of 90 landslides was mapped from the interpretation of satellite images, previous reports and validated by several field surveys conducted during the years 2015 – 2018. The produced map was also converted to a raster format at 10 m pixel size. This map assisted in the creation of training data set of 3 471 pixels (approximately 75 % of total landslide area), and testing data set of 1 157 pixels (approximately rest 25 % of total landslide area) were used as validating for the models. Landslide distribution in the study area is shown in figure 1a.
Geology is considered to be the most important factor in the occurrence of landslides (Yesilnacar and Topal, 2005; Yalcin et al., 2011). Eleven types of lithological units (figure 2a) have been digitized on the basis of Sedrata geological map at the scale of 1: 50 000, produced by the Algerian geological survey. They lead to variation in strength and slope stability.
Slope angle is a main cause of slope failure predisposition in mountainous regions (Hadji et al., 2016). In theory, it is assumed that the susceptibility is greater if the slope is steep. In this work, slope angle gradient is derived from the DEM over a regular 10 × 10 m grid, where the slopes vary from 0 to 66°. The slope map (figure 2b) is then divided into five categories: 35°.
Rainfall is a determining factor in the erosion process responsible for triggering gravity driven down slope movements. In this work, the precipitation factor was presented within a thematic layer using the average annual precipitation (figure 2c); it was reclassified into three classes: 428 – 437 mm/year, 437 – 447 mm/year and 447 – 460 mm/year.
Elevation is one of the most important parameters responsible for the landslide occurrence in mountainous areas (Conforti et al., 2014). In theory, the LS is proportional to the elevation which is directly related to precipitation in different forms as well rainfall and snow. The elevation map (figure 2d) presents five classes: 675 – 800 m, 800 – 900 m, 900 – 1000 m, 1000 – 1100 m, 1100 – 1283 m.
Slope aspect is considered also as an important predisposing causal factor. Results from previous research have shown that there is a link between the slope aspect and it’s prone towards landslide (Hadji et al., 2016). It can influence the landslides distribution by controlling the tectonic fractures orientation and the soil moisture concentration (Hadji et al., 2016). The slope aspect map has been derived also from the DEM and subdivided into nine classes (figure 2e) such as: flat, north, northeast, east, southeast, south, southwest, west, and northwest.
Hydrographic network map provides an information about a distribution of unstable areas by modifying the soils behavior with ravines erosion, which can trigger the break of the slopes that can sometimes cause soil movements; hence where the need to designate the buffer zones, the distance between the drain and the vulnerable zone was measured using a multiple buffer analysis with 50 m interval (figure 2f), presented in seven classes: 500 m.
The normalized differential vegetation index (NDVI) is a determining factor in slope stability, used to indicate a plant cover in an area (Yusof and Pradhan, 2014). In general, the relatively low vegetation coverage can lead to landslide incidence. In this study, a Landsat satellite image was used to calculate the NDVI values (figure 2g) using the following formula:
NDVI = (RI – R) / (RI + R)……………………………………………………….. (1)
Where: RI indicate the value of the infrared portion of electromagnetic spectrum, and R is the value of the red portion of electromagnetic spectrum. The produced map presents three classes: – 0.185425416 – 0.188091246, 0.188091246 – 0.295223932 and 0.295223932 – 0.552921474.
Landslide Susceptibility Mapping
Frequency Ratio model
The frequency ratio (FR) analysis method (Lee and Min 2001) is one of the bivariate statistical methods frequently used for calculating the probabilistic relationship between landslides and landslide conditioning factors. In this work, the FR was calculated for all the class of the seven factors used in the landslide susceptibility mapping by dividing the ratio of the landslide occurrence to the area ratio. The FR of different parameter classes are given in Table 1.
The landslide susceptibility index (LSI) was calculated (equation 2) by summing all the FR of the conditioning factors (Lee and Talib 2005):
LSI = FR1 + FR2 +…. + FR n ………………………………………………………….. (2)
Where: n constitutes the total number of factors. A landslide susceptibility map is prepared by combining each causative factor with its frequency ratio value. The final LSM was divided into five categories: very low, low, moderate, high, and very high (figure 3a).
Logistic regression model
Logistic regression (LR) method is a multivariate statistical technique used for study and prevention of landslide susceptibility. The LR model representing the maximum likelihood regression can be expressed with the following form:
P= 1/(1+ e^(-z) ) ………………………………………………………………. (3)
Where: P presents the estimated probability of landslide occurrence, varies from 0 to 1; z represents the linear combination of the causal factors and varies from – ? to + ?. It is defined by the following equation:
Z = B0 + B1X1 + B2X2+… BnXn ……………………………………………………… (4)
Where: B0 is the intercept and n is the number of independent variables. Bi (i = 0, 1, 2 …, n) represent the regression coefficients of the independent variables, and Xi (i = 0, 1, 2 …, n) are the independent variables.
In this paper, the classes of the seven chosen independent variables as mentioned earlier, were normalized in the range of 0, 1, and two dependent variables which were expressed in binary format, representing the presence or absence of landslides (1 or 0); were converted into dbf format and included in XLSTAT – Pro 7.5 statistical software, in order to evaluate the relationship between landslide events and landslide causal factors using the following equation:
P = 1 / (1 + EXP (- (- 0.441247659273368 + (-0.0861714381342469 x Lithological units) + (0.203331123854193 x slope angle) + (-0.0449963579116557 x Rainfall) + (0.273420719917235 x elevation) + (0.0436096196834892 x distance to river) + (- 0.13323468675971 x NDVI) + (- 0.0217731355150329 x slope aspect)))).
To delineate areas where landslides can occur, we have used raster calculator option of the analysis module of ARC GIS 10.4 software, using weights of the individual factor maps and summing them to obtain a total weight map (figures 3b). The values of the susceptibility have been divided into five classes as very low, low, moderate, high and very high.
Validation of Landslide Susceptibility Maps
The mapping results were verified using the ROC curves. It is a diagram in which the cumulative percentage of decreasing LSI is plotted against the cumulative percentage of observed landslide occurrence. In this method, the area under the ROC curve (AUC), which contains values ranging from 0.5 to 1.0, is used to check the prediction performance of the model.
In this study, the ROC curve of the LR and FR models using training and testing data sets (figure 5a and 5b) showed respectively that the AUC value is 0.9045 (90.45 %), 0.8670 (86.70 %) for the success rate curve and 0.9181 (91.81%), 0.8804 (88.04%) for the prediction rate curve. These results represent a good agreement between the spatial distribution of landslide events and the LSMs produced; which reveal that the models used in this work have high accuracy in predicting the potential locations of future landslides in the study area.
Discussion and Conclusions
Landslide susceptibility mapping is a very essential procedure in delineating the areas prone to this phenomenon. Recently, many statistical techniques and approaches based on computer technology, GIS and remote sensing have been used by many researches in order to prepare LSMs.
In this study, a statistical approaches based on LR and FR models were used for evaluating and mapping the landslide susceptibility in the Northwest of Souk Ahras region, with a total area of 73 km2, using GIS technology. It is a mountainous area located in the Northeast of Algeria, its environ is frequently subjected to landslides in different masses, mainly affected by the interplay of several landslide influencing parameters include: lithological units, elevation, slope aspect, slope angle, distance to river, NDVI, and rainfall events.
The landslide data analysis of our study area using LR and FR models are reflected in the production of a LSMs shown in figure 3a and 3b, which was classified into five susceptible zones according to the degree of their susceptibility: very high, high, moderate, very low and low. Resulting maps is compared with both of training and testing data sets of the landslide inventory, for evaluating their performance based on the values of areas under the ROC curve (AUC) method.
The LSMs analysis show that the high susceptibility sites located in the northeast and the south parts of the study area. It was found that more than 45 % of landslides occur on slopes between 15° and 35°, they affect mainly scree slope with marly gangue, alluvium and Miocene marls.
The susceptibility maps generated from the two statistical models used in our study look the same with minor differences. The map obtained from LR model shows that the susceptibility to landslide in the study area is distributed as follow: high and very high susceptible areas represent (47.39 %) , moderate (24.09 %) and the rest presents a low to very low susceptibility which means no landslide is likely to occur. Whereas the FR model gives: 37.59 % as high and very high susceptible areas, 24.57 % as a moderate susceptible, and the rest presents a low to very low landslide susceptibility. Field evidence and statistical validation show that LR model is more reliable than the FR model.
The produced susceptibility maps could constitute a good document that can be used to predict any future potential hazard inherent to any type of urban extension, road network development as well as any other activity involving earth work.