Wednesday, 9 November 2005
16

Can Logistic Regression and/or Feature Selection Methods Be Used to Predict Contaminated Wells? a Case Study of Polk County, Florida.

Nivedita V. Candade, University of South Florida, 140, 7th Ave S., St. Petersburg, FL 33701 and Barnali Dixon, University of South Florida - St. Petersburg, Dept. of Environmental Science & Policy & Geography, 140 Seventh Ave South, 210 Davis Hall, St. Petersburg, FL 33701.

Detection of potentially contaminated wells is an important component of environmental protection and management. However, contamination potential mapping is not an easy task due to inherent uncertainties. This study aims at assessing suitability of various techniques in predicting contaminated wells for example logistic regression, feature selection, Neural Networks (NN) and Support Vector Machines (SVM).

Contamination potential depends on complex interactions of hydro-geological variables. A large number of input variables add to redundancy, cost and time. The logistic regression, feature selection methods were used to identify critical variables in transporting contaminants in and through the soil profile. NN and SVM were used to identify contaminated wells. Variables used in this study included DRASTIC parameters, soil structure (pedality), hydrologic group, landuse, pH, organic matter and bulk density. Well data (nitrate-N) provided by FLDEP as part of the WSRP were used in this study as target class.

The objective of this study was three- fold: (a) Analyze the input variables and identify the most significant predictors of well contamination. Perform feature selection to identify the best subset of variables. (b) Use all the input variables with the NN and SVM to classify wells and compare their performances. (c) Repeat the above (step b) with the variable subset from step (a) and compare results.

Classifiers were compared based on their accuracies and parameters such as sensitivity and specificity. Free Receiver Operating Curves (FROCs) were used for evaluation of classifier performance.

Preliminary results show comparable results with the NN and SVM. Feature selection did not improve accuracy. However, it helped increase the sensitivity or the true positive rate (TPR). Thus, a higher TPR was obtainable with fewer features. In this study, higher TPR is desirable since the cost of detecting a contaminated well incorrectly is far higher than a non-contaminated well going undetected.


Back to Field Processes/Management Effects
Back to S01 Soil Physics

Back to The ASA-CSSA-SSSA International Annual Meetings (November 6-10, 2005)