Contamination potential depends on complex interactions of hydro-geological variables. A large number of input variables add to redundancy, cost and time. The logistic regression, feature selection methods were used to identify critical variables in transporting contaminants in and through the soil profile. NN and SVM were used to identify contaminated wells. Variables used in this study included DRASTIC parameters, soil structure (pedality), hydrologic group, landuse, pH, organic matter and bulk density. Well data (nitrate-N) provided by FLDEP as part of the WSRP were used in this study as target class.
The objective of this study was three- fold: (a) Analyze the input variables and identify the most significant predictors of well contamination. Perform feature selection to identify the best subset of variables. (b) Use all the input variables with the NN and SVM to classify wells and compare their performances. (c) Repeat the above (step b) with the variable subset from step (a) and compare results.
Classifiers were compared based on their accuracies and parameters such as sensitivity and specificity. Free Receiver Operating Curves (FROCs) were used for evaluation of classifier performance.
Preliminary results show comparable results with the NN and SVM. Feature selection did not improve accuracy. However, it helped increase the sensitivity or the true positive rate (TPR). Thus, a higher TPR was obtainable with fewer features. In this study, higher TPR is desirable since the cost of detecting a contaminated well incorrectly is far higher than a non-contaminated well going undetected.
Back to Field Processes/Management Effects
Back to S01 Soil Physics
Back to The ASA-CSSA-SSSA International Annual Meetings (November 6-10, 2005)