Rights Contact Login For More Details
- Wiley
More About This Title Statistical Applications for Environmental Analysis and Risk Assessment
- English
English
Statistical Applications for Environmental Analysis and Risk Assessment guides readers through real-world situations and the best statistical methods used to determine the nature and extent of the problem, evaluate the potential human health and ecological risks, and design and implement remedial systems as necessary. Featuring numerous worked examples using actual data and “ready-made” software scripts, Statistical Applications for Environmental Analysis and Risk Assessment also includes:
• Descriptions of basic statistical concepts and principles in an informal style that does not presume prior familiarity with the subject
• Detailed illustrations of statistical applications in the environmental and related water resources fields using real-world data in the contexts that would typically be encountered by practitioners
• Software scripts using the high-powered statistical software system, R, and supplemented by USEPA’s ProUCL and USDOE’s VSP software packages, which are all freely available
• Coverage of frequent data sample issues such as non-detects, outliers, skewness, sustained and cyclical trend that habitually plague environmental data samples
• Clear demonstrations of the crucial, but often overlooked, role of statistics in environmental sampling design and subsequent exposure risk assessment.
- English
English
Joseph Ofungwu, PhD, is an environmental professional with over eighteen years of hands-on experience in environmental practice, including contaminant impact analysis, human health and ecological risk assessment, pollutant fate and transport modeling in ambient air, soil, ground and surface water. Dr. Ofungwu is also Visiting Assistant Professor with the Urban Environmental Systems Management Program at Pratt Institute and teaches statistics courses for professional engineer license maintenance requirements.
- English
English
PREFACE xvii
ACKNOWLEDGMENTS xix
1 INTRODUCTION 1
1.1 Introduction and Overview 1
1.2 The Aim of the Book: Get Involved! 2
1.3 The Approach and Style: Clarity, Clarity, Clarity 3
PART I BASIC STATISTICAL MEASURES AND CONCEPTS 5
2 INTRODUCTION TO SOFTWARE PACKAGES USED IN THIS BOOK 7
2.1 R 8
2.1.1 Helpful R Tips 9
2.1.2 Disadvantages of R 10
2.2 ProUCL 10
2.2.1 Helpful ProUCL Tips 11
2.2.2 Potential Deficiencies of ProUCL 12
2.3 Visual Sample Plan 12
2.4 DATAPLOT 13
2.4.1 Helpful Tips for Running DATAPLOT in Batch Mode 13
2.5 Kendall–Thiel Robust Line 14
2.6 Minitab® 14
2.7 Microsoft Excel 15
3 LABORATORY DETECTION LIMITS, NONDETECTS, AND DATA ANALYSIS 17
3.1 Introduction and Overview 17
3.2 Types of Laboratory Data Detection Limits 18
3.3 Problems with Nondetects in Statistical Data Samples 19
3.4 Options for Addressing Nondetects in Data Analysis 20
3.4.1 Kaplan–Meier Estimation 21
3.4.2 Robust Regression on Order Statistics 22
3.4.3 Maximum Likelihood Estimation 23
4 DATA SAMPLE, DATA POPULATION, AND DATA DISTRIBUTION 25
4.1 Introduction and Overview 25
4.2 Data Sample Versus Data Population or Universe 26
4.3 The Concept of a Distribution 27
4.3.1 The Concept of a Probability Distribution Function 28
4.3.2 Cumulative Probability Distribution and Empirical Cumulative Distribution Functions 31
4.4 Types of Distributions 34
4.4.1 Normal Distribution 34
4.4.2 Lognormal, Gamma, and Other Continuous Distributions 49
4.4.3 Distributions Used in Inferential Statistics (Student’s t, Chi-Square, F) 53
4.4.4 Discrete Distributions 57
Exercises 64
5 GRAPHICS FOR DATA ANALYSIS AND PRESENTATION 67
5.1 Introduction and Overview 67
5.2 Graphics for Single Univariate Data Samples 68
5.2.1 Box and Whiskers Plot 68
5.2.2 Probability Plots (i.e., Quantile–Quantile Plots for Comparing a Data Sample to a Theoretical Distribution) 72
5.2.3 Quantile Plots 79
5.2.4 Histograms and Kernel Density Plots 82
5.3 Graphics for Two or More Univariate Data Samples 86
5.3.1 Quantile–Quantile Plots for Comparing Two Univariate Data Samples 86
5.3.2 Side-by-Side Box Plots 89
5.4 Graphics for Bivariate and Multivariate Data Samples 91
5.4.1 Graphical Data Analysis for Bivariate Data Samples 91
5.4.2 Graphical Data Analysis for Multivariate Data Samples 95
5.5 Graphics for Data Presentation 98
5.6 Data Smoothing 105
5.6.1 Moving Average and Moving Median Smoothing 105
5.6.2 Locally Weighted Scatterplot Smoothing (LOWESS or LOESS) 108
Exercises 113
6 BASIC STATISTICAL MEASURES: DESCRIPTIVE OR SUMMARY STATISTICS 115
6.1 Introduction and Overview 115
6.2 Arithmetic Mean and Weighted Mean 116
6.3 Median and Other Robust Measures of Central Tendency 117
6.4 Standard Deviation, Variance, and Other Measures of Dispersion or Spread 119
6.4.1 Quantiles (Including Percentiles) 121
6.4.2 Robust Measures of Spread: Interquartile Range and Median Absolute Deviation 124
6.5 Skewness and Other Measures of Shape 124
6.6 Outliers 134
6.6.1 Tests for Outliers 135
6.7 Data Transformations 139
Exercises 141
PART II STATISTICAL PROCEDURES FOR MOSTLY UNIVARIATE DATA 143
7 STATISTICAL INTERVALS: CONFIDENCE, TOLERANCE, AND PREDICTION INTERVALS 145
7.1 Introduction and Overview 145
7.2 Confidence Intervals 146
7.2.1 Parametric Confidence Intervals 151
7.2.2 Nonparametric Confidence Intervals Around the Mean, Median, and Other Percentiles 154
7.2.3 Parametric Confidence Band Around a Trend Line 164
7.2.4 Nonparametric Confidence Band Around a Trend Line 166
7.3 Tolerance Intervals 168
7.3.1 Parametric Tolerance Intervals 169
7.3.2 Nonparametric Tolerance Intervals 170
7.4 Prediction Intervals 173
7.4.1 Parametric Prediction Intervals for Future Individual Values and Future Means 175
7.4.2 Nonparametric Prediction Intervals for Future Individual Values and Future Medians 176
7.5 Control Charts 178
Exercises 178
8 TESTS OF HYPOTHESIS AND DECISION MAKING 181
8.1 Introduction and Overview 181
8.2 Basic Terminology and Procedures for Tests of Hypothesis 182
8.3 Type I and Type II Decision Errors, Statistical Power, and Interrelationships 190
8.4 The Problem with Multiple Tests or Comparisons: Site-Wide False Positive Error Rates 193
8.5 Tests for Equality of Variance 195
Exercises 199
9 APPLICATIONS OF HYPOTHESIS TESTS: COMPARING POPULATIONS, ANALYSIS OF VARIANCE 201
9.1 Introduction and Overview 201
9.2 Single Sample Tests 202
9.2.1 Parametric Single-Sample Tests: One-Sample t-Test and One-Sample Proportion Test 203
9.2.2 Nonparametric Single-Sample Tests: One-Sample Sign Test and One-Sample Wilcoxon Signed Rank Test 205
9.3 Two-Sample Tests 208
9.3.1 Parametric Two-Sample Tests 210
9.3.2 Nonparametric Two-Sample Tests 216
9.4 Comparing Three or More Populations: Parametric ANOVA and Nonparametric Kruskal–Wallis Tests 227
9.4.1 Parametric One-Way ANOVA 228
9.4.2 Nonparametric One-Way ANOVA (Kruskal–Wallis Test) 235
9.4.3 Follow-Up or Post Hoc Comparisons After Parametric and Nonparametric One-Way ANOVA 238
9.4.4 Parametric and Nonparametric Two-Way and Multifactor ANOVA 244
Exercises 255
10 TRENDS, AUTOCORRELATION, AND TEMPORAL DEPENDENCE 257
10.1 Introduction and Overview 257
10.2 Tests for Autocorrelation and Temporal Effects 258
10.2.1 Test for Autocorrelation Using the Sample Autocorrelation Function 259
10.2.2 Test for Autocorrelation Using the Rank Von Neumann Ratio Method 261
10.2.3 An Example on Site-Wide Temporal Effects 264
10.3 Tests for Trend 265
10.3.1 Parametric Test for Trends—Simple Linear Regression 266
10.3.2 Nonparametric Test for Trends—Mann–Kendall Test and Seasonal Mann–Kendall Test 271
10.3.3 Nonparametric Test for Trends—Theil–Sen Trend Test 273
10.4 Correcting Seasonality and Temporal Effects in the Data 279
10.4.1 Correcting Seasonality for a Single Data Series 280
10.4.2 Simultaneously Correcting Temporal Dependence for Multiple Data Sets 281
10.5 Effects of Exogenous Variables on Trend Tests 282
Exercises 285
PART III STATISTICAL PROCEDURES FOR MOSTLY MULTIVARIATE DATA 287
11 CORRELATION, COVARIANCE, GEOSTATISTICS 289
11.1 Introduction and Overview 289
11.2 Correlation and Covariance 290
11.2.1 Pearson’s Correlation Coefficient 292
11.2.2 Spearman’s and Kendall’s Correlation Coefficients 294
11.3 Introduction to Geostatistics 300
11.3.1 The Variogram or Covariogram 300
11.3.2 Kriging 302
11.3.3 A Note on Data Sample Size and Lag Distance Requirements 311
Exercises 312
12 SIMPLE LINEAR REGRESSION 315
12.1 Introduction and Overview 315
12.2 The Simple Linear Regression Model 316
12.2.1 The True or Population X–Y Relationship 317
12.2.2 The Estimated X–Y Relationship Based on a Data Sample 320
12.3 Basic Applications of Simple Linear Regression 324
12.3.1 Description and Graphical Review of the Data Sample for Regression 324
12.4 Verify Compliance with the Assumptions of Conventional Linear Regression 332
12.4.1 Assumptions of Linearity and Homoscedasticity 332
12.4.2 Assumption of Independence 334
12.4.3 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers 337
12.5 Check the Regression Diagnostics for the Presence of Influential Data Points 339
12.6 Confidence Intervals for the Predicted Y Values 343
12.7 Regression for Left-Censored Data (Non-detects) 344
Exercises 349
13 DATA TRANSFORMATION VERSUS GENERALIZED LINEAR MODEL 351
13.1 Introduction and Overview 351
13.2 Data Transformation 352
13.2.1 General Approach for Data Transformations 355
13.2.2 The Ladder of Powers 357
13.2.3 The Bulging Rule and Data Transformations for Regression Analysis 359
13.2.4 Facilitating Data Transformations Using Box–Cox Methods 366
13.2.5 Back-Transformation Bias and Other Issues with Data Transformation 367
13.2.6 Transformation Bias Correction 371
13.3 The Generalized Linear Model (GLM) and Applications for Regression 374
13.3.1 Components of the Generalized Linear Model and Inherent Limitations 374
13.3.2 Estimation and Hypothesis Tests of Significance for GLM Parameters 376
13.3.3 Deviance, Null Deviance, Residual Deviance, and Goodness of Fit 377
13.3.4 Diagnostics for GLM 379
13.3.5 Procedural Steps for Regression with GLM in R 380
13.4 Extension of Data Transformation and Generalized Linear Model to Multiple Regression 385
13.4.1 Data Transformation for Multiple Regression 385
13.4.2 Generalized Linear Models for Multiple Regression 387
Exercises 387
14 ROBUST REGRESSION 391
14.1 Introduction and Overview 391
14.2 Kendall–Theil Robust Line 393
14.2.1 Computation of the Kendall–Theil Robust Line Regression 393
14.2.2 Test of Significance for the Kendall–Theil Robust Line 396
14.2.3 Bias Correction for Y Predictions by the Kendall–Theil Robust Line 397
14.3 Weighted Least Squares Regression 398
14.3.1 Procedure for Weighted Least Squares Regression for Known Variances of the Observations 399
14.4 Iteratively Reweighted Least Squares Regression 405
14.4.1 The Iteratively Reweighted Least Squares Procedure 409
14.5 Other Robust Regression Alternatives: Bounded Influence Methods 412
14.5.1 Least Absolute Deviation or Least Absolute Values 412
14.5.2 Quantile Regression 413
14.5.3 Least Median of Squares 413
14.5.4 Least Trimmed Squares 414
14.6 Robust Regression Methods for Multiple-Variable Data 416
Exercises 417
15 MULTIPLE LINEAR REGRESSION 419
15.1 Introduction and Overview 419
15.2 The Need for Multiple Regression 420
15.3 The Multiple Linear Regression (MLR) Model 421
15.4 The Estimated Multivariable X–Y Relationship Based on a Data Sample 422
15.5 Assumptions of Multiple Linear Regression 430
15.5.1 Linearity of the Relationship Between the Dependent and Explanatory Variables 431
15.5.2 Absence of Multicollinearity Among the Explanatory Variables 433
15.5.3 Homoscedasticity or Constancy of Variance of the Y Population Errors 439
15.5.4 Statistical Independence of the Y Population Errors 441
15.5.5 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers 445
15.5.6 Absence of Variability or Errors in the Explanatory Variables 446
15.6 Hypothesis Tests for Reliability of the MLR Model 447
15.6.1 ANOVA F Test for Overall Significance of the Regression 447
15.6.2 Partial t and Partial F Tests for Individual Regression Coefficients 452
15.6.3 Complete and Reduced Models 452
15.7 Confidence Intervals for the Regression Coefficients and Predicted Y Values 457
15.8 Coefficient of Multiple Correlation (R), Multiple Determination (R2), Adjusted R2, and Partial Correlation Coefficients 458
15.8.1 Coefficient of Multiple Correlation (R) 458
15.8.2 Coefficient of Multiple Determination (R2) and Adjusted R2 459
15.8.3 Partial Correlations and Squared Partial Correlations 460
15.9 Regression Diagnostics 462
15.10 Model Interactions and Multiplicative Effects 467
15.10.1 The Multiple Linear Regression Interaction Model 467
15.10.2 Hypothesis Tests of the Interaction Terms for Significance 468
Exercises 474
16 CATEGORICAL DATA ANALYSIS 477
16.1 Introduction and Overview 477
16.2 Types of Variables and Associated Data 478
16.2.1 Quantitative Variables 479
16.2.2 Qualitative Variables 479
16.3 One-Way Analysis of Variance Regression Model 480
16.3.1 Interpretation of the Regression Results and ANOVA F-Test for Overall Significance of the Regression Model 485
16.4 Two-Way Analysis of Variance Regression Model with No Interactions 486
16.5 Two-Way Analysis of Variance Regression Model with Interactions 490
16.6 Analysis of Covariance Regression Model 491
Exercises 499
17 MODEL BUILDING: STEPWISE REGRESSION AND BEST SUBSETS REGRESSION 501
17.1 Introduction and Overview 501
17.2 Consequences of Inappropriate Variable Selection 502
17.3 Stepwise Regression Procedures 505
17.3.1 Advantages and Disadvantages of Stepwise Procedures 512
17.4 Subsets Regression 513
Exercises 522
18 NONLINEAR REGRESSION 525
18.1 Introduction and Overview 525
18.2 The Nonlinear Regression Model 526
18.3 Assumptions of Nonlinear Least Squares Regression 528
Exercises 545
PART IV STATISTICS IN ENVIRONMENTAL SAMPLING DESIGN AND RISK ASSESSMENT 547
19 DATA QUALITY OBJECTIVES AND ENVIRONMENTAL SAMPLING DESIGN 549
19.1 Introduction and Overview 549
19.2 Sampling Design 550
19.3 Sampling Plans 550
19.3.1 Simple Random Sampling 552
19.3.2 Systematic Sampling 554
19.3.3 Other Sampling Designs 556
19.4 Sample Size Determination 557
19.4.1 Types I and II Decision Errors 558
19.4.2 Variance and Gray Region 559
19.4.3 Width of the Gray Region 560
19.4.4 Computation of the Recommended Minimum Sample Size for Estimating the Population Mean or Median 561
19.4.5 Computation of the Recommended Minimum Sample Size for Comparing a Population Mean or Median with a Fixed Threshold Value 565
19.4.6 Computation of the Recommended Minimum Sample Size for Comparing the Population Means or Medians for Two Populations 568
Exercises 569
20 DETERMINATION OF BACKGROUND AND APPLICATIONS IN RISK ASSESSMENT 571
20.1 Introduction and Overview 571
20.2 When Background Sampling is Required and When it is not 572
20.3 Background Sampling Plans 572
20.4 Graphical and Quantitative Data Analysis for Site Versus Background Data Comparisons 573
20.5 Determination of Exposure Point Concentration and Contaminants of Potential Concern 583
Exercises 585
21 STATISTICS IN CONVENTIONAL AND PROBABILISTIC RISK ASSESSMENT 587
21.1 Introduction and Overview 587
21.2 Conventional or Point Risk Estimation 588
21.3 Probabilistic Risk Assessment Using Monte Carlo Simulation 594
Exercises 598
APPENDIX A: SOFTWARE SCRIPTS 599
APPENDIX B: DATASETS 603
REFERENCES 609
ANSWERS FOR EXERCISES 613
INDEX 619