Statistical Applications for Environmental Analysis and Risk Assessment
Buy Rights Online Buy Rights

Rights Contact Login For More Details

  • Wiley

More About This Title Statistical Applications for Environmental Analysis and Risk Assessment

English

Statistical Applications for Environmental Analysis and Risk Assessment guides readers through real-world situations and the best statistical methods used to determine the nature and extent of the problem, evaluate the potential human health and ecological risks, and design and implement remedial systems as necessary. Featuring numerous worked examples using actual data and “ready-made” software scripts, Statistical Applications for Environmental Analysis and Risk Assessment also includes:

• Descriptions of basic statistical concepts and principles in an informal style that does not presume prior familiarity with the subject

• Detailed illustrations of statistical applications in the environmental and related water resources fields using real-world data in the contexts that would typically be encountered by practitioners

• Software scripts using the high-powered statistical software system, R, and supplemented by USEPA’s ProUCL and USDOE’s VSP software packages, which are all freely available

• Coverage of frequent data sample issues such as non-detects, outliers, skewness, sustained and cyclical trend that habitually plague environmental data samples

• Clear demonstrations of the crucial, but often overlooked, role of statistics in environmental sampling design and subsequent exposure risk assessment.

English

Joseph Ofungwu, PhD, is an environmental professional with over eighteen years of hands-on experience in environmental practice, including contaminant impact analysis, human health and ecological risk assessment, pollutant fate and transport modeling in ambient air, soil, ground and surface water. Dr. Ofungwu is also Visiting Assistant Professor with the Urban Environmental Systems Management Program at Pratt Institute and teaches statistics courses for professional engineer license maintenance requirements.

English

PREFACE xvii

ACKNOWLEDGMENTS xix

1 INTRODUCTION 1

1.1 Introduction and Overview  1

1.2 The Aim of the Book: Get Involved!  2

1.3 The Approach and Style: Clarity, Clarity, Clarity  3

PART I BASIC STATISTICAL MEASURES AND CONCEPTS 5

2 INTRODUCTION TO SOFTWARE PACKAGES USED IN THIS BOOK 7

2.1 R  8

2.1.1 Helpful R Tips  9

2.1.2 Disadvantages of R  10

2.2 ProUCL  10

2.2.1 Helpful ProUCL Tips  11

2.2.2 Potential Deficiencies of ProUCL  12

2.3 Visual Sample Plan  12

2.4 DATAPLOT  13

2.4.1 Helpful Tips for Running DATAPLOT in Batch Mode  13

2.5 Kendall–Thiel Robust Line  14

2.6 Minitab®  14

2.7 Microsoft Excel  15

3 LABORATORY DETECTION LIMITS, NONDETECTS, AND DATA ANALYSIS 17

3.1 Introduction and Overview  17

3.2 Types of Laboratory Data Detection Limits  18

3.3 Problems with Nondetects in Statistical Data Samples  19

3.4 Options for Addressing Nondetects in Data Analysis  20

3.4.1 Kaplan–Meier Estimation  21

3.4.2 Robust Regression on Order Statistics  22

3.4.3 Maximum Likelihood Estimation  23

4 DATA SAMPLE, DATA POPULATION, AND DATA DISTRIBUTION 25

4.1 Introduction and Overview  25

4.2 Data Sample Versus Data Population or Universe  26

4.3 The Concept of a Distribution  27

4.3.1 The Concept of a Probability Distribution Function  28

4.3.2 Cumulative Probability Distribution and Empirical Cumulative Distribution Functions  31

4.4 Types of Distributions  34

4.4.1 Normal Distribution  34

4.4.2 Lognormal, Gamma, and Other Continuous Distributions  49

4.4.3 Distributions Used in Inferential Statistics (Student’s t, Chi-Square, F)  53

4.4.4 Discrete Distributions  57

Exercises  64

5 GRAPHICS FOR DATA ANALYSIS AND PRESENTATION 67

5.1 Introduction and Overview  67

5.2 Graphics for Single Univariate Data Samples  68

5.2.1 Box and Whiskers Plot  68

5.2.2 Probability Plots (i.e., Quantile–Quantile Plots for Comparing a Data Sample to a Theoretical Distribution)  72

5.2.3 Quantile Plots  79

5.2.4 Histograms and Kernel Density Plots  82

5.3 Graphics for Two or More Univariate Data Samples  86

5.3.1 Quantile–Quantile Plots for Comparing Two Univariate Data Samples  86

5.3.2 Side-by-Side Box Plots  89

5.4 Graphics for Bivariate and Multivariate Data Samples  91

5.4.1 Graphical Data Analysis for Bivariate Data Samples  91

5.4.2 Graphical Data Analysis for Multivariate Data Samples  95

5.5 Graphics for Data Presentation  98

5.6 Data Smoothing  105

5.6.1 Moving Average and Moving Median Smoothing  105

5.6.2 Locally Weighted Scatterplot Smoothing (LOWESS or LOESS)  108

Exercises  113

6 BASIC STATISTICAL MEASURES: DESCRIPTIVE OR SUMMARY STATISTICS 115

6.1 Introduction and Overview  115

6.2 Arithmetic Mean and Weighted Mean  116

6.3 Median and Other Robust Measures of Central Tendency  117

6.4 Standard Deviation, Variance, and Other Measures of Dispersion or Spread  119

6.4.1 Quantiles (Including Percentiles)  121

6.4.2 Robust Measures of Spread: Interquartile Range and Median Absolute Deviation  124

6.5 Skewness and Other Measures of Shape  124

6.6 Outliers  134

6.6.1 Tests for Outliers  135

6.7 Data Transformations  139

Exercises  141

PART II STATISTICAL PROCEDURES FOR MOSTLY UNIVARIATE DATA 143

7 STATISTICAL INTERVALS: CONFIDENCE, TOLERANCE, AND PREDICTION INTERVALS 145

7.1 Introduction and Overview  145

7.2 Confidence Intervals  146

7.2.1 Parametric Confidence Intervals  151

7.2.2 Nonparametric Confidence Intervals Around the Mean, Median, and Other Percentiles  154

7.2.3 Parametric Confidence Band Around a Trend Line  164

7.2.4 Nonparametric Confidence Band Around a Trend Line  166

7.3 Tolerance Intervals  168

7.3.1 Parametric Tolerance Intervals  169

7.3.2 Nonparametric Tolerance Intervals  170

7.4 Prediction Intervals  173

7.4.1 Parametric Prediction Intervals for Future Individual Values and Future Means  175

7.4.2 Nonparametric Prediction Intervals for Future Individual Values and Future Medians  176

7.5 Control Charts  178

Exercises  178

8 TESTS OF HYPOTHESIS AND DECISION MAKING 181

8.1 Introduction and Overview  181

8.2 Basic Terminology and Procedures for Tests of Hypothesis  182

8.3 Type I and Type II Decision Errors, Statistical Power, and Interrelationships  190

8.4 The Problem with Multiple Tests or Comparisons: Site-Wide False Positive Error Rates  193

8.5 Tests for Equality of Variance  195

Exercises  199

9 APPLICATIONS OF HYPOTHESIS TESTS: COMPARING POPULATIONS, ANALYSIS OF VARIANCE 201

9.1 Introduction and Overview  201

9.2 Single Sample Tests  202

9.2.1 Parametric Single-Sample Tests: One-Sample t-Test and One-Sample Proportion Test  203

9.2.2 Nonparametric Single-Sample Tests: One-Sample Sign Test and One-Sample Wilcoxon Signed Rank Test  205

9.3 Two-Sample Tests  208

9.3.1 Parametric Two-Sample Tests  210

9.3.2 Nonparametric Two-Sample Tests  216

9.4 Comparing Three or More Populations: Parametric ANOVA and Nonparametric Kruskal–Wallis Tests  227

9.4.1 Parametric One-Way ANOVA  228

9.4.2 Nonparametric One-Way ANOVA (Kruskal–Wallis Test)  235

9.4.3 Follow-Up or Post Hoc Comparisons After Parametric and Nonparametric One-Way ANOVA  238

9.4.4 Parametric and Nonparametric Two-Way and Multifactor ANOVA  244

Exercises  255

10 TRENDS, AUTOCORRELATION, AND TEMPORAL DEPENDENCE 257

10.1 Introduction and Overview  257

10.2 Tests for Autocorrelation and Temporal Effects  258

10.2.1 Test for Autocorrelation Using the Sample Autocorrelation Function  259

10.2.2 Test for Autocorrelation Using the Rank Von Neumann Ratio Method  261

10.2.3 An Example on Site-Wide Temporal Effects  264

10.3 Tests for Trend  265

10.3.1 Parametric Test for Trends—Simple Linear Regression  266

10.3.2 Nonparametric Test for Trends—Mann–Kendall Test and Seasonal Mann–Kendall Test  271

10.3.3 Nonparametric Test for Trends—Theil–Sen Trend Test  273

10.4 Correcting Seasonality and Temporal Effects in the Data  279

10.4.1 Correcting Seasonality for a Single Data Series  280

10.4.2 Simultaneously Correcting Temporal Dependence for Multiple Data Sets  281

10.5 Effects of Exogenous Variables on Trend Tests  282

Exercises  285

PART III STATISTICAL PROCEDURES FOR MOSTLY MULTIVARIATE DATA 287

11 CORRELATION, COVARIANCE, GEOSTATISTICS 289

11.1 Introduction and Overview  289

11.2 Correlation and Covariance  290

11.2.1 Pearson’s Correlation Coefficient  292

11.2.2 Spearman’s and Kendall’s Correlation Coefficients  294

11.3 Introduction to Geostatistics  300

11.3.1 The Variogram or Covariogram  300

11.3.2 Kriging  302

11.3.3 A Note on Data Sample Size and Lag Distance Requirements  311

Exercises  312

12 SIMPLE LINEAR REGRESSION 315

12.1 Introduction and Overview  315

12.2 The Simple Linear Regression Model  316

12.2.1 The True or Population X–Y Relationship  317

12.2.2 The Estimated X–Y Relationship Based on a Data Sample  320

12.3 Basic Applications of Simple Linear Regression  324

12.3.1 Description and Graphical Review of the Data Sample for Regression  324

12.4 Verify Compliance with the Assumptions of Conventional Linear Regression  332

12.4.1 Assumptions of Linearity and Homoscedasticity  332

12.4.2 Assumption of Independence  334

12.4.3 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers  337

12.5 Check the Regression Diagnostics for the Presence of Influential Data Points  339

12.6 Confidence Intervals for the Predicted Y Values  343

12.7 Regression for Left-Censored Data (Non-detects)  344

Exercises  349

13 DATA TRANSFORMATION VERSUS GENERALIZED LINEAR MODEL 351

13.1 Introduction and Overview  351

13.2 Data Transformation  352

13.2.1 General Approach for Data Transformations  355

13.2.2 The Ladder of Powers  357

13.2.3 The Bulging Rule and Data Transformations for Regression Analysis  359

13.2.4 Facilitating Data Transformations Using Box–Cox Methods  366

13.2.5 Back-Transformation Bias and Other Issues with Data Transformation  367

13.2.6 Transformation Bias Correction  371

13.3 The Generalized Linear Model (GLM) and Applications for Regression  374

13.3.1 Components of the Generalized Linear Model and Inherent Limitations  374

13.3.2 Estimation and Hypothesis Tests of Significance for GLM Parameters  376

13.3.3 Deviance, Null Deviance, Residual Deviance, and Goodness of Fit  377

13.3.4 Diagnostics for GLM  379

13.3.5 Procedural Steps for Regression with GLM in R  380

13.4 Extension of Data Transformation and Generalized Linear Model to Multiple Regression  385

13.4.1 Data Transformation for Multiple Regression  385

13.4.2 Generalized Linear Models for Multiple Regression  387

Exercises  387

14 ROBUST REGRESSION 391

14.1 Introduction and Overview  391

14.2 Kendall–Theil Robust Line  393

14.2.1 Computation of the Kendall–Theil Robust Line Regression  393

14.2.2 Test of Significance for the Kendall–Theil Robust Line  396

14.2.3 Bias Correction for Y Predictions by the Kendall–Theil Robust Line  397

14.3 Weighted Least Squares Regression  398

14.3.1 Procedure for Weighted Least Squares Regression for Known Variances of the Observations  399

14.4 Iteratively Reweighted Least Squares Regression  405

14.4.1 The Iteratively Reweighted Least Squares Procedure  409

14.5 Other Robust Regression Alternatives: Bounded Influence Methods  412

14.5.1 Least Absolute Deviation or Least Absolute Values  412

14.5.2 Quantile Regression  413

14.5.3 Least Median of Squares  413

14.5.4 Least Trimmed Squares  414

14.6 Robust Regression Methods for Multiple-Variable Data  416

Exercises  417

15 MULTIPLE LINEAR REGRESSION 419

15.1 Introduction and Overview  419

15.2 The Need for Multiple Regression  420

15.3 The Multiple Linear Regression (MLR) Model  421

15.4 The Estimated Multivariable X–Y Relationship Based on a Data Sample  422

15.5 Assumptions of Multiple Linear Regression  430

15.5.1 Linearity of the Relationship Between the Dependent and Explanatory Variables  431

15.5.2 Absence of Multicollinearity Among the Explanatory Variables  433

15.5.3 Homoscedasticity or Constancy of Variance of the Y Population Errors  439

15.5.4 Statistical Independence of the Y Population Errors  441

15.5.5 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers  445

15.5.6 Absence of Variability or Errors in the Explanatory Variables  446

15.6 Hypothesis Tests for Reliability of the MLR Model  447

15.6.1 ANOVA F Test for Overall Significance of the Regression  447

15.6.2 Partial t and Partial F Tests for Individual Regression Coefficients  452

15.6.3 Complete and Reduced Models  452

15.7 Confidence Intervals for the Regression Coefficients and Predicted Y Values  457

15.8 Coefficient of Multiple Correlation (R), Multiple Determination (R2), Adjusted R2, and Partial Correlation Coefficients  458

15.8.1 Coefficient of Multiple Correlation (R)  458

15.8.2 Coefficient of Multiple Determination (R2) and Adjusted R2  459

15.8.3 Partial Correlations and Squared Partial Correlations  460

15.9 Regression Diagnostics  462

15.10 Model Interactions and Multiplicative Effects  467

15.10.1 The Multiple Linear Regression Interaction Model  467

15.10.2 Hypothesis Tests of the Interaction Terms for Significance  468

Exercises  474

16 CATEGORICAL DATA ANALYSIS 477

16.1 Introduction and Overview  477

16.2 Types of Variables and Associated Data  478

16.2.1 Quantitative Variables  479

16.2.2 Qualitative Variables  479

16.3 One-Way Analysis of Variance Regression Model  480

16.3.1 Interpretation of the Regression Results and ANOVA F-Test for Overall Significance of the Regression Model  485

16.4 Two-Way Analysis of Variance Regression Model with No Interactions  486

16.5 Two-Way Analysis of Variance Regression Model with Interactions  490

16.6 Analysis of Covariance Regression Model  491

Exercises  499

17 MODEL BUILDING: STEPWISE REGRESSION AND BEST SUBSETS REGRESSION 501

17.1 Introduction and Overview  501

17.2 Consequences of Inappropriate Variable Selection  502

17.3 Stepwise Regression Procedures  505

17.3.1 Advantages and Disadvantages of Stepwise Procedures  512

17.4 Subsets Regression  513

Exercises  522

18 NONLINEAR REGRESSION 525

18.1 Introduction and Overview  525

18.2 The Nonlinear Regression Model  526

18.3 Assumptions of Nonlinear Least Squares Regression  528

Exercises  545

PART IV STATISTICS IN ENVIRONMENTAL SAMPLING DESIGN AND RISK ASSESSMENT 547

19 DATA QUALITY OBJECTIVES AND ENVIRONMENTAL SAMPLING DESIGN 549

19.1 Introduction and Overview  549

19.2 Sampling Design  550

19.3 Sampling Plans  550

19.3.1 Simple Random Sampling  552

19.3.2 Systematic Sampling  554

19.3.3 Other Sampling Designs  556

19.4 Sample Size Determination  557

19.4.1 Types I and II Decision Errors  558

19.4.2 Variance and Gray Region  559

19.4.3 Width of the Gray Region  560

19.4.4 Computation of the Recommended Minimum Sample Size for Estimating the Population Mean or Median  561

19.4.5 Computation of the Recommended Minimum Sample Size for Comparing a Population Mean or Median with a Fixed Threshold Value  565

19.4.6 Computation of the Recommended Minimum Sample Size for Comparing the Population Means or Medians for Two Populations  568

Exercises  569

20 DETERMINATION OF BACKGROUND AND APPLICATIONS IN RISK ASSESSMENT 571

20.1 Introduction and Overview  571

20.2 When Background Sampling is Required and When it is not  572

20.3 Background Sampling Plans  572

20.4 Graphical and Quantitative Data Analysis for Site Versus Background Data Comparisons  573

20.5 Determination of Exposure Point Concentration and Contaminants of Potential Concern  583

Exercises  585

21 STATISTICS IN CONVENTIONAL AND PROBABILISTIC RISK ASSESSMENT 587

21.1 Introduction and Overview  587

21.2 Conventional or Point Risk Estimation  588

21.3 Probabilistic Risk Assessment Using Monte Carlo Simulation  594

Exercises  598

APPENDIX A: SOFTWARE SCRIPTS 599

APPENDIX B: DATASETS 603

REFERENCES 609

ANSWERS FOR EXERCISES 613

INDEX 619

loading