Statistical Data Analysis Explained - AppliedEnvironmental Statistics with R
Buy Rights Online Buy Rights

Rights Contact Login For More Details

  • Wiley

More About This Title Statistical Data Analysis Explained - AppliedEnvironmental Statistics with R

English

Few books on statistical data analysis in the natural sciences are written at a level that a non-statistician will easily understand. This is a book written in colloquial language, avoiding mathematical formulae as much as possible, trying to explain statistical methods using examples and graphics instead. To use the book efficiently, readers should have some computer experience. The book starts with the simplest of statistical concepts and carries readers forward to a deeper and more extensive understanding of the use of statistics in environmental sciences. The book concerns the application of statistical and other computer methods to the management, analysis and display of spatial data. These data are characterised by including locations (geographic coordinates), which leads to the necessity of using maps to display the data and the results of the statistical methods. Although the book uses examples from applied geochemistry, and a large geochemical survey in particular, the principles and ideas equally well apply to other natural sciences, e.g., environmental sciences, pedology, hydrology, geography, forestry, ecology, and health sciences/epidemiology.

The book is unique because it supplies direct access to software solutions (based on R, the Open Source version of the S-language for statistics) for applied environmental statistics. For all graphics and tables presented in the book, the R-scripts are provided in the form of executable R-scripts. In addition, a graphical user interface for R, called DAS+R, was developed for convenient, fast and interactive data analysis.

Statistical Data Analysis Explained: Applied Environmental Statistics with R provides, on an accompanying website, the software to undertake all the procedures discussed, and the data employed for their description in the book.

English

Clemens Reiman (born 1952) holds an M.Sc. in Mineralogy and Petrology from the University of Hamburg (Germany), a Ph.D. in Geosciences from Leoben Mining University, Austria, and a D.Sc. in Applied Geochemistry from the same university. he has worked as a lecturer in Mineralogy and Petrology and Environmental Sciences at Leoben Mining University, as an exploration geochemist in eastern Canada, in contract research in environmental sciences in Austria and managed the laboratory of an Austrian cement company before joining the Geological Survey of Norway in 1991 as a senior geochemist. From March to October 2004 he was director and professor at the German Federal Environment Agency (Unweltbundesamt, UBAS), responsible for the Division II, Environmental Health and Protection of Ecosystems. At present he is chairman of the EuroGeoSurveys geochemistry expert group, acting vice president of the International Association of GeoChemistry (IAGC), and associate editor of both Applied Geochemistry and Geochemistry: Exploration, Environment, Analysis.

Peter Filzmoser (born 1968) studies Applied Mathematics at the Vienna University of Technology, Austria, where he also wrote his doctoral thesis and habilitation devoted to the field of multivariate statistics. His research led him to the area of robust statistics, resulting in many international collaborations and various scientific papers in this area. His interest in applications of robust methods resulted in the development of R software packages. He was and is involved in the Organisation of several scientific evens devoted to robust statistics. Since 2001 he has been dozent at the Statistics Department at Vienna University of Technology. He was visiting professor at the universities of Vienna, Toulouse and Minsk.

Robert G. Garrett (Bob Garrett) studied Mining Geology and Applied Geochemistry at Imperial College, London, and joined the Geological Survey of Canada (GSC) in 1967 following post-doctoral studies at Northwestern University, Evanston. For the next 25 years his activities focused on regional geochemical mapping in Canada, and overseas for the Canadian International Development Agency, to support mineral exploration and resource appraisal. Throughout his work there has been a use of computers and statistics to manage data, assess their quality, and maximise the knowledge extracted from them. In the 1990s he commenced collaboration crops. Since then he has been involved in various Canadian Federal and university-based research initiatives aimed at providing sound science to support Canadian regulatory and international policy activities concerning risk assessments and risk management for metals. he retired in March 2005 but remains active as an Emeritus Scientist.

Rudolf Dutter is senior statistician and full professor at Vienna University of Technology, Austria. he studies Applied Mathematics in Vienna (M.Sc.) and Statistics at Universite de Montreal, Canada (Ph.D.). He spent three years as a post-doctoral fellow at ETH, Zurich, working on computational robust statistics. research and teaching activities followed at the Graz University of Technology, and as a full professor of statistics at Vienna University of Technology, both in Austria. he also taught and consulted at Leoben Mining University, Technology, both in Austria. he also taught and consulted at Leoben Mining University, Austria; currently he consults in many fields of applied statistics with main interests in computational and robust statistics, development of statistical software, and geostatistics. He is author and coauthor of many publications and several books, e.g., an early booklet in German on geostatistics.

English

Preface.

Acknowledgements.

About the Authors.

1. Introduction.

1.1 The Kola Ecogeochemistry Project.

2. Preparing the Data for Use in R and DAS+R.

2.1 Required data format for import into R and DAS+R.

2.2 The detection limit problem.

2.3 Missing Values.

2.4 Some “typical” problems encountered when editing a laboratory data report file to a DAS+R file.

2.5 Appending and linking data files.

2.6 Requirements for a geochemical database.

2.7 Summary.

3. Graphics to Display the Data Distribution.

3.1 The one-dimensional scatterplot.

3.2 The histogram.

3.3 The density trace.

3.4 Plots of the distribution function.

3.5 Boxplots.

3.6 Combination of histogram, density trace, one-dimensional scatterplot, boxplot, and ECDF-plot.

3.7 Combination of histogram, boxplot or box-and-whisker plot, ECDF-plot, and CP-plot.

3.8 Summary.

4. Statistical Distribution Measures.

4.1 Central value.

4.2 Measures of spread.

4.3 Quartiles, quantiles and percentiles.

4.4 Skewness.

4.5 Kurtosis.

4.6 Summary table of statistical distribution measures.

4.7 Summary.

5. Mapping Spatial Data.

5.1 Map coordinate systems (map projection).

5.2 Map scale.

5.3 Choice of the base map for geochemical mapping

5.4 Mapping geochemical data with proportional dots.

5.5 Mapping geochemical data using classes.

5.6 Surface maps constructed with smoothing techniques.

5.7 Surface maps constructed with kriging.

5.8 Colour maps.

5.9 Some common mistakes in geochemical mapping.

5.10 Summary.

6. Further Graphics for Exploratory Data Analysis.

6.1 Scatterplots (xy-plots).

6.2 Linear regression lines.

6.3 Time trends.

6.4 Spatial trends.

6.5 Spatial distance plot.

6.6 Spiderplots (normalized multi-element diagrams).

6.7 Scatterplot matrix.

6.8 Ternary plots.

6.9 Summary.

7. Defining Background and Threshold, Identification of Data Outliers and Element Sources.

7.1 Statistical methods to identify extreme values and data outliers.

7.2 Detecting outliers and extreme values in the ECDF- or CP-plot.

7.3 Including the spatial distribution in the definition of background.

7.4 Methods to distinguish geogenic from anthropogenic element sources.

7.5 Summary.

8. Comparing Data in Tables and Graphics.

8.1 Comparing data in tables.

8.2 Graphical comparison of the data distributions of several data sets.

8.3 Comparing the spatial data structure.

8.4 Subset creation – a mighty tool in graphical data analysis.

8.5 Data subsets in scatterplots.

8.6 Data subsets in time and spatial trend diagrams.

8.7 Data subsets in ternary plots.

8.8 Data subsets in the scatterplot matrix.

8.9 Data subsets in maps.

8.10 Summary.

9. Comparing Data Using Statistical Tests.

9.1 Tests for distribution (Kolmogorov–Smirnov and Shapiro–Wilk tests).

9.2 The one-sample t-test (test for the central value).

9.3 Wilcoxon signed-rank test.

9.4 Comparing two central values of the distributions of independent data groups.

9.5 Comparing two central values of matched pairs of data.

9.6 Comparing the variance of two test.

9.7 Comparing several central values.

9.8 Comparing the variance of several data groups.

9.9 Comparing several central values of dependent groups.

9.10 Summary.

10. Improving Data Behaviour for Statistical Analysis: Ranking and Transformations.

10.1 Ranking/sorting.

10.2 Non-linear transformations.

10.3 Linear transformations.

10.4 Preparing a data set for multivariate data analysis.

10.5 Transformations for closed number systems.

10.6 Summary.

11. Correlation.

11.1 Pearson correlation.

11.2 Spearman rank correlation.

11.3 Kendall-tau correlation.

11.4 Robust correlation coefficients.

11.5 When is a correlation coefficient significant?

11.6 Working with many variables.

11.7 Correlation analysis and inhomogeneous data.

11.8 Correlation results following addictive logratio or central logratio transformations.

11.9 Summary.

12. Multivariate Graphics.

12.1 Profiles.

12.2 Stars.

12.3 Segments.

12.4 Boxes.

12.5 Castles and trees.

12.6 Parallel coordinates plot.

12.7 Summary.

13. Multivariate Outlier Detection.

13.1 Univariate versus multivariate outlier detection.

13.2 Robust versus non-robust outlier detection.

13.3 The chi-square plot.

13.4 Automated multivariate outlier detection and visualization.

13.5 Other graphical approaches for identifying outliers and groups.

13.6 Summary.

14. Principal Component Analysis (PCA) and Factor Analysis (FA).

14.1 Conditioning the data for PCA and FA.

14.2 Principal component analysis (PCA).

14.3 Factor Analysis.

14.4 Summary.

15. Cluster Analysis.

15.1 Possible data problems in the context of cluster analysis.

15.2 Distance measures.

15.3 Clustering samples.

15.4 Clustering variables.

15.5 Evaluation of cluster validity.

15.6 Selection of variables for cluster analysis.

15.7 Summary.

16. Regression Analysis (RA).

16.1 Data requirements for regression analysis.

16.2 Multiple regression.

16.3 Classical least squares (LS) regression.

16.4 Robust regression.

16.5 Model selection in regression analysis.

16.6 Other regression methods.

16.7 Summary.

17. Discriminant Analysis (DA) and Other Knowledge-Based Classification Methods.

17.1 Methods for discriminant analysis.

17.2 Data requirements for discriminant analysis.

17.3 Visualisation of the discriminant function.

17.4 Prediction with discriminant analysis.

17.5 Exploring for similar data structures.

17.6 Other knowledge-based classification methods/

17.7 Summary.

18. Quality Control (QC).

18.1 Randomised samples.

18.2 Trueness.

18.3 Accuracy.

18.4 Precision.

18.5 Analysis of variance (ANOVA)

18.6 Using Maps to assess data quality.

18.7 Variables analysed by two different analytical techniques.

18.8 Working with censored data – a practical example.

18.9 Summary.

19. Introduction to R and Structure of the DAS+R Graphical User Interface.

19.1 R.

19.2 R-scripts.

19.3 A brief overview of relevant R commands.

19.4 DAS+R.

19.5 Summary.

References.

Index.

loading