Rights Contact Login For More Details
- Wiley
More About This Title Data Science Using Python and R
- English
English
Learn data science by doing data science!
Data Science Using Python and R will get you plugged into the world’s two most widespread open-source platforms for data science: Python and R.
Data science is hot. Bloomberg called data scientist “the hottest job in America.” Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.
Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R.
Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining.
Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars.
Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.
- English
English
Chantal D. Larose, PhD, is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics, and helped develop data science programs at ECSU and SUNY New Paltz. Her PhD dissertation, Model-Based Clustering of Incomplete Data, tackles the persistent problem of trying to do data science with incomplete data.
Daniel T. Larose, PhD, is a Professor of Data Science and Statistics and Director of the Data Science programs at Central Connecticut State University. He has published many books on data science, data mining, predictive analytics, and statistics. His consulting clients include The Economist magazine, Forbes Magazine, the CIT Group, and Microsoft.
- English
English
Preface
Acknowledgements
Chapter 1 Introduction to Data Science
Why Data Science?
What is Data Science?
The Data Science Methodology
Problem Understanding Phase
Data Preparation Phase
Exploratory Data Analysis Phase
Setup Phase
Modeling Phase
Evaluation Phase
Deployment Phase
Data Science Tasks
Description
Estimation
Classification
Clustering
Prediction
Association
Exercises
Chapter 2 The Basics of Python and R
Downloading Python 14
Basics of Coding in Python
Using Comments in Python
Executing Commands in Python
Importing Packages in Python
Getting Data into Python
Saving Output in Python
Accessing Records and Variables in Python
Setting up Graphics in Python
Downloading R and RStudio
Basics of Coding in R
Using Comments in R
Executing Commands in R
Importing Packages in R
Getting Data into R
Saving Output in R
Accessing Records and Variables in R
Exercises
Chapter 3 Data Preparation
The Bank Marketing Data Set
The Problem Understanding Phase
Clearly Enunciate the Project Objectives
Translate These Objectives into a Data Science Problem
Data Preparation Phase
Adding an Index Field
How to Add an Index Field Using Python
How to Add an Index Field Using R
Changing Misleading Field Values
How to Change Misleading Field Values Using Python
How to Change Misleading Field Values Using R
Re-Expression of Categorical Data as Numeric
How to Re-Express Categorical Field Values Using Python
How to Re-Express Categorical Field Values Using R
Standardizing the Numeric Fields
How to Standardize Numeric Fields Using Python
How to Standardize Numeric Fields Using R
Identifying Outliers Using Z-Values
How to Identify Outliers Using Python
How to Identify Outliers Using R
Exercises
Chapter 4 Exploratory Data Analysis
EDA versus HT
Bar Graphs with Response Overlay
How to Construct a Bar Graph with Overlay Using Python
How to Construct a Bar Graph with Overlay Using R
Contingency Tables
How to Construct Contingency Tables Using Python
How to Construct Contingency Tables Using R
Histograms with Response Overlay
How to Construct Histograms with Response Overlay Using Python
How to Construct Histograms with Response Overlay Using R
Binning Based on Predictive Value
How to Perform Binning Based on Predictive Value Using Python
How to Perform Binning Based on Predictive Value Using R
Exercises
Chapter 5 Preparing to Model the Data
The Story So Far
Partitioning the Data
How to Partition the Data in Python
How to Partition the Data in R
Validating Your Partition
Balancing the Training Data Set
How to Balance the Training Data Set in Python
How to Balance the Training Data Set in R
Establishing Baseline Model Performance
Exercises
Chapter 6 Decision Trees
Introduction to Decision Trees
Classification and Regression Trees (CART)
How to Build CART Decision Trees Using Python
How to Build CART Decision Trees Using R
The C5.0 Algorithm for Building Decision Trees
How to Build C5.0 Decision Trees Using Python
How to Build C5.0 Decision Trees Using R
Random Forests
How to Build Random Forests Using Python
How to Build Random Forests Using R
Exercises
Chapter 7 Model Evaluation
Introduction to Model Evaluation
Classification Evaluation Measures
Sensitivity and Specificity
Precision, Recall, and F_β Scores
Method for Model Evaluation
An Application of Model Evaluation
How to Perform Model Evaluation Using R
How to Perform Model Evaluation Using Python
Accounting for Unequal Error Costs
Accounting for Unequal Error Costs Using R
Comparing Models with and without Unequal Error Costs
Data-Driven Error Costs
Exercises
Chapter 8 Naïve Bayes Classification
Introduction to Naïve Bayes
Bayes Theorem
Maximum a Posteriori Hypothesis
Class Conditional Independence
Application of Naïve Bayes Classification
Naïve Bayes in Python
Naïve Bayes in R
Exercises
Chapter 9 Neural Networks
Introduction to Neural Networks
The Neural Network Structure
Connection Weights and the Combination Function
The Sigmoid Activation Function
Back-Propagation
An Application of a Neural Network Model
How to Use Neural Networks in R
Exercises
Chapter 10 Clustering
What is Clustering?
Introduction to the k-Means Clustering Algorithm
An Application of k-Means Clustering
Cluster Validation
How to Perform k-Means Clustering Using Python
How to Perform k-Means Clustering Using R
Exercises
Chapter 11 Regression Modeling
The Estimation Task
Descriptive Regression Modeling
An Application of Multiple Regression Modeling
How to Perform Multiple Regression Modeling Using Python
How to Perform Multiple Regression Modeling Using R
Model Evaluation for Estimation
How to Perform Estimation Model Evaluation Using Python
How to Perform Estimation Model Evaluation Using R
Stepwise Regression
How to Perform Stepwise Regression Using R
Baseline Models for Regression
Exercises
Chapter 12 Dimension Reduction
The Need for Dimension Reduction
Multicollinearity
Identifying Multicollinearity Using Variance-Inflation Factors
How to Identify Multicollinearity Using Python
How to Identify Multicollinearity Using R
Principal Components Analysis
An Application of Principal Components Analysis
How Many Components Should We Extract?
The Eigenvalue Criterion
The Proportion of Variance Explained Criterion
Performing PCA with k = 4
Validation of the Principal Components
How to Perform Principal Components Analysis Using Python
How to Perform Principal Components Analysis Using R
When is Multicollinearity Not a Problem?
Exercises
Chapter 13 Generalized Linear Models
An Overview of General Linear Models
Linear Regression as a General Linear Model
Logistic Regression as a General Linear Model
An Application of Logistic Regression Modeling
How to Perform Logistic Regression Using Python
How to Perform Logistic Regression Using R
Poisson Regression
An Application of Poisson Regression
How to Perform Poisson Regression Using Python
How to Perform Poisson Regression Using R
Exercises
Chapter 14 Association Rules
Introduction to Association Rules
A Simple Example of Association Rule Mining
Support, Confidence, and Lift
Mining Association Rules
How to Mine Association Rules Using R
Confirming Our Metrics
The Confidence Difference Criterion
How to Apply the Confidence Difference Criterion Using R
The Confidence Quotient Criterion
How to Apply the Confidence Quotient Criterion Using R
Exercises
Appendix
Index