Data Science Using Python and R
Buy Rights Online Buy Rights

Rights Contact Login For More Details

  • Wiley

More About This Title Data Science Using Python and R

English

Learn data science by doing data science! 

Data Science Using Python and R will get you plugged into the world’s two most widespread open-source platforms for data science: Python and R.

Data science is hot. Bloomberg called data scientist “the hottest job in America.” Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques. 

Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R.

Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining.

Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars.

Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.

English

Chantal D. Larose, PhD, is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics, and helped develop data science programs at ECSU and SUNY New Paltz. Her PhD dissertation, Model-Based Clustering of Incomplete Data, tackles the persistent problem of trying to do data science with incomplete data.

Daniel T. Larose, PhD, is a Professor of Data Science and Statistics and Director of the Data Science programs at Central Connecticut State University. He has published many books on data science, data mining, predictive analytics, and statistics. His consulting clients include The Economist magazine, Forbes Magazine, the CIT Group, and Microsoft.

English

Preface

Acknowledgements      

Chapter 1 Introduction to Data Science

Why Data Science?        

What is Data Science?

The Data Science Methodology

Problem Understanding Phase

Data Preparation Phase

Exploratory Data Analysis Phase

Setup Phase

Modeling Phase

Evaluation Phase

Deployment Phase

Data Science Tasks

Description

Estimation

Classification

Clustering

Prediction

Association

Exercises

Chapter 2 The Basics of Python and R

Downloading Python 14

Basics of Coding in Python

Using Comments in Python

Executing Commands in Python

Importing Packages in Python

Getting Data into Python

Saving Output in Python

Accessing Records and Variables in Python

Setting up Graphics in Python

Downloading R and RStudio

Basics of Coding in R

Using Comments in R

Executing Commands in R

Importing Packages in R

Getting Data into R

Saving Output in R

Accessing Records and Variables in R

Exercises

Chapter 3 Data Preparation

The Bank Marketing Data Set

The Problem Understanding Phase

Clearly Enunciate the Project Objectives

Translate These Objectives into a Data Science Problem

Data Preparation Phase

Adding an Index Field

How to Add an Index Field Using Python

How to Add an Index Field Using R

Changing Misleading Field Values

How to Change Misleading Field Values Using Python

How to Change Misleading Field Values Using R

Re-Expression of Categorical Data as Numeric

How to Re-Express Categorical Field Values Using Python

How to Re-Express Categorical Field Values Using R

Standardizing the Numeric Fields

How to Standardize Numeric Fields Using Python

How to Standardize Numeric Fields Using R

Identifying Outliers Using Z-Values

How to Identify Outliers Using Python

How to Identify Outliers Using R

Exercises

Chapter 4 Exploratory Data Analysis

EDA versus HT

Bar Graphs with Response Overlay

How to Construct a Bar Graph with Overlay Using Python

How to Construct a Bar Graph with Overlay Using R

Contingency Tables

How to Construct Contingency Tables Using Python

How to Construct Contingency Tables Using R

Histograms with Response Overlay

How to Construct Histograms with Response Overlay Using Python

How to Construct Histograms with Response Overlay Using R

Binning Based on Predictive Value

How to Perform Binning Based on Predictive Value Using Python

How to Perform Binning Based on Predictive Value Using R

Exercises

Chapter 5 Preparing to Model the Data

The Story So Far

Partitioning the Data

How to Partition the Data in Python

How to Partition the Data in R

Validating Your Partition

Balancing the Training Data Set

How to Balance the Training Data Set in Python

How to Balance the Training Data Set in R

Establishing Baseline Model Performance

Exercises

Chapter 6 Decision Trees

Introduction to Decision Trees

Classification and Regression Trees (CART)

How to Build CART Decision Trees Using Python

How to Build CART Decision Trees Using R

The C5.0 Algorithm for Building Decision Trees

How to Build C5.0 Decision Trees Using Python

How to Build C5.0 Decision Trees Using R

Random Forests

How to Build Random Forests Using Python

How to Build Random Forests Using R

Exercises

Chapter 7 Model Evaluation

Introduction to Model Evaluation

Classification Evaluation Measures

Sensitivity and Specificity

Precision, Recall, and F_β Scores

Method for Model Evaluation

An Application of Model Evaluation

How to Perform Model Evaluation Using R

How to Perform Model Evaluation Using Python

Accounting for Unequal Error Costs

Accounting for Unequal Error Costs Using R

Comparing Models with and without Unequal Error Costs

Data-Driven Error Costs

Exercises

Chapter 8 Naïve Bayes Classification

Introduction to Naïve Bayes

Bayes Theorem

Maximum a Posteriori Hypothesis

Class Conditional Independence

Application of Naïve Bayes Classification

Naïve Bayes in Python

Naïve Bayes in R

Exercises

Chapter 9 Neural Networks

Introduction to Neural Networks

The Neural Network Structure

Connection Weights and the Combination Function

The Sigmoid Activation Function

Back-Propagation

An Application of a Neural Network Model

How to Use Neural Networks in R

Exercises

Chapter 10 Clustering

What is Clustering?

Introduction to the k-Means Clustering Algorithm

An Application of k-Means Clustering

Cluster Validation

How to Perform k-Means Clustering Using Python

How to Perform k-Means Clustering Using R

Exercises

Chapter 11 Regression Modeling

The Estimation Task

Descriptive Regression Modeling

An Application of Multiple Regression Modeling

How to Perform Multiple Regression Modeling Using Python

How to Perform Multiple Regression Modeling Using R

Model Evaluation for Estimation

How to Perform Estimation Model Evaluation Using Python

How to Perform Estimation Model Evaluation Using R

Stepwise Regression

How to Perform Stepwise Regression Using R

Baseline Models for Regression

Exercises

Chapter 12 Dimension Reduction

The Need for Dimension Reduction

Multicollinearity

Identifying Multicollinearity Using Variance-Inflation Factors

How to Identify Multicollinearity Using Python

How to Identify Multicollinearity Using R

Principal Components Analysis

An Application of Principal Components Analysis

How Many Components Should We Extract?

The Eigenvalue Criterion

The Proportion of Variance Explained Criterion

Performing PCA with k = 4

Validation of the Principal Components

How to Perform Principal Components Analysis Using Python

How to Perform Principal Components Analysis Using R

When is Multicollinearity Not a Problem?

Exercises

Chapter 13 Generalized Linear Models

An Overview of General Linear Models

Linear Regression as a General Linear Model

Logistic Regression as a General Linear Model

An Application of Logistic Regression Modeling

How to Perform Logistic Regression Using Python

How to Perform Logistic Regression Using R

Poisson Regression

An Application of Poisson Regression

How to Perform Poisson Regression Using Python

How to Perform Poisson Regression Using R

Exercises

Chapter 14 Association Rules

Introduction to Association Rules

A Simple Example of Association Rule Mining

Support, Confidence, and Lift

Mining Association Rules

How to Mine Association Rules Using R

Confirming Our Metrics

The Confidence Difference Criterion

How to Apply the Confidence Difference Criterion Using R

The Confidence Quotient Criterion

How to Apply the Confidence Quotient Criterion Using R

Exercises

Appendix

Index

loading