Rights Contact Login For More Details
- Wiley
More About This Title Classification Analysis of DNA Microarrays
- English
English
Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource:
- Provides information on the use of classification analysis for DNA microarrays used for large-scale high-throughput transcriptional studies
- Serves as a historical repository of general use supervised classification methods as well as newer contemporary methods
- Brings the reader quickly up to speed on the various classification methods by implementing the programming pseudo-code and source code provided in the book
- Describes implementation methods that help shorten discovery times
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
- English
English
LEIF E. PETERSON, PhD, is Associate Professor of Public Health, Weill Cornell Medical College, Cornell University, and is with the Center for Biostatistics, The Methodist Hospital Research Institute (Houston). He is a member of the IEEE Computational Intelligence Society, and Editor-in-Chief of the BioMed Central Source Code for Biology and Medicine.
- English
English
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
PART I CLASS DISCOVERY 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.3 Implementation 63
4.4 Cluster Visualization 67
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.3 Algorithm 96
6.4 Implementation 96
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
PART II DIMENSION REDUCTION 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
PART III CLASS PREDICTION 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.4 Data Arrangement 211
12.5 Filtering 213
12.6 Selection Methods 254
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Introduction 273
13.2 Input–Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver–Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 297
14.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 311
15.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 331
16.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 361
17.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Na¨ýve Bayes Classifier 379
18.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 393
19.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher’s Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 415
20.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 433
21.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 449
22.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.6 Least-Squares Soft-Margin SVM 465
22.7 Summary 481
References 483
23 Artificial Neural Networks 487
23.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.4 ANN Training Methods 497
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 525
24.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 543
25.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation–Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.6 Summary 567
References 567
26 Supervised Neural Gas 573
26.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 591
27.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 601
28.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
APPENDIXES 625
A Probability Primer 627
B Matrix Algebra 639
C Mathematical Functions 655
D Statistical Primitives 665
E Probability Distributions 679
F Symbols And Notation 699
Index 703