Machine learning contests a guidebook /
This book systematically introduces the competitions in the field of algorithm and machine learning. The first author of the book has won 5 championships and 5 runner-ups in domestic and international algorithm competitions. Firstly, it takes common competition scenarios as a guide by giving the mai...
Main Author: | Wang he, 1946- |
---|---|
Other Authors: | Liu, Peng., Qian, Qian., SpringerLink (Online service) |
Format: | Electronic |
Language: | English |
Published: |
Singapore :
Springer,
2023.
Singapore : 2023. |
Physical Description: |
1 online resource (398 pages) |
Subjects: |
Table of Contents:
- Intro
- Preface
- Algorithm Competition Era
- Why Write
- Features of the Book
- Target Readers
- Welcome to Contact with Us
- Acknowledgments
- Contents
- Part I: Half the Work, Twice the Effect
- Chapter 1: Guide to the Competitions
- 1.1 Competition Platforms
- 1.1.1 Kaggle
- 1.1.2 Tianchi
- 1.1.2.1 Registration
- 1.1.2.2 Competition System
- 1.1.2.3 Points
- 1.1.3 DF
- 1.1.4 DC
- 1.1.5 Kesci
- 1.1.6 JDATA
- 1.1.7 Corporate Websites
- 1.2 Competition Procedures
- 1.2.1 Problem Modeling
- 1.2.2 Data Exploration
- 1.2.3 Feature Engineering
- 1.2.4 Model Training.
- 1.2.5 Model Integration
- 1.3 Competition Types
- 1.3.1 Data Types
- 1.3.2 Task Types
- 1.3.3 Application Scenarios
- 1.4 Thinking Exercises
- Chapter 2: Problem Modeling
- 2.1 Understanding the Competition Question
- 2.1.1 Business Background
- 2.1.1.1 Go Deep into the Business
- 2.1.1.2 Be Clear About the Goals
- 2.1.2 Understanding Data
- 2.1.3 Evaluation Indicators
- 2.1.3.1 Classification Indicators
- Error Rate and Accuracy
- Precision and Recall
- F1-score
- ROC Curve
- AUC
- Logarithmic Loss
- 2.1.3.2 Indicators of Regression
- Mean Absolute Error.
- Mean Squared Error
- Root Mean Squared Error
- Average Absolute Percentage Error
- 2.2 Sample Selection
- 2.2.1 Main Reasons
- 2.2.1.1 Too Large Data Set
- 2.2.1.2 Data Noise
- 2.2.1.3 Data Redundancy
- 2.2.1.4 Uneven Distribution of Positive and Negative Samples
- 2.2.2 Accurate Methods
- 2.2.3 Application Scenarios
- 2.3 Offline Evaluation Strategy
- 2.3.1 Strong Time Sequence Problems
- 2.3.2 Weak Time Sequence Problems
- 2.4 Cases in Practice
- 2.4.1 Understanding the Competition Question
- 2.4.2 Offline Verification
- 2.5 Thinking Exercises
- Chapter 3: Data Exploration.
- 3.1 Preliminary Data Exploration
- 3.1.1 Analytical Thinking
- 3.1.2 Analysis Methods
- 3.1.3 Purpose Clarification
- 3.2 Variable Analysis
- 3.2.1 Univariate Analysis
- 3.2.1.1 Labels
- 3.2.1.2 Continuous Type
- 3.2.1.3 Category Type
- 3.2.2 Multivariate Analysis
- 3.3 Model Analysis
- 3.3.1 Learning Curve
- 3.3.1.1 Underfitting Learning Curve
- 3.3.1.2 Overfitting Learning Curve
- 3.3.2 Feature Importance Analysis
- 3.3.3 Error Analysis
- 3.4 Thinking Exercises
- Chapter 4: Feature Engineering
- 4.1 Data Preprocessing
- 4.1.1 Processing Missing Values.
- 4.1.1.1 Distinguishing Missing Values
- 4.1.1.2 Processing Method
- 4.1.2 Dealing with Outliers
- 4.1.2.1 Looking for Outliers
- 4.1.2.2 Coping with Outliers
- 4.1.3 Optimizing Memory
- 4.2 Feature Transformation
- 4.2.1 Non-dimensionalization Processing of Continuous Variables
- 4.2.2 Data Transformation of Continuous Variables
- 4.2.2.1 log Transformation
- 4.2.2.2 Discretization of Continuous Variables
- 4.2.3 Category Feature Transformation
- 4.2.4 Irregular Feature Transformation
- 4.3 Feature Extraction
- 4.3.1 Statistics Features Related to Categories
- 4.3.1.1 Target Coding.