Machine learning contests a guidebook /

This book systematically introduces the competitions in the field of algorithm and machine learning. The first author of the book has won 5 championships and 5 runner-ups in domestic and international algorithm competitions. Firstly, it takes common competition scenarios as a guide by giving the mai...

Full description

Main Author: Wang he, 1946-
Other Authors: Liu, Peng., Qian, Qian., SpringerLink (Online service)
Format: Electronic
Language: English
Published: Singapore : Springer, 2023.
Singapore : 2023.
Physical Description: 1 online resource (398 pages)
Subjects:
Table of Contents:
  • Intro
  • Preface
  • Algorithm Competition Era
  • Why Write
  • Features of the Book
  • Target Readers
  • Welcome to Contact with Us
  • Acknowledgments
  • Contents
  • Part I: Half the Work, Twice the Effect
  • Chapter 1: Guide to the Competitions
  • 1.1 Competition Platforms
  • 1.1.1 Kaggle
  • 1.1.2 Tianchi
  • 1.1.2.1 Registration
  • 1.1.2.2 Competition System
  • 1.1.2.3 Points
  • 1.1.3 DF
  • 1.1.4 DC
  • 1.1.5 Kesci
  • 1.1.6 JDATA
  • 1.1.7 Corporate Websites
  • 1.2 Competition Procedures
  • 1.2.1 Problem Modeling
  • 1.2.2 Data Exploration
  • 1.2.3 Feature Engineering
  • 1.2.4 Model Training.
  • 1.2.5 Model Integration
  • 1.3 Competition Types
  • 1.3.1 Data Types
  • 1.3.2 Task Types
  • 1.3.3 Application Scenarios
  • 1.4 Thinking Exercises
  • Chapter 2: Problem Modeling
  • 2.1 Understanding the Competition Question
  • 2.1.1 Business Background
  • 2.1.1.1 Go Deep into the Business
  • 2.1.1.2 Be Clear About the Goals
  • 2.1.2 Understanding Data
  • 2.1.3 Evaluation Indicators
  • 2.1.3.1 Classification Indicators
  • Error Rate and Accuracy
  • Precision and Recall
  • F1-score
  • ROC Curve
  • AUC
  • Logarithmic Loss
  • 2.1.3.2 Indicators of Regression
  • Mean Absolute Error.
  • Mean Squared Error
  • Root Mean Squared Error
  • Average Absolute Percentage Error
  • 2.2 Sample Selection
  • 2.2.1 Main Reasons
  • 2.2.1.1 Too Large Data Set
  • 2.2.1.2 Data Noise
  • 2.2.1.3 Data Redundancy
  • 2.2.1.4 Uneven Distribution of Positive and Negative Samples
  • 2.2.2 Accurate Methods
  • 2.2.3 Application Scenarios
  • 2.3 Offline Evaluation Strategy
  • 2.3.1 Strong Time Sequence Problems
  • 2.3.2 Weak Time Sequence Problems
  • 2.4 Cases in Practice
  • 2.4.1 Understanding the Competition Question
  • 2.4.2 Offline Verification
  • 2.5 Thinking Exercises
  • Chapter 3: Data Exploration.
  • 3.1 Preliminary Data Exploration
  • 3.1.1 Analytical Thinking
  • 3.1.2 Analysis Methods
  • 3.1.3 Purpose Clarification
  • 3.2 Variable Analysis
  • 3.2.1 Univariate Analysis
  • 3.2.1.1 Labels
  • 3.2.1.2 Continuous Type
  • 3.2.1.3 Category Type
  • 3.2.2 Multivariate Analysis
  • 3.3 Model Analysis
  • 3.3.1 Learning Curve
  • 3.3.1.1 Underfitting Learning Curve
  • 3.3.1.2 Overfitting Learning Curve
  • 3.3.2 Feature Importance Analysis
  • 3.3.3 Error Analysis
  • 3.4 Thinking Exercises
  • Chapter 4: Feature Engineering
  • 4.1 Data Preprocessing
  • 4.1.1 Processing Missing Values.
  • 4.1.1.1 Distinguishing Missing Values
  • 4.1.1.2 Processing Method
  • 4.1.2 Dealing with Outliers
  • 4.1.2.1 Looking for Outliers
  • 4.1.2.2 Coping with Outliers
  • 4.1.3 Optimizing Memory
  • 4.2 Feature Transformation
  • 4.2.1 Non-dimensionalization Processing of Continuous Variables
  • 4.2.2 Data Transformation of Continuous Variables
  • 4.2.2.1 log Transformation
  • 4.2.2.2 Discretization of Continuous Variables
  • 4.2.3 Category Feature Transformation
  • 4.2.4 Irregular Feature Transformation
  • 4.3 Feature Extraction
  • 4.3.1 Statistics Features Related to Categories
  • 4.3.1.1 Target Coding.