In Machine learning and Statistics, feature selection (특징 선택), also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. (https://en.wikipedia.org/wiki/Feature_selection)
특징 선택은 모델에서 중요한 특징을 선택하는 것이고, 특징 추출(Feature extraction)은 임의의 특징을 만들어 내는 것.
모델을 만들기전에 Feature selection을 수행하면 좋은 점
- Reduce overfitting: 중복 데이터를 줄인다.
- Improves accuracy: 정확도 향상
- Reduces training time: 학습시간을 줄인다.
방법들
- Univariate selection: 각 특징별 빈도로 Chi-squared test 하여, 유의한 특징 선별
- Recursive feature elimination: 재귀적으로 특징 제거
- PCA weights
- 일부 기계학습 알고리즘에서 important features 정보 제공 (Random forest, Extra trees classifiers,...)
관련정보
- Beginner's Guide to Feature Selection in Python
- Feature Selection For Machine Learning in Python : Using Python
- Principal Component Analysis (PCA) for Feature Selection and some of its Pitfalls : PCA로 특징 선택
- 차원축소와 특징선택이란 무엇인가
관련논문
- Feature selecction methods for big data bioinformatics: A survey from the search perspective
- Feature Selection: A Data Perspective
- 의미 기반 유전 알고리즘을 사용한 특징 선택 Journal of Korean Socieity for Internet Information
- An Introduction to Variable and Feature Selection Journal of Machine learning Research
Suggested Pages #
- 0.025 July 19
- 0.025 Programming language
- 0.025 NumPy
- 0.025 Matplotlib
- 0.025 January 17
- 0.025 December 1
- 0.025 November 11
- 0.025 Pattern recognition
- 0.025
- 0.025 TinyDB
- More suggestions...