-
sklearn standard scalerpython 기초 2024. 8. 27. 21:19728x90
데이터 표준화 적용시, sklearn의 standard scaler를 사용하는 경우, ddof=0의 편향 표준편차 를 사용한다.
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
StandardScaler
Gallery examples: Release Highlights for scikit-learn 1.5 Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 1.2 Release Highlights for scikit-learn 1.1 Release Highlights ...
scikit-learn.org
StandardScaler document에서 ddof=0 의 표준편차를 사용함을 말하고 있다. 아래와 같이 샘플을 만들어 Standard Scaler를 적용하면, ddof=0의 std를 사용한 계산과 결과가 같음을 알 수 있다.
In [2]:import numpy as np import pandas as pd import sklearn
In [3]:A=np.random.sample(10)*5+2 B=np.random.sample(10)*2+3 C=np.random.sample(10)*3-1 df=pd.DataFrame({'A':A, 'B':B, 'C':C}) df
Out[3]:A B C 0 3.609154 3.408440 1.151811 1 6.227276 3.419386 0.669558 2 6.363483 4.949257 0.655796 3 4.491161 3.223420 -0.369682 4 2.144723 4.559320 1.054213 5 2.170273 4.978961 1.713408 6 5.584039 3.042569 0.886624 7 5.224212 4.663551 -0.564353 8 4.030533 3.552307 1.247374 9 5.134493 3.350118 1.917439 In [4]:from sklearn.preprocessing import StandardScaler std_scaler=StandardScaler() print('sklearn standard scaler\n',std_scaler.fit_transform(df)) print('\nddof=1\n',(df-df.mean())/df.std(ddof=1)) print('\nddof=0\n',(df-df.mean())/df.std(ddof=0))
sklearn standard scaler [[-0.62003685 -0.69093336 0.41679539] [ 1.20643463 -0.67599531 -0.22010499] [ 1.30145605 1.41180608 -0.2382801 ] [-0.00472555 -0.94342769 -1.59260461] [-1.64166337 0.87966226 0.28790019] [-1.62383922 1.45234197 1.15848269] [ 0.75769539 -1.19023447 0.06656856] [ 0.50667029 1.02190528 -1.84970211] [-0.32607169 -0.49459958 0.54300297] [ 0.44408032 -0.7705252 1.42794202]] ddof=1 A B C 0 -0.588219 -0.655477 0.395407 1 1.144524 -0.641305 -0.208810 2 1.234670 1.339357 -0.226052 3 -0.004483 -0.895014 -1.510877 4 -1.557419 0.834521 0.273126 5 -1.540509 1.377813 1.099033 6 0.718813 -1.129156 0.063152 7 0.480670 0.969464 -1.754781 8 -0.309339 -0.469218 0.515138 9 0.421292 -0.730984 1.354665 ddof=0 A B C 0 -0.620037 -0.690933 0.416795 1 1.206435 -0.675995 -0.220105 2 1.301456 1.411806 -0.238280 3 -0.004726 -0.943428 -1.592605 4 -1.641663 0.879662 0.287900 5 -1.623839 1.452342 1.158483 6 0.757695 -1.190234 0.066569 7 0.506670 1.021905 -1.849702 8 -0.326072 -0.494600 0.543003 9 0.444080 -0.770525 1.427942
728x90'python 기초' 카테고리의 다른 글
pd crosstab 사용 (0) 2024.10.20 pandas dataframe index, column 가져오고 변경하기 (1) 2024.08.27 pandas read_csv sep 옵션 (0) 2024.08.25 pandas dadtaframe 수치형 / 범주형 분리 (0) 2024.08.24 zfill string 자릿수 맞추기 (0) 2024.02.26