StratifiedKFold

»çÀÌŶ·±ÀÇ StratifiedKFold¿¡ ´ëÇØ ¾Ë¾Æ º¸ÀÚ.

tain, test°¡ µ¥ÀÌÅÍ°¡ ºÎÁ·ÇÒ¶§³ª °úÀûÇÕÀÌ ¹ß»ýÇÒ¶§ ±³Â÷ °ËÁõÀÌ ÇÊ¿ä ÇÏ´Ù.
±³Â÷ °ËÁõÀÌ ÇÊ¿ä ÇÒ¶§, »ç¿ëÇÏ´Â °ÍÀÌ »çÀÌŶ·±ÀÇ StratifiedKFoldÀÌ´Ù.


from sklearn.model_selection import StratifiedKFold
import numpy as np

StratifiedKFold ¶óÀ̺귯¸®¸¦ ÀÓÆ÷Æ® ÇÑ´Ù.

data = np.arange(1, 10).reshape(9, 1)
col = [[1], [2], [3], [1], [2], [3], [1], [2], [3]]
data = np.append(data, col, 1)
data

°á°ú)
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 1],
       [5, 2],
       [6, 3],
       [7, 1],
       [8, 2],
       [9, 3]])

1ºÎÅÍ 9±îÁö ¹è¿­À» reshape ÇÏ¿© 9Çà 1¿­ÀÇ ¹è¿­·Î º¯È¯ÇÑ´Ù.

Ä®·³À» Ãß°¡ÇÑ´Ù. 
np.append(data, col, 1)

X_data = data[:, 0]
Y_data = data[:, 1]

skf = StratifiedKFold(n_splits=3)
#skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=1234)

for train_index, test_index in skf.split(X_data, Y_data):
  print('--------------------')
  X, test_X = X_data[train_index], X_data[test_index]
  print('X=', X, 'test_X=', test_X)

°á°ú)
--------------------
X= [4 5 6 7 8 9] test_X= [1 2 3]
--------------------
X= [1 2 3 7 8 9] test_X= [4 5 6]
--------------------
X= [1 2 3 4 5 6] test_X= [7 8 9]

¿©±â¼­´Â n_splits=3À¸·Î 3°³ÀÇ ¼ÂÆ®·Î ³ª´«´Ù.
3°³ÀÇ ¼ÂÆ®¿¡¼­ ÇÑ°³ÀÇ Å×½ºÆ® ¼ÂÆ®¿Í µÎ°³ÀÇ ÇнÀ ¼ÂÅÍ µ¥ÀÌÅÍ·Î 3¹ø ¹Ýº¹ÇÑ´Ù.

¸¸¾à ÄÉ¶ó½º ÇнÀÇÑ´Ù¸é

Çϳª, n_splits °¹¼ö ¸¸Å­ ¼ÂÆ®°¡ ¸¸µé¾îÁø´Ù.
µÑ, n_splits °¹¼ö ¸¸Å­ ¹Ýº¹ÇÏ¿© ÇнÀ°ú Æò°¡¸¦ ÇÑ´Ù.

StratifiedKFold¸¦ shuffle, random_state¸¦ ¾ÈÇÏ¸é ¼ø¼­´ë·Î ºÐ¸®µÈ´Ù.

Å×½ºÆ® 1 2 3 
ÇнÀ 4 5 6
ÇнÀ 7 8 9
ÇнÀ 1 2 3 Å×½ºÆ® 4 5 6 ÇнÀ 7 8 9
ÇнÀ 1 2 3 ÇнÀ 4 5 6 Å×½ºÆ® 7 8 9

Æò°¡1
Æò°¡2
Æò°¡3


Æò±Õ( Æò°¡ 1, 2, 3)



Àüü ¼Ò½º
from sklearn.model_selection import StratifiedKFold
import numpy as np

data = np.arange(1, 10).reshape(9, 1)
col = [[1], [2], [3], [1], [2], [3], [1], [2], [3]]
data = np.append(data, col, 1)

X_data = data[:, 0]
Y_data = data[:, 1]

skf = StratifiedKFold(n_splits=3)
#skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=1234)

for train_index, test_index in skf.split(X_data, Y_data):
  print('--------------------')
  X, test_X = X_data[train_index], X_data[test_index]
  print('X=', X, 'test_X=', test_X)