»çÀÌŶ·±ÀÇ StratifiedKFold¿¡ ´ëÇØ ¾Ë¾Æ º¸ÀÚ. tain, test°¡ µ¥ÀÌÅÍ°¡ ºÎÁ·ÇÒ¶§³ª °úÀûÇÕÀÌ ¹ß»ýÇÒ¶§ ±³Â÷ °ËÁõÀÌ ÇÊ¿ä ÇÏ´Ù. ±³Â÷ °ËÁõÀÌ ÇÊ¿ä ÇÒ¶§, »ç¿ëÇÏ´Â °ÍÀÌ »çÀÌŶ·±ÀÇ StratifiedKFoldÀÌ´Ù. from sklearn.model_selection import StratifiedKFold
import numpy as np StratifiedKFold ¶óÀ̺귯¸®¸¦ ÀÓÆ÷Æ® ÇÑ´Ù. data = np.arange(1, 10).reshape(9, 1)
col = [[1], [2], [3], [1], [2], [3], [1], [2], [3]] data = np.append(data, col, 1) data °á°ú) array([[1, 1], [2, 2], [3, 3], [4, 1], [5, 2], [6, 3], [7, 1], [8, 2], [9, 3]]) 1ºÎÅÍ 9±îÁö ¹è¿À» reshape ÇÏ¿© 9Çà 1¿ÀÇ ¹è¿·Î º¯È¯ÇÑ´Ù. Ä®·³À» Ãß°¡ÇÑ´Ù. np.append(data, col, 1) X_data = data[:, 0]
Y_data = data[:, 1] skf = StratifiedKFold(n_splits=3) #skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=1234) for train_index, test_index in skf.split(X_data, Y_data): print('--------------------') X, test_X = X_data[train_index], X_data[test_index] print('X=', X, 'test_X=', test_X) °á°ú) -------------------- X= [4 5 6 7 8 9] test_X= [1 2 3] -------------------- X= [1 2 3 7 8 9] test_X= [4 5 6] -------------------- X= [1 2 3 4 5 6] test_X= [7 8 9] ¿©±â¼´Â n_splits=3À¸·Î 3°³ÀÇ ¼ÂÆ®·Î ³ª´«´Ù. 3°³ÀÇ ¼ÂÆ®¿¡¼ ÇÑ°³ÀÇ Å×½ºÆ® ¼ÂÆ®¿Í µÎ°³ÀÇ ÇнÀ ¼ÂÅÍ µ¥ÀÌÅÍ·Î 3¹ø ¹Ýº¹ÇÑ´Ù. ¸¸¾à ÄÉ¶ó½º ÇнÀÇÑ´Ù¸é Çϳª, n_splits °¹¼ö ¸¸Å ¼ÂÆ®°¡ ¸¸µé¾îÁø´Ù. µÑ, n_splits °¹¼ö ¸¸Å ¹Ýº¹ÇÏ¿© ÇнÀ°ú Æò°¡¸¦ ÇÑ´Ù. StratifiedKFold¸¦ shuffle, random_state¸¦ ¾ÈÇÏ¸é ¼ø¼´ë·Î ºÐ¸®µÈ´Ù.
Àüü ¼Ò½º from sklearn.model_selection import StratifiedKFold
import numpy as np data = np.arange(1, 10).reshape(9, 1) col = [[1], [2], [3], [1], [2], [3], [1], [2], [3]] data = np.append(data, col, 1) X_data = data[:, 0] Y_data = data[:, 1] skf = StratifiedKFold(n_splits=3) #skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=1234) for train_index, test_index in skf.split(X_data, Y_data): print('--------------------') X, test_X = X_data[train_index], X_data[test_index] print('X=', X, 'test_X=', test_X) |