supervised learning - What is the reason of splitting data to training/testing in SOM? -


i doing research , reading papers using som algorithm. not understand logic people splitting dataset training/test sets som. mean, example while c4.5 decision tree used, trained structure includes rules applied when new dataset (test) comes classify data there. however, kind of rules or similar generated after system trained via som? differentce if apply 100% of data som system instead using 30% training first using 70% testing? answers in advance.

for every system data dependent, supposed exposed new data in future, holding out part of existing data testing gives ability robustly predict how predict once deployed. som, learn specific data embedding. if use data training , later on want use trained som on never seen before data - have no guarantees how behave (how representation task @ hand). having hold out gives ability test in controlled environment - train som representation on part of data , apply embed hold out (test), simulates "what happen if new data , want use som on it". same applies every single algorithm using data, no matter if supervised or not, if going deploy based on model, need test set building confidence in own solution. if, on other hand, doing exploratory analysis of "closed" set of data - unsupervised methods can applied of them (if asking "what structure in particular dataset).


Comments