This paper investigates the problem of cost reduction of data collection procedures. To select an adequate regression or classification model, a sample set of minimum sufficient size must be collected. This sample set is modelled according to follow the data generation hypotheses. Namely, the generalised linear regression models assume the independent and identically distributed target variable. The paper analyses several numerical methods of sample size estimation and compared them in practical terms. It includes statistic, heuristic and Bayesian methods. The practical goal of a sample set collection is modelling, some methods involve analysis of the model parameters. The computational experiment includes widely-used sample sets. The open-source code and the software are provided for the practitioners to use in the data collection planning.