Network-based dimensionality reduction of high-dimensional, low-sample-size datasets

In the field of data science, there are a variety of datasets that suffer from the high-dimensional, low-sample-size (HDLSS) problem; however, only a few dimensionality reduction methods exist that are applicable to address this type of problem, and there is no nonparametric solution to date. The purpose of this work is to develop a novel network-based (nonparametric) dimensionality reduction analysis (NDA) method, that can be effectively applied to HDLSS data. First, with the NDA method, the correlation graph of variables is specified. With a modularity-based community detection method, the set of modules is specified. Then, the linear combination of variables weighted by their eigenvector centralities (EVCs), defined as LVs, is determined. In the optional phase of variable selection, variables with low EVCs and low communality are ignored. Then, the set of LVs and the set of indicators belonging to the LVs are specified using the NDA method. NDA is applied to publicly available databases and compared with principal factoring with community analysis (PFA) methods. The results show that NDA can be effectively applied to HDLSS datasets as it outperforms the existing methods in terms of interpretability. In addition, the application of NDA is easier, since there is no need to specify the number of latent variables due to its nonparametric nature.

Morzsák

Oldal címe

Network-based dimensionality reduction of high-dimensional, low-sample-size datasets

Címlapos tartalom