Special Session 78: Special Session on Mathematics of Data Science and Applications

Beyond Unconstrained Features: Neural Collapse for Shallow Neural Networks with General Data
Shuyang Ling
NYU Shanghai
Peoples Rep of China
Co-Author(s):    Wanli Hong
Abstract:
Neural collapse is a phenomenon that occurs at the terminal phase of training of DNNs. The features of the data in the same class collapse to their respective sample means and the sample means exhibit an ETF. In the past few years, there have been a surge of works that focus on explaining why the ${\cal NC}$ occurs and how affects generalization. Since the DNNs is notoriously difficult to analyze, most works mainly focus on the unconstrained feature model (UFM). While the UFM explains the ${\cal NC}$ to some extent, it fails to provide a complete picture of how the network architecture and the dataset affect ${\cal NC}$. In this work, we focus on shallow ReLU neural network and try to understand how the width, depth, data dimension, and statistical property of the training dataset influence the neural collapse. We provide a complete characterization on when the ${\cal NC}$ occurs for two or three layer neural works. This sufficient condition depends on the data dimension, sample size, and the signal-to-noise ratio in the data instead of the width. For three-layer neural networks, we show that the ${\cal NC}$ occurs as long as the second-layer is sufficiently wide. Moreover, we show that even if the ${\cal NC}$ occurs, the generalization can still be bad provided that the SNR in the data is sufficiently lower. Our results significantly extend the state-of-the-art theoretical analysis of ${\cal NC}$ under UFM to shallow nonlinear feature models, and characterize the emergence of ${\cal NC}$ via data properties and network architecture.