Abstract:
We study active learning of homogeneous ss-sparse halfspaces in RdRd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most ηη for a parameter η∈[0,12)η∈[0,12), known as the bounded noise. Even in the presence of mild label noise, i.e. ηη is a small constant, this is a challenging problem and only recently have label complexity bounds of the form ˜O(s⋅polylog(d,1ϵ))~O(s⋅polylog(d,1ϵ)) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity ˜O((slnd/ϵ)poly(1/(1−2η)))~O((slnd/ϵ)poly(1/(1−2η))), which is label-efficient only when the noise rate ηη is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of ss-sparse halfspaces, with a label complexity of ˜O(s(1−2η)4polylog(d,1ϵ))~O(s(1−2η)4polylog(d,1ϵ)). This is the first efficient algorithm with label complexity polynomial in 11−2η11−2η in this setting, which is label-efficient even for ηη arbitrarily close to 1212. Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.
Chat is not available.