There have been some suggestions in linguistics and cognitive science that humans process continuous speech by routinely chunking it up into smaller units. The nature of the process is open to debate, which is complicated by the apparent existence of two entirely different chunking processes, both of which seem to be warranted by the limitations of working memory. To overcome them, humans seem to both combine items into larger units for future retrieval (usage-based chunking), and partition incoming streams into temporal groups (perceptual chunking). To determine linguistic properties and cognitive constraints of perceptual chunking, most previous research has employed short-constructed stimuli modeled on written language. In contrast, we presented linguistically naïve listeners with excerpts of natural speech from corpora and collected their intuitive perceptions of chunk boundaries. We then used mixed-effects logistic regression models to find out to what extent pauses, prosody, syntax, chunk duration, and surprisal predict chunk boundary perception. The results showed that all cues were important, suggesting cue degeneracy, but with substantial variation across listeners and speech excerpts. Chunk duration had a strong effect, supporting the cognitive constraint hypothesis. The direction of the surprisal effect supported the distinction between perceptual and usage-based chunking.