Opzioni
Discovering words and rules from speech input: an investigation into early morphosyntactic acquisition mechanisms
Marchetto, Erika
2009-12-14
Abstract
To acquire language proficiently, learners have to segment fluent speech
into units – that is, words -, and to discover the structural regularities underlying
word structure. Yet, these problems are not independent: in varying degrees, all
natural languages express syntax as relations between nonadjacent word
subparts. This thesis explores how developing infants come to successfully solve
both tasks. The experimental work contained in the thesis approaches this issue
from two complementary directions: investigating the computational abilities of
infants, and assessing the distributional properties of the linguistic input directed
to children.
To study the nature of the computational mechanisms infants use to
segment the speech stream into words, and to discover the structural regularities
underlying words, I conducted seventeen artificial grammar studies. Along these
experiments, I test the hypothesis that infants may use different mechanisms to
learn words and word-internal rules. These mechanisms are supposed to be
triggered by different signal properties, and possibly they become available at
different stages of development. One mechanism is assumed to compute the
distributional properties of the speech input. The other mechanism is
hypothesized to be non-statistical in nature, and to project structural regularities
without relying on the distributional properties of the speech input.
Infants at different ages (namely, 7, 12 and 18 months) are tested in their
abilities to detect statistically defined patterns, and to generalize structural
regularities appearing inside word-like units. Results show that 18-month-old
infants can both extract statistically defined sequences from a continuous stream
(Experiment 12), and find internal-word rules only if the familiarization stream is
segmented (Experiments 13 and 14). Twelve-month-olds can also segment words from a continuous stream (Experiment 5), but they cannot detect wordstraddling
sequences even if they are statistically informative (Experiments 15
and 16). In contrast, they readily generalize word-internal regularities to novel
instances after exposure to a segmented stream (Experiments 1-3 and 17), but not
after exposure to a continuous stream (Experiment 4). Instead, 7-month-olds do
not compute either statistics (Experiments 10 and 11) or within-word relations
(Experiments 6 and 7), regardless of input properties. Overall, the results suggest
that word segmentation and structural generalization rely on distinct
mechanisms, requiring different signal properties to be activated --that is, the
presence of segmentation cues is mandatory for the discovery of structural
properties, while a continuous stream supports the extraction of statistically
occurring patterns. Importantly, the two mechanisms have different
developmental trajectories: generalizations became readily available from 12
months, while statistical computations remain rather limited along the first year.
To understand how the computational selectivities and the limits of the
computational mechanisms match up with the limitations and the properties of
natural language, I evaluate the distributional properties of speech directed to
children. These analyses aim at assessing with quantitative and qualitative
measures whether the input children listen to may offer a reliable basis for the
acquisition of morphosyntactic rules. I choose to examine Italian, a language with
a rich and complex morphology, evaluating whether the word forms used in
speech directed to children would provide sufficient evidence of the
morphosyntactic rules of this language. Results show that the speech directed to
children is highly systematic and consistent. The most frequently used word
forms are also morphologically well-formed words in Italian: thus, frequency
information correlates with structural information -- such as the morphological
structure of words. While a statistical analysis of the speech input may provide a
small set of words occurring with high frequency, how learners come to extract
structural properties from them is another problem. In accord with the results of
the infant studies, I propose that structural generalizations are projected on a
different basis than statistical computations.
Overall, the results of both the artificial grammar studies an the corpus analysis are compatible with the hypothesis that the tasks of segmenting words from fluent speech, and that of learning structural regularities underlying word
structure rely on statistical and non-statistical cues respectively, placing
constraints on computational mechanisms having different nature and
selectivities in early development.
Diritti
open access
Visualizzazioni
1
Data di acquisizione
Apr 19, 2024
Apr 19, 2024