Desiderata for a morphology learner
Definitions
First, we define some concepts:
- stem
The underlying base of a non-compound word, possibly corresponding to the word’s lemma form or root, but we view the choice of stem as a modeling decision. We use the word “stem” and not “root,” to include more shallow types of analysis that may seek only to identify the affixes (and what’s left behind), and not to recover the fundamental word root. - stem class
The type of the stem, often related to the familiar notion of part-of-speech, which determines which affixes may join with it. For more limited morphological analysis, we may constrain the stem and final word classes to be equal. - surface affix
What is normally referred to as an affix, realized as zero or more characters of text. - functional affix
An abstract affix that expresses a single underlying function of a surface affix, e.g. Number, Person, Tense, Aspect, etc., which may determine the final word class. Note that multiple functional affixes often map to a single surface affix. - affix vector
A value in the space (either surface or functional) of all possible affix combinations allowed in a language. - affix position
A set of one or more affixes that are mutually dependent and thus modeled jointly, occupying a single index in an affix vector. The concept of a position is also useful for specifying the mapping between functional and surface affixes. For example, in Spanish we may have multiple function positions for person, tense, etc. mapping to a single, “verb suffix” surface position.
Word Formation
Next, we describe a simplified model of word formation that, as suggested above, ignores stem-to-word class changes from affixes, which is sufficient for many applications, such as smoothing to address data sparsity. This framework allows us to model, for example, “working” as verbal stem “work” plus progressive suffix “ing,” but not “worker” as “work” plus nominalizing suffix “er.” Voilà:
- Select a word class t, given the utterance context.
- Select a word stem s in t, given the utterance context.
- Select a vector of functional affixes f, given (t, s) and the utterance context.
- Given (t, s, f), select a stem transformation (possibly identity) to produce the final stem s’.
- Given (t, s’, f), map f to a surface affix vector a.
- Generate the final word from w from (t, s’, a). For most familiar languages, this process is deterministic, but there are languages that allow some affix permutations with each other and even the stem.
Now we extend this process with a few modifications to handle affixes that change the class of the stem:
- Select a word class t, given the utterance context.
- Select a stem class c such that t is derivable, that is, c admits one or more affix vectors that will produce a final word class t when adjoined to a stem of class c.
- Select a word stem s in c, given the utterance context.
- Select a vector of functional affixes f, given (t, c, s) and the utterance context, such that f maps the stem class c to the final word class t.
- Given (t, c, s, f), select a stem transformation (possibly identity) to produce the final stem s’.
- Given (t, c, s’, f), map f to a surface affix vector a.
- Generate the final word from w from (t, c, s’, a).
Desiderata
Finally, the promised desiderata for a morphology learner:
- If an analysis for word w with the stem (c, s) is correct, we expect to observe a number of other words that also have an analysis with (c, s), but with different, valid affix vectors.
- Conversely, the hypothesis that a word is atomic (unanalyzed) is an implicit claim that the word will not appear as the stem in other words with affixes.
- Homographs make things more difficult, as one word token may appear with different classes, some atomic and some not, e.g. “get caught up in the ins and outs of something” or Persian “(در در(ها” (”dr dr(hA)” = “at the door(s)”, an atomic preposition plus inflecting noun). Thus context must be part of the model and learning process, and a word’s marginal hypothesis distribution should never be too skewed, especially between atomic and analyzed classes.
- It would be nice to be able to integrate a discovery process, such as the method of Yarowski and Wicentowski, with (possibly partial) knowledge of the morphological system.
- Another aspect of Y&W that would be nice: Modeling the expectation of regularity over the relative frequency between different related word pairs, e.g. we expect
count(walk)/count(walking) to be roughly equal tocount(singe)/count(singeing).