Plan B
Current processing cycles are being devoted to the following basic question: Should I try to straddle two difficult topics, morphology and syntax, for my impending quals, or go for expediency and stick with one, staying the course on morphology?
Put more cynically, should I cling to that last idealistic drop of PhD motivation in my body, the drive to do something novel and exciting, the last, tenuous hope for a home run that will make the last 9 1/2 innings of drudgery seem worthwhile? Or just accept that those dreams are done and that now all I want is the paper reward, that piece of parchment suitable for framing and the little acronym that says: resistiré.
The Dreamer declaims the following:
- These ideas are exciting! They are novel, with nice linguistic foundations (albeit unorthodox), and could be a strong development in unsupervised and low-resource grammar learning, and in MT.
- The high bar for the quals are a bit self-imposed.
- The two morphology chapters plus the syntax smoothing (probably feasible for the spring) are sufficient for the quals, so I can still meet that deadline.
- With the smoothing completed, the grammar transformations are mostly done.
- Then I spend most of the final year on unsupervised learning, with the MT results limited to the most straightforward applications of it.
- If not syntax, what then? What novel work would you do in morphology to fill out a thesis? Especially since everyone and their cousin has taken a pass at it!
To which The Pragmatist retorts:
- They are exciting, but extremely speculative and risky. If you’d developed them in year two or even three, that would have been a great time to try something big. But we’re starting year five now, and it’s time to finish, not to finesse.
- Yes, but then you push more work to do after the quals, and do you really want to be here past May 2009?
- A bit hopeful, assuming mountains of SpeechLinks work doesn’t come crashing down, also no chance for a COLING paper, because it’s pretty clear that the current papers will occupy me fully through January 10th.
- Yes, but again no small piece of work. 6 months is a safe estimate, so that takes us through the NAACL deadline, without starting on the unsupervised learning, which is harder!
- I’d call it 18 months after the quals. Want to stay through December?
- Ah, you have me there a bit, but I can come up with something. Just watch me….
The Pragmatist’s Plan B
The mantra of Plan B is this: My hammer is morphology, so what can I nail? Put more concretely, we find languages with interesting morphology, model them, and evaluate the performance of the segmentation and its use in applications.
A little brainstorming:
- Models
- Basic concatenative morphology
- Functional and surface affixes
- Agglutinative morphology
- Template morphology
- Affix ordering (e.g. Chingtang)
- Phonological modeling
- Learning
- EM and variants
- Log-linear: contrastive, etc.
- Full semi-supervised, i.e. combine small supervised model with unsupervised.
- Using Y&W approaches to discover patterns, in addition to specification.
- MT
- Segmentation and training techniques (IN PROGRESS).
- Factored morphological models.
- Translation with isolating language, mapping morphemes to particles (Vietnamese, Chinese).
- Other applications
- Dependency parsing? Train McDonald on segmented data.
- Joint tagging and segmentation.
Now let’s translate this into chapters for the quals:
- Models of Morphological Systems
- Learning Morphology
- EM and variants
- Contrastive estimation
- Partial specification (Y&W proposals)
- Translating Morphemes
- Persian and Spanish, maybe Czech (but what to offer over G&M).
- Proposed: Factored Translation
- Basic improvements
- Isolating languages: identifying particles, etc.
- Proposed: Other Applications
- Joint tagging
- Dependency parsing
I’m not sure I like this division of the work, especially separating the models from the learning. Let’s try another:
- Learning Morphology by Specification
- Basic concatenative model, with stem change, orders 1-3.
- EM and variants.
- Contrastive estimation (note: this requires generation!)
- Beyond Concatenative Morphology
- Agglutinative, templatic, permutative.
- Functional and surface affix modeling.
- Phonological modeling.
- Translating Morphemes
- Persian work, with patterns and above models.
- Some other exciting language(s).
- Proposed: Learning from Partial Specifications
- Use hypotheses proposals from Y&W models and others, extend specification.
- Also semi-supervised training, with small annotated data set.
- Proposed: Factored Translation
- Basic improvements on morpheme translation.
- Isolating languages: Identifying particles, etc.
- Proposed: Other Applications
- Dependency parsing of morphemes (McDonald plus certain attachment constraints).
- Joint tagging and segmentation (anything really to be done here?).
Ok, I think that’s better. Call it a plan.