Plan B

Current processing cycles are being devoted to the following basic question: Should I try to straddle two difficult topics, morphology and syntax, for my impending quals, or go for expediency and stick with one, staying the course on morphology?

Put more cynically, should I cling to that last idealistic drop of PhD motivation in my body, the drive to do something novel and exciting, the last, tenuous hope for a home run that will make the last 9 1/2 innings of drudgery seem worthwhile? Or just accept that those dreams are done and that now all I want is the paper reward, that piece of parchment suitable for framing and the little acronym that says: resistiré.

The Dreamer declaims the following:

  1. These ideas are exciting! They are novel, with nice linguistic foundations (albeit unorthodox), and could be a strong development in unsupervised and low-resource grammar learning, and in MT.
  2. The high bar for the quals are a bit self-imposed.
    1. The two morphology chapters plus the syntax smoothing (probably feasible for the spring) are sufficient for the quals, so I can still meet that deadline.
    2. With the smoothing completed, the grammar transformations are mostly done.
    3. Then I spend most of the final year on unsupervised learning, with the MT results limited to the most straightforward applications of it.
  3. If not syntax, what then? What novel work would you do in morphology to fill out a thesis? Especially since everyone and their cousin has taken a pass at it!

To which The Pragmatist retorts:

  1. They are exciting, but extremely speculative and risky. If you’d developed them in year two or even three, that would have been a great time to try something big. But we’re starting year five now, and it’s time to finish, not to finesse.
  2. Yes, but then you push more work to do after the quals, and do you really want to be here past May 2009?
    1. A bit hopeful, assuming mountains of SpeechLinks work doesn’t come crashing down, also no chance for a COLING paper, because it’s pretty clear that the current papers will occupy me fully through January 10th.
    2. Yes, but again no small piece of work. 6 months is a safe estimate, so that takes us through the NAACL deadline, without starting on the unsupervised learning, which is harder!
    3. I’d call it 18 months after the quals. Want to stay through December?
  3. Ah, you have me there a bit, but I can come up with something. Just watch me….

The Pragmatist’s Plan B

The mantra of Plan B is this: My hammer is morphology, so what can I nail? Put more concretely, we find languages with interesting morphology, model them, and evaluate the performance of the segmentation and its use in applications.

A little brainstorming:

  1. Models
    1. Basic concatenative morphology
    2. Functional and surface affixes
    3. Agglutinative morphology
    4. Template morphology
    5. Affix ordering (e.g. Chingtang)
    6. Phonological modeling
  2. Learning
    1. EM and variants
    2. Log-linear: contrastive, etc.
    3. Full semi-supervised, i.e. combine small supervised model with unsupervised.
    4. Using Y&W approaches to discover patterns, in addition to specification.
  3. MT
    1. Segmentation and training techniques (IN PROGRESS).
    2. Factored morphological models.
    3. Translation with isolating language, mapping morphemes to particles (Vietnamese, Chinese).
  4. Other applications
    1. Dependency parsing? Train McDonald on segmented data.
    2. Joint tagging and segmentation.

Now let’s translate this into chapters for the quals:

  1. Models of Morphological Systems
  2. Learning Morphology
    1. EM and variants
    2. Contrastive estimation
    3. Partial specification (Y&W proposals)
  3. Translating Morphemes
    1. Persian and Spanish, maybe Czech (but what to offer over G&M).
  4. Proposed: Factored Translation
    1. Basic improvements
    2. Isolating languages: identifying particles, etc.
  5. Proposed: Other Applications
    1. Joint tagging
    2. Dependency parsing

I’m not sure I like this division of the work, especially separating the models from the learning. Let’s try another:

  1. Learning Morphology by Specification
    1. Basic concatenative model, with stem change, orders 1-3.
    2. EM and variants.
    3. Contrastive estimation (note: this requires generation!)
  2. Beyond Concatenative Morphology
    1. Agglutinative, templatic, permutative.
    2. Functional and surface affix modeling.
    3. Phonological modeling.
  3. Translating Morphemes
    1. Persian work, with patterns and above models.
    2. Some other exciting language(s).
  4. Proposed: Learning from Partial Specifications
    1. Use hypotheses proposals from Y&W models and others, extend specification.
    2. Also semi-supervised training, with small annotated data set.
  5. Proposed: Factored Translation
    1. Basic improvements on morpheme translation.
    2. Isolating languages: Identifying particles, etc.
  6. Proposed: Other Applications
    1. Dependency parsing of morphemes (McDonald plus certain attachment constraints).
    2. Joint tagging and segmentation (anything really to be done here?).

Ok, I think that’s better. Call it a plan.

Leave a Reply

You must be logged in to post a comment.