CollexDesignDecisions

From MWCSWiki

Jump to: navigation, search

Design decisions 03/31/09:

  1. The lexeme page is a sort of one-stop-shopping with links to form and sense pages. It shows forms and senses clearly delineated by POS. All forms and all senses for a given POS are shown, even in the rare case where some of the forms do not apply to all of the senses or vice versa.
  2. The search results page will show two sections: the lexemes matched, and the forms matched by the query string. Each lexeme thus shown will have the gloss in parens (so as to distinguish.) Each form thus shown will have its parent lexeme, and gloss and POS shown also.
  3. Forms are EITHER associated with a sense, OR associated with a lexeme and POS (note: the DB doesn't do it exactly this way, but a form is associated with a lexeme and has an attribute of POS.)
  4. A sense page should only show (a) forms associated with that sense and (b) forms associated with ALL senses of that POS of the parent lexeme but are NOT explicitly invalidated for that sense. (e.g., the "to hang a person" sense page shows "hanged" because it's an (a), and "hangs" because it's a (b), but not "hung" because it's a (b) but invalidated for "to hang a person.")
  5. A form page should only show (a) senses associated with that form and (b) senses associated with that POS for the lexeme for ALL forms but are NOT explicitly invalidated for that form. (e.g., the "hanged" form page shows "to hang a person" because it's an (a) but not "to hang your clothes" because it's explicitly invalidated for that sense.)

  • Lexeme
  • Lexemes have forms
  • Each form can be given with no further information, or can be annotated with one (or many) standard linguistic tags (past tense, plural, whatever) (this is pending confirmation from Judith/Paul/whoever)
  • Sample sentences are associated with FORMS, not lexemes
  • Morphemes
  • When the user enters a word, we gently/proactively allow them to specify morphemes. Either we say, "hey bud, are there any morphemes here you'd like to specify?" or "hey, this word begins with 'un-', and I have an "un-" on record; is that applicable here?"

Open Questions:

  • What does the user think they're doing when they enter a "word"? Do they think they're entering a lexeme, or a form, or do we offer both options, or do we delay that until we find out whether they checked the "this is a form of" option?

Should we have a separate DB for each language? Or one common to all languages?

Pros for separate DBs:

  • Referential integrity
  • Easier to code
  • No development changes required (as of 2/10)
  • Bulk import -- jeopardizes all languages?

Pros for one common DB:

  • Admin intervention not needed for creating new DB
  • Searches could be cross-language
  • Words could be linked between languages

Conclusion on 2/10/09: speed to market is the driving factor, therefore let's go with separate DBs.


Sample sentences:

Resolved:

  1. Every sample sentence must have one (or more) of: native, roman, IPA.
  2. When the user enters the sentence, they will be presented with the "do you want to link to these words?" page, and this should only use one of the script types.
  3. If a word is linked to a sentence, it is linked to the sentence, period. (Doesn't matter if it was linked by means of choosing roman, native, or whatever.)
  4. We'll store the sentence marked up, rather than have a separate DB field called "offsetWithinSentence" or whatever. So, we might store: "I am <wordID:1967>really</wordID:1967> <wordID:9987>hungry</wordID:9987>!!"
Personal tools