SemanticQueryProject

From MWCSWiki

Jump to: navigation, search

Contents

Under Construction! Please Visit Reserve Page. Page Will Be Available Shortly

Fall semester planning

  1. Find a way to get the quantity of data in the inspector pane under control. (Note that this is "okay" since our first experiment is going to focus on the formulating-pivot-query question, not the deal-with-dirty-data question.)
    1. Choose simpler resources for experiment.
    2. Do some data "cleansing" to remove Wikipedia bookkeeping garbage.
    3. Prepare a list of predicates that they should consider.
  2. Implement the save/load feature
  3. Estimate size of "180 degree" task
    1. If not hard, implement this task. :)
  4. Implement session signon form
  5. Compare existing sem web browsers (Tabulator, Marbles, etc.) enough to pick one for our control group. note that it must be (a) easy to use, (b) see-one-resource-per-page paradigm, and (c) pointable at dbpedia.

Experimental design

Control group and experimental group. Question is: does Smeagol allow users to pose pivot queries more effectively than users with only a generic sem web browser?

  • Control group: Marbles, demo on how to use Marbles, 4 progressively less-scripted examples, the early two of which point and click them to a small number of concrete results. Then, we measure the other two questions by "how often can people get a small number of answers by browsing?"
  • Experimental group: Smeagol, demo on how to use Smeagol, 4 progressively less-scripted examples, the early two of which point and click them to an exhaustive list of results. Then, we measure the other two questions by "how often can people get a whole list of results by pivoting?" (Note: if a user instead uses Smeagol in a Marbles fashion and gets a small list of concrete results, this is marked as a failure.)

One idea of how to measure this is: subjects electronically submit their answers, and we measure how many answers they get in a given timeframe. Marbles and non-pivot Smeagol people will have to be doing lots of traversing and cut/paste, where Smeagle pivot people can copy a list.

Next steps as of 7/15

  • Implement high-priority changes
  • Do some deep thinking about research question (or are we even there yet?)
  • Redo packet of materials to script Dickens more fully.
  • Take a nap until end of August. :)

Notes from first experiment

  1. Lessons learned from first experiment
  2. wildcarding is butt hard to understand
    1. possible reasons:
      1. hard to grasp what it is
      2. hard to grasp why it would be useful
      3. hard to grasp how to do it for a particular case
    2. we don't want examples (probably) that make it easy to avoid wildcarding
  3. dirty/incomplete/inconsistent data is a problem
  4. Tweaks to demo
    1. We need to explain sem web concepts better
      1. triple
        1. "they're all out there" (but do they need to know that?)
      2. pivot
      3. draw/show a graph
    2. We need to explain wildcarding better
      1. don't even use the term wildcard or any other noun. just speak of it as "see others like this"
  5. New experiment come early Sept.
    1. Demo (basically how Aaron had it, with more sem web concepts. includes UMW example.)
    2. Group walkthrough (Aaron does it and everyone else carries out the operations with him. includes a second example.)
    3. Packet item #1: heavily scripted Dickens.
    4. Packet item #2: Batman

Short-term task list

Lit review

High-risk items:

  1. Which toolkit/technology to use? (Ajax? Toolkit? Java app?) (AC)
  2. Connecting programmatically to Sindice/Falcons, and make sure no rate limits, latency problems, licensing issues, etc. (SD)
  3. Connecting to dbpedia SPARQL endpoint, and make sure no rate limits, latency problems, licensing issues, etc. (JH)

Areas to explore:

  1. Anything that advertises itself as "a user-friendly alternative to SPARQL"
    • ISPARQL
    • MashQL
  2. Things that relate to "being able to see/understand the data before (or while) querying"
    • Older design efforts like:
      • RABBIT
      • QBE
  3. Psychological research into how people formulate questions

Technology choices

  • Identify one or two key "hard" aspects the GUI is likely to have
  • Investigate language/platform alternatives: roughly prototype the "hard" aspect(s) in:
    • Java Swing app
    • JavaScript/Ajax
    • JavaScript/Ajax with a toolkit like ExtJS or Dojo

Possible research questions

<table border=1 cellpadding=5> <tr><th>key term</th><th>question</th><th>novelty</th><th>utility</th><th>doability</th></tr> <tr><td>term discovery</td><td>How does the user discover appropriate terms for a query they already have in mind?</td><td>low</td><td>high</td><td>medium</td></tr> <tr><td>construction ease</td><td>How does the user construct a simple query in something other than SPARQL?</td><td>low</td><td>high</td><td>medium</td></tr> <tr><td>complex construction</td><td>How does the user construct a complex (pivot) query?</td><td>medium</td><td>???</td><td>medium-high</td></tr> <tr><td>query discovery</td><td>How does the user discover what kinds of things are useful/possible to ask about a topic?</td><td>high (???)</td><td>medium-high</td><td>medium-low</td></tr> <tr style="color:grey;"><td>statistics</td><td>Is it useful to present the user with statistical information about predicate/class/whatever usage, so they know what's commonly in use? (tied into query discovery)</td><td>high</td><td>high</td><td>high</td></tr> <tr><td>reformulation</td><td>Building an interface which helps a user successively get to the query (and answer) they intend</td><td>medium-low</td><td>medium</td><td>???</td></tr> </table>

High-level design issues

  • Are we querying a KB, or are we querying "the Sem Web?"
  • How does the user find predicates/types/resources/whatever to include in the query? (Step 1)
    • How does the user visualize what the graph actually looks like?
  • How does the user compose a complex query without using a query language (Step 2)
Personal tools