SemanticQueryProject
From MWCSWiki
Contents |
Under Construction! Please Visit Reserve Page. Page Will Be Available Shortly
Fall semester planning
- Find a way to get the quantity of data in the inspector pane under control. (Note that this is "okay" since our first experiment is going to focus on the formulating-pivot-query question, not the deal-with-dirty-data question.)
- Choose simpler resources for experiment.
- Do some data "cleansing" to remove Wikipedia bookkeeping garbage.
- Prepare a list of predicates that they should consider.
- Implement the save/load feature
- Estimate size of "180 degree" task
- If not hard, implement this task. :)
- Implement session signon form
- Compare existing sem web browsers (Tabulator, Marbles, etc.) enough to pick one for our control group. note that it must be (a) easy to use, (b) see-one-resource-per-page paradigm, and (c) pointable at dbpedia.
Experimental design
Control group and experimental group. Question is: does Smeagol allow users to pose pivot queries more effectively than users with only a generic sem web browser?
- Control group: Marbles, demo on how to use Marbles, 4 progressively less-scripted examples, the early two of which point and click them to a small number of concrete results. Then, we measure the other two questions by "how often can people get a small number of answers by browsing?"
- Experimental group: Smeagol, demo on how to use Smeagol, 4 progressively less-scripted examples, the early two of which point and click them to an exhaustive list of results. Then, we measure the other two questions by "how often can people get a whole list of results by pivoting?" (Note: if a user instead uses Smeagol in a Marbles fashion and gets a small list of concrete results, this is marked as a failure.)
One idea of how to measure this is: subjects electronically submit their answers, and we measure how many answers they get in a given timeframe. Marbles and non-pivot Smeagol people will have to be doing lots of traversing and cut/paste, where Smeagle pivot people can copy a list.
Next steps as of 7/15
- Implement high-priority changes
- Do some deep thinking about research question (or are we even there yet?)
- Redo packet of materials to script Dickens more fully.
- Take a nap until end of August. :)
Notes from first experiment
- Lessons learned from first experiment
- wildcarding is butt hard to understand
- possible reasons:
- hard to grasp what it is
- hard to grasp why it would be useful
- hard to grasp how to do it for a particular case
- we don't want examples (probably) that make it easy to avoid wildcarding
- possible reasons:
- dirty/incomplete/inconsistent data is a problem
- Tweaks to demo
- We need to explain sem web concepts better
- triple
- "they're all out there" (but do they need to know that?)
- pivot
- draw/show a graph
- triple
- We need to explain wildcarding better
- don't even use the term wildcard or any other noun. just speak of it as "see others like this"
- We need to explain sem web concepts better
- New experiment come early Sept.
- Demo (basically how Aaron had it, with more sem web concepts. includes UMW example.)
- Group walkthrough (Aaron does it and everyone else carries out the operations with him. includes a second example.)
- Packet item #1: heavily scripted Dickens.
- Packet item #2: Batman
Short-term task list
Lit review
High-risk items:
- Which toolkit/technology to use? (Ajax? Toolkit? Java app?) (AC)
- Connecting programmatically to Sindice/Falcons, and make sure no rate limits, latency problems, licensing issues, etc. (SD)
- Connecting to dbpedia SPARQL endpoint, and make sure no rate limits, latency problems, licensing issues, etc. (JH)
Areas to explore:
- Anything that advertises itself as "a user-friendly alternative to SPARQL"
- ISPARQL
- MashQL
- Things that relate to "being able to see/understand the data before (or while) querying"
- Older design efforts like:
- RABBIT
- QBE
- Older design efforts like:
- Psychological research into how people formulate questions
Technology choices
- Identify one or two key "hard" aspects the GUI is likely to have
- Investigate language/platform alternatives: roughly prototype the "hard" aspect(s) in:
- Java Swing app
- JavaScript/Ajax
- JavaScript/Ajax with a toolkit like ExtJS or Dojo
Possible research questions
<table border=1 cellpadding=5> <tr><th>key term</th><th>question</th><th>novelty</th><th>utility</th><th>doability</th></tr> <tr><td>term discovery</td><td>How does the user discover appropriate terms for a query they already have in mind?</td><td>low</td><td>high</td><td>medium</td></tr> <tr><td>construction ease</td><td>How does the user construct a simple query in something other than SPARQL?</td><td>low</td><td>high</td><td>medium</td></tr> <tr><td>complex construction</td><td>How does the user construct a complex (pivot) query?</td><td>medium</td><td>???</td><td>medium-high</td></tr> <tr><td>query discovery</td><td>How does the user discover what kinds of things are useful/possible to ask about a topic?</td><td>high (???)</td><td>medium-high</td><td>medium-low</td></tr> <tr style="color:grey;"><td>statistics</td><td>Is it useful to present the user with statistical information about predicate/class/whatever usage, so they know what's commonly in use? (tied into query discovery)</td><td>high</td><td>high</td><td>high</td></tr> <tr><td>reformulation</td><td>Building an interface which helps a user successively get to the query (and answer) they intend</td><td>medium-low</td><td>medium</td><td>???</td></tr> </table>
High-level design issues
- Are we querying a KB, or are we querying "the Sem Web?"
- How does the user find predicates/types/resources/whatever to include in the query? (Step 1)
- How does the user visualize what the graph actually looks like?
- How does the user compose a complex query without using a query language (Step 2)

