RevisedKR2010paper
From MWCSWiki
Contents |
Reviewers Points
Reviewer 1
Strengths
- Related works section
- Motivation
Weaknesses
- Remove 'General Psychology' (not representative of general public)
- Add discussion about why this generally unrepresentative sample is ok.
- e.g., that people who will actually attempt to write RDF will have a relevantly similar background, or that the people we should focus on supporting are those with at least this much background since less background isn't going to make things easier; contrariwise, it's a weakness that people with more training weren't in the mix
- Revise sentences as from the design of the study it is obvious that representing n-ary relations is the focus of this study
- "one particularly revealing finding, which we focus on this paper, is that knowledge...is especially problematic to represent"
- Make how hypotheses are proved or disproved more clear. Refer to them in other places besides just hypotheses section.
- Better link hypotheses section with results section.
- Better state and explain hypotheses.
- Revise user study description.
- Separate into "procedure", "participants", "materials"
- Confused about what the sentences were in Part A
- Male or female subjects
- What were examples
- Were participants given principles for formal mapping
- How systematically did we choose sentences used
- Biased results because of sentence topics being relevant to students' interests
- Are psych students taught similar material in a class of theirs
- Does "after the first half of the test was collected" mean experiment had two parts?
- Check results again to make sure significant results can be backed by statistical test results
- Results classified as "correct, partial, incorrect, invalid", did authors create classification?
- Discuss whether this biased the results
- Suggested double-blind approach
- Learning effect because sentence 4 was always seen after sentence 3
- All findings in analysis section, "Chains and triangles", are related to not knowing principles
- Discuss whether students were taught principles in experiment section
- Section "predicate reification vs. predicate modifiers", authors say "participants using the visual representation found it much easier to modify a predicated..."
- If authors did not ask about the ease of use explicitly to participants than this claim is not valid. Results show that they did more mistakes which might show the lack of understanding/knowledge
Reviewer 2
Strengths
- Motivating representation of n-ary relations
- Presenting examples of sentences using n-ary relations
- Discussing techniques of predicate reification, predicate modifiers, relation reification
- Present an actual empirical study on knowledge formalization by average people
- Has experimental design, data, analysis, discussion of few past empirical user studies
- Relevant to KR
- Original contribution
- Related work discussed in sufficient detail
- Significant conceptual and technical contribution to KR user-tools
Weaknesses
- Description of experiment design is confusing
- Structure description more clearly
- "two-part test: the first part with paragraphs...that contained no n-ary relations...For the first part, the subject was directed...This section was designed...The subjects were then asked to read and express the following two sentences that do contain n-ary relations" -- Confused about which part we are discussing and when it switches to second part; does first half refer to first part?
- Perhaps the reason that the predicate-modifier-graph group had a harder time than the predicate-modifier-text group is that the former, but not the latter, had to distinguish attributes like "began" from relations like "located". This distinction might make sense to an implementer, but might not make sense to an average person.
Reviewer 3
Strengths
- Good to see work on humans encoding knowledge for the semantic web
Weaknesses
- "compatibility" section is unnecessary
- Missing statistical tests
- Incorrect analyses
- Reporting on statistics is sloppy
- ANOVA test
- Has to be done before pairwise comparisons
- Shows there are statistically significant differences in the population
- Report ANOVA results, sample size, degrees of freedom, F score, etc
- More detail about comparisons
- "Predicate reification vs predicate modifiers" section, we compare groups B & D, are we comparing A to B & C to D?
- What did we run the significance test on?
- "Significance" means the results are real; cannot say: "Each of these statements was actually more likely to be expressed completely by Group B than Group D, though this failed to reach a statistical significance threshold of 90%. (See figure 4.)"
- If we didn't reach significance threshold, statements are not more likely to be complete by B than D
- If test fails, there is no difference
- Confidence should be expressed as p values, not as % confidence
- Don't put 99.9995% confidence
- Pick a significance level as a threshold
- More information
- Sample sizes for analyses
- Standard deviations
- Don't ignore all decimal values in reporting values as percentages (did we round up to whole numbers for a reason?)

