References

From MWCSWiki

Jump to: navigation, search

Contents

Categorization

  • Cohen, W. W. and Singer, Y. 1999. Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst. 17, 2 (Apr. 1999), 141-173. DOI= http://doi.acm.org.ezproxy.umw.edu:2048/10.1145/306686.306688 "Context-sensitive" in effect means non-linearity; in other words, it is no longer true that every free-text word (for us, explicit keyword) has an independent contribution to category-ness from all other words. Paper presents algorithms for learning these context-sensitive classifiers.

  • Bauer and Leake
    • Bauer, T. and Leake, D. B. 2001. Real time user context modeling for information retrieval agents. In Proceedings of the Tenth international Conference on information and Knowledge Management (Atlanta, Georgia, USA, October 05 - 10, 2001). H. Paques, L. Liu, and D. Grossman, Eds. CIKM '01. ACM, New York, NY, 568-570. DOI= http://doi.acm.org/10.1145/502585.502693
    • Bauer, T. L. and Leake, D. B. 2003. Detecting context-differentiating terms using competitive learning. SIGIR Forum 37, 2 (Sep. 2003), 4-17. DOI= http://doi.acm.org.ezproxy.umw.edu:2048/10.1145/959258.959259

(Bauer's and Leake's work is similar in that (a) it's based on categories, and (b) it's based on content (not collab filtering.) It's different in that (a) it's objective, not subjective, (b) it's based on free text, not structured features.) Another key difference between us and Bauer/Leake is that their scheme is based on implicit accessing of documents. They're snooping into the documents that the user is accessing and trying to infer the context from that. We, on the other hand, are allowing the user to explicitly assert category membership for items.

Collaborative filtering

Survey:

  • Adomavicius, G.; Tuzhilin, A. (June 2005), "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions", IEEE Transactions on Knowledge and Data Engineering 17 (6): 734–749, doi:10.1109/TKDE.2005.99, ISSN 1041-4347, http://portal.acm.org/citation.cfm?id=1070611.1070751 .
  • Herlocker, J. L.; Konstan, J. A.; Terveen, L. G.; Riedl, J. T. (January 2004), "Evaluating collaborative filtering recommender systems", ACM Trans. Inf. Syst. 22 (1): 5–53, doi:10.1145/963770.963772, ISSN 1046-8188, http://portal.acm.org/citation.cfm?id=963772 . survey includes a list of tasks/reasons for using recommender systems. I'm thinking that AQL is a new task/reason not on this list. Also, provides a taxonomy of "ways to evaluate" recommender systems, which includes some non-traditional measures (i.e., beyond just accuracy.) Worth thinking about for the user testing.

Recommendation Systems

  • Schafer, J. B., Konstan, J. A., and Riedl, J. 2002. Meta-recommendation systems: user-controlled integration of diverse recommendations. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 43-51. DOI= http://doi.acm.org.ezproxy.umw.edu:2048/10.1145/584792.584803
  • Han, E. (. and Karypis, G. 2005. Feature-based recommendation system. In Proceedings of the 14th ACM international Conference on information and Knowledge Management (Bremen, Germany, October 31 - November 05, 2005). CIKM '05. ACM, New York, NY, 446-452. DOI= http://doi.acm.org.ezproxy.umw.edu:2048/10.1145/1099554.1099683

Sparse data

  • H. Luo, C. Niu, R. Shen, and C. Ullrich, “A collaborative filtering framework based on both local user similarity and global user similarity,” Machine Learning, vol. 72, 2008, pp. 231-245. http://www.springerlink.com/content/e66708w753127510/ These people approach the "sparse data" problem as it applies to collab filtering: suppose you want to find similar users to X, but there aren't any/enough, because not enough people rated the same item? They do a friend-of-a-friend type of deal: instead of just using users similar to you, they include users that are similar to users who ARE similar to you, etc. Transitive closure. (Doesn't apply to us, since we're content-based, not collab)


Category building

sequential analysis

How do we know when to make a ruling / give up?

active learning

How do we know which instance is best to put in a trial list?

Summary:

  1. You can choose the sample based on:
    1. only samples that exceed some threshold on an "informativeness measure," or:
    2. only samples within an explicitly calculated "region of uncertainty," or:
    3. the sample with the highest informativeness meausure.
  2. What kinds of informativeness measures are there?
    1. Choose the sample you're least sure how to label (for binary classification, the one where P(yes) is closest to .5. This doesn't work for us, because there will be so many films with no keywords in common with canonical set.)
    2. Query By Committee (QBC): have multiple raters, and choose the sample that they most disagree about.
    3. Choose the sample that would impart the greatest change to the model if its label were known.
    4. Choose the sample that, if we know it, would help us minimize our future expected error the most.
  3. Batch-mode active learning means asking the user to label a bunch of things at once. Note that just querying the "n best" might not work well because of overlap in features between the n best.
Personal tools