CPSC 370P Database Essentials and Data Mining
From MWCSWiki
Contents |
CPSC 370P Database Essentials and Data Mining
Prerequisite: CSPC 230
About Databases
There's no technology today more prevalent and practical than database systems. Nearly every application domain of computer science is generating large volumes of information – whether scientific observations, marketing data, or English text – and demanding that it be stored, retrieved, and analyzed. Indeed, the primary function of many of today’s computer systems is this ability to help users access their information.
Database design is the art of modeling and representing large quantities of information so that software applications - and people - can use it effectively. It requires insight into how data is used and how it should be structured. It's an exciting and challenging area that crosses over into everything from medical research to space exploration to baseball scouting to voting patterns.
About Data Mining
You hear talk today about the "information explosion": computer systems are being used to collect data far more rapidly than it can be analyzed. Consider the atmospheric readings that are taken every minute from every corner of the globe, or the purchasing information that supermarkets collect every time a customer swipes their card at the checkout counter. What good does it do to faithfully record zillions of data records if it just ends up as meaningless gigabytes on a disk? How does that help us actually learn anything?
Data mining is the study of extracting useful knowledge from large amounts of information. Countless individual readings are worthless, but if they can be examined and patterns can be found, it can lead to a deeper understanding of the issues concealed within. What is the true cause of a cancerous outbreak? Which of a group of stocks is most likely to grow in value? Which potential drilling site is the best bet for finding crude oil? These kinds of questions cannot be answered by simply collecting data: the data must be analyzed, or "mined," to draw broader, more general conclusions.
About the class
We will first cover the "essentials" of databases, including how to use database systems like Oracle and MySQL to store and retrieve information effectively. We will then learn how the data in such systems can be accessed over a network with cutting-edge Web technologies like PHP and Ruby on Rails. Finally, we will survey a collection of data mining techniques that can be used to find broad patterns and meaningful conclusions from large quantities of data. The focal point throughout the course will be a team-based project that will put all of these ideas into practice, and apply them to a compelling, real-world problem.

