Exercise: MySQL

For this assignment, you’ll perform some basic MySQL database queries on sample data from a literary text mining project.

First, read w3schools’ introduction to SQL. Note that the w3schools tutorial treats several “flavors” of SQL; pay special attention to the MySQL examples, as that’s the type you’ll use for the exercises. W3schools provides a testing database with a web front end to help you follow their examples. But you’ll need to access a real MySQL server for the exercises, so you’ll eventually need a full MySQL client. On Macs, I recommend the excellent Sequel Pro. On Windows or Linux, you might try MySQL Workbench. Or you can download the full MySQL server for your platform, which includes the text-based mysql client (but watch out for setup and security implications of running a database server; you probably don’t want to do this unless you’re planning to develop a database of your own). On Linux, you can likely just install mysql-client.

The assignment

Log in to the geo database (see the corresponding assignment in Sakai for details of the server address and access details; not posting this publicly for security reasons) and use it to answer these questions. Except as specified, you probably want to query the results table. The most basic query looks like this:

SELECT * FROM results;

Most everything else is a variation on this basic form of the SELECT statement.

For each of the numbered items, submit just the SQL query used to produce the specified output.

  1. Write a query that selects from the results table all rows containing a location in France.
  2. How many times did New York City (and its sublocations) occur in books published in 1870? (Hint: The result should be 455; use COUNT(*).)
  3. How many of the locations in the last question were specific to Manhattan?
  4. Write a query that generates a list of counts for the number of locations in each US state (including “bare” uses of the state itself, i.e., those without any more specific location within the state) over the period 1851 to 1875 (inclusive, i.e., including the start and end years), sorted by total count from high to low. (Hint: You’ll need BETWEEN, GROUP BY, and ORDER BY.)
  5. Write a query that JOINs the authors and books tables and SELECTs only those books authored by Herman Melville. There are a couple of ways you could do this, but don’t just look up Melville’s auth_id by hand and search against it; the idea is to join the tables and search against the result by author name. You should find that there are eleven Melville volumes in the database, though only five of them were published between 1851 and 1875 (so only those later five are included in the geolocation results, FYI.)
  6. Optional challenge: Modify the answer to question five so that it returns (only) the title, publication date, publisher name, and publication location for all Melville volumes in the database. (Hint: You’ll need to use three JOINs.)

Good luck! Remember to submit your five (or six) query statements via Sakai.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s