For this assignment, you’ll perform some basic MySQL database queries on sample data from a literary text mining project.
First, read w3schools’ introduction to SQL. Note that the w3schools tutorial treats several “flavors” of SQL; pay special attention to the MySQL examples, as that’s the type you’ll use for the exercises. W3schools provides a testing database with a web front end to help you follow their examples. But you’ll need to access a real MySQL server for the exercises, so you’ll eventually need a full MySQL client. On Macs, I recommend the excellent Sequel Pro. On Windows or Linux, you might try MySQL Workbench. Or you can download the full MySQL server for your platform, which includes the text-based mysql
client (but watch out for setup and security implications of running a database server; you probably don’t want to do this unless you’re planning to develop a database of your own). On Linux, you can likely just install mysql-client
.
The assignment
Log in to the geo
database (see the corresponding assignment in Sakai for details of the server address and access details; not posting this publicly for security reasons) and use it to answer these questions. Except as specified, you probably want to query the results
table. The most basic query looks like this:
SELECT * FROM results;
Most everything else is a variation on this basic form of the SELECT
statement.
For each of the numbered items, submit just the SQL query used to produce the specified output.
- Write a query that selects from the
results
table all rows containing a location in France. - How many times did New York City (and its sublocations) occur in books published in 1870? (Hint: The result should be 455; use
COUNT(*)
.) - How many of the locations in the last question were specific to Manhattan?
- Write a query that generates a list of counts for the number of locations in each US state (including “bare” uses of the state itself, i.e., those without any more specific location within the state) over the period 1851 to 1875 (inclusive, i.e., including the start and end years), sorted by total count from high to low. (Hint: You’ll need
BETWEEN
,GROUP BY
, andORDER BY
.) - Write a query that
JOIN
s the authors and books tables andSELECT
s only those books authored by Herman Melville. There are a couple of ways you could do this, but don’t just look up Melville’sauth_id
by hand and search against it; the idea is to join the tables and search against the result by author name. You should find that there are eleven Melville volumes in the database, though only five of them were published between 1851 and 1875 (so only those later five are included in the geolocation results, FYI.) - Optional challenge: Modify the answer to question five so that it returns (only) the title, publication date, publisher name, and publication location for all Melville volumes in the database. (Hint: You’ll need to use three
JOIN
s.)
Good luck! Remember to submit your five (or six) query statements via Sakai.