Problem 1. Data Modeling and Basic SQL
Choose one of the following five flat (CSV) data files in the Supporting Materials for Week 1. The datasets
are:
1. Sacramento: 985 real estate transactions over a 5-day period (source:
https://support.spatialkey.com/spatialkey-sample-csv-data/)
2. Cars: various makes of cars and their attributes (adapted from
https://perso.telecomparistech.
fr/eagan/class/igr204/datasets)
3. Ebay: 5-day auctions for Xbox consoles on ebay (each row is a bid; source:
http://www.modelingonlineauctions.com/datasets)
4. Movies: Films, with leading actor/actress and other attributes (source:
https://perso.telecomparistech.
fr/eagan/class/igr204/datasets)
5. Airport: Traffic (no. of flights) for various US airports in one month (soure: Plotly examples)
You may follow these links to learn more about each set. But you are encouraged to load the one that
interests you in, e.g., Excel and inspect manually. You can learn a lot this way.
For your dataset:
1) What are the columns (attributes)? What is the appropriate MySQL data type for each column
based on what you see? There may be more than one right answer! Explain your choices
2) Create a new schema in MySQL workbench with an appropriate name. Manually create a new
table within this schema that implements the data model you devised using the CREATE TABLE
syntax.
3) Following the Week 1 reading, Use the Table Import Data Wizard in MySQL workbench to import
your flat file into the table you just created. (Note: do not use the “create new table functionality
of the Wizard! You must insert the data into a pre-existing table you created above).
4) Based on the nature of your particular dataset, propose three (or more!) meaningful questions
you can ask and answer using the basic SQL we have learned thus far. For example, with movies,
you might ask who are the most frequent leading actor/actress in a certain range of years.
Between your three questions you should cover each of the following language elements at least
once: WHERE (in combination with a comparison operator, BETWEEN, (NOT) IN, IS (NOT),
etc.), ORDER BY, MAX, MIN, AVG, COUNT, GROUP BY, HAVING, and AS (for example, to
name the result of one of the aggregate functions applied following a GROUP BY).
You should show all SQL code and the results by, for example, taking a screenshot of your Workbench
session. All discussion should appear in the companion work document.
Problem 2. Normalization.
This problem will use the file techcrunch.csv in the Supporting Materials. This dataset consists of
company funding records reported by TechCrunch. Each row represents one funding event for a company.
You can assume that the (composite) key is {company, round} where round denotes the given round of
funding.
1) Why is {company, round} the key? Why doesn’t {company} work?
2) Do the data satisfy 1NF? Why or why not?
3) Do the data satisfy 2NF? Why or why not?
4) Do the data sastify 3NF? Why or why not?
If the dataset fails to be in any of the normal forms above, you should document all the ways in which it so
fails, and give a specific example (row or rows in the file) for each. If you answered “no” to (4), sketch a
proposed Entity-Relationship diagram that would bring this dataset into 3NF. Explain why your proposal
satisfies all the requirements of that form.
Problem 3. Case study (adapted from Comeau, Chapter 9)
This problem uses the Recipe Database case study from the textbook. You can find the .sql file for this
database in the Supporting Materials. You will need to load this file through Workbench in order to test
your answers (which you should!). This question is designed to exercise your ability to use JOIN
statements to “undo” the often-complex relationships that result from normalization.
1) Write a set of queries to return all information on a specific recipe including main details,
ingredients, recipe tags, nutrition, comments, food warnings and any available substitutions. Use
as few queries as possible. Your set of queries should be designed with creating something like
Figure 9.14 in mind—a complete recipe page (in that case, Chicken Marsala) that might appear on
a website. Remember to use aliases on the field names that are returned so that the raw query
results will be more readable.
Show the results of your set of queries for one (1) recipe of your choosing.
2) Write a SELECT query that would supply enough information to create an index of all recipes in
the system with recipe name, category and all tag values. The index should be sorted first by tag
value and then by category. Show the first 25 rows of the result of your query.