0 like 0 dislike
61 views
in Computer Science by (1.0m points)
Write an ingredients of data modeling?

1 Answer

0 like 0 dislike
by (1.0m points)
Problem 1. Data Modeling and Basic SQL

Choose one of the following five flat (CSV) data files in the Supporting Materials for Week 1. The datasets

are:

1. Sacramento: 985 real estate transactions over a 5-day period (source:

https://support.spatialkey.com/spatialkey-sample-csv-data/)

2. Cars: various makes of cars and their attributes (adapted from https://perso.telecomparistech.

fr/eagan/class/igr204/datasets)

3. Ebay: 5-day auctions for Xbox consoles on ebay (each row is a bid; source:

http://www.modelingonlineauctions.com/datasets)

4. Movies: Films, with leading actor/actress and other attributes (source: https://perso.telecomparistech.

fr/eagan/class/igr204/datasets)

5. Airport: Traffic (no. of flights) for various US airports in one month (soure: Plotly examples)

You may follow these links to learn more about each set. But you are encouraged to load the one that

interests you in, e.g., Excel and inspect manually. You can learn a lot this way.

For your dataset:

1) What are the columns (attributes)? What is the appropriate MySQL data type for each column

based on what you see? There may be more than one right answer! Explain your choices

2) Create a new schema in MySQL workbench with an appropriate name. Manually create a new

table within this schema that implements the data model you devised using the CREATE TABLE

syntax.

3) Following the Week 1 reading, Use the Table Import Data Wizard in MySQL workbench to import

your flat file into the table you just created. (Note: do not use the “create new table functionality

of the Wizard! You must insert the data into a pre-existing table you created above).

4) Based on the nature of your particular dataset, propose three (or more!) meaningful questions

you can ask and answer using the basic SQL we have learned thus far. For example, with movies,

you might ask who are the most frequent leading actor/actress in a certain range of years.

Between your three questions you should cover each of the following language elements at least

once: WHERE (in combination with a comparison operator, BETWEEN, (NOT) IN, IS (NOT),

etc.), ORDER BY, MAX, MIN, AVG, COUNT, GROUP BY, HAVING, and AS (for example, to

name the result of one of the aggregate functions applied following a GROUP BY).

You should show all SQL code and the results by, for example, taking a screenshot of your Workbench

session. All discussion should appear in the companion work document.

Problem 2. Normalization.

This problem will use the file techcrunch.csv in the Supporting Materials. This dataset consists of

company funding records reported by TechCrunch. Each row represents one funding event for a company.

You can assume that the (composite) key is {company, round} where round denotes the given round of

funding.

1) Why is {company, round} the key? Why doesn’t {company} work?

2) Do the data satisfy 1NF? Why or why not?

3) Do the data satisfy 2NF? Why or why not?

4) Do the data sastify 3NF? Why or why not?

If the dataset fails to be in any of the normal forms above, you should document all the ways in which it so

fails, and give a specific example (row or rows in the file) for each. If you answered “no” to (4), sketch a

proposed Entity-Relationship diagram that would bring this dataset into 3NF. Explain why your proposal

satisfies all the requirements of that form.

Problem 3. Case study (adapted from Comeau, Chapter 9)

This problem uses the Recipe Database case study from the textbook. You can find the .sql file for this

database in the Supporting Materials. You will need to load this file through Workbench in order to test

your answers (which you should!). This question is designed to exercise your ability to use JOIN

statements to “undo” the often-complex relationships that result from normalization.

1) Write a set of queries to return all information on a specific recipe including main details,

ingredients, recipe tags, nutrition, comments, food warnings and any available substitutions. Use

as few queries as possible. Your set of queries should be designed with creating something like

Figure 9.14 in mind—a complete recipe page (in that case, Chicken Marsala) that might appear on

a website. Remember to use aliases on the field names that are returned so that the raw query

results will be more readable.

Show the results of your set of queries for one (1) recipe of your choosing.

2) Write a SELECT query that would supply enough information to create an index of all recipes in

the system with recipe name, category and all tag values. The index should be sorted first by tag

value and then by category. Show the first 25 rows of the result of your query.

Related questions

0 like 0 dislike
1 answer 55 views
asked Feb 14, 2019 in Computer Science by danish (1.0m points)
0 like 0 dislike
1 answer 45 views
0 like 0 dislike
1 answer 38 views
0 like 0 dislike
1 answer 65 views
0 like 0 dislike
1 answer 114 views
asked Feb 12, 2019 in Computer Science by danish (1.0m points)
0 like 0 dislike
1 answer 32 views
0 like 0 dislike
1 answer 55 views
0 like 0 dislike
1 answer 59 views
0 like 0 dislike
1 answer 55 views
0 like 0 dislike
1 answer 73 views
asked Apr 2, 2019 in Computer Science by danish (1.0m points)
Welcome to Free Homework Help, where you can ask questions and receive answers from other members of the community. Anybody can ask a question. Anybody can answer. The best answers are voted up and rise to the top. Join them; it only takes a minute: School, College, University, Academy Free Homework Help

19.4k questions

18.3k answers

8.7k comments

6.3k users

Free Hit Counters
...