Quantitative Methods (M) & (H) Semester 2, 2020

Major Project  

Project Instructions

  • The project is separated into 3 interrelated tasks. All 3 tasks are due at the same time in report format. The final due date is Friday 6th of November at 1pm Adelaide time (UTC +09:30). Please submit your projects online via MyUni by uploading a single file in Microsoft Word format (either .doc or .docx) through the Assignments tab and associated link. If your assignment is late, please email it to George personally.
  • Tasks 1, 2 and 3 most closely relate to Topics 2, 8 and 9 as detailed in the course outline.
  • This is an individual project (you can discuss it with other students but everyone needs to make an individual report submission).
  • The project comprises 50% of your final Quantitative Methods (M) grade with the following assessment breakdown:
    • Headline Regression Model Derivation                       10%
    • Task 1: Data Summary                                               20%
    • Task 2: Regression Model – Build and Use            40%
    • Task 3: Regression Model – Evaluation                       20%
    • Report Structure and Written Presentation Quality           10%
  • Your final report will be processed through Turnitin as a check for plagiarism so please ensure that you only present your own work. In addition, your final report needs to meet all of the following criteria:
    • Font:                      Times New Roman
    • Font size:               12 point
    • Page margins:          2.54cm all around (Normal)
    • Page Limit:           5 pages only (A4, single sided)
    • Appropriate font, font size, and page margins are all graded against ‘Written Presentation Quality’. Submissions which exceed the page limit will only have their first 5 pages graded.
  • Further Advice:
    • Do not use a title page.
    • Do not use a table of contents.
    • Only provide a short introduction.
    • Ensure your report is free from spelling and grammatical errors.
    • Ensure your report is clear and well-structured.
    • Ensure your report is written in context and answers questions in context.
    • Make sure it is clear which model is your “Headline Regression Model” (Final answer)

Loxton is a small town with two suburbs. The data file “Major Project – Data Set” contains data on 545 houses sold in Loxton between 2015 and 2020. This data includes the price at which the house was sold, which of two agents sold the house (all houses are sold through an agent by law), the year in which the house was sold as well as data on various characteristics of each house sold (age, size, number of stories etc.). These characteristics serve as possible explanatory variables of sale price.

Data definitions follow:

OBS=   observation
AGE=   age of house in years
SHOPS=   1 if house is close to shopping precinct, 0 otherwise
CRIME=   crime rate of the suburb within which the house is located
TOWN=   distance in kilometres to the town centre
STORIES=   number of dwelling stories
OCEAN=   1 if house has an ocean view, 0 otherwise
POOL=   1 if house has a pool, 0 otherwise
PRICE=   price at which the house was sold (in dollars)
SELLER=   selling agent – “W&M” (0) or “A&B” (1)
SIZE=   size of the house in square metres
SUBURB=   Mayfair (0) or Claygate (1)
TENNIS=   1 if house has a tennis court, 0 otherwise
SOLD=   year of last sale (2015 to 2020)

Your tasks

Task 1 – 20% of project grade (recommended length of 1 page)

You are required to provide a comprehensive summary of the data set contained in the “Major Project – Data

Set” file. How you choose to do this is entirely at your discretion. However, it is recommended that you consider using both summary statistic and graphical methods while also noting any peculiarities within the data set.

Task 2 (including Headline Regression Model) – 50% of project grade (recommended length of 2.5 pages)

You have been hired by Jane, the wealthy owner of a house on Elm Street in Loxton (not included in the data set) to predict the price at which her house will sell. Her house has two stories, is in Claygate, is 192 square metres large, is not near a shopping precinct and is 10 km from the town centre. She estimates that the house is about 10 years old and in a low crime area according to her experiences. Jane inherited the house from her uncle and is therefore unsure when it was last sold. Some other features of the property can be seen below:


Views of and from the house whose sale price you are to predict

You are expected to build a regression model of house prices. In doing so, make sure that you use an appropriate number of predictors to develop your estimates. Once you have constructed an appropriate model, use it to obtain and provide for Jane’s house:

  1. A point prediction of the sales price which it can be expected to fetch
  2. A 95% interval prediction for this sale price
  3. An estimate of the marginal effect of house size on this sale price
  4. Financial advice on whether Jane should use “W&M” or “A&B” to sell her house. “W&M” charges a commission of 5% whereas “A&B” charges a commission of 10% of the final sale price.

Jane, who claims to have some knowledge of regression analysis, has stressed that she thinks you should use a regression model with an R2 of at least 85%.

Note: Task 1 directed you to take note of any peculiarities in the data set. There are other additional errors in the data set that you may not have picked up on in Task 1. These will only become clear to you once you start working on Task 2. Several problems can result if you fail to handle these issues correctly, so be mindful to address them, both in your regression application as well as your final report. If resolving any of the errors in the dataset requires you to make assumptions, make sure to clearly state your reasoning and approach in your report.

Task 3 – 20% of project grade (recommended length of 1.5 pages)

Please provide a reflective discussion on how you executed Task 2 of the project above. Specifically consider the following:

  1. Verify that your regression model does not suffer from any misspecification errors and provide the  relevant regression diagnostics which support your findings.
  2. If you found that your model is in fact partially misspecified in part (1) of Task 3 above, explain what you did to ensure that the misspecification only has a minimal impact on your results in Task 2 above.
  3. Were there any other oddities in the data set or your model? Explain.
  4. Is there anything else worth mentioning which is relevant to your work or to your results for Jane?