|
| 1 | +package com.sparkTutorial.sparkSql; |
| 2 | + |
| 3 | + |
| 4 | +public class HousePriceProblem { |
| 5 | + |
| 6 | + /* TODO: Create a Spark program to read the house data from in/RealEstate.csv, group by location, aggregate the average price per SQ Ft and max price, and sort by average price per SQ Ft. |
| 7 | +
|
| 8 | + The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo county and |
| 9 | + around it. The dataset is provided in two formats: as a CSV file and as a Microsoft Excel (19972003) |
| 10 | + spreadsheet. |
| 11 | +
|
| 12 | + The dataset contains the following fields: |
| 13 | + 1. MLS: Multiple listing service number for the house (unique ID). |
| 14 | + 2. Location: city/town where the house is located. Most locations are in San Luis Obispo county and |
| 15 | + northern Santa Barbara county (Santa MariaOrcutt, Lompoc, Guadelupe, Los Alamos), but there |
| 16 | + some out of area locations as well. |
| 17 | + 3. Price: the most recent listing price of the house (in dollars). |
| 18 | + 4. Bedrooms: number of bedrooms. |
| 19 | + 5. Bathrooms: number of bathrooms. |
| 20 | + 6. Size: size of the house in square feet. |
| 21 | + 7. Price/SQ.ft: price of the house per square foot. |
| 22 | + 8. Status: type of sale. Thee types are represented in the dataset: Short Sale, Foreclosure and Regular. |
| 23 | +
|
| 24 | + Each field is comma separated. |
| 25 | +
|
| 26 | + Sample output: |
| 27 | +
|
| 28 | + +----------------+-----------------+----------+ |
| 29 | + | Location| avg(Price SQ Ft)|max(Price)| |
| 30 | + +----------------+-----------------+----------+ |
| 31 | + | Oceano| 1145.0| 1195000| |
| 32 | + | Bradley| 606.0| 1600000| |
| 33 | + | San Luis Obispo| 459.0| 2369000| |
| 34 | + | Santa Ynez| 391.4| 1395000| |
| 35 | + | Cayucos| 387.0| 1500000| |
| 36 | + |.............................................| |
| 37 | + |.............................................| |
| 38 | + |.............................................| |
| 39 | +
|
| 40 | + */ |
| 41 | +} |
0 commit comments