## Introduction and Business Understanding:

In the assignment, I will perform a Regression Analysis to build a Model for Baton Rouge, Louisiana housing prices. Predicting listing prices for a Single-Family Homes or Townhouses. JMP Software will be used to extract data from the 2,129 houses listed on the Redfin.com. I have presented within each table listed in the Appendix, variables that were used in data clusters are as follows: Price, Beds, Baths, Square Feet, Lot Size, Year Built, and $/SF.

## Data Understanding and Model Assumptions:

JMP Software will be used to extract data from the 2,129 houses listed on the Redfin.com to provide the research report provided below. I have presented within each table listed in the Appendix, variables that were used in data clusters are as follows: Price, Beds, Baths, Square Feet, Lot Size, Year Built, and $/SF. There is one continuous variable within the Redfin.com file which is Pricing (Price) along with approximately fifteen potential variables used for predicting. Lack of variation, the data sets have been reduced to eleven (11) predictor variables aiding to build the Regression Analysis Model. The missing values have been noted at ninety-nine (99) due to columns missing data sets of values. The actual LOT size, Year the home was built, HOA/Month have been excluded due to data was missing. Bedrooms and Bathrooms within homes listing more than six (6) were excluded, as well. $/Sq. Ft will not be included due to biased values and variations. However, prior to Regression Analysis being performed the functions of each relationship must be checked for validation. Included within the checkpoints are the predictor variable and Y, Price using Y by X Platform. Scatter Plots present having a Linear relationship. Please see the Appendix below. An (OLS) has been researched by using predictor variables to determine residuals versus the predicted. Also, the assumption of constant variations of error does not determine a clear relationship on the plot. Studentized Residuals do not have any indicators of Outliners. Residual Plots using normal quartiles determine the residuals may be normally distributed by a straight line. See Appendix C. The Multicollinearity is defined as the “final” assumption there are no variables within the (VIF) of approximately ten (10).

## Modeling- Explaining:

Multicollinearity occurrence notes a reducing data set of variables that must be determined prior to explaining a variable response. To derive results, one must combine a high R2, backward minimum BIC to select Sq. Ft. Longitude, Latitude, Bedrooms along with the Bathrooms in the homes. The analyses display R2 of .7867 along with R2 of .7854. Followed by RMSE of 143,604. Along with forwarding a minimum of the same resulting factors. A minimum AIC Analysis determined R2 of 0.7874 which is adjusted to RMSE of 0.7874. A Regression Model explains these determining factors. Minimum AIC Analysis determined R2 of 0.7874 with an adjustment of .7859. The Adjusted R2 is 77.0 % due to the accounting of additional variables leading to an increase within the R2 values.

Save your time!

We can take care of your essay

- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee

Place Order
The (RMSE) is determined at approximately 143,534 which was compared and contrasted to models generated using stopping rules. Most importantly, coefficients are encouraged to explain the association with potential causation. A good example to use is Sq. Ft. possess a coefficient of 272.30. The average Sq. Ft. if the home increases at least (1) Sq. Ft. then, the price of the home will increase by approximately $272.00. This example is of how multiple Linear Regression is used to predict averaging efforts of predictor variables of response variables.

## Predictive Regression Model:

Finally, this step in the process is to make a Prediction Price Model of homes listed on the market in Baton Rouge. The column listing Validation determined approximately Sixty Percent was training and Forty Percent was Validating. I selected the Model that best represented the predicted validation set. Most importantly, apply the previous “stopping rules” MAX Validation R2 was applied with K-Fold Cross Validation totaling K=3. The R2 RASE Values of validation data sets are used in the comparison of models to determine which is the best choice. (See Appendix G). MAX Validation R2 listed a greater R2 at 0.8093 with a lower RASE of 145,866 in comparison to the Minimum BOC Model of R2 of .809 with RASE at 146,005. Variables used to determine factors in MAX Validation are the Sq. Ft. , Zip Code, Longitude, Latitude, Bedrooms, and Bathrooms. The Multiple Linear Regression Models are used to explain the “specific” predictor variables used to determine to price along with prediction prices of new homes added to the Homs on the Market in Baton Rouge, Louisiana.

## Conclusion:

In conclusion, Multiple Linear Regression Models were used in both scenarios of influences to predict variables used prices along with new homes recently added to the listings in Baton Rouge. In each instance, methods were applied to improve the decreases in Dimensionality which Multi-Collinearity existed by applying stopping rules in the JMP software graphs. To demonstrate, the adjusted R2 was lowered within the Standard Least Squad Model. The minimum AIC of six (6) of seven (7) variables were selected due to having a more reasonable RASE within the validation column R2. Most importantly, the Model should be improved by eliminating the missing values that are least common. A good example is the Lot size, and year the home was built. These variables hold a lot of importance in determining the price of the Home on the MLS Market. Along with a minimum AIC Analysis determined the R2 of 0.7874 with an adjustment of 0.7859 with an RMSE of 143,453. Finally, the P-Value determined similar results as the minimum AIC. The lower RMSE, AIC Analysis was with the least Regression while running all variables.