# How to Do Multiple Linear Regression in Power BI ## View the tutorial in the Power BI Dashboard or keep scrolling for text!

Updated July 2022 with instructions on creating visual in “Results” page!

## What is Multiple Linear Regression?

Multiple Linear regression is a statistical model that predicts an event based on several explanatory variables. Each variable represents a factor that influences the event and is assigned a weighted value.

Multiple linear regression is an extension of simple linear regression. Simple Linear regression predicts the outcome of an event based on a dependent variable. Calculating each weighted value is done using historical data.

If you would like to learn more about Simple Linear Regression, check out this blog dashboard for a tutorial on How to implement Simple Linear Regression in Power BI.

Use Cases for Multiple Linear Regression:

Estimate blood pressure. We can use multiple linear regression to predict blood pressure for a person based on factors like height, weight, age, and hours of exercise per week.

Forecast crop growth. Agriculturists can use multiple linear regression to create the equation of a line formula using factors such as rainfall, temperature, and amount of fertilizer to predict the effect on crop growth.

Sports Scoring. Sports analysts can use multiple linear regression to predict scoring points for players based on how many hours of yoga, weightlifting, cardio, or low-intensity exercises they do.

## Let’s Do Some Predictions!

Multiple Linear Regression uses the equation:

Y = b0 + b1x1 + b2x2 + … + bpxp

Where:

Y = The predicted value (Price)

b0 = Y – intercept

b1 … bp = Slope coefficients for each explanatory variable

x1… xp = Explanatory variables (Bedrooms, Bathrooms, square footage)

We are going to predict the estimated price of a house based on the number of bedrooms, bathrooms, and square footage selected.

To do this, we calculate the values of the slope coefficients (b1 …bp) and the Y – intercept(b0) using R script within Power Query Editor. Then, we are going to create what if parameters as our explanatory variables (x1… xp). And finally, we will complete the formula as a measure and visualize our predictions.

## Steps We Need to Take:

1. Create R script in Power Query Editor

2. Create What if Parameters

3. Create measure of Multiple regression Formula

## Data Prep

For this tutorial I have chosen a simple dataset based on housing prices in King Country USA from Kaggle.com.

Disclaimer: This dataset was chosen solely for a simplified demonstration of multiple Linear Regression. To implement successful multiple linear regression, your dataset MUST follow the 4 assumptions of regression. To Learn more about how to test for the 4 assumptions, click here

For multiple regression analysis, it’s best to remove all columns from the dataset that are not included in our formula. These columns can cause an impact on our results. Our formula is Price = bedrooms(x) + bathrooms(y) + square-feet(z), therefore I have removed all columns that are not Price, bedrooms, bathrooms, or square feet. You can do this in excel before importing your data or in Power Query Editor in the Power BI Desktop.

## Step 1: Create R Script in Power Query Editor

Load your dataset into Power BI. On the Home ribbon, click Transform Data

In Power Query Editor, with your table selected in the queries pane, go to the Transform ribbon and Click Run R Script.

Replace the default text that appears in the R script box, with the script below:

```model <- lm(price ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
```

For more information on Using R in Power BI, check out this blog post by Stephen Zielke.

Press Ok, and your result should be a table that looks like the one below:

The Coefficients from the table fit our multiple linear regression equation!

Y = b0 + b1x1 + b2x2 + … + bpxp

Y(Price) = 74662.1 – 57906.6(bedrooms) + 7928.7(bathrooms) +309.6(Sqft_living)

# Step 2: Create What If Parameters

Now that we have our coefficients(b0,b1…bp), we will create What if parameters for our variables(x1…xp).

A What-if parameter allows you to create an interactive slicer with a variable to visualize and quantify another value in your report.

In the Modeling ribbon click New Parameter

Name: Bathrooms
Data type: Whole Number
Minimum: 0
Maximum: 15
Increment: 1

Name: Bedrooms
Data type: Whole Number
Minimum: 0
Maximum: 15
Increment: 1

Name: Square footage
Data type: Whole Number
Minimum: 0
Maximum: 15
Increment: 1

These what if parameters will have generated a series of possible values for bedrooms, bathrooms and square footage that we can select based on our preference.

## Step 3: Create a Measure for the Regression Formula

With our Multiple Regression formula from Step 1:

Y(Price) = 74662.1 – 57906.6(bedrooms) + 7928.7(bathrooms) +309.6(Sqft_living)

We can now create a DAX measure of this formula using our what if parameters as the variables.

Create the DAX measure below:

`Price =74662.1 - 57906.6 * ( 'Bedrooms'[Bedrooms Value] ) + 7928.7 * ( 'Bathrooms'[Bathrooms Value] ) + 309.6 * ( 'Sqft_Living'[Sqft_Living Value] )`

Insert a card visual in the report and select the measure you have just created for the field.

Check out the Power BI Dashboard at the top of the blog to try out the House Price Predict-o-matic!

## Bonus Challenge!

Try adding these variables to the Multiple Linear Regression Equation!

Number of floors
Number of views
Square footage of the lot

Hint! Here is the new DAX formula:

`Price =76114.2 + 5237.4 * ( Bathrooms[Bathrooms Value] ) - 49819.3 * ( Bedrooms[Bedrooms Value] ) + 9491.5 * ( Floors[Floors Value] ) + 285.6 * ( Sqft_Living[Sqft_Living Value] ) - 0.81 * ( Sqft_lot[Sqft_lot Value] ) + 98795.2 * ( Views[Views Value] )`

## Conclusion

In this blog, we harnessed the power of the R engine within Power BI to create a multiple regression formula and predict the price of a house. Multiple linear regression is used for forecasting future events based on historical data and is extremely useful for preparation and decision making. I hope you enjoyed this tutorial on how to use Multiple Regression in Power BI. Try it yourself and let me know your results!

## Resources

Housing Dataset

Four assumptions of multiple regression that researchers should always test

What is Multiple Linear Regression

Multiple Linear Regression in Power BI

## More Stats Blogs!

How to do Linear Regression in Power BI

An Introduction to DAX Statistical Functions

Creating a Simple Naïve Bayes Predictive Model in Power BI

• Hi , thank you for such a lovely article. Could you please share the pbix file please.
Thank you

• Hi Sakeenah,

Thank you for your inquiry! We are working to create a .pbix file to share publicly which should be added to the article in a few weeks’ time.

Renata.

• Hi,

Thank you for such a helpful analysis.

Is there any way of getting the p-values of the coefficients?

Thank you!

• Hello Monica,

I followed up with Diana and this was her response:

Thank you for reading the article!

Unfortunately, I have not found a way to get the p-values of the coefficients in Power BI, however here are some resources I found helpful for getting the P-values in R!

Hope this helps!

• Bravo on the excellent article! I had the same question about getting p-value for coefficients. Do you know what the impediment is? Because obviously you can get them in the R output. Why can’t you get the p-value in the output of the R script in the query editor?

• Hello Michael!
Thank you so much!
Unfortunately, R script within Power BI does not have the full functional capabilities of the R program itself. For multiple linear regression it was easier for Power BI to provide us with a clear table of our values.
Let’s hope that in the future this will be possible!

• Hello Diana,

Thank you and congratulations for the article, it is great! I am new in this kind of analytics, and my company ask me to create a model to predict employee leavings based on several variables per employee like year with the same salary, with the same manager etc.

Is your model the base to get what I need?, thank you in advance

Regards

• Hello Ivan!

I am so happy you enjoyed the article.

This is a model that you can use for predictive analytics however you must validate that a Multiple Linear Regression model can be fit to the data before using it. If your data is not eligible for this type of model, your analysis will not be valid.
This article by Neal A. Akyildirim gives a great summary of the steps (using R) in order to validate that your data meets the requirements and also creating a multiple linear regression model.

• Thank you Diana

I appreciate your answer, i have seen the related link that you shared with, it look very nice, I have a lot of concepts to study but it look that I am in the right way! 🙂

Thank you gain for your help

Best regards