How to Do Multiple Linear Regression in Power BI

multiple linear regression

Sections 

View the tutorial in the Power BI Dashboard or keep scrolling for text!

Updated July 2022 with instructions on creating visual in “Results” page!

What is Multiple Linear Regression? 

Multiple Linear regression is a statistical model that predicts an event based on several explanatory variables. Each variable represents a factor that influences the event and is assigned a weighted value.

Multiple linear regression is an extension of simple linear regression. Simple Linear regression predicts the outcome of an event based on a dependent variable. Calculating each weighted value is done using historical data.   

If you would like to learn more about Simple Linear Regression, check out this blog dashboard for a tutorial on How to implement Simple Linear Regression in Power BI.  

Use Cases for Multiple Linear Regression: 

Estimate blood pressure. We can use multiple linear regression to predict blood pressure for a person based on factors like height, weight, age, and hours of exercise per week. 

Forecast crop growth. Agriculturists can use multiple linear regression to create the equation of a line formula using factors such as rainfall, temperature, and amount of fertilizer to predict the effect on crop growth. 

Sports Scoring. Sports analysts can use multiple linear regression to predict scoring points for players based on how many hours of yoga, weightlifting, cardio, or low-intensity exercises they do. 

Let’s Do Some Predictions! 

Download the pbix file to follow along here.

Multiple Linear Regression uses the equation:  

Y = b0 + b1x1 + b2x2 + … + bpxp 

Where:  

Y = The predicted value (Price) 

b0 = Y – intercept  

b1 … bp = Slope coefficients for each explanatory variable 

x1… xp = Explanatory variables (Bedrooms, Bathrooms, square footage) 

We are going to predict the estimated price of a house based on the number of bedrooms, bathrooms, and square footage selected.  

To do this, we calculate the values of the slope coefficients (b1 …bp) and the Y – intercept(b0) using R script within Power Query Editor. Then, we are going to create what if parameters as our explanatory variables (x1… xp). And finally, we will complete the formula as a measure and visualize our predictions. 

Steps We Need to Take: 

1. Create R script in Power Query Editor  

2. Create What if Parameters 

3. Create measure of Multiple regression Formula  

Data Prep 

For this tutorial I have chosen a simple dataset based on housing prices in King Country USA from Kaggle.com. 

Disclaimer: This dataset was chosen solely for a simplified demonstration of multiple Linear Regression. To implement successful multiple linear regression, your dataset MUST follow the 4 assumptions of regression. To Learn more about how to test for the 4 assumptions, click here 

For multiple regression analysis, it’s best to remove all columns from the dataset that are not included in our formula. These columns can cause an impact on our results. Our formula is Price = bedrooms(x) + bathrooms(y) + square-feet(z), therefore I have removed all columns that are not Price, bedrooms, bathrooms, or square feet. You can do this in excel before importing your data or in Power Query Editor in the Power BI Desktop.  

multiple linear regression
All rows of the dataset are not represented in this picture.

To Download a version of this dataset that is prepped for Step 1 of the tutorial, click here.

Step 1: Create R Script in Power Query Editor  

Load your dataset into Power BI. On the Home ribbon, click Transform Data

multiple linear regression

In Power Query Editor, with your table selected in the queries pane, go to the Transform ribbon and Click Run R Script. 

multiple linear regression

Replace the default text that appears in the R script box, with the script below:  

model <- lm(price ~ . , dataset) 
df<- data.frame(coef(model)) 
names(df)[names(df)=="coef.model."] <- "coefficients" 
df['variables'] <- row.names(df) 

For more information on Using R in Power BI, check out this blog post by Stephen Zielke. 

multiple linear regression

Press Ok, and your result should be a table that looks like the one below:

multiple linear regression

The Coefficients from the table fit our multiple linear regression equation! 

Y = b0 + b1x1 + b2x2 + … + bpxp 

Y(Price) = 74662.1 – 57906.6(bedrooms) + 7928.7(bathrooms) +309.6(Sqft_living) 

To return to the Power BI Desktop, click Close and Apply 

multiple linear regression

Step 2: Create What If Parameters  

Now that we have our coefficients(b0,b1…bp), we will create What if parameters for our variables(x1…xp).

A What-if parameter allows you to create an interactive slicer with a variable to visualize and quantify another value in your report.

In the Modeling ribbon click New Parameter

multiple linear regression

Name: Bathrooms  
Data type: Whole Number  
Minimum: 0 
Maximum: 15 
Increment: 1 
Add slicer to this page: yes

multiple linear regression

Name: Bedrooms  
Data type: Whole Number  
Minimum: 0 
Maximum: 15 
Increment: 1 
Add slicer to this page: yes 

multiple linear regression

Name: Square footage 
Data type: Whole Number  
Minimum: 0 
Maximum: 15 
Increment: 1 
Add slicer to this page: yes

multiple linear regression

These what if parameters will have generated a series of possible values for bedrooms, bathrooms and square footage that we can select based on our preference.  

Step 3: Create a Measure for the Regression Formula  

With our Multiple Regression formula from Step 1:  

Y(Price) = 74662.1 – 57906.6(bedrooms) + 7928.7(bathrooms) +309.6(Sqft_living) 

We can now create a DAX measure of this formula using our what if parameters as the variables.  

Create the DAX measure below:  

Price =
74662.1 - 57906.6 * ( 'Bedrooms'[Bedrooms Value] ) + 7928.7 * ( 'Bathrooms'[Bathrooms Value] ) + 309.6 * ( 'Sqft_Living'[Sqft_Living Value] )

Insert a card visual in the report and select the measure you have just created for the field.  

multiple linear regression

Check out the Power BI Dashboard at the top of the blog to try out the House Price Predict-o-matic!

multiple linear regression

Bonus Challenge! 

Try adding these variables to the Multiple Linear Regression Equation!

Number of floors
Number of views
Square footage of the lot

To Download a version of this dataset that is prepped for the Bonus challenge, click here.

Hint! Here is the new DAX formula:

Price =
76114.2 + 5237.4 * ( Bathrooms[Bathrooms Value] ) - 49819.3 * ( Bedrooms[Bedrooms Value] ) + 9491.5 * ( Floors[Floors Value] ) + 285.6 * ( Sqft_Living[Sqft_Living Value] ) - 0.81 * ( Sqft_lot[Sqft_lot Value] ) + 98795.2 * ( Views[Views Value] )
multiple linear regression

Conclusion  

In this blog, we harnessed the power of the R engine within Power BI to create a multiple regression formula and predict the price of a house. Multiple linear regression is used for forecasting future events based on historical data and is extremely useful for preparation and decision making. I hope you enjoyed this tutorial on how to use Multiple Regression in Power BI. Try it yourself and let me know your results! 

Resources

Housing Dataset

Multiple Linear Regression Equation  

Four assumptions of multiple regression that researchers should always test

What is Multiple Linear Regression

4 Examples of Using Regression in Real Life  

Multiple Linear Regression in Power BI

More Stats Blogs!

How to do Linear Regression in Power BI

An Introduction to DAX Statistical Functions

Creating a Simple Naïve Bayes Predictive Model in Power BI

10 Comments

  • Hi , thank you for such a lovely article. Could you please share the pbix file please.
    Thank you

    Reply
    • Hi Sakeenah,

      Thank you for your inquiry! We are working to create a .pbix file to share publicly which should be added to the article in a few weeks’ time.

      Renata.

      Reply
  • Hi,

    Thank you for such a helpful analysis.

    Is there any way of getting the p-values of the coefficients?

    Thank you!

    Reply
      • Bravo on the excellent article! I had the same question about getting p-value for coefficients. Do you know what the impediment is? Because obviously you can get them in the R output. Why can’t you get the p-value in the output of the R script in the query editor?

        Reply
        • Hello Michael!
          Thank you so much!
          Unfortunately, R script within Power BI does not have the full functional capabilities of the R program itself. For multiple linear regression it was easier for Power BI to provide us with a clear table of our values.
          Let’s hope that in the future this will be possible!

          Reply
  • Hello Diana,

    Thank you and congratulations for the article, it is great! I am new in this kind of analytics, and my company ask me to create a model to predict employee leavings based on several variables per employee like year with the same salary, with the same manager etc.

    Is your model the base to get what I need?, thank you in advance

    Regards

    Reply
    • Hello Ivan!

      I am so happy you enjoyed the article.

      This is a model that you can use for predictive analytics however you must validate that a Multiple Linear Regression model can be fit to the data before using it. If your data is not eligible for this type of model, your analysis will not be valid.
      This article by Neal A. Akyildirim gives a great summary of the steps (using R) in order to validate that your data meets the requirements and also creating a multiple linear regression model.

      Reply
      • Thank you Diana

        I appreciate your answer, i have seen the related link that you shared with, it look very nice, I have a lot of concepts to study but it look that I am in the right way! 🙂

        Thank you gain for your help

        Best regards

        Reply
  • Thanks very much for the very helpful explanation.

    Is there a way to use the coefficients of the regression automatically in the DAX formula? Instead of writing the values “hard” into the DAX formula?

    Because if we can only write them in the DAX formula hard, we have to do it always again, if the data changes, and then there is no automation in it. And then again, the regression is done much easier in Excel itself with the data analysis tool than with R.

    Reply

Leave a Reply to Michael Cancel reply

Your email address will not be published.

Diana Arciniega is an Analytics Consultant for Iteration Insight. She has a Bachelor’s of Science from the University of Calgary, majoring in Actuarial Science and Minoring in Statistics and a certificate in Business Intelligence and Data Analytics from SAIT. She is very passionate about Storytelling with data and empowering others throughout their data-driven journey. In her spare time, she loves reading fantasy and science fiction and playing board games with friends.

Iterate With Us

Signup for the Iteration Insights newsletter for access to recent news and other updates.

Related Posts

Preparing Your Company for AI

Artificial Intelligence (AI) is big news, bringing the promise and peril of transformational change. Companies

Share on

Next Event

Let's get started with a consultation

get started with training

Sign up for our newsletter