Fix the Date! String Manipulation and Date Formatting in Power BI

string manipulation

When it comes to data types, Power BI provides us with the basics for efficiently storing measurable aspects of our entities and business processes.

Here are the different data types in Power BI:

  • Whole Number
  • Decimal and Percentage
  • Date, Date/Time, Date/Time/Time-Zone and Time
  • Text
  • Boolean (True/False)
  • Currency

The most dynamic of these will be the Text Data type, which is commonly known as strings in most programming languages. The terms text and strings would be used interchangeably for the duration of this article.

Strings are a series of characters used to represent a value or classification of an attribute or label an aspect of an entity.

They differ from other data types as they can represent every other data type. This flexibility makes them the ideal data type for intermediary storage and manipulation of other data types, for example, dates or serial numbers.

In this blog, we will look at an example of string manipulation to modify a date format in Power BI to achieve consistency and learn a few Text Functions in M and a try-otherwise statement.

This tutorial assumes that you have experience cleaning and transforming data using Power Query Editor in Power BI.

Possible Scenarios: When could we need string format manipulation?

There are many scenarios where you will need to convert data formats in Power BI.

Imagine a situation where you are provided data that is combined from various geographic regional settings, each with a localized date/time format. The data is being aggregated without returning to a generic format first.

Another example would be a change of format in the interface of your existing data entry program from DD/MM/YYYY to MM/DD/YY and the old data is not updated to match the new.

Or worse still, the data is being entered manually with no way to enforce standards. As a result, the data is inconsistent, and we are unable to figure out the correct values. In all of these scenarios, it is best to use string manipulation in order to create consistency in the data.

In this example, we have a dataset that is based on the last scenario. Sometimes the values in the table are presented as Date values, and other times as Date-Time values, as showcased in Figure 1.

Figure 1 Mixed Formats

A table of Dates that show the inconsistent formats.
Row 1 2020-02-13T08:22:23-06:00
Row 1 5/13/20
Figure 1. Mixed Formats

By default, Power BI reads this column as String due to its inconsistent format. Forcing the column to a Date type will not work as Power BI only supports one data type per Column.

A screenshot of a Power BI Column that shows each row and a error.
Figure 2. Power BI Fails to convert to Date

If you have had any experience with data clean-up in Power BI, you might reach for the powerful Columns From Example feature.

string manipulation
Figure 3. Hardcoded transformations based on conditional tests

To our collective disappointment, the tool does not pick up any pattern. Meaning, we are forced to define every possible representation of each date. As such, the solution cannot be automated because it must be custom-tailored to suit each occurrence of the problem. This is a time-consuming approach that has zero scalability.

One viable alternative is to take multiple steps through the tools available in the Power Query Editor and then split the String (Date) into its components Day/Month/Year based on the presence of the “-“ or “/” delimiters. Then reassemble as desired.

This is the basis of the solution chosen in this blog.

Should this solution be implemented using the Power Query Editor drop-down options, the multiple button clicks will clutter the Applied Steps section in the editor. It gets terribly busy, if not messy when the process is repeated multiple times on the same table.

For a more manageable solution, we can implement the process using M.

Related Articles

Solution and Implementation

Before we begin, it is crucial, that we spend some time thinking through the steps needed to complete this transformation in Power BI.

We exploit the ability to chain multiple Text functions provided by Power BI as well as Concatenation so we can tear and rebuild.

Through Iterate, we apply the solution to each entry of the selected column. Then, convert the data type of the Date values and trim off the Time/Zone aspects of the Date/Time/Time-Zone format.

Before making any transformations on Text columns, it is considered good practice to trim them; that is, to remove the whitespaces that lead or trail the String for consistency and sanity preservation purposes.

These are the steps we will follow:

Step 1: Search for the column delimiter using the function Text.Contains([column], text).

Looking at our Date Column we notice there are two distinct data types:

Date/Time/TimeZone and Date.

The date aspects in both examples are of different formats. To rectify this, we must ensure we apply the proper fixes based on the specific format. One can quickly spot that the Date type used the “/” delimiter to separate its components.

Figure 1 Mixed Formats

A table of Dates that show the inconsistent formats.
Row 1 2020-02-13T08:22:23-06:00
Row 1 5/13/20
Figure 4. Mixed Formats

To search for this, we can check if a character at a specific index is “/”.

This works only if the entered data is predictable, and the character consistently appears at that position in the String.

In this case, we will employ the Text.Contains() function as it is a flexible tool capable of checking the existence of a set of characters in a String without having to know the specifics of its contents.

If Text.Contains([Date],”/”) then …

Step 2: Split apart the column components using the function Text.Split(text, delimiter).

Should the If statement return true, meaning it found the “/” character in the String, we split the date into its parts (Day/Month/Year).

Text.Split([Date],”/”)

When the String is split, we need an index to access each section of its division. In some cases, we do not know how many fragments the String will be divided into but since we are familiar with this data we know that it should look like this:

DateDayMonthYear
5/13/2013520
Date split into its component parts.

In this example, I decided against using variables to store temporary calculations in M and so we are forced to relate each component to an index number. We then run the function and refer to the component by selecting the index:

IndexTextCode
0MonthText.Split([Date], “/”) {0}
1DayText.Split([Date], “/”) {1}
2YearText.Split([Date], “/”) {2}

Using our derived indices, we can now go ahead with reordering the date components and combining them via Concatenation only when it is the right data as dictated by the earlier If-statement.

#"New Date" =
  "20"                                          //Adds the ‘20’ prefix                      
   &
 Text.Split([Dates], "/"){2}                              //Year        
   &
 "-" & Text.PadStart(Text.Split([Dates], "/"){0}, 2, "0") //Month       
   &
 "-" & Text.PadStart(Text.Split([Dates], "/"){1}, 2, "0") //Day     

Step 3: Use Text.PadStart(text, size, padValue) to insert a 0 before single digits.

Since our desired format is a DD/MM/YYYY we want all single digits to be preceded by a “0”. This is achieved through the function Text.PadStart.

Refer to Figure 5 for the transformation code for this step.

Step 4: Use Text.Range(Column, startIndex, length) to Trim the column.

This code is used to remove Time/Time-Zone from the Date/Time/Time-Zone format.

Text.Range([Date], 0, 10)

The entire If Statement is wrapped in a Try-Otherwise Statement in case the model of our solution does not account for every single variation, so it fails gracefully. Meaning that it returns a null if it encounters an unknown date format.

// MM/DD/YY to YYYY-MM-DD {0}/{1}/{2} to {2}-{0}-{1}                                                    
  #"New Date" = Table.AddColumn(
    #"Trimmed Text",
    "tempDate",
    each try

        if Text.Contains([Dates], "/") then
          "20"
            &
 Text.Split([Dates], "/"){2}
            &
 "-"
            &
 Text.PadStart(Text.Split([Dates], "/"){0}, 2, "0")
            &
 "-"
            &
 Text.PadStart(Text.Split([Dates], "/"){1}, 2, "0")
        else

          Text.Range([Dates], 0, 10)
    otherwise
      null

  )

in

  #"New Date"

Once the step is finished, you should see the following output.

string manipulation
Figure 6. Transformed Data

Step 5: Delete the source column and rename the tempDate column.

The Final steps involve us dropping the Dates column and renaming the tempDate column back to Dates. We then converting from the data type from String to Date. In the end, you will be left with the transformed data in a column named Dates.

string manipulation
Figure 7. Final output

Conclusion

We can see from this simple example just how flexible Strings are and the level of manipulation Power BI provides us through M.

While this works, the ideal string manipulation solution should involve saving these steps as a custom function and invoking it on individual columns as opposed to re-writing the steps for each column in need of a date format change.

Stay Up to Date with Our Latest Blogs

Subscribe to our mailing list to be notified when a new article is released.

What Next?

This solution relies on detecting special characters in the a String, an alternative could involve working solely off string length.

Aside from dates, consider using this technique to generate a username based on the text that comes before the “@” in an email address. This is a much simpler problem but utilizes the same tools and approach to solving it.

Tips

Here are a few things to consider when working in Power Query:

Tip 1 – Wrap Uncertain Code (M)

Due to uncertainty of data quality and large sizes of data to be consider, it is generally good practice to wrap uncertain code in a Try-Otherwise Statement as it allows the code to fail graciously.

Tip 2 – Trim String Values

Trim Text files before manipulating them as white space characters will influence transformations. It also grants you precision when comparing values.

Tip 3 – Code Readability

M code can be indented for readability. See Figure 5 for an example of code indentation. This is particularly useful when the code is long and involving multiple branches.

Tip 4 – Step Name and Description

When writing M code as a step in your transformation, name the steps appropriately so they can be easily identified in the Applied Steps panel. Adding comments to the code and descriptions to the steps assist with understanding the complexity of the applied steps, especially when working in a collaborative environment.

Don’t miss out on the opportunity to take your Power BI skills to the next level! Click here now to explore our training and events, and learn how to unleash the full potential of this powerful data tool!

Stay Up to Date with Our Latest Blogs

Subscribe to our mailing list to be notified when a new article is released.

8 Comments

  • Thank you, Victor. It is very informative. I had this issue before and did try with split the columns. It was completely yet not as elegantly as yours. I tried to keep MMM in month column. It turned out the month column is not updated when new data appended and month names change.
    I will try your method.
    Thanks for sharing.
    Ruth

    Reply
    • Hi Ruth,
      Thanks for the kind compliments. Let me know if this solution worked for you.
      I am can provide tips if I can see some sample data.
      Regards
      Victor

      Reply
  • Victor, I love this post and the idea behind it. Unfortunately I have been trying your code out in the Advanced Editor all day. I don’t get any errors but it seems to not add the new column. Just wondering if anyone else has had any problems. I basically have the code in step 4 word for word with proper indentation.
    Anyway, Iteration Insights, keep the great posts coming. Love the learning.
    Regards,
    Steve

    Reply
    • Hi Steve,
      Thanks for attempting the tutorial and for the kind words. Would you be willing to share the entire Power Query you wrote?
      Without seeing your code, I can only assume you have a different name for the step you applied before the #”Trim Text” or #”Add Column” steps.
      The flow should be something like this:
      Source = ...
      #"Remove Other Columns" = Operation (Source,....
      #"Trim Text" = Operation (#"Remove Other Columns",....
      #"Add Column" = Operation (#"Trim Text",....

      It is important to ensure continuity between steps i.e. The current step must reference the previous step. Power Query discards the results of unconnected steps.

      Looking forward to your reply.
      Regards
      Victor

      Reply
      • Thanks Victor, here is my code(btw, I kept your comments in my text as well, good practice):
        let
        Source = Csv.Document(File.Contents(“C:\Users\Steve\Downloads\dateSample.csv”),[Delimiter=”,”, Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #”Promoted Headers” = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        #”Changed Type” = Table.TransformColumnTypes(#”Promoted Headers”,{{“id”, Int64.Type}, {“Dates”, type text}}),
        #”Removed Other Columns” = Table.SelectColumns(#”Changed Type”,{“Dates”}),
        #”Trimmed Text” = Table.TransformColumns(#”Removed Other Columns”,{{“Dates”,Text.Trim, type text}}),
        // MM/DD/YY to YYYY-MM-DD {0}/{1}/{2} to {2}-{0}-{1}
        NewDate = Table.AddColumn(#”Trimmed Text”, “tempDate”,
        each
        try
        if Text.Contains([Date],”/”) then
        “20” & Text.Split([Date],”/”){2}
        & “-”
        & Text.PadStart(Text.Split([Date],”/”){0},2,”0″)
        & “-”
        & Text.PadStart(Text.Split([Date],”/”){1},2,”0″)
        else Text.Range([Date], 0, 10)
        otherwise
        null
        )
        in
        #”Trimmed Text”

        Reply
        • Hi Steve,
          Thanks for the code. There were 2 small errors.
          I had a name inconsistencies in the sample code. Date instead of Dates. This has been fixed.
          The second error comes from your code returning the values of a previous step #”Trimmed Text” and not the latest step NewDate
          Ensure the very last line in the code says NewDate.
          Hope this helps
          Regards
          Victor

          Reply
          • Hi Victor!
            Thank you so much! I tried with ‘Dates’ before as well and it did not work. It was the NewDate that did it!! Absolutely practical solution to a problem we routinely face.
            Keep up the great work!
            Regards,
            Steve

Leave a Reply to Victor Lajide Cancel reply

Your email address will not be published.

Victor is an Analytics Consultant for Iteration Insights. With a Computer Science background and a certificate in Business Intelligence and Data Analytics, Victor finds elegant and efficient solutions that scale to serve his client's needs. Be it on the Front-End or Back-End of analytics development.

Iterate With Us

Signup for the Iteration Insights newsletter for access to recent news and other updates.

Related Posts

Share on

Next Event

Let's get started with a consultation

get started with training

Sign up for our newsletter