string article

Fix the Date! String Manipulation and Date Formatting in Power BI

When it comes to data types, Power BI provides us with the basics for efficiently storing measurable aspects of our entities and business processes.

Here are the different data types in Power BI:

  • Whole Number
  • Decimal and Percentage
  • Date, Date/Time, Date/Time/Time-Zone and Time
  • Text
  • Boolean (True/False)
  • Currency

The most dynamic of these will be the Text Data type, which is commonly known as strings in most programming languages. The terms text and strings would be used interchangeably for the duration of this article.

Strings are a series of characters used to represent a value or classification of an attribute or label an aspect of an entity.

They differ from other data types as they can represent every other data type. This flexibility makes them the ideal data type for intermediary storage and manipulation of other data types, for example, dates or serial numbers.

In this blog, we will look at an example of string manipulation to modify a date format in Power BI to achieve consistency and learn a few Text Functions in M and a try-otherwise statement.

This tutorial assumes that you have experience cleaning and transforming data using Power Query Editor in Power BI.

Possible Scenarios: When could we need string format manipulation?

There are many scenarios where you will need to convert data formats in Power BI.

Imagine a situation where you are provided data that is combined from various geographic regional settings, each with a localized date/time format. The data is being aggregated without returning to a generic format first.

Another example would be a change of format in the interface of your existing data entry program from DD/MM/YYYY to MM/DD/YY and the old data is not updated to match the new.

Or worse still, the data is being entered manually with no way to enforce standards. As a result, the data is inconsistent, and we are unable to figure out the correct values. In all of these scenarios, it is best to use string manipulation in order to create consistency in the data.

In this example, we have a dataset that is based on the last scenario. Sometimes the values in the table are presented as Date values, and other times as Date-Time values, as showcased in Figure 1.

Figure 1 Mixed Formats

A table of Dates that show the inconsistent formats.
Row 1 2020-02-13T08:22:23-06:00
Row 1 5/13/20
Figure 1. Mixed Formats

By default, Power BI reads this column as String due to its inconsistent format. Forcing the column to a Date type will not work as Power BI only supports one data type per Column.

A screenshot of a Power BI Column that shows each row and a error.
Figure 2. Power BI Fails to convert to Date

If you have had any experience with data clean-up in Power BI, you might reach for the powerful Columns From Example feature.

string manipulation
Figure 3. Hardcoded transformations based on conditional tests

To our collective disappointment, the tool does not pick up any pattern. Meaning, we are forced to define every possible representation of each date. As such, the solution cannot be automated because it must be custom-tailored to suit each occurrence of the problem. This is a time-consuming approach that has zero scalability.

One viable alternative is to take multiple steps through the tools available in the Power Query Editor and then split the String (Date) into its components Day/Month/Year based on the presence of the โ€œ-โ€œ or โ€œ/โ€ delimiters. Then reassemble as desired.

This is the basis of the solution chosen in this blog.

Should this solution be implemented using the Power Query Editor drop-down options, the multiple button clicks will clutter the Applied Steps section in the editor. It gets terribly busy, if not messy when the process is repeated multiple times on the same table.

For a more manageable solution, we can implement the process using M.

Solution and Implementation

Before we begin, it is crucial, that we spend some time thinking through the steps needed to complete this transformation in Power BI.

We exploit the ability to chain multiple Text functions provided by Power BI as well as Concatenation so we can tear and rebuild.

Through Iterate, we apply the solution to each entry of the selected column. Then, convert the data type of the Date values and trim off the Time/Zone aspects of the Date/Time/Time-Zone format.

Before making any transformations on Text columns, it is considered good practice to trim them; that is, to remove the whitespaces that lead or trail the String for consistency and sanity preservation purposes.

These are the steps we will follow:

Step 1: Search for the column delimiter using the function Text.Contains([column], text).

Looking at our Date Column we notice there are two distinct data types:

Date/Time/TimeZone and Date.

The date aspects in both examples are of different formats. To rectify this, we must ensure we apply the proper fixes based on the specific format. One can quickly spot that the Date type used the โ€œ/โ€ delimiter to separate its components.

Figure 1 Mixed Formats

A table of Dates that show the inconsistent formats.
Row 1 2020-02-13T08:22:23-06:00
Row 1 5/13/20
Figure 4. Mixed Formats

To search for this, we can check if a character at a specific index is โ€œ/โ€.

This works only if the entered data is predictable, and the character consistently appears at that position in the String.

In this case, we will employ the Text.Contains() function as it is a flexible tool capable of checking the existence of a set of characters in a String without having to know the specifics of its contents.

If Text.Contains([Date],โ€/โ€) then โ€ฆ

Step 2: Split apart the column components using the function Text.Split(text, delimiter).

Should the If statement return true, meaning it found the โ€œ/โ€ character in the String, we split the date into its parts (Day/Month/Year).

Text.Split([Date],โ€/โ€)

When the String is split, we need an index to access each section of its division. In some cases, we do not know how many fragments the String will be divided into but since we are familiar with this data we know that it should look like this:

DateDayMonthYear
5/13/2013520
Date split into its component parts.

In this example, I decided against using variables to store temporary calculations in M and so we are forced to relate each component to an index number. We then run the function and refer to the component by selecting the index:

IndexTextCode
0MonthText.Split([Date], โ€œ/โ€) {0}
1DayText.Split([Date], โ€œ/โ€) {1}
2YearText.Split([Date], โ€œ/โ€) {2}

Using our derived indices, we can now go ahead with reordering the date components and combining them via Concatenation only when it is the right data as dictated by the earlier If-statement.

#"New Date" =
  "20"                                          //Adds the โ€˜20โ€™ prefix                      
   &
 Text.Split([Dates], "/"){2}                              //Year        
   &
 "-" & Text.PadStart(Text.Split([Dates], "/"){0}, 2, "0") //Month       
   &
 "-" & Text.PadStart(Text.Split([Dates], "/"){1}, 2, "0") //Day     

Step 3: Use Text.PadStart(text, size, padValue) to insert a 0 before single digits.

Since our desired format is a DD/MM/YYYY we want all single digits to be preceded by a โ€œ0โ€. This is achieved through the function Text.PadStart.

Refer to Figure 5 for the transformation code for this step.

Step 4: Use Text.Range(Column, startIndex, length) to Trim the column.

This code is used to remove Time/Time-Zone from the Date/Time/Time-Zone format.

Text.Range([Date], 0, 10)

The entire If Statement is wrapped in a Try-Otherwise Statement in case the model of our solution does not account for every single variation, so it fails gracefully. Meaning that it returns a null if it encounters an unknown date format.

// MM/DD/YY to YYYY-MM-DD {0}/{1}/{2} to {2}-{0}-{1}                                                    
  #"New Date" = Table.AddColumn(
    #"Trimmed Text",
    "tempDate",
    each try

        if Text.Contains([Dates], "/") then
          "20"
            &
 Text.Split([Dates], "/"){2}
            &
 "-"
            &
 Text.PadStart(Text.Split([Dates], "/"){0}, 2, "0")
            &
 "-"
            &
 Text.PadStart(Text.Split([Dates], "/"){1}, 2, "0")
        else

          Text.Range([Dates], 0, 10)
    otherwise
      null

  )

in

  #"New Date"

Once the step is finished, you should see the following output.

string manipulation
Figure 6. Transformed Data

Step 5: Delete the source column and rename the tempDate column.

The Final steps involve us dropping the Dates column and renaming the tempDate column back to Dates. We then converting from the data type from String to Date. In the end, you will be left with the transformed data in a column named Dates.

string manipulation
Figure 7. Final output

Conclusion

We can see from this simple example just how flexible Strings are and the level of manipulation Power BI provides us through M.

While this works, the ideal string manipulation solution should involve saving these steps as a custom function and invoking it on individual columns as opposed to re-writing the steps for each column in need of a date format change.

Stay Up to Date with Our Latest Blogs

Subscribe to our mailing list to be notified when a new article is released.

What Next?

This solution relies on detecting special characters in the a String, an alternative could involve working solely off string length.

Aside from dates, consider using this technique to generate a username based on the text that comes before the โ€œ@โ€ in an email address. This is a much simpler problem but utilizes the same tools and approach to solving it.

Tips

Here are a few things to consider when working in Power Query:

Tip 1 โ€“ Wrap Uncertain Code (M)

Due to uncertainty of data quality and large sizes of data to be consider, it is generally good practice to wrap uncertain code in a Try-Otherwise Statement as it allows the code to fail graciously.

Tip 2 โ€“ Trim String Values

Trim Text files before manipulating them as white space characters will influence transformations. It also grants you precision when comparing values.

Tip 3 โ€“ Code Readability

M code can be indented for readability. See Figure 5 for an example of code indentation. This is particularly useful when the code is long and involving multiple branches.

Tip 4 โ€“ Step Name and Description

When writing M code as a step in your transformation, name the steps appropriately so they can be easily identified in the Applied Steps panel. Adding comments to the code and descriptions to the steps assist with understanding the complexity of the applied steps, especially when working in a collaborative environment.

Don’t miss out on the opportunity to take your Power BI skills to the next level! Click here now to explore our training and events, and learn how to unleash the full potential of this powerful data tool!

Stay Up to Date with Our Latest Blogs

Subscribe to our mailing list to be notified when a new article is released.

Share this post

Let's get started with a consultation

Get started with training

Sign up for our newsletter