PQ text replacement competition

In my previous blog post, I wrote about the then-new M function, Text.ReplaceAll. Unfortunately, this function is not available anymore. A certain someone from Microsoft contacted me and said that this function was made visible by mistake; it was part of a certified connector and should not have been exposed. They were kind enough to provide the actual code behind the function though. I was quite impressed by the way the function was written, and I decided to run a competition (with a prize!) to write the improved version of the function in the shortest possible way.

To recap, the function performed multiple text replacements, which was exactly what Chris Webb‘s and Ivan Bondarenko‘s functions also did. Imke Feldmann‘s implementation of a similar function was different: it replaced whole words only.

In this competition, contestants need to write a function that combines both techniques, depending on the value of an optional parameter.

Rules

  • The function must have three parameters:
    • String (type text)
      The text to make replacements in.
      Example: "the cat sat on a mat"
    • Replacements (type list)
      A list of lists that contains pairs of replacement values
      Example: {{"cat", "bear"}, {"dog", "dragon"}}
    • ReplaceFullWords (type logical, optional)
      The parameter that decides whether replacements should follow Imke’s logic (full words only) or Chris’s and Ivan’s (substrings)
      Example: true
  • The entries will be ranked by the number of characters excluding non-essential spaces (the shortest function wins)
  • Readability and performance are not important at all
  • The function should assume that words are separated by spaces only
  • Using Expression.Evaluate and referencing #shared directly is not allowed
  • The function must be contained in a single query and external references, such as GitHub, are not allowed
  • Not only the above data may be used to check whether the function works correctly, so hardcoded replacements are not permitted
  • By taking part in this competition, you agree that your code may be published in this blog
  • Anyone can enter the competition, including people from North Korea and other countries commonly excluded from web competitions
  • The competition runs until UTC 12:00 pm Monday, 24 June 2019.
  • Entries are accepted via email to daniil at xxlbi dot com

Sample data

Download Replacements.pbix

In the file, the Starting point query contains some sample text, and the Dictionary as list query should be used as the second argument in your function.

The Dictionary as table query exists purely for your convenience.

Expected results

When the third parameter is true

When the third parameter is false, null or omitted:

The prize

Given that this is a competition that involves Power Query, I thought it would be appropriate to have a Power Query book as a prize. There are not that many books on Power Query; Gil Raviv’s book, published by Microsoft, is the latest book on the topic, and it includes many exercises to further improve anyone’s skills.

Collect, Combine, and Transform Data Using Power Query in Excel and Power BI

Gil was very kind and provided a digital product voucher for his book, which you can redeem at Microsoft Press Store, or you can give it to someone else! 🙂

I may publish your code here once the competition is over. I would also prefer to publish the full name of the code author — please let me know if you’d like to stay anonymous. Also, please send me a LinkedIn/blog/website link if you’d like me to link your name.

Update: the results!

In total, I’ve received submissions from eight contestants (in order of submission):

  • Maxim Zelensky
  • Imke Feldmann
  • Bill Szysz
  • Aleksei Zhigulin
  • Zubair Muhhamad
  • Tanuki Phoenix
  • Andrey Minakov
  • Alexandr Morin

Unfortunately, I was a bit unclear in the rules: I didn’t specify whether declaring parameter types is mandatory. For this reason, I decided to count the lengths of function after the first occurrence of “=>”, excluding non-essential whitespace characters.

Some functions below may still have non-essential whitespace, which I left for readability purposes but excluded it when counting the length of functions.

Maxim Zelensky

Maxim is a Microsoft MVP in the Data Platform category (profile) from Russia, who has a blog called Excel Inside and tweets as @Hohlick. Here’s his formula:

(String as text, Replacements as list, optional ReplaceFullWords as nullable logical) as text =>
let p=if ReplaceFullWords=true then" "else""in Text.Trim(List.Accumulate(Replacements,p&String&p,(s,r)=>Text.Replace(s,p&r{0}&p,p&r{1}&p)))

According to Maxim, this is a “typical Excel approach” to replacing words. It uses the List.Accumulate function, which is a perfect fit here. You can read more about the function in Gil Raviv’s blog: Power Query List.Accumulate – Unleashed.

Length

139 characters

Imke Feldmann

Imke, who is from Germany, is also a Microsoft Data Platform MVP (profile). She blogs at The BIccountant and tweets under the same name — @TheBIccountant. Here’s Imke’s approach:

(String, Replacements, optional ReplaceFullWords) =>
let r=(z)=>if ReplaceFullWords=true then" "&z&" "else z,e=Text.Trim(List.Accumulate(Replacements," "&String&" ",(x,y) =>Text.Replace(x,r(y{0}),r(y{1}))))in e

Great minds think alike! Imke’s approach is similar to Maxim’s, she just chose to define the steps a bit differently. Maxim saved a few characters by putting the result directly into the in statement, while Imke used a separate variable for that, e.

Length

156 characters

Bill Szysz

Bill (LinkedIn) is a very knowledgeable Polish expert on Power Query, who has a YouTube channel and is a moderator on the largest Polish Excel forum — ExcelForum.pl. You may also have seen his comments on Power Query all over various community websites. Here is Bill’s formula:

(String,Replacements,optional ReplaceFullWords)=>Text.Trim([S=if ReplaceFullWords=true then" "else"",R=List.Accumulate(Replacements,S&String&S,(F,L)=>Text.Replace(F,S&L{0}&S,S&L{1}&S))][R])

Again, the approach is similar, but there’s an important difference: instead of using the let/in construction, Bill uses fields of a record, then calls a field. Not something you see every day!

Length

140 characters

Aleksei Zhigulin

Aleksei (LinkedIn) actually submitted two variants:

(String as text, Replacements as list, optional ReplaceFullWords as logical) => 
                        [a = Replacements,
                         b = String,
                         c = if ReplaceFullWords = true
                             then Text.Combine(List.ReplaceMatchingItems(Text.Split(b, " "), a), " ")                              
                             else List.Accumulate(a, b, (x,y)=>Text.Replace(x,y{0},y{1}))]
                        [c]
(String as text, Replacements as list, optional ReplaceFullWords as logical) => 
    List.Accumulate(Replacements, String, (a,b)=>
            if ReplaceFullWords = true
            then Text.Combine(List.ReplaceValue(Text.Split(a, " "), b{0},b{1}, Replacer.ReplaceValue), " ")                              
            else Text.Replace(a,b{0},b{1}))

We see List.Accumulate again, and Aleksei decided to rename the parameters, which added a few characters.

Length

183 and 190 characters, respectively.

Zubair Muhammad

Zubair (website: Excel & Access) wrote his function without List.Accumulate:

(string as text, ReplacementsList as list, optional ReplaceFullWords as logical)  => 
    
     let
               Replacements=#table(
                    type table [
                         Word to Replace = text,
                         Replace With = text
                    ],
                    ReplacementsList
               ),     
               List1=Replacements[Replace With],
               List2=Replacements[Word to Replace],
               myrecord=Record.FromList(List1,List2),
               myresult= 
               if 
                         ReplaceFullWords = true then
                         Text.Combine(
                         List.Transform(
                         Text.Split(string," "),each 
                         Record.FieldOrDefault(myrecord, _, _))," ")
               else
                         List.Last(
                         List.Generate(()=>
                         [a=0,
                         b=List2{a},
                         c=List1{a},
                         d=Text.Replace(string,b,c)
                         ], 
                         each [a] < Table.RowCount(Replacements),
                         each [a=[a]+1,
                              b=List2{a},
                         c=List1{a},
                         d=Text.Replace([d],b,c)
                         ]
                         , each [d]))
     in
     myresult

In this code, there’s List.Generate, which requires many more characters than List.Accumulate — at least because the result is a list, so you need to pick the last item.

Length

200+ characters

Tanuki Phoenix

Tanu, who tweets as @tanuki_phoenix, also used List.Accumulate, but in his if-then-else construct he used different replacer functions: Replacer.ReplaceText for partial replacements and Replacer.ReplaceValue for full-word replacements.

Text.Combine(List.Accumulate(Replacements,Text.Split(String," "),(a,b)=>List.ReplaceValue(a,b{0},b{1},if List.Contains({null,false},ReplaceFullWords)then Replacer.ReplaceText else Replacer.ReplaceValue))," ")

Length

200+ characters

Andrey Minakov

Andrey (LinkedIn) used a combination of Imke’s original technique with  List.ReplaceMatchingItems (for full-word replacements) and List.Accumulate (for partial replacements):

(String as text, Replacements as list, ReplaceFullWords as logical) as text => if ReplaceFullWords then Text.Combine(List.ReplaceMatchingItems(Text.Split(String, " "), Replacements), " ") else List.Accumulate(Replacements, String, (v, i) => Text.Replace(v, i{0}, i{1}))

Note that Andrey is using “if ReplaceFullWords” here without “= true” as most contestants use: this makes the function fail when the third parameter is null or omitted.

Length

180 characters (not adherent to competition rules)

Alexandr Morin

Alexandr is a digital marketing expert, who has a Facebook page devoted to Power BI (Power BI // Power Query // DAX) and a Telegram channel. Alexandr is the only contestant who used recursion:

(String,Replacements,optional ReplaceFullWords)=>
let R=Replacements,S=Text.Split(String," "),F=(L,P)=>let E=List.ReplaceValue(L,R{P}{0},R{P}{1},Replacer.ReplaceText)in if P=List.Count(R)-1 then E else @F(E,P+1)in Text.Combine(if ReplaceFullWords=true then List.ReplaceMatchingItems(S,R)else F(S,0)," ")

Here, renaming the parameters is justified because they are referenced multiple times.

Length

200+ characters

Winner

Congratulations to Maxim Zelensky, whose function is only 139 characters long!

I certainly have learned a few new tricks from this competition 🙂