Wednesday, 13 February 2019

Variable-Length Lookbehinds: actually possible in Perl/PCRE!

Get strapped in, because this is as close as you're going to get to breaking news in the world of regular expressions.



You know what the best thing is about being involved in regex circles? It's not the money or the fame or the women.. no no; it's that, on occasion, you get to witness incredible discoveries unfold before your very eyes. This is exactly what happened in a small StackExchange chatroom earlier, when fellow regex tinkerer Grimy cracked a problem that previously seemed insoluble. In doing so, he uncovered a technique for emulating a feature that has been coveted for years by Perl and PCRE users alike: variable-length lookbehinds.

Now, forget \K. That doesn't come close to having the expressive power we want. I'm talking about truly emulating variable-length lookbehinds; that is, doing so for all conceivable scenarios.

The Revelation


In response to a challenge to match an arbitrary string (in the domain of ^.*$) that contains no unique characters, a task I once dismissed as impossible in the general case, Grimy ingeniously came up with the following:

Wonderfully elegant and simple in principle. Let us begin by examining the core concept.

The Concept

((?<=(?= ... |(?3)).))
The trick is to remember that, while lookbehinds can only peek a fixed number of characters behind the current matching point, the lookaheads they contain are not subject to any such restriction. If \1 is a single character, (?<=(?=\1).) succeeds where (?<=\1) fails because in the former case you are telling the regex engine that it needs to go back precisely one character to look for \1.

Because recursive algorithms are tricky to follow, let us expand this subexpression a couple of levels deep to observe what's happening:
((?<=(?= ... |(?<=(?= ... |(?<=(?= ... |(?3)).)).)).))
Notice how we're moving back one character each step of the way. So if (?= ... ) fails to match, move back one character and try again. Thus, it is equivalent in essence to using (?<=(?= ...).*) if actual variable-length lookbehinds were possible.

The Nanny


If you've read and understood everything so far, you may have thought to look back at Grimy's original expression to try and understand it in full. What, then, you ask, is the purpose of \n in the lookahead? After all, \n, which matches a new-line character, should not exist in the domain of ^.*$ since `.` does not match a new-line by default. So why is it there?

Regex engines often have a bunch of nannies in place to prevent you from hurting yourself. One of these employed by PCRE is a compile-time check for recursive calls that appear, from a cursory static inspection, as though they may not terminate. The simplest example of this is the expression (?R), which yields the compile-time error:
40  recursive call could loop indefinitely
With Grimy's expression, the PCRE engine doesn't realize that \2 is guaranteed to match a single character. To work around this, we need to give the engine at least one non-empty path to follow. \n is a contradiction in the domain of ^.*$ and, more importantly, it functions as a non-empty alternative that convinces the nanny that we know what we're doing.

Note that in the more general domain of ^[\s\S]*$, which may include new-line characters, x^ serves as a non-empty contradiction.

The Emulation


Perspicacious readers may have observed that the method above involves an unbounded lookahead, one that may peer far past the position at which we wish to place our emulated lookbehind. This is fine for the problem it aimed to solve, but if we want to expand the method to work in the general case, this issue needs to be addressed.

And so, for a variable-length subexpression X, (?<=X) can be implemented in both Perl and PCRE as:
(?=(?'a'[\s\S]*))(?'b'X(?=\k'a'\z)|(?<=(?=x^|(?&b))[\s\S]))
See it in action on regex101

To negate it and emulate (?<!X), simply enclose the whole thing in a negative lookahead:
(?!(?=(?'a'[\s\S]*))(?'b'X(?=\k'a'\z)|(?<=(?=x^|(?&b))[\s\S])))
Aside from the fact that I've used two capturing groups, that snippet is basically plug-and-play: you may insert it anywhere in your expression where a variable-length lookbehind is desired.

The main difference here is that a new group "a" has been added to capture the portion of the string after the current matching point, so that when we travel backwards and look ahead for X we know where to stop. This solves the issue mentioned above.

Next, X has been moved to the outer group "b" as it would fail to match if X were zero-width since the lookahead only begins at the position one character behind the current matching point, and only moves backwards thereafter.

Here is a breakdown of the various components:
(?=(?'a'[\s\S]*))  # Capture the rest of the string in "a"
(?'b'
  X(?=\k'a'\z)     # Match X followed by the contents of "a" to ensure
                   # the emulated lookbehind stops at the correct point.
  |                # OR
  (?<=             # Look behind (one character) match either:
    (?=
      x^           # A contradiction; non-empty to appease the nanny
      |            # OR
      (?&b)        # Recurse (match X OR look behind (one character)) etc..
    )
    [\s\S]         # How far we go back each step: one single character
  )
) 

Note that by emulating a positive lookbehind in this way, you lose the luxury of having capture groups in X that return their values to the caller. But are you complaining?

The End


There you have it: another impossibility made possible by the creative application of old tools. I hope you enjoy using this new toy as much as I certainly will. Big shout out once again to Grimy for his stroke of genius.

As usual, I welcome you to follow me on twitter to stay apprised of more regex wackiness in the future.

The Bonus


What about problems that require comparing the substring at the current matching point to multiple substrings at later positions in the string - can we now handle those? Turns out we can! By iterating forwards with a lookahead and using our new toy to look back to the original matching point, we can make short work of any problem of that nature. For example, "match the longest word in a line" can now be solved in the general case:
\b(\w++)(?=(.*))(?!(.*\W)\b((?<=(?=(?=\1\2$)(?:(?=\w*+\3(\5?+\w))\w)++\b|(?4)).)))
Figuring out how that works is left as an exercise to the reader :) Suffice it to say, there is now a whole new set of problems for which we can irresponsibly abuse regex.

74 comments:

  1. Your choice of destination for higher education plays a major role in Study in UK as it defines your career and education choices.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Students who are looking for assignment help australia or assignment help perth are more likely to order their projects through us, because they know what quality they will be getting.

    ReplyDelete
  4. Very visited administration in uk is Online assignment help administration. Spare yourself from low evaluations and accomplish a high score in your scholastics at any rate.
    Dissertation Writing Service
    online assignment help

    ReplyDelete
  5. There are tons of things that you could do after returning from school, but you have your homework to solve. Don’t you wish there was a reliable, professional Professional Homework Writing Service service that could take care of your homework?

    ReplyDelete
  6. I think this is one of the most significant information for me. And I’m glad reading your article. But should remark on some general things, The web site style is perfect, the articles is really great: D. Good job, cheers. tree removal services palm beach county

    ReplyDelete
  7. Very Nice Blog! This Blog is Totally Informative About Management Assignment help in Australia. If You Are Looking For The Management Assignment help Provider Then You Should Definitely Go For myassignmenthelpau.com

    ReplyDelete
  8. A superbly written article, if only all bloggers offered the same content as you, the internet would be a far better place.
    tree removal miami fl

    ReplyDelete
  9. Thanks for the wonderful share. Your article has proved your hard work and experience you have got in this field. Brilliant .i love it reading. tree removal services west palm beach

    ReplyDelete
  10. I heard you have a nice information with your blog. This is greatly amazing! click here

    ReplyDelete
  11. This the nice article I have ever seen. Kindly visit Digital marketing services for expanding your business.

    ReplyDelete
  12. Amazing post you shared. Contact digital marketing agency in Dubai to reach your target audience.

    ReplyDelete
  13. I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it..tree trimming north lauderdale

    ReplyDelete
  14. The first thing one must remember when attempting a case study help is to interpret what the nature of the case study is

    ReplyDelete
  15. Great article and a nice way to promote online. I’m satisfied with the information that you provided tree trimming services hallandale beach

    ReplyDelete
  16. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    best angular js online training
    best angular js online training
    angular js online training

    ReplyDelete
  17. At myassignmenthelp, we make sure that all our projects are of quality and without any plagiarism, so that students can focus on the projects and get comfortable with it. the best part is that with my assignment help service, our expert writers are available round the clock so that if you have any queries that can be answered.

    ReplyDelete
  18. Hello, I have browsed most of your posts. This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this. Are you aware of any other websites on this subject?
    residential tree services oakland park fl

    ReplyDelete
  19. A superbly written article, if only all bloggers offered the same content as you, the internet would be a far better place. tree services near me in margate

    ReplyDelete
  20. BSc in Optometry – Here is the details about Best BSc Optometry Colleges In Bangalore. If you are looking to study in Bangalore, the below link will redirect to a page that will show the best OPT colleges in Bangalore
    Optometry Colleges In Bangalore

    ReplyDelete

  21. Nice blog.Thanks for sharing this information.

    intechapp

    ReplyDelete
  22. A superbly written article, if only all bloggers offered the same content as you, the internet would be a far better place. tree care companies weston fl

    ReplyDelete
  23. Our writers are professionally experienced as they have year’s long experience in writing CDRs, which makes it quite easy for them to understand the requirements for composing a captivating cdr assignment help.

    ReplyDelete
  24. Great article and a nice way to promote online. I’m satisfied with the information that you provided emergency tree services deerfield beach

    ReplyDelete
  25. Thanks for the wonderful share. Your article has proved your hard work and experience you have got in this field. Brilliant .i love it reading. commercial tree services pompano beach

    ReplyDelete
  26. This post is good enough to make somebody understand this amazing thing, and I’m sure everyone will appreciate this interesting things.stump grinding parkland

    ReplyDelete
  27. Get Assignment Help Online   is the best helper for submit hectic assignments on time with the best grade. Check out our website for more information about online academic writing services in Australia. we have many experts who serve you for   Assignment Help Experts

    ReplyDelete
  28. You have a good point here!I totally agree with what you have said!! Thanks for sharing your views. hope more people will read this article!!! stump removal lauderdale lakes

    ReplyDelete
  29. I have been reading for the past two days about your blogs and topics, still on fetching! Wondering about your words on each line was massively effective. Techno-based information has been fetched in each of your topics. Sure it will enhance and fill the queries of the public needs. Feeling so glad about your article. Thanks…!
    best software testing training in chennai
    best software testing training institute in chennai with placement
    software testing training
    courses

    software testing training and placement
    software testing training online
    software testing class
    software testing classes in chennai
    best software testing courses in chennai
    automation testing courses in chennai

    ReplyDelete
  30. At myassignmenthelp, we ensure that our customers receive original work every time they place order. Our expert writers are expected to produce customized assignment papers to reassure quality and content originality. Our expert writers are well aware of the rule concerns of the top universities in Australia, so they go extra mile to reassure 100% plagiarism free content.

    ReplyDelete
  31. I appreciate your efforts to do this difficult task before my eyes and getting help from PCRE which is a library and helps the pattern for matching.
    Dissertation Help UK

    ReplyDelete
  32. The whole study of crisis management can be split into core action plans and strategies. Being preventive is a modus operandi where preparedness and anticipation are key to reducing the likelihood of crisis occurring or at least minimizing its impact. Avail strategic crisis management help by our academic writers based in Australia.

    ReplyDelete
  33. Great blog! It provides you ideas about Mount Everest Base Camp Trek Cost. You can know important tips how to choose Everest Trek package. Thanks for sharing this nice blog.

    ReplyDelete
  34. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    data analytics courses in mumbai
    data science interview questions

    ReplyDelete
  35. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    data science course Mumbai
    data science interview questions
    data analytics course in mumbai

    ReplyDelete