Honesty Rocks! truth rules.

Using Regular Expressions To Parse Functions

HOME      >>       Programming

turbopowerdmaxsteel

I have a hierarchy of functions represented as something like:-

 

@BeforeText(@AfterText(@ReadURL('http://www.quotationspage.com/quote/1.html')@,'<dt>',True)@,'</dt>',True)@

 

Each of the functions follow the format @FunctionName(<ParamString>)@. The ParamString itself can be composed of a string, a number or a boolean value. For example:-

 

@AfterText('String', "st", True)@ - This function returns the portion of text following the specified substring. It is composed of three parameters. First one is the string being worked on, second is the substring being sought and the third parameter denotes whether to Ignore case.

 

The above code will result in an output of ring.

 

Trouble starts with function nesting. The following example should give the output so.

 

@BeforeText(@AfterText('Microsoft', "Micro", True)@, "ft", True)@

 

This kind of nesting should be possible to arbitrary levels. I can do this using Loops and Conditional Branching statements. But, I would rather use Regular Expressions and in the process learn some more nuances of .NET's Regular Expression support.

 

Classes for evaluating the functions have been designed. What I need to do is parse the functions from the deepest of levels just as the programming languages do, execute it and use the resulting output as parameters to outer level functions. But I am not sure where to begin. Anybody care to give a head start?


faulty.lee

Why don't you consider using .net scripting or powershell, instead of cracking your head to reinvent the wheel. For .net scripting, the script itself is plain .net, be it VB.Net or C#. The main thing is to use System.CodeDom.Compiler.ICodeCompiler to compile and run the code on the fly. Windows Powershell http://forums.xisto.com/no_longer_exists/

It's also relying on .net. But it's scripting engine is much more powerful, and suited to run as an console or scripted.


turbopowerdmaxsteel

I have tried my hands on the ICodeCompiler interface. It forms a part of the package I am building for Pika Bot. But, these functions have got to be written in this custom language and regular expressions seemed to be the best option. I could, however, convert these functions into VB or C# representations and execute the code using the ICodeCompiler interface.


turbopowerdmaxsteel

Given below is a simplistic code that represents the function evaluation method. Codes irrelevant to the problem have been abstracted to avoid cluttering.

 

public static string Substitute(string Message){ string Pat = @"@(?<name>[a-z0-9]+)\((?<paramstring>.*)\)@"; Match M = Regex.Match(Message, Pat, RegexOptions.IgnoreCase | RegexOptions.Singleline); if (M.Success) { // BEGIN Parameter determination Code // END Parameter determination Code // Substitute Parameters, incase they contain functions. (Nested Functions) for(int i = 0; i < Params.Length; i++) { Params[i] = Substitute(Params[i]); } // Pre & Post variables contain the strings which are before and after the matched text. // Func is an object of the appropriate function class. // The Parameters are added to it after they are substituted. Message = Pre + Func.Invoke() + Post; } return Message;}

The pattern matches the outermost function (using .* in the paramstring) and thus allows nested functions to be matched in the next recursions of the function. A lot of code exists between the // BEGIN Parameter determination Code and // END Parameter determination Code blocks. This splits the ParamString using , to determine the parameters and then combines invalid string entries (incase the , is contained inside a string parameter). The loop iterates through all the parameters calling the function itself for all the parameters. This takes care of nested functions. The object Func returns the result obtained from evaluating the function. For example @Log10(100)@ will result in 2. Pre & Post contain the strings before and after the matched text. In the input ABC@Rnd(1,100)@DEF, ABC is Pre and DEF is Post.

 

Consider the following Input:-

 

@BeforeText(@AfterText('School','S')@,'l')@

 

In the first call to the Substitute function, the pattern matches the whole input: @BeforeText(@AfterText('School','S')@,'l')@. Here, BeforeText is the name sub-group while @AfterText('School','S')@,'l' is the paramstring sub-group. The next recursive call to the substitution function passes the input: @AfterText('School','S')@.

 

The problem now is that multiple functions at the same level cannot be evaluated.

 

@Log10(0)@ Some Intermediate Text @Log(0)@

 

The Pattern matches the entire message - @Log10(0)@ Some Intermediate Text @Log(0)@. But, what I want it to do is match @Log10(0)@ and @Log(0)@ seperately. Excluding the symbols @ ( ) from the paramstring will not work as that would disable nested functions to be evaluated. I am wondering if there is something like recursive pattern matching in .NET and will it be able to aid in this matter.


faulty.lee

I'm not that good with regex. I do have a suggestion, maybe you can count for matching '@(' and ')@'. Say for every '@(' you increment the counter, then for every ')@' you decrement the counter. Like reference counting in C++. When counter is 0, you need to look for the next function, instead of skipping it. Get the last position/index of the last matching '@(' ')@' pair, then can either chop off the matched set of '@( and ')@', the match the regex again, of use the match function where you can specified the starting position for matching (but you loose the regexoption).


turbopowerdmaxsteel

I would have done it that way back in the old days but ever since realizing the power of Regex, it just doesn't feel right. It is my last option, though. I have just come across some interesting ways of matching such constructs but I can't seem to get them to work. Any ideas?


Moudey

plz how can parse this func?


iGuest

Replying to turbopowerdmaxsteelHi there;I know this reply is a significant amount of time after the initial post was made but, I came across this post trying to solve the same problem, so I thought I would post the solution I eventually found.The regular expression to do what you are looking for would be:(?[a-z0-9]+)((?(?>((?)|)(?<-DEPTH>)|.?)*(?(DEPTH)(?!))))Note that with this expression the @ symbols are not needed to designate a function/method, instead the open and close parenthesis attached at the end of a word is used.I have to give all credit to this excellent article I found on code project, the author explains how it works brilliantly.http://forums.xisto.com/no_longer_exists/ by Brendon

iGuest

Faulty expression.Using Regular Expressions To Parse FunctionsThe expression seems to have come out faulty due to some of the items I and between <>'s beingregistered as tags. Here is the correct one. 
(?<name>[a-z0-9]+)((?<Params>(?>((?<DEPTH>)|)(?<-DEPTH>)|.?)*(?(DEPTH)(?!))))

 

-reply by Brendon