Pages

Monday, October 22, 2012

String pattern matching in Java and Python

I needed to extract substring: the input is
rename_method("%.Arith#add()","%.Arith#addRefactored()","%.Arith")
And from the string I need to extract the substrings in the quotation marks:
%.Arith#add(), %.Arith#addRefactored(), %.Arith
The java code could be implemented as:
  Pattern p = Pattern.compile("rename_method\\(" + // ignore 'rename_method('
    "\"([^\"]*)\"," +                        // find '"....",'
    "\"([^\"]*)\"," +
    "\"([^\"]*)\""  +
    "\\)");                                  // ignore closing ')'
  Matcher m = p.matcher(refactoring);
  
  boolean result = m.find();
  if (result)
  {
   beforeMethodName = m.group(1);
   afterMethodName = m.group(2);
  }
Python uses some interesting features to make the code simpler and easier to read.
matchstring = (
"rename_method\("   # 
  "\"([^\"]*)\","  # %.Arith#add()
  "\"([^\"]*)\","  # %.Arith#addRefactored()
  "\"([^\"]*)\""   # %.Arith
  "\)"
)
inputstring = r'rename_method("%.Arith#add()","%.Arith#addRefactored()","%.Arith")';

m = re.search(matchstring, inputstring)
print m
if m:
 beforeMethodName = m.group(1)
 afterMethodName = m.group(2)
You can use python re module's compile/match, and by doing so, you can remove bunch of parenthesis
p = re.compile(r"""
rename_method\(   # 
  \"([^\"]*)\",   # %.Arith#add()
  \"([^\"]*)\",   # %.Arith#addRefactored()
  \"([^\"]*)\"    # %.Arith
  \)
"""
, re.VERBOSE)

m = p.match(inputstring)
if m:
 beforeMethodName = m.group(1)
 afterMethodName = m.group(2)
For concatenating strings in python with some comment, you can check this post. For re.VERBOSE, check this site. For regular expression in python, check this site, and this one.

No comments:

Post a Comment