Pages

Saturday, November 3, 2012

XML parsing/generation with ElementTree

Parsing XML

Let's say we want to parse this XML code.
  • The XML has the root element of "data".
  • Inside the root element, we have "TestA".
  • Inside the "TestA" element, we have multiple "Source" elements.
  • The "Source" element has three attributes.

 
  
 

Maybe the most convenient way to parse this XML is using ElementTree with Python. You can get the root with getroot() method.
import xml.etree.ElementTree as ET

tree = ET.parse('setup.xml')
root = tree.getroot()
From the root (or any element object), you can get the element with find() method with the element name as parameter. You can get the element whatever you want likewise.

The interesting thing you notice is that the element object returns iterator that gives all the sub elements.

The element object also has the attrib dictionary that contains the attributes.
testA = root.find('TestA')
sources = testA.find('Source')

for source in testA: # testA returns all the sub elements. 
    print source.attrib
    print source.attrib['numberOfParams']
    print source.attrib['filePath']
    print source.attrib['method']
This is the result to run the script.
{'numberOfParams': '2', 'method': 'method', 'filePath': 'Charlie'}
2
Charlie
method

Generating XML

You can crete the Element object with Element() with name of the element as a parameter.
root = ET.Element('data')
With SubElement with first parameter as the element object, and the second as the element name, one can add sub elements. As element object has the attrib dictionary, by filling in the dictionary one can add attributes.
testChoice = ET.SubElement(root, 'TestA')
source = ET.SubElement(testChoice, 'Source')
source.attrib[key] = sourceInfo[key]
And to string method generates string, and you can just save the string into a file.
xml_str = ET.tostring(root)
with open("./hello.xml","w")  as f:
            f.write(xml_str)
For element object you can use the dump() method
ET.dump(root)
The code to generate the xml can be as follows:
import xml.etree.ElementTree as ET
import re

class Source:
    def __init__(self):
        self.method = "method"
        self.filePath = "filePath"
        self.numberOfParams = 2
        
    def buildSource(self, data, testChoice, **sourceInfo):
        testChoice = ET.SubElement(data, testChoice)
        source = ET.SubElement(testChoice, 'Source')
        for key in sourceInfo.keys():
            print key, sourceInfo[key]
            source.attrib[key] = sourceInfo[key]
        
    def buildXml(self):
        root = ET.Element('data')
        sourceInfo = {}
        sourceInfo['filePath'] = "Charlie"
        sourceInfo['numberOfParams'] = "%d" % 2
        sourceInfo['method'] = "method"
        self.buildSource(root, 'TestA', **sourceInfo)
        
        xml_str = ET.tostring(root)
        
        #root = ET.fromstring(xml_str)
        ET.dump(root)
        with open("./hello.xml","w")  as f:
            f.write(xml_str)

source = Source()
source.buildXml()

No comments:

Post a Comment