"; */ ?>

SED to parse and modify XML element nodes

In one of my previous articles I showed how AWK can be used to get a very useful statistic from server log. Today I want to introduce my other friend – SED, which will help us to modify values of element nodes within an XML file.

Here is a little info on SED from wikipedia:

sed (which stands for Stream EDitor) is a simple and powerful computer program used to apply various textual transformations to a sequential stream of text data. It reads input files line by line, applying the operation which has been specified via the command line (or the sed script), and then finally outputs the line. It was originally developed from 1973 to 1974 as a Unix utility by Lee E. McMahon of Bell Labs, but today sed is now available for Unix (BSD, Mac OS X), Linux, and Win32, as well as many other platforms.

Ok, let’s see what is given. By complete accident we have an access to an XML request (file) from Yanik’s bank (ING), that performs a transfer of $1,000,000.00 dollars to his account today in exactly one hour. Here is what the request looks like (goodnews.xml):

<?xml version="1.0" encoding="ISO-8859-1"?>
      <from>ING Bank</from>
      <message>We are pleased to inform you that the above amount was transferred to your bank account</message>

Now, what if we could just made a slight change to this request, and become a receiver of that million dollars – would not that be cool!? Well, I am not asking Yanik here, for an obvious reason… The answer is – yes, it’d be cool and SED can help us archive our goal. Here is how.

Below, I wrote a small shell script that will be using SED. The script will take three parameters from a command line:

    “xml filename”, “element name” and “new value”

Then it will extract the value from the “element name”, and substitute it with a “new value” – that’s it – that is how simple it is. Does it smell like a million dollars already? :)

Here is the code (relement.sh):

# Check that exactly 3 values were passed in
if [ $# -ne 3 ]; then
echo 1>&2 “This script replaces xml element’s value with the one provided as a command parameter \n\n\tUsage: $0 <xml filename> <element name> <new value>exit 127
echo "DEBUG: Starting... [Ok]\n"
echo "DEBUG: searching $1 for tagname <$2> and replacing its value with '$3'"
# Creating a temporary file for sed to write the changes to
# Elegance is the key -> adding an empty last line for Mr. “sed” to pick up
echo ” ” >> $1
# Extracting the value from the <$2> element
el_value=`grep<$2>.*<.$2>$1 | sed -e “s/^.*<$2/<$2/| cut -f2 -d”>| cut -f1 -d”<`
echo "DEBUG: Found the current value for the element <$2> - '$el_value'"
# Replacing elemen’s value with $3
sed -e “s/<$2>$el_value<\/$2>/<$2>$3<\/$2>/g” $1 > $temp_file
# Writing our changes back to the original file ($1)
chmod 666 $1
mv $temp_file $1

Let’s run it now and get that million dollars, that we are after:

[me at server]~: ./relement.sh goodnews.xml account my-secure-account
DEBUG: Starting... [Ok]
DEBUG: searching goodnews.xml for tagname <account> and replacing its value with 'my-secure-account'
DEBUG: Found the current value for the element <account> - '0024549Y48K3-843'
DEBUG: <account>0024549Y48K3-843</account> was successfully changed to <account>my-secure-account</account>
DEBUG: Exiting... [Ok]
[me at server]~: cat goodnews.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
      <from>ING Bank</from>
      <message>We are pleased to inform you that the above amount was transferred to your bank account</message>

Now we are getting all the money and not Yanik (well it is MY-secure-account, so technically I get it :) ).

Here is the nitty-gritty details of how that financial operation was possible…

Of course, the heart of this script is this line:

el_value=`grep "<$2>.*<.$2>" $1 | sed -e "s/^.*<$2/<$2/" | cut -f2 -d">"| cut -f1 -d"<"`

And here is what happens here:

    1. We grep “<element>whatever</element>” from the file ($1)

    2. Then we apply sed to search to ignore everything from the beginning of the line to the “<element”

    3. And finally we cut the value of this element which is located in between greater and less signs “>value<“

Easy, right?

Now let us look at this line:

sed -e "s/<$2>$el_value<\/$2>/<$2>$3<\/$2>/g" $1 > $temp_file

which uses sed’ (or vi’s) ‘s/search/replace/g” pattern to do the job – to replace all the “<element>oldvalue</element>” to “<element>newvalue</element>”. After that it saves it in a temp file, before replacing the original file.

The line

echo " " >> $1

makes sure that the source file has an empty last line, so sed can identify the “end of file” correctly

There is also one thing to mention – if you export Microsoft (e.g. M$ Word) document to XML, in order to change anything (properties) there, add these lines:

sed -e "s/<w:t>$el_value</<w:t>$3</g" $1 > $temp_file
chmod 666 $1
mv $temp_file $1

it will change all the corresponding elements to the property you need to change.

Now we are completely empowered! Quit your job – learn SED, and earn millions! No.. rather – billions!

Feel free to ask questions or leave comments.