In one of my previous articles I showed how AWK can be used to get a very useful statistic from server log. Today I want to introduce my other friend – SED, which will help us to modify values of element nodes within an XML file.
Here is a little info on SED from wikipedia:
sed (which stands for Stream EDitor) is a simple and powerful computer program used to apply various textual transformations to a sequential stream of text data. It reads input files line by line, applying the operation which has been specified via the command line (or the sed script), and then finally outputs the line. It was originally developed from 1973 to 1974 as a Unix utility by Lee E. McMahon of Bell Labs, but today sed is now available for Unix (BSD, Mac OS X), Linux, and Win32, as well as many other platforms.
Ok, let’s see what is given. By complete accident we have an access to an XML request (file) from Yanik’s bank (ING), that performs a transfer of $1,000,000.00 dollars to his account today in exactly one hour. Here is what the request looks like (goodnews.xml):
<?xml version="1.0" encoding="ISO-8859-1"?> <goodnews> <to>Yanik</to> <from>ING Bank</from> <date>04/01/2007</date> <amount>$1,000,000.00</amount> <account>0024549Y48K3-843</account> <message>We are pleased to inform you that the above amount was transferred to your bank account</message> </goodnews>
Now, what if we could just made a slight change to this request, and become a receiver of that million dollars – would not that be cool!? Well, I am not asking Yanik here, for an obvious reason… The answer is – yes, it’d be cool and SED can help us archive our goal. Here is how.
Below, I wrote a small shell script that will be using SED. The script will take three parameters from a command line:
- “xml filename”, “element name” and “new value”
Then it will extract the value from the “element name”, and substitute it with a “new value” – that’s it – that is how simple it is. Does it smell like a million dollars already?
Here is the code (relement.sh):
# Check that exactly 3 values were passed in if [ $# -ne 3 ]; then echo 1>&2 “This script replaces xml element’s value with the one provided as a command parameter \n\n\tUsage: $0 <xml filename> <element name> <new value>” exit 127 fi echo "DEBUG: Starting... [Ok]\n" echo "DEBUG: searching $1 for tagname <$2> and replacing its value with '$3'" # Creating a temporary file for sed to write the changes to temp_file="repl.temp" # Elegance is the key -> adding an empty last line for Mr. “sed” to pick up echo ” ” >> $1 # Extracting the value from the <$2> element el_value=`grep “<$2>.*<.$2>” $1 | sed -e “s/^.*<$2/<$2/” | cut -f2 -d”>”| cut -f1 -d”<”` echo "DEBUG: Found the current value for the element <$2> - '$el_value'" # Replacing elemen’s value with $3 sed -e “s/<$2>$el_value<\/$2>/<$2>$3<\/$2>/g” $1 > $temp_file # Writing our changes back to the original file ($1) chmod 666 $1 mv $temp_file $1
Let’s run it now and get that million dollars, that we are after:
[me at server]~: ./relement.sh goodnews.xml account my-secure-account DEBUG: Starting... [Ok] DEBUG: searching goodnews.xml for tagname <account> and replacing its value with 'my-secure-account' DEBUG: Found the current value for the element <account> - '0024549Y48K3-843' DEBUG: <account>0024549Y48K3-843</account> was successfully changed to <account>my-secure-account</account> DEBUG: Exiting... [Ok] [me at server]~: cat goodnews.xml
<?xml version="1.0" encoding="ISO-8859-1"?> <goodnews> <to>Yanik</to> <from>ING Bank</from> <date>04/01/2007</date> <amount>$1,000,000.00</amount> <account>my-secure-account</account> <message>We are pleased to inform you that the above amount was transferred to your bank account</message> </goodnews>
Now we are getting all the money and not Yanik (well it is MY-secure-account, so technically I get it
).
Here is the nitty-gritty details of how that financial operation was possible…
Of course, the heart of this script is this line:
el_value=`grep "<$2>.*<.$2>" $1 | sed -e "s/^.*<$2/<$2/" | cut -f2 -d">"| cut -f1 -d"<"`
And here is what happens here:
- 1. We grep “<element>whatever</element>” from the file ($1)
2. Then we apply sed to search to ignore everything from the beginning of the line to the “<element”
3. And finally we cut the value of this element which is located in between greater and less signs “>value<”
Easy, right?
Now let us look at this line:
sed -e "s/<$2>$el_value<\/$2>/<$2>$3<\/$2>/g" $1 > $temp_file
which uses sed’ (or vi’s) ‘s/search/replace/g” pattern to do the job – to replace all the “<element>oldvalue</element>” to “<element>newvalue</element>”. After that it saves it in a temp file, before replacing the original file.
The line
echo " " >> $1
makes sure that the source file has an empty last line, so sed can identify the “end of file” correctly
There is also one thing to mention – if you export Microsoft (e.g. M$ Word) document to XML, in order to change anything (properties) there, add these lines:
sed -e "s/<w:t>$el_value</<w:t>$3</g" $1 > $temp_file chmod 666 $1 mv $temp_file $1
it will change all the corresponding elements to the property you need to change.
Now we are completely empowered! Quit your job – learn SED, and earn millions! No.. rather – billions!
Feel free to ask questions or leave comments.
Although this approach is good for small tasks, like getting a million bucks
, it is not a full blown XML parser, how will it work with nesting? What if elements include attributes ?
- Gregg
Hi Gregg,
Very good point about elements’ attributes, here is what needs to be chaged (just a couple of lines
) to achieve the desired behavior:
# Extracting the value from the <$2> elementel_value=`grep "<$2>.*<" $1 | sed -e "s/^.*<$2/<$2/" | cut -f2 -d">"| cut -f1 -d"<"`
and
# Replacing elemen's value with $3sed -e "s/<$2>$el_value</<$2>$3</g" $1 > $temp_file
you can also change a debug message to:
echo "DEBUG: <$2>'s value '$el_value' was successfully changed to '$3'"Now let’s see it in action. Let’s say an account element has an attribute:
<account type="checking">9283745-F87DS-GS</account>then the script should be executed like this:
./relwithattr.sh goodnews.xml "account type=\"checking\"" my-secure-accountNow it will do all the elements with attributes
The only caveat to this approach is if a “value” contains a “<” sign..
As to nesting, I do not see the problem, as long as a nesting elements do not have elements with a nesting “root” name.
And of course it is a not full blown XML parser – it does not even create a node tree, but it is a very fast/cheap and elegant “way to get a million”
This script works fine for a single element replacment . what if you want to replace two or three different elements in the same xml file . would there be that may sed statments for each element and what if an element value repeats .
Eg can be listed more than twice with different values. what if you want to replace those values with new values
Hey cyber,
Valid concern
1. The same element that repeats will be changed by this script as well due to the “grep” property to grep ALL lines that match.
2. If you’d like to replace 2 or more different elements, then, the script could be modified to introduce a (for/while) loop. The only question now is to feed it with parameters. The way I would do it – create a file in the following (YAML ?) format:
So now there are two parts to it: Provider and Processor. Where Provider provides (reads) a file name and elements, and the processor has that (for/while) loop to call “sed” on grepped lines.
Let me know if you need clarification on anything,
Thanks for your comment,
Toly
The format of the xml file is like this
I need to search for this propertyname and replace the value of the location= with a new value only on this line without touching any other lines which has the same string dir.sdk .
XML files do not follow a standard format . Thats where the issues arises
It\’s great when given simple examples of xml. Unfortunately, most XML is not simple. This may be a job for awk rather than sed.
redherring
red
ourtarget
value
still the value
We only want to change the value for key, where name=ourtarget. The problems are that name may not precede value in the block, name may not be an attribute but rather a tag.. and value can be multiline.
and now with escaped tags
This does not work if the string has special characters. for example //jdbc:db2:localhost:/test1:30000
Is there any solution for this?
Thanks for the post…
It really solved one of my issue…
“# Elegance is the key -> adding an empty last line for Mr. “sed” to pick up
echo ” ” >> $1″
GUYS, ABOVE POINT IS REALLY VERY IMPORTANT
Thanks a ton
Mohit M Makhija
Yes. I am also facing the issue of special characters … my value in xml is jdbc:mysql://localhost:3306/my_db … This script doesn’t replace …. any help ?
Nik,
Try putting the escape character “\” before a fwd Slash
For me this worked -’s/INACTIVE/ACTIVE/g’
Use this – jdbc:mysql:\/\/localhost:3306\/my_db
Thanks,
Mohit M Makhija
Is there a way to edit in between the tags. I mean ignore a part in the tag and change the rest.
I want to change mm to m in the below:
167.8 mm
Any suggestions please.
-J
Is there a way to edit in between the tags. I mean ignore a part in the tag and change the rest.
I want to change mm to m in the below:
tag: avg 167.8 mm /avg
Any suggestions please.
-J
i had a one file with many messages in it as :
Message
Message
…..
…..
…..
Here, i wanted to write ian ndividual file for every Messages
like
>cat 1.txt
aaaaaaaaaaaa
bbbbbbbbbbbb
cccccccccccccc
dddddddddddd
>cat 2.txt
eeeeeeeeeeee
fffffffffffffffffffffff
gggggggggggg
hhhhhhhhhhhh
Please suggest
I am new to this stuff and am in a class right now learn SED. I get the very basics of UNIX and I really need to know an in English explanation for SED. Nothing I have read yet has summed up even what it is so I know what to start looking for…? Can you help me here?