Jump to content
xisto Community
beatgammit

How To Use Xml For Data Storage Explores the power of XML (extensible markup language) and XPath

Recommended Posts

Introduction

Last summer, I worked as an intern for a small software development company, which only had three workers, apart from myself, there everyday, with a few other workers selling their product around the country. My job was to convert the data storage methodology from binary to XML. At first, I thought that this was a waste of time because, after all, "if it ain't broke, don't fix it". This is how I approached it from the beginning, until I realized the power behind it. I started using XML for all of my pet projects at home. You may be asking, "What is XML?" or, at least, "What makes it so cool?" Thank you for asking, I'll tell you.

 

Definition

XML (extensible markup language) is actually not a language, it is more of a methodology or specification. The format is similar to HTML (hypertext markup language), so if you know HTML, you know the basic "syntax" of XML. XML is a method of standardized data storage, where data can be stored in a common format. XML, while being based on text, isn't as small in file size as binary or other forms of database management, and thus not as fast for computes, but it is very efficient for what it is. I cannot think of any other way to make a text-based data storage methodology that is more efficient as XML. The beauty of XML is the ease, and efficiency, of extracting the data. Hopefully by the end of this tutorial you will understand where I'm coming from.

 

Basics of XML

To explain the power of XML, I'll teach by example. Lets say that we are needing to store information about a musical band. To do this, we must specify the various parts of a band, which include:

Name

Albums

Members

Alright, we have the basic format of a band down now, but we must define what constitutes a member or an album. Here's a list of member attributes:

Name

Instrument

Here is a list of album attributes:

Title

Date Released

Tracks

Alright, now we need to say what constitutes a track:

Title

Length

Lyrics

Track Number

As you can see, this is a very basic definition of what constitutes a band. In a normal database, we would probably have four databases in this instance:

Band database- band definitions and references to member database and album database

Member database- member definitions

Album database- album definition and references to track database

Track database- track definitions (and possible reference to album database)

It is possible to have this all in one database, but this would be pretty messy, and would require some advanced database programming scripts, or an advanced database that allowed entries with entries with entries. With XML, we can do this all in one file. I'll give you the XML file that will store this data with some values, and then we'll examine the file together.

<Band name="Modest Mouse">	<Member name="Isaac Brock" instrument="Vocals/Lead Guitar" />	<Member name="Eric Judy" instrument="Bass" />	<Member name="Jeremiah Green" instrument="Percussion" />	<Album title="Good News For People Who Love Bad News" releaseDate="Apr 06, 2004">		<Track Title="The World at Large" TrackNumber="02" Length="4:31">			<Lyrics>			</Lyrics>		</Track>		<Track Title="Float On" TrackNumber="03" Length="3:27">			<Lyrics>				I backed my car into a cop-car, the other day.				Well, he just drove off, sometimes life's okay.			</Lyrics>		</Track>	</Album>	<Album Title="The Moon and Antartica" ReleaseDate="Mar 09, 2000">		<Track Title="3rd Planet" TrackNumber="01" Length="3:58">			<Lyrics>				The third planed is sure that they're being watched				By an eye in the sky, that cannot be stopped.			</Lyrics>		</Track>		<Track Title="Gravity Rides Everything">			<Lyrics>			</Lyrics>		</Track>		<Track Title="Dark Center of the Universe" TrackNumber="03" Length="5:02"/>	</Album></Band>
If you know HTML, this may seem pretty familiar. Let's take a look at the basic format of XML. The basic building block of XML is the XML Element. An Element has Attributes. An Element starts at a '<' and ends at a '>', just like HTML tags. Between this lies the Attributes. An element follows this pattern:

<NameOfElement Attribute1="Value" Attribute2="Value" Attribute3="Value"></NameOfElement>

You may have noticed that in my example, the "</NameOfElement>" did not appear after some Elements. Only elements that contain other Elements need to have this closing tag, which wraps around the other elements. Instead, the methodology is to place a '/' just before the closing '>' of the Element. Now, back to the example. In the example, the first Element reads: <Band name="Modest Mouse">.

 

The word Band is the NameOfElement in my little template, and name is the Attribute name, and "Modest Mouse" is the value of the data. All data is stored in strings, so integers must be represented as strings. This is really handy for debugging, but it does take up a bit of space.

 

Inside of this Band Element, we find a bunch of Elements inside of it. We call these child Elements. Some of these child Elements have the same Element name. This is because these are all related. They are all of type Member or Album, and all have similar Attributes. It is possible for some to have more or less information than others, but usually they are pretty similar (it depends on the code, I'll get to that later).

 

Let's take a look at the Album Elements. I included two of these elements, just to show that I can include as many as I want. Lets look at the first one. This has two elements, with various values in their Attributes, but they have the same Attributes. Each has a "Lyrics" Element, but none of them have any Attributes. So what good is an Element without an attribute to define the value of the Lyrics? Take a look at the second Element, trackNumber 3. Between the opening and closing tags (the <Lyrics> and the </Lyrics> tags) there is some text. This text is called "inner text". This is not an attribute, but is whatever information you want to have. Every Element can have inner text, even those with child elements.

 

Now let's take a look at the second album. This album has three Elements. The first element has lyrics entered into it's inner text, the second node is missing the trackNumber and length Attributes, and third node has no lyrics Element. All of these Elements are valid, but probably not preferred. Everything depends on the implementation, which is my next topic. At the end of the code, we find the closing Band tag.

 

Implementing XML- unleashing the power

By now, you hopefully understand the basic structure of XML, but you probably don't understand how to implement it. Implementation of XML is the fun part. To pull the data out, we use something called XPath (I'll explain this after I explain how XML can be used). For full documentation, click here. It is quite a lengthy documentation, so I'll give you the basics that you'll need.

 

To start using XML, you must first have some data to store. You must decide on a storage hierarchy that makes the most sense to your data (such as the Band example). Once you have a structure that you have decided on, you can then begin programming your code to use XML. Most programming languages support XML and provide you with XML parsers. An XML parser "translates" the XML into a structure that makes sense. This structure is made up of Elements, each of which has child Elements. The parser lets you "iterate" through the list and pull out the Elements. Once each Element is pulled out, you can pull out the individual Attributes and store them. Then you can get that Element's child Elements, and so on until you get all of the information you need. This seems kind of tedious if, say, you wanted all of the Tracks in a band with a length greater than three minutes. You would have to go through all of the XML data in order to get at the information you care about. This is not very efficient code. Well, you're in luck; the developers of XML foresaw this problem and developed a standard called XPath.

 

With XPath, you can give it a "criteria" for the data that you want. As introduced in two paragraphs above, XPath is quite lengthy and arduous to learn, but I'll give you the basics.

 

XPath is based off of strings containing patterns recognized by the XPath parser (the thing that interprets what you ask and gives you what you want). Here are some basics:

 

Get children by Element name: Most parsers have build-in functionality like GetNodes(ElementName) and you can use XPath to get further down the hierarchy than the first level. Just make the string like a FilePath string, with the format- "Child/Child/Child", where each Child is a child of the Element above it. This will get all of the children of whom this path will work for from the current place in the XML file.

 

Get child by Element index (or children of a specific Element index): In order to get a specific list (if you know the order of the children), you can use the format- "Child[ChildIndex]/Child[ChildIndex]/Child" etc until you get at the specific children, or child, that you want.

 

Get children by attribute value: In order to get a child with a specific attribute, you follow similar syntax as described above for a child by index, except instead of supplying the index number, you will supply the attribute name and the value you are looking for. This follows the format- "Child[@AttributeName=Value]". The syntax for the attribute name follows syntax, '@' then the attribute name, an '=' sign, and then the "Value", where Value is the value you are looking for. This can be extended as far down the hierarchy, as far as you need, but remember, if you make your criteria too strict, you may get no results.

 

Get the parent of the current Element: To get the parent, you can either use the XPath parser provided Element.GetParent() or something like that. If you need to go further than this, you can use '..' to go up one level, or use "../../" to go as far as you like.

 

Get a specific attribute of the parent of this Element: To get this, XPath provides some cool functionality. Use the syntax- "../@AttributeName"- to get the parent's attribute AttributeName. Make sure to put the '@' in there. You can go as far up the hierarchy as you like.

 

Conclusion

XML is a very powerful, and standardized, method for data storage. Many large corporations are using this to simplify their data storage habits. XML is becoming increasingly popular in online data storage, especially for RSS feeds. One of the best parts about XML is that it is easy to modify and get information out of. It is also really easy to create software that can work with files in XML format. I hope you understand the power behind XML, and if you did not, it is most likely due to my inabilities of explanation, not to any shortcomings of the XML standard.

 

 

There are many more things that can be done with XPath, but these are the ones that I use the most. I apologize if this tutorial was a bit confusing because this is my first tutorial. Please respond if you would like clarification on any of the topics I discussed or help with XPath.

 

Thank you for reading my Tutorial on XML and XPath. Happy coding!!

Share this post


Link to post
Share on other sites

Very nice Tut there Beatgammit you describe things well in a logical manor :)How about making a tut on Xpath as well then you could later come back and edit this one with refrences to the Xpath one.Just a suggestion, keep up the good work!!

Share this post


Link to post
Share on other sites

Wow... this is really an excellent tutueriali never thought that xml would be that strong .. very impressivei think iam gona start learning this1 soon :)Keep the good work :)

Share this post


Link to post
Share on other sites

A very good tutorial, when I tried to learn XML, the first days were quite horrible, because reading different tutorials I always thought, why the hell I need this 'XML', what is the purpose of it, I really can live without it.. so I quite, but eventually after some time I went back to and only when I tried to do something with it myself and use it, only then I found how good it can be and what can you do with it and what is the 'best' that the syntax is like HTML which is very simple. :)Nice job on the tutorial, keep up the good work and yeah, the tut on XPath would be awesome :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.