Archives
- June 2009
- May 2009
- April 2009
- January 2009
- December 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- November 2007
Categories
Links
Site News
- Our new home on Facebook
- Opening up the procedures of Parliament
- Another big step forward for government transparency in Australia
- Multiple email alerts over the last few days
- Read the Register of Senators' Interests here
- Government website changes everything
- A new look OpenAustralia
- The Senate is Here!
- OpenAustralia behind the scenes
- Photos on all representatives page
Twitter @openaustralia
- @johnf Did you see this? http://bit.ly/tlsp7 (tweet)
- Ooh. We've had around 400 new email alerts signed up for in the last two weeks! (tweet)
- Updated Register of Senators' Interests (up to 22 June 2009) now up on http://www.openaustralia.org (tweet)
- Nice roundup by @mithro on Google's Open Source blog of #openaustralia's recent hackfest: http://tr.im/qpzA (tweet)
Why is OpenAustralia not getting updated?
Published by Matthew Landauer | Filed under Announcement, Development
If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
It’s only a temporary affair but OpenAustralia is not getting updated with the latest speeches in the House of Representatives and the Senate.
Why? Well, let me explain. It’s been a tumultuous few weeks behind the scenes here. If you use OpenAustralia you’re probably blissfully unaware of some changes that have taken place at the official online home of the Hansard at aph.gov.au which have caused us a great deal of grief.
Several weeks ago, a new system for accessing the Hansard at aph.gov.au was made live and the old system was immediately switched off. We had some warning that this was going to happen. Also, we were told by a person at the Department of Parliamentary Services (DPS) that the old system would be kept online for about a month after the switchover. Unfortunately, this isn’t what actually happened.
After the switchover nothing worked for us. Our parser that scrapes all the Hansard information depended very tightly on how the information was structured and everything had changed! So, nothing worked.
Many conversation ensued with the DPS imploring them to turn on the old system again and at least give us some grace period to try to rewrite our parser to work with the new parlinfo search. Thankfully after a few days they agreed to put the old system back up for a short period of time.
That allowed OpenAustralia to keep on working for a little while.
Then, for me, the fun truly started. I was faced with a new system that bore only a passing resemblance to the old one. The way that the Hansard was split into multiple pages had changed; The structure of the HTML markup had changed; the metadata associated with the pages had changed – everything had changed! Worse still, I soon discovered that there were some absolutely fundamental problems. Information was missing, such as whether a particular page is “procedural text”, most pages are not valid XHTML – a typical page when put through an HTML validator comes up with over 600 errors; I discovered some instances where the text was in the wrong order, even where several different sections of text from different places had been combined into one section.
Somehow I tried to work my way around each of these problems. I battled away at this for a few weeks making very slow and painful progress.
Then, I heard murmurings from the DPS that another solution might be coming. What might this be?
Three days ago, Friday last week, they added a new link to Hansard pages that allow you to download an XML file. This XML file is the underlying data that until now has only been used internally within the DPS. It is what comes out of the “Hansard Production System” which are the people and systems that annotate and record the Hansard and is what goes into the web system. So, it has all the information required to truly make sense of the Hansard.
I had asked for access to the XML data in November of last year when I started working on what became OpenAustralia. I never heard anything back. Also, during phone calls with DPS I brought it up again but I never expected it to get anywhere. It turned out that at the same time Jason Wilson from GetUp’s Project Democracy had been asking for the same thing. So, huge thanks goes to Jason Wilson and his team at GetUp for helping getting DPS to give us the Hansard XML data.
I dropped everything and have spent every waking moment since then working on rewriting the parser to work from the XML file. I’ve made good progress. Now, it’s Monday, but I don’t realistically think that it’s going to be anywhere near ready by tomorrow when the first of the Hansard from this most recent parliamentary day will appear.
So, please be patient while we fix this. We’ll do everything we can to make it as quick as possible.
And, of course, we’ll keep you posted.



October 13th, 2008 at 3:46 pm
Gah! Well at least they won’t be changing the XML format dramatically without having to change their own internal systems too.
October 13th, 2008 at 11:40 pm
Many thanks for all your hard work. I’m sorry to hear you had to waste all that time, why can’t they get their act together? Wonderful stuff.
October 15th, 2008 at 5:20 pm
Ditto. It’s really great to see this site up and much respect for your efforts.
October 21st, 2008 at 3:48 am
I am seaching for some idea to write in my blog… somehow come to your blog. best of luck. Eugene
November 3rd, 2008 at 11:49 pm
Everything’s up and running again! See http://www.openaustralia.org/news/archives/2008/11/03/government_websi