eParliament

Thinking about Parliament on the Net

Searching the Houses of the Oireachtas Web sites

Posted by bollinsl on 4 September 2008

Various search tools have been used to allow users to search the Houses Web sites.

PLWeb

An old but effective search engine “PLWeb” is used to provide searching on the following hugh Web sites:

The search index is updated when new data is added. DC Metadata is added to the pages, adding to the search functionality.

dtSearch

http://acts2.oireachtas.ie/dtSearch_english.html

In order to allow users to see what Acts are available on-line in both Irish and English, an alternative version of Achtanna.ie is available as achtanna2.oireachtas.ie or acts2.oireachtas.ie using the dtsearch search engine is provided:

Google

A Google appliance and on-line Google tools are available for users who wish to search the other Oireachtas sites:

Lucene

Lucene is used for the committee reports site under development.

Other Search engines

A number of other search engines were used over the years, including:

  • Lotus Notes based search may have been available in 1996 when the site was Notes based.
  • 1997 … A search tools provided by gov.ie designed by a student I believe!
  • FrontPage search – specific to a particular Web – the site soon grew too big for this search engine
  • Open Objects search engine was used before Google
  • Current debates has a search facility within each day’s debates
    see Dáil 10 July 2008 - ”Find Text Today”
  • The members database has a search facility built in
    http://www.oireachtas.ie/members-hist/
  • The order papers page has its own search facility
  • You can search within a PDF file as long as it is not scanned. Be careful with Irish words as DíOSPóIREACHTAí appears as DI´OSPO´ IREACHTAI´

Issues

  1. The Oireachtas Web site is very large, with millions of pages. Search engines can get very expensive when required to index so many pages.
  2. Configuring the search engine to index pages served up by a CMS or XML based system, where multiple versions of the same page are available on-line, and where you have heterogeneous technologies on a site, can be problematic, to say the least!
  3. Where PDF files are used extensively in the site, and where Irish language characters are used in PDF documents, the search engines seem unable to interpret the Irish text. Such characters would include: ÁÉÍÓÚ and áéíóú. This means that Irish language based searching is difficult.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>