Blog

Ponderings of a kind

This is my own personal blog, each article is an XML document and the code powering it is hand cranked in XQuery and XSLT. It is fairly simple and has evolved only as I have needed additional functionality. I plan to Open Source the code once it is a bit more mature, however if you would like a copy in the meantime drop me a line.

Atom Feed

Keeping GitHub pages up to date with your master

Auto-sync from master to gh-pages

For the RESTXQ specification that I am working on as part of my EXQuery efforts, I need to write up a "formal" specification for RESTXQ. The EXQuery RESTXQ code base lives on GitHub (http://github.com/exquery/exquery), and the specification has been authored in the exquery-restxq-specification module.

The RESTXQ specification is authored in HTML using Robin Berjon's excellent ReSpec tool. As specifications are arguably meant to be read by people, it would be nice if we could present the work in progress from the source repository to users as a web page.

Fortunately GitHub provides a nice facility for web pages called GitHub Pages. However, the pages are taken from a branch of your GitHub repository called gh-pages. The advantage of this is that your 'pages' can contain different content to your main source code base (i.e. your master branch). If your page content is in your master branch though, you need a facility for keeping the copy in your gh-pages branch up to date with your commits to master.

I will not detail how to setup GitHub pages, that is better covered here.

I simply wanted to be able to keep a single folder called exquery-restxq-specification from master in sync with my gh-pages. When creating my gh-pages repository, following the instructions above, rather than delete everything in the gh-pages branch, I deleted everything except the exquery-restxq-specification folder, and then committed and pushed.

To keep the folder in sync across both branches, we can add a post-commit hook locally to the master branch, so that when we commit changes to that folder in master, the changes are propagated to the gh-pages branch.

To add the post-commit hook, create the script: your-repo/.git/hooks/post-commit

    #!/bin/sh
    git checkout gh-pages                                           #switch to gh-pages branch
    git checkout master -- exquery-restxq-specification             #checkout just the exquery-restxq-specification folder from master
    git commit -m "Merge in of specification from master branch"    #commit the changes
    # git push                                                      #uncomment if you want to auto-push
    git checkout master                                             #switch back to the master branch
If you are on Linux/MacOSX/Unix, you must ensure that the script has execute permissions, otherwise Git will not execute it.

Now simply changing something in the exquery-restxq-specification folder in the master branch and committing, will cause Git to also sync the changes to the gh-pages branch.
As a further exercise it might be interesting to take the commit message for gh-pages from the last commit message of master...

Adam Retter posted on Sunday, 19th August 2012 at 12.03 (GMT+01:00)
Updated: Sunday, 19th 2012 at August 12.03 (GMT+01:00)

tags: XQueryFull-TextSearcheXist-db

0 comments | add comment

XQuery Matching Based on Word Distance

A distraction from work

Whilst at this moment I am meant to be preparing my sessions of the XML Summer School this year, I was reviewing Priscilla Walmsley's slides from last year and saw the following example given as a 'Search and Browse' use-case for XQuery:

"What medical journal articles since 2004 mention "artery" and "plaque" within 3 words of each other?"

I immediately thought to myself 'Hmm... that would be a tricky one to code in XQuery!. Of course the easy answer would be to use the W3C XPath and XQuery Full-Text extensions, for example:

            
    /journal[xs:date(@date) ge xs:date("2004-01-01")] contains text "artery" ftand "plaque" distance at most 3 words

Sadly however, eXist-db, which is the XQuery platform I like to use, does not implement the W3C Full-Text extensions yet. Instead it has its own full-text extensions based on Lucene, so in eXist-db the equivalent would be:

   
    /journal[xs:date(@date) ge xs:date("2004-01-01")][ft:query(., '“artery plaque”~3')]

If I stopped there however, it would be quite a short blog post. It also appears from the implementation test results that the W3C XPath and XQuery Full-Text specification is not widely implemented. So how about implementing this in pure XQuery? I took the challenge, and my Solution is below.
I would be interested to see attempts at a more elegant implementation or suggestions for improvements.

(:~
: Simple search for words within a distance of each other
:
: Adam Retter <[email protected]>
:)
xquery version "1.0";
        
declare function local:following-words($texts as text()*, $current-pos, $distance, $first as xs:boolean) {
        
    if(not(empty($texts)))then
        let $text := $texts[$current-pos],
        $next-tokens :=
            if($first)then
                (: ignore first word on first invokation, as its our current word :)
                let $all-tokens := tokenize($text, " ") return
                    subsequence($all-tokens, 2, count($all-tokens))
            else
                tokenize($text, " ")
        return
        
            if(count($next-tokens) lt $distance)then
            (
                $next-tokens,
                if($current-pos + 1 lt count($texts))then
                    local:following-words($texts, $current-pos + 1, $distance - count($next-tokens), false())
                else()
            )	
            else
                subsequence($next-tokens, 1, $distance)
    else()
};
        
declare function local:following-words($texts as text()*, $current-pos, $distance) {
    local:following-words($texts, $current-pos, $distance, true())
};
        
declare function local:preceding-words($texts as text()*, $current-pos, $distance) {
        
    let $prev := $texts[$current-pos - 1] return
        if(not(empty($prev)))then
            let $prev-tokens := tokenize($prev, " ") return
                if(count($prev-tokens) lt $distance)then
                (
                    local:preceding-words($texts, $current-pos - 1, $distance - count($prev-tokens)),
                    $prev-tokens
                )	
                else
                    subsequence($prev-tokens, count($prev-tokens) - $distance + 1, count($prev-tokens))
        else()
};
        
(:~
: Performs a search within the text nodes for words within a distance of ech other
:)
declare function local:found-within($texts as text()*, $distance, $words-to-find as xs:string+) as xs:boolean {

    let $results := 

        for $text at $current-pos in $texts
        let $current-word := tokenize($text, " ")[1],
        $preceding-words := local:preceding-words($texts, $current-pos, $distance),
        $following-words := local:following-words($texts, $current-pos, $distance)
        return

            for $word at $i in $words-to-find
            let $other-words-to-find := $words-to-find[position() ne $i]
            return
                if($current-word eq $word and ($other-words-to-find = $preceding-words or $other-words-to-find = $following-words))then
                    true()
                else
                    false()

    return $results = true()            
};
        
(:~
: Just for debugging to help people understand
:)
declare function local:debug($texts as text()*, $current-pos, $distance) {
    <group distnace="{$distance}">
        <text>{$texts[$current-pos]}</text>
        <preceding-words>
            {
            let $preceding := local:preceding-words($texts, $current-pos, $distance) return
                $preceding
            }
        </preceding-words>
        <current-word>{tokenize($texts[$current-pos], " ")[1]}</current-word>
        <following-words>
        {
            let $following := local:following-words($texts, $current-pos, $distance) return
                $following
        }
        </following-words>
    </group>
};
        
(: params :)
let $words-to-find := ("artery", "plaque"),
$distance := 3 return
        
    (: main :)    
    for $journal in /journal
    [xs:date(@date) ge xs:date("2004-01-01")]
    [local:found-within(.//text(), $distance, $words-to-find)]
    return
        $journal
        
        (: comment out the above flwor and uncomment below, to see debugging output :)
        (:
        for $journal in /journal[xs:date(@date) ge xs:date("2004-01-01")]
        let $texts := $journal//text() return
            for $current-pos in (1 to count($texts)) return 
                local:debug($texts, $current-pos, $distance)	  
        :)
        

Adam Retter posted on Saturday, 18th August 2012 at 18.43 (GMT+01:00)
Updated: Sunday, 19th 2012 at August 16.36 (GMT+01:00)

tags: XQueryFull-TextSearcheXist-db

1 comments | add comment

Tag Cloud