Good news! MarkLogic has been listening to it’s customers and are working on adding XSLT 2.0 support. Norman Walsh announced the news at a MarkLogic user group meeting in New York – see his blog post for details. In the past the argument has always been XSLT is unnecessary because XQuery is enough, so it’s nice to see acceptance that XQuery isn’t the best tool for every job.
I can see this being of great benefit at our company where we have far more experience with XSLT compared to XQuery. The separation of query and output will allow us to write cleaner, more focussed XQueries and leave the specifics of different output formats to XSLT. This should make debugging and performance tuning XQuery much simpler.
Only downside is we can’t start using it straight away, and as yet there’s no information on when it will be released.
Our development database had been allowed to grow unchecked and the single forest was well over the recommended 200GB max size. So I created 2 new forests to move all the documents into (I think that’s faster than moving half into one new forest and then having loads of deleted fragments to clean up in the initial forest).
The technique is to first set the initial forest as delete-only and then re-insert all the documents without changing them. Inserting a document at a URI that already exists will normally update it, but because the forest is delete-only, it will be saved in one of the new forests instead.
Here’s the code to move a single document
xquery version '1.0-ml';
declare function local:document-move-forest($uri as xs:string)
{
local:document-move-forest(
$uri,
xdmp:database-forests(xdmp:database())
)
};
declare function local:document-move-forest($uri as xs:string,
$forest-ids as xs:unsignedLong*)
{
xdmp:document-insert(
$uri,
fn:doc($uri),
xdmp:document-get-permissions($uri),
xdmp:document-get-collections($uri),
xdmp:document-get-quality($uri),
$forest-ids
)
};
This will keep the document’s permissions, collections and quality. And the properties document is maintained and moved with it.
Call this function using Corb on all the documents in the original forest, and then just remove the now empty forest.
Although MarkLogic isn’t officially supported on Ubuntu, it will run fine – after jumping through some hoops to get in installed.
Thankfully someone has already detailed the process:
“MarkLogic install on Ubuntu 9.04 and libbteuclid and libbtunicode
A port of common spreadsheet functions to XQuery by the FLWOR foundation:
http://sourceforge.net/apps/mediawiki/zorba/index.php?title=EXcel_Function_Library
Contains loads of useful functions, especially for statistics. It says tested with Zorba, Saxon, and eXist. By the looks of it they don’t use any proprietary XQuery, so should be fine in MarkLogic too.
A little trick I picked up recently..
XQuery doesn’t have a coalesce function so I was writing code like:
if ($x) then $x else 'default value'
But there’s a nicer way which shows why there’s no need for a coalesce function:
($x, 'default value')[1]
Everything is a sequence…
Posted in Uncategorized
Tagged XQuery
|
I’ve found a new project that adds XQuery support to Eclipse – XQDT.
It’s early days for the plugin but already it looks very good. It has proper syntax highlighting, live code validation, auto-complete and crucially can connect to MarkLogic to execute the query within Eclipse. It looks like it validates code as XQuery 1.1 – so anything written in 0.9 will show false errors. Don’t think this will cause problems with 1.0-ml though, and it recognises the MarkLogic API functions.
I recommend giving it a try – I’ve already switched from Notepad++ and so far so good.
I may be blind and all this was for nothing, but I couldn’t find an easy way to format a number with thousand separators – e.g. 1,234.
So I wrote this little function:
declare function local:format-int($i as xs:int) as xs:string
{
let $input :=
if ($i lt 0) then fn:substring(fn:string($i), 2)
else fn:string($i)
let $rev := fn:reverse(fn:string-to-codepoints(fn:string($input)))
let $comma := fn:string-to-codepoints(',')
let $chars :=
for $c at $i in $rev
return (
$c,
if ($i mod 3 eq 0 and fn:not($i eq count($rev)))
then $comma else ()
)
return fn:concat(
if ($i lt 0) then '-' else (),
fn:codepoints-to-string(fn:reverse($chars))
)
};
tests:
local:format-int((1, 12, 123, 1234, 12345, 123456, 1234567,
123456789, 1234567890, -12, -12345, -1234567))
(By the way, I think that’s the first time I’ve found a use for function mapping)
I guess I could go further and support different format specifications, and doubles etc., but for now this is all I need. And I’m suspicious that it’s just existing functionality I can’t find… any pointers welcome!
I’ve finally updated the Notepad++ language definition for XQuery 1.0 and MarkLogic 4.1. It adds XQuery to the language menu, and enables syntax highlighting and auto-complete of XQuery 1.0, as well as the MarkLogic 4.1 API (including the library modules).
And it comes with a new bonus feature – support for the Function List Plugin by Jens Lorenz. See the readme file in the download for instructions (it’s easy).
There are still some limitations, but as far as I know this is all that’s out there right now. If anyone knows of a better implementation please let me know!
Project on GitHub
Today I found some queries were erroring with:
XDMP-LISTCACHEFULL: List cache full on host …
Googling “LISTCACHEFULL” returns a paltry 1 result – someone asking about the error on the MarkLogic mailing list but getting no response. Great.
I simplified the query down to the part causing the error:
cts:search(
fn:doc(),
cts:element-query(
fn:expanded-QName("http://www.springer.com/app/meta", "OrgName"),
cts:word-query("university of exeter", ("unstemmed"))
)
)[1]
Nothing really complex about that query – but playing around with it, I found that changing “unstemmed” to “stemmed” fixed it. I have no idea why an unstemmed query causes more trouble than a stemmed query which has much more work to do. Changing to a stemmed query wasn’t an option because that would limit the search to content in one language (see this post for more details on this weird behaviour).
Then the error was magically gone after a restart of the MarkLogic service. The classic ‘switch it off and on again’ fix. Although it’s a relief the problem is gone, it’s also really frustrating because it’s hard to carry on investigating the cause without being able to reproduce the error, and I just know it will come back again at some unknown point in the future. If I ever do get to the bottom of it I’ll post an update…
MarkLogic have released version 4.1, not long after 4.0 came out. I haven’t had a chance to use it yet but it looks like there are a few cool new features:
- XML schema validation
- Japanese language support
- Task scheduler
- JSON support
- HTTP app servers supports REST, URL rewriting and HTTPS
- MarkLogic Application Services – includes a new search API that seems to
be a move to provide built-in functionality like lib-search, and a GUI
tool to develop demos.
The JSON support would have been really valuable in the project I just finished, and I can see how useful a task scheduler would be for maintenance jobs etc. The improvements to the HTTP app server don’t really interest me. Why would you want to use MarkLogic as an HTTP server in any real production environment? Despite these improvements, it’s woefully under-featured compared to Apache or IIS, and is an expensive MarkLogic cluster really the right place to be handling HTTP requests?
And as for the application builder GUI… I haven’t checked it out just, but the idea scares me to death. I can just imagine people knocking something up that nearly works and then thinking creating a proper, scalable, production-ready application should be just as quick and easy.
Now just need to find a way to justify upgrading our cluster – doubt it will happen any time soon unfortunately..