MarkLogic: Move documents to new forests

Our development database had been allowed to grow unchecked and the single forest was well over the recommended 200GB max size. So I created 2 new forests to move all the documents into (I think that’s faster than moving half into one new forest and then having loads of deleted fragments to clean up in the initial forest).

The technique is to first set the initial forest as delete-only and then re-insert all the documents without changing them. Inserting a document at a URI that already exists will normally update it, but because the forest is delete-only, it will be saved in one of the new forests instead.

Here’s the code to move a single document

xquery version '1.0-ml';

declare function local:document-move-forest($uri as xs:string)
{
  local:document-move-forest(
    $uri,
    xdmp:database-forests(xdmp:database())
  )
};

declare function local:document-move-forest($uri as xs:string,
  $forest-ids as xs:unsignedLong*)
{
  xdmp:document-insert(
    $uri,
    fn:doc($uri),
    xdmp:document-get-permissions($uri),
    xdmp:document-get-collections($uri),
    xdmp:document-get-quality($uri),
    $forest-ids
  )
};

This will keep the document’s permissions, collections and quality. And the properties document is maintained and moved with it.

Call this function using Corb on all the documents in the original forest, and then just remove the now empty forest.

This entry was tagged , , . Bookmark the permalink.

2 Responses to MarkLogic: Move documents to new forests

  1. David Lee says:

    Curious how does this code preserve the properties associated with the document ?
    In similar code that I have that renames a document I have had to explicitly get and then insert the properties document separately.

    For example here’s the code I pulled from an old mailing list entry to move a document. It seems to imply that not all property objects are maintained in an insert.

    declare function local:document-rename(
    $old-uri as xs:string, $new-uri as xs:string)
    as empty-sequence()
    {
    xdmp:document-delete($old-uri)
    ,
    let $permissions := xdmp:document-get-permissions($old-uri)
    let $collections := xdmp:document-get-collections($old-uri)
    return xdmp:document-insert(
    $new-uri, doc($old-uri),
    if ($permissions) then $permissions
    else xdmp:default-permissions(),
    if ($collections) then $collections
    else xdmp:default-collections(),
    xdmp:document-get-quality($old-uri)
    )
    ,
    let $prop-ns := namespace-uri()
    let $properties :=
    xdmp:document-properties($old-uri)/node()
    [ namespace-uri(.) ne $prop-ns ]
    return xdmp:document-set-properties($new-uri, $properties)
    };

    -David

  2. admin says:

    Hi David,

    I believe the difference is because the move forest function doesn’t change a document’s uri or explicitly delete it. Because the uri doesn’t change the properties document is maintained (and under the covers MarkLogic moves it with the document to the new forest).

    Rob

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>