Migrating to v3

Warning

Invenio v3 is significantly different from v1 and thus migrating from v1 to v3 is a complex operation.

This guide will help you dump records and files from your v1 installation. You will need to write code to import the dumped data into your v3 installation. This is necessary because v3 support many different data models and thus you need to map your v1 MARC21 records into your new data model in v3.

Dumping data from v1.2

The module Invenio-Migrator will help you dump your v1 data and as well import the data in v3.

Install Invenio-Migrator in v1

There are several ways of installing Invenio-Migrator in your Invenio v1.2 or v2.1 production environment, the one we recommend is using Virtualenv to avoid any interference with the currently installed libraries:

$ sudo pip install virtualenv virtualenvwrapper
$ source /usr/local/bin/virtualenvwrapper.sh
$ mkvirtualenv migration --system-site-packages
$ workon migration
$ pip install invenio-migrator --pre
$ inveniomigrator dump --help

It is important to use the option --system-site-packages as Invenio-Migrator will use Invenio legacy python APIs to perform the dump. The package virtualenvwrapper is not required but it is quite convenient.

Dump records and files

$ mkdir /vagrant/dump
$ cd /vagrant/dump/
$ inveniomigrator dump records

This will generate one or more JSON files containing 1000 records each, with the following information:

  • The record id.
  • The record metadata, stored in the record key there is a list with one item for each of the revisions of the record, and each item of the list contains the MARC21 representation of the record plus the optional JSON.
  • The files linked with the record, like for the record metadata it is a list with all the revisions of the files.
  • Optionally it also contains the collections the record belongs to.

For more information about how to dump records and files see the Usage section of the Invenio-Migrator documentation.

The file path inside the Invenio legacy installation will be included in the dump and used as file location for the new Invenio v3 installation. If you are able to mount the file system following the same pattern in your Invenio v3 machines, there shouldn’t be any problem, but if you can’t do it, then you need to copy over the files folder manually using your favorite method, i.e.:

$ cd /opt/invenio/var/data
$ tar -zcvf /vagrant/dump/files.tar.gz files

Pro-tip: Maybe you want to have different data models in your new installation depending on the nature of the record, i.e. bibliographic records vs authority records. In this case one option is to dump them in different files using the --query argument when dumping from your legacy installation:

$ inveniomigrator dump records --query '-980__a:AUTHORITY' --file-prefix bib
$ inveniomigrator dump records --query '980__a:AUTHORITY' --file-prefix auth

Things

The dump command of the Invenio-Migrator works with, what we called, things. A thing is an entity you want to dump from your Invenio legacy installation, e.g. in the previous example the thing was records.

The list of things Invenio-Migrator can dump by default is listed via entry-points in the setup.py, this not only help us add new dump scripts easily, but also allows anyone to create their own dumpscripts from outside the Invenio-Migrator.

You can read more about which things are already supported by the Invenio-Migrator documentation.

Loading data in v3

Install Invenio-Migrator in v3

Invenio-Migrator can be installed in any Invenio v3 environment using PyPI and the extra dependencies loader:

$ pip install invenio-migrator[loader]

Depending on what you want to load you might need to have installed other packages, i.e. to load communities from Invenio v2.1 you need invenio-communities installed.

This will add to your Invenio application a new set of commands under dumps:

$ invenio dumps --help

Load records and files

$ invenio dumps loadrecords /vagrant/dump/records_dump_0.json

This will generate one celery task to import each of the records inside the dump.

Pro-tip: By default Invenio-Migrator uses the bibliographic MARC21 standard to transform and load the records, we now that this might not be the case to all Invenio v3 installation, i.e authority records. By changing MIGRATOR_RECORDS_DUMP_CLS and MIGRATOR_RECORDS_DUMPLOADER_CLS you can customize the behavior of the loading command. There is a full chapter in the Invenio-Migrator documentation about customizing loading if you want more information.

Loaders

Each of the entities that can be loaded by Invenio-Migrator have a companion command generally prefixed by load, e.g. loadrecords.

The loaders are similar to the things we describe previously, but in this case, instead of entry-points, if you want to extend the default list of loaders it can be done adding a new command to dumps, more information about the loaders can be found in the Invenio-Migrator documentation and on how to add more commands in the click documentation.