Managing access to records¶
Invenio comes with a access control system that is very powerful and flexible but which can also seem overwhelming at first. This guide will show you a basic example of how to protect our REST API so only some users can see certain records.
If you haven’t already done so, make sure you’ve followed the Quickstart so you have an Invenio instance to work on.
Endpoints¶
Your data model’s REST API has two main endpoints:
Search endpoint (e.g.
/api/records
)Detail endpoint (e.g.
/api/records/<id>
)
The goal is to ensure that a) only an owner of a record can retrieve their record from the detail endpoint and b) that the search endpoint only shows records that a given user owns.
In order to protect the two endpoints we will:
Store permissions in a record by adding a new field
owner
to our data model.- Require permissions by writing:
a permission factory to protect the detail endpoint.
a search filter to filter results in the search endpoint.
Configure endpoints to use the permission factory and search filter.
Storing permissions¶
First, below is an example of how we could store an owner inside a
record by adding a new field owner
:
{
"title": "My secret publication",
"owner": 1
}
In order to be able to store the owner
property in our data model, you
must add this new field to the data model’s:
JSONSchema (think of the JSONSchema as the database table structure that you add a new column to).
Elasticsearch mapping (think of the mapping as a description of how your data should be indexed).
Marshmallow schema (think of the Marshmallow schema as a description of how you would render one or more rows in a database table to an end-user).
JSONSchema¶
In the quickstart example, our JSONSchema is located in
.../records/jsonschemas/records/record-v.1.0.0.json
, and you would add
something like below to our schema:
{
"owners": {
"type": "integer"
}
}
Elasticsearch mapping¶
In the quickstart example, our Elastisearch mapping is likely located in
.../records/mappings/v7/records/record-v.1.0.0.json
, and you would add
something like below to our mapping:
{
"owners": {
"type": "integer"
}
}
Requiring permissions¶
Once you have added the new field(s) to your data model, you need to make use of the field to protect the detail endpoint and the search endpoint. You do that by writing
a permission factory to protect the detail endpoint.
a search filter to filter results in the search endpoint.
Permission factory¶
The purpose of the permission factory is to create a permission object from a record which is then used by the detail endpoint to check if the current user has permission to view the current record. Below is a simple example of a permission factory:
from invenio_access import Permission
from flask_principal import UserNeed
def my_permission_factory(record=None):
return Permission(UserNeed(record["owner"]))
The permission factory function takes as input a record and creates a
Permission
object from it.
The permission, when checked, requires that the current user has the same id as
the id stored in the records owner
field. This is expressed with the
UserNeed
.
Permissions and needs
The concept of needs can be somewhat hard to grasp, but essentially it just
expresses the smallest level of access control. For instance UserNeed(1)
expresses the statement “has user id 1”, and RoleNeed('admin')
expresses
the statement “has admin role”.
A permission represents a set of required needs. For instance
Permission(UserNeed(1), RoleNeed('admin'))
expresses the statement “has
user id 1 or has admin role”.
Thus, with a permission factory you can build arbitrarily complex permissions from the information stored in your records.
Search filter¶
For searches over possibly millions of records we need to be able to efficiently check permissions of all records. This is done with a search filter which is applied when executing a query. In comparison, a permission factory only deals with one record at a time.
Below is an example of search filter which is applied to all queries on the search endpoint:
from elasticsearch_dsl import Q
from flask_security import current_user
from invenio_search.api import DefaultFilter, RecordsSearch
def permission_filter():
return [Q('match', owner=current_user.get_id())]
class MyRecordSearch(RecordsSearch):
class Meta:
index = 'records'
default_filter = DefaultFilter(permission_filter)
The method permission_filter
when called, will create an Elasticsearch DSL
Q()
(query object) which will match all records where the property owner
equals the current user’s id (current_user
is an object that holds the
current request’s authenticated user).
The class MyRecordSearch
, will be responsible for executing all queries on
the search endpoint. In above example, we set the name of the Elasticsearch
index it should used, and the search filter which it should use (in our case
the permission filter).
Search filter vs permission factory
There’s a subtle difference between the search filter and the permission factory which is worth noting.
The permission factory takes a record as input, while the search filter takes the current user as input. For the permission factory, the created permission is checked against the current user, while with the search filter the current user is checked against the records. Hence, the permission factory and search filter are coming from each their end when checking permissions.
It’s therefore very important when writing the search filter and permission factory, that the two are producing identical results.
Configuring endpoints¶
The last part of the puzzle is to tell our detail/search endpoints to use our newly created permission factory and search filter:
RECORDS_REST_ENDPOINTS = {
'recid': dict(
# ...
search_class=MyRecordSearch,
read_permission_factory_imp=my_permission_factory,
# ...
),
}
In our case we are protecting only the read operation on the view. Needless to say, as the REST API also supports CRUD operations, you should also protect the other operations with their a permission factory.
Complex access rights¶
The toy example presented in this guide is too simple for most normal requirements, thus in order to provide some inspiration, we here present two more complex ways you could store access rights in records:
Computed rights¶
In some cases, it can be an advantage to use existing properties in your record to manage access rights. This way, you ensure that access rights does not get out of sync with other properties. An example of such a record could be:
{
"visibility": "restricted",
"owners": [1, 2],
"communities": ["blr"]
}
A permission factory could for above record then compute different permissions objects for different types of actions.
For reading the record, the permission could be:
Permission(any_user)
For seeing the files in the record, the permission could be:
Permission(UserNeed(1), UserNeed(2), RoleNeed('blr-curators'))
For editing the record, the permission could be:
Permission(UserNeed(1), UserNeed(2))
Explicit rights¶
In some cases, it is an advantage to have explicit rights defined on your record so that even if the code changes, it still obvious who should have access for which actions. An example of such a record could be:
{
"_access": {
"read": {
"systemroles": ["campus_user"]
},
"update": {
"users": [1],
"roles": ["curators"],
}
}
}
This way, changes to rights can also be explicitly tracked via the records revision history and thus be audited.