Welcome to the documentation of Fenix¶
Contents:
Fenix¶
OpenStack host maintenance and upgrade in interaction with the application
Fenix implements rolling infrastructure maintenance and upgrade in interaction with the application on top of it. In Telco world we talk about VNFM, but one can implement his own simple manager for any application.
Infrastructure admin can call Fenix API to start a maintenance workflow session. This session will make needed maintenance and upgrade operations to infrastructure in interaction with the application manager to guarantee zero downtime for its service. Interaction gives the ability for the application manager to know about new capabilities coming over maintenance to make his own upgrade. The application can have a time window to finish what he is doing, make own action to re-instantiate his instance or have Fenix to make the migration. Also scaling applications or retirement will be possible.
As Fenix has project-specific messaging with information about instances affected towards the application manager, it will also have admin-level messaging. This messaging can tell what host is down for maintenance, back in use, added or retired. Any infrastructure component can catch this information as needed.
Fenix also works with “one-click”. Infrastructure admin just creates the workflow session he wants and all needed software changes are automatically downloaded, the workflow is run to wanted hosts according to the request and depending on how the used workflow plug-in and action plug-ins are implemented.
In the NFV Fenix needs to be supported by infrastructure admin UI, VNFM and VNF implementation. Fenix itself should be integrated into infrastructure to be used it the infrastructure maintenance, upgrade, scaling and life-cycle operations.
- Free software: Apache license
- Documentation: https://fenix.readthedocs.io/en/latest/index.html
- Developer Documentation: https://wiki.openstack.org/wiki/Fenix
- Source: https://opendev.org/x/fenix
- Running sample workflows: https://opendev.org/x/fenix/src/branch/master/fenix/tools/README.md
- Bug tracking and Blueprints: https://storyboard.openstack.org/#!/project/x/fenix
- How to contribute: https://docs.openstack.org/infra/manual/developers.html
- Fenix Specifications
Fenix service installation guide¶
Fenix service overview¶
The Fenix service provides…
The Fenix service consists of the following components:
fenix-api
service- Accepts and responds to end user API calls
fenix-engine
service- Runs the pluggable maintenance sessions
Install and configure¶
This section describes how to install and configure the host maintenance service, code-named Fenix, on the controller node.
This section assumes that you already have a working OpenStack environment.
Note that installation and configuration vary by distribution. Currently Fenix is not included in any distributions. Instead there is shown the generic way of installing and how to install via DevStack.
Install and configure for Red Hat Enterprise Linux and CentOS¶
This section describes how to install and configure the Fenix service for Red Hat Enterprise Linux and CentOS.
Prerequisites¶
Before you install and configure the Fenix service, you must create a database, service credentials, and API endpoints.
To create the database, complete these steps:
Use the database access client to connect to the database server as the
root
user:$ mysql -u root -p
Create the
fenix
database:CREATE DATABASE fenix;
Grant proper access to the
fenix
database:GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'localhost' \ IDENTIFIED BY 'FENIX_DBPASS'; GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'%' \ IDENTIFIED BY 'FENIX_DBPASS';
Replace
FENIX_DBPASS
with a suitable password.Exit the database access client.
exit;
Source the
admin
credentials to gain access to admin-only CLI commands:$ . admin-openrc
To create the service credentials, complete these steps:
Create the
fenix
user:$ openstack user create --domain default --password-prompt fenix
Add the
admin
role to thefenix
user:$ openstack role add --project service --user fenix admin
Create the Fenix service entities:
$ openstack service create --name fenix --description "fenix" fenix
Note! In Fenix workflow you may want to have ssh access to all nodes for your Fenix action plug-ins to scp filex and locally execute scripts on those nodes. This means you may want to have the ssh without password configured for Fenix service user.
Create the Fenix service API endpoints:
$ openstack endpoint create --region RegionOne \ fenix public http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix internal http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix admin http://controller:XXXX/vY/%\(tenant_id\)s
Installation¶
Note! Fenix is currently not included in Linux distributions. You need to clone and install it from source.
$ git clone https://opendev.org/x/fenix $ cd fenix $ sudo python setup.py install
Configuration files¶
Configuration options. All options have default values. Mandatory options are mentioned as those are usually at least the ones needed to be defined to match to the current system.
Edit the
/etc/fenix/fenix-api.conf
file the configure fenix-api[DEFAULT] # Mandatory configuration options # Host where API is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> [keystone_authtoken] # OpenStack Identity service URL. auth_url = http://127.0.0.1/identity # Authentication type auth_type = password # PEM encoded Certificate Authority to use when verifying HTTPs connections. cafile = /opt/stack/data/ca-bundle.pem # The Fenix admin project domain. project_domain_name = Default # The Fenix admin project. project_name = admin # A domain name the os_username belongs to. user_domain_name = Default # Fenix admin user password. password = admin # Fenix user. Must have admin role. username = adminEdit the
/etc/fenix/fenix.conf
file the configure fenix-engine[DEFAULT] # Mandatory configuration options # Host where engine is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> # Optional configuration options # Wait for project reply after message sent to project. default 120 wait_project_reply = 120 # Project maintenance reply confirmation time in seconds. default 40 project_maintenance_reply = 40 # Project scale in reply confirmation time in seconds. default 60 project_scale_in_reply = 60 # Number of live migration retries. default 5 live_migration_retries = 5 # How long to wait live migration to be done. default 600 live_migration_wait_time = 600 [database] # database connection URL connection = mysql+pymysql://fenix:FENIX_DBPASS@controller/fenix [service_user] # OpenStack Identity service URL. Default to environmental variable OS_AUTH_URL os_auth_url = http://127.0.0.1/identity # Fenix user. Must have admin role. Default to environmental variable OS_USERNAME os_username = admin # Fenix admin user password. Default to environmental variable OS_PASSWORD os_password = admin # A domain name the os_username belongs to. Default to environmental variable OS_USER_DOMAIN_NAME os_user_domain_name = default # The Fenix admin project. Default to environmental variable OS_PROJECT_NAME os_project_name = admin # The Fenix admin project domain. Default to environmental variable OS_PROJECT_DOMAIN_NAME os_project_domain_name = default
Finalize installation¶
Start the fenix services and configure them to start when the system boots:
# sudo systemctl enable openstack-fenix-api.service
# sudo systemctl start openstack-fenix-api.service
# sudo systemctl enable openstack-fenix-engine.service
# sudo systemctl start openstack-fenix-engine.service
Install and configure for Ubuntu¶
This section describes how to install and configure the Fenix service for Ubuntu.
Prerequisites¶
Before you install and configure the Fenix service, you must create a database, service credentials, and API endpoints.
To create the database, complete these steps:
Use the database access client to connect to the database server as the
root
user:$ mysql -u root -p
Create the
fenix
database:CREATE DATABASE fenix;
Grant proper access to the
fenix
database:GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'localhost' \ IDENTIFIED BY 'FENIX_DBPASS'; GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'%' \ IDENTIFIED BY 'FENIX_DBPASS';
Replace
FENIX_DBPASS
with a suitable password.Exit the database access client.
exit;
Source the
admin
credentials to gain access to admin-only CLI commands:$ . admin-openrc
To create the service credentials, complete these steps:
Create the
fenix
user:$ openstack user create --domain default --password-prompt fenix
Add the
admin
role to thefenix
user:$ openstack role add --project service --user fenix admin
Create the Fenix service entities:
$ openstack service create --name fenix --description "fenix" fenix
Note! In Fenix workflow you may want to have ssh access to all nodes for your Fenix action plug-ins to scp filex and locally execute scripts on those nodes. This means you may want to have the ssh without password configured for Fenix service user.
Create the Fenix service API endpoints:
$ openstack endpoint create --region RegionOne \ fenix public http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix internal http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix admin http://controller:XXXX/vY/%\(tenant_id\)s
Installation¶
Note! Fenix is currently not included in Linux distributions. You need to clone and install it from source.
$ git clone https://opendev.org/x/fenix $ cd fenix $ sudo python setup.py install
Configuration files¶
Configuration options. All options have default values. Mandatory options are mentioned as those are usually at least the ones needed to be defined to match to the current system.
Edit the
/etc/fenix/fenix-api.conf
file the configure fenix-api[DEFAULT] # Mandatory configuration options # Host where API is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> [keystone_authtoken] # OpenStack Identity service URL. auth_url = http://127.0.0.1/identity # Authentication type auth_type = password # PEM encoded Certificate Authority to use when verifying HTTPs connections. cafile = /opt/stack/data/ca-bundle.pem # The Fenix admin project domain. project_domain_name = Default # The Fenix admin project. project_name = admin # A domain name the os_username belongs to. user_domain_name = Default # Fenix admin user password. password = admin # Fenix user. Must have admin role. username = adminEdit the
/etc/fenix/fenix.conf
file the configure fenix-engine[DEFAULT] # Mandatory configuration options # Host where engine is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> # Optional configuration options # Wait for project reply after message sent to project. default 120 wait_project_reply = 120 # Project maintenance reply confirmation time in seconds. default 40 project_maintenance_reply = 40 # Project scale in reply confirmation time in seconds. default 60 project_scale_in_reply = 60 # Number of live migration retries. default 5 live_migration_retries = 5 # How long to wait live migration to be done. default 600 live_migration_wait_time = 600 [database] # database connection URL connection = mysql+pymysql://fenix:FENIX_DBPASS@controller/fenix [service_user] # OpenStack Identity service URL. Default to environmental variable OS_AUTH_URL os_auth_url = http://127.0.0.1/identity # Fenix user. Must have admin role. Default to environmental variable OS_USERNAME os_username = admin # Fenix admin user password. Default to environmental variable OS_PASSWORD os_password = admin # A domain name the os_username belongs to. Default to environmental variable OS_USER_DOMAIN_NAME os_user_domain_name = default # The Fenix admin project. Default to environmental variable OS_PROJECT_NAME os_project_name = admin # The Fenix admin project domain. Default to environmental variable OS_PROJECT_DOMAIN_NAME os_project_domain_name = default
Finalize installation¶
Restart the fenix services:
# sudo service openstack-fenix-api restart
# sudo service openstack-fenix-engine restart
Install and configure for openSUSE and SUSE Linux Enterprise¶
This section describes how to install and configure the Fenix service for openSUSE and SUSE Linux Enterprise Server.
Prerequisites¶
Before you install and configure the Fenix service, you must create a database, service credentials, and API endpoints.
To create the database, complete these steps:
Use the database access client to connect to the database server as the
root
user:$ mysql -u root -p
Create the
fenix
database:CREATE DATABASE fenix;
Grant proper access to the
fenix
database:GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'localhost' \ IDENTIFIED BY 'FENIX_DBPASS'; GRANT ALL PRIVILEGES ON fenix.* TO 'fenix'@'%' \ IDENTIFIED BY 'FENIX_DBPASS';
Replace
FENIX_DBPASS
with a suitable password.Exit the database access client.
exit;
Source the
admin
credentials to gain access to admin-only CLI commands:$ . admin-openrc
To create the service credentials, complete these steps:
Create the
fenix
user:$ openstack user create --domain default --password-prompt fenix
Add the
admin
role to thefenix
user:$ openstack role add --project service --user fenix admin
Create the Fenix service entities:
$ openstack service create --name fenix --description "fenix" fenix
Note! In Fenix workflow you may want to have ssh access to all nodes for your Fenix action plug-ins to scp filex and locally execute scripts on those nodes. This means you may want to have the ssh without password configured for Fenix service user.
Create the Fenix service API endpoints:
$ openstack endpoint create --region RegionOne \ fenix public http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix internal http://controller:XXXX/vY/%\(tenant_id\)s $ openstack endpoint create --region RegionOne \ fenix admin http://controller:XXXX/vY/%\(tenant_id\)s
Installation¶
Note! Fenix is currently not included in Linux distributions. You need to clone and install it from source.
$ git clone https://opendev.org/x/fenix $ cd fenix $ sudo python setup.py install
Configuration files¶
Configuration options. All options have default values. Mandatory options are mentioned as those are usually at least the ones needed to be defined to match to the current system.
Edit the
/etc/fenix/fenix-api.conf
file the configure fenix-api[DEFAULT] # Mandatory configuration options # Host where API is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> [keystone_authtoken] # OpenStack Identity service URL. auth_url = http://127.0.0.1/identity # Authentication type auth_type = password # PEM encoded Certificate Authority to use when verifying HTTPs connections. cafile = /opt/stack/data/ca-bundle.pem # The Fenix admin project domain. project_domain_name = Default # The Fenix admin project. project_name = admin # A domain name the os_username belongs to. user_domain_name = Default # Fenix admin user password. password = admin # Fenix user. Must have admin role. username = adminEdit the
/etc/fenix/fenix.conf
file the configure fenix-engine[DEFAULT] # Mandatory configuration options # Host where engine is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> # Optional configuration options # Wait for project reply after message sent to project. default 120 wait_project_reply = 120 # Project maintenance reply confirmation time in seconds. default 40 project_maintenance_reply = 40 # Project scale in reply confirmation time in seconds. default 60 project_scale_in_reply = 60 # Number of live migration retries. default 5 live_migration_retries = 5 # How long to wait live migration to be done. default 600 live_migration_wait_time = 600 [database] # database connection URL connection = mysql+pymysql://fenix:FENIX_DBPASS@controller/fenix [service_user] # OpenStack Identity service URL. Default to environmental variable OS_AUTH_URL os_auth_url = http://127.0.0.1/identity # Fenix user. Must have admin role. Default to environmental variable OS_USERNAME os_username = admin # Fenix admin user password. Default to environmental variable OS_PASSWORD os_password = admin # A domain name the os_username belongs to. Default to environmental variable OS_USER_DOMAIN_NAME os_user_domain_name = default # The Fenix admin project. Default to environmental variable OS_PROJECT_NAME os_project_name = admin # The Fenix admin project domain. Default to environmental variable OS_PROJECT_DOMAIN_NAME os_project_domain_name = default
Finalize installation¶
Start the fenix services and configure them to start when the system boots:
# sudo systemctl enable openstack-fenix-api.service
# sudo systemctl start openstack-fenix-api.service
# sudo systemctl enable openstack-fenix-engine.service
# sudo systemctl start openstack-fenix-engine.service
Verify operation¶
Verify operation of the fenix service.
Note
Perform these commands on the controller node.
List service components to verify successful launch and registration of each process. Example for DevStack:
$ sudo systemctl status devstack@fenix*
The host maintenance service (Fenix) provides…
This chapter assumes a working setup of OpenStack following the OpenStack Installation Tutorial.
Contributor Guide¶
If you would like to contribute to the development of OpenStack, you must follow the steps in this page:
If you already have a good understanding of how the system works and your OpenStack accounts are set up, you can skip to the development workflow section of this documentation to learn how changes to OpenStack should be submitted for review via the Gerrit tool:
Pull requests submitted through GitHub will be ignored.
Bugs should be filed on Storyboard https://storyboard.openstack.org/#!/project/x/fenix
More project information can be found from https://wiki.openstack.org/wiki/Fenix
Configuration¶
Configuration files¶
Configuration options. All options have default values. Mandatory options are mentioned as those are usually at least the ones needed to be defined to match to the current system.
Edit the
/etc/fenix/fenix-api.conf
file the configure fenix-api[DEFAULT] # Mandatory configuration options # Host where API is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> [keystone_authtoken] # OpenStack Identity service URL. auth_url = http://127.0.0.1/identity # Authentication type auth_type = password # PEM encoded Certificate Authority to use when verifying HTTPs connections. cafile = /opt/stack/data/ca-bundle.pem # The Fenix admin project domain. project_domain_name = Default # The Fenix admin project. project_name = admin # A domain name the os_username belongs to. user_domain_name = Default # Fenix admin user password. password = admin # Fenix user. Must have admin role. username = adminEdit the
/etc/fenix/fenix.conf
file the configure fenix-engine[DEFAULT] # Mandatory configuration options # Host where engine is running. default="127.0.0.1" host = <hostname> # API Port. default=5000 port = <port> # An URL representing the messaging driver to use and its full configuration. transport_url = <transport URL> # Optional configuration options # Wait for project reply after message sent to project. default 120 wait_project_reply = 120 # Project maintenance reply confirmation time in seconds. default 40 project_maintenance_reply = 40 # Project scale in reply confirmation time in seconds. default 60 project_scale_in_reply = 60 # Number of live migration retries. default 5 live_migration_retries = 5 # How long to wait live migration to be done. default 600 live_migration_wait_time = 600 [database] # database connection URL connection = mysql+pymysql://fenix:FENIX_DBPASS@controller/fenix [service_user] # OpenStack Identity service URL. Default to environmental variable OS_AUTH_URL os_auth_url = http://127.0.0.1/identity # Fenix user. Must have admin role. Default to environmental variable OS_USERNAME os_username = admin # Fenix admin user password. Default to environmental variable OS_PASSWORD os_password = admin # A domain name the os_username belongs to. Default to environmental variable OS_USER_DOMAIN_NAME os_user_domain_name = default # The Fenix admin project. Default to environmental variable OS_PROJECT_NAME os_project_name = admin # The Fenix admin project domain. Default to environmental variable OS_PROJECT_DOMAIN_NAME os_project_domain_name = default
Dependencies and special configuration¶
Fenix Default workflow VNFM interaction also assumes AODH is installed. Among that, here is mentioned what you may want to configure when using Fenix.
Fenix external dependencies¶
Nova¶
Fenix will normally use cold and live migrations. For these to work, Nova service user should be configured to be able to ssh between compute nodes. You may also want to change some other related configuration parameters.
AODH and Ceilometer configuration¶
When want to utilize the VNF(M)/EM interaction with Fenix, VNF needs to supbscribe to AODH event alarm for ‘maintenance.scheduled’ type of notifications.
Any service may also want to know when host is added, retired, in maintenance or back from maintenance. For this those services can subscribe to AODH event alarm for ‘maintenance.host’ type of notification.
/etc/ceilometer/event_definitions.yaml
- event_type: maintenance.scheduled traits: actions_at: fields: payload.maintenance_at type: datetime allowed_actions: fields: payload.allowed_actions host_id: fields: payload.host_id instances: fields: payload.instances metadata: fields: payload.metadata project_id: fields: payload.project_id reply_url: fields: payload.reply_url session_id: fields: payload.session_id state: fields: payload.state - event_type: maintenance.host traits: host: fields: payload.host project_id: fields: payload.project_id session_id: fields: payload.session_id state: fields: payload.state/etc/ceilometer/event_pipeline.yaml
- notifier:// - notifier://?topic=alarm.all
For AODH and Ceilometer configuration to take into effect, you may want to restart corresponding services
$ sudo systemctl restart openstack-aodh-listener.service $ sudo systemctl restart openstack-ceilometer-notification.service
In DevStack you may want to enable Ceilometer and AODH in local.conf
enable_plugin ceilometer https://opendev.org/openstack/ceilometer enable_plugin aodh https://opendev.org/openstack/aodh
API¶
API v1¶
Admin API¶
These APIs are meant for infrastructure admin who is in charge of triggering the rolling maintenance and upgrade workflow sessions.
Admin workflow session API¶
Create a new maintenance session. You can specify a list of ‘hosts’ to be maintained or have an empty list to indicate those should be self-discovered. You need to have an initial state for the workflow in ‘state’. ‘workflow’ indicates the name of a Python plug-in to be used in the maintenance.
Request¶
Name | In | Type | Description |
---|---|---|---|
hosts (Optional) | body | list of strings | Hosts to be maintained. An empty list can indicate hosts are to be discovered. |
state | body | string | Maintenance workflow state (States explained in the user guide) |
maintenance_at | body | string | Maintenance workflow start time. |
workflow | body | string | Maintenance workflow to be used. |
metadata | body | dictionary | Hint to project/tenant/VNF to know what capability the infrastructure is offering to instance when it moves to already maintained host in ‘PLANNED_MAINTENANCE’ state action. This may have impact on how the instance is to be moved or if instance is to be upgraded and VNF needs to re-instantiate it as its ‘OWN_ACTION’. This could be the case with new hardware or instance could be wanted to be upgraded anyhow at the same time of the infrastructure maintenance. |
download (Optional) | body | list of dictionaries | List of needed SW upgrade packages:
|
actions (Optional) | body | list of dictionaries | List of action plug-ins. |
actions.plugin | body | string | plug-in name. Default workflow executes same type of plug-ins in an alphabetical order. |
actions.type | body | integer | Type of the action plug-in. Default workflow supports:
|
actions.metadata | body | dictionary | Metadata; hints to plug-ins. |
{
"hosts": [],
"state": "MAINTENANCE",
"maintenance_at": "2018-02-28 06:06:03",
"metadata": {"openstack_release": "Stein"},
"workflow": "default",
"download": ["https://my.sw.upgrades.com/compute.tar.gz",
"https://my.sw.upgrades.com/controller.tar.gz",
"https://my.sw.upgrades.com/esw.tar.gz",
"https://my.sw.upgrades.com/os.tar.gz",
"https://my.sw.upgrades.com/actions.tar.gz"],
"actions": [
{"plugin": "prepare", "type": "pre"},
{"plugin": "compute", "type": " compute ", "metadata": {"upgrade": " compute.tar.gz "}},
{"plugin": "controller", "type": " controller ", "metadata": {"upgrade": " controller.tar.gz "}},
{"plugin": "esw_upgrade", "type": "host", "metadata": {"upgrade": "esw.tar.gz"}},
{"plugin": "os_upgrade", "type": "host", "metadata": {"upgrade": "os.tar.gz"}},
{"plugin": "finalize", "type": "post"}]
}
Response codes¶
Code | Reason | ||||||||
---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"session_id": "695030ee-1c4d-11e8-a9b0-0242ac110002"
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
509 - Unknown |
There is too many parallel sessions. |
Update existing maintenance session. This can be used to continue a failed session after manually fixing what failed. Workflow should then run succesfully to the end.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | Session ID |
state (Optional) | body | string | Maintenance workflow state or previous state if not given. The workflow will continue from this state. |
Response codes¶
Code | Reason | ||||||||
---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"state": "PLANNED_MAINTENANCE"
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Get all ongoing maintenance sessions.
Response codes¶
Code | Reason | ||||||||
---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"session_id": ["695030ee-1c4d-11e8-a9b0-0242ac110002"]
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Get a maintenance session state.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | Session ID |
Response codes¶
Code | Reason | ||||||||
---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"state": "MAINTENANCE_DONE"
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
404 - Not Found |
The requested resource could not be found. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Get a maintenance session details. This information can be usefull to see detailed status of a maintennace session or to troubleshoot a failed session. Usually session should fail on simple problem, that can be fast manually fixed. Then one can update maintenance session state to continue from ‘prev_state’.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | Session ID |
Response codes¶
Code | Reason | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instances": [
{
"instance_id": "da8f96ae-a1fe-4e6b-a852-6951d513a440",
"action_done": false,
"host": "overcloud-novacompute-2",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "INSTANCE_ACTION_DONE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_nonha_app_2",
"state": "active",
"details": null,
"action": null,
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "73190018-eab0-4074-bed0-4b0c274a1c8b"
},
{
"instance_id": "22d869d7-2a67-4d70-bb3c-dcc14a014d78",
"action_done": false,
"host": "overcloud-novacompute-4",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "ACK_PLANNED_MAINTENANCE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_nonha_app_3",
"state": "active",
"details": null,
"action": "MIGRATE",
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "c0930990-65ac-4bca-88cb-7cb0e7d5c420"
},
{
"instance_id": "89467f5c-d5f8-461f-8b5c-236ce54138be",
"action_done": false,
"host": "overcloud-novacompute-2",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "INSTANCE_ACTION_DONE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_nonha_app_1",
"state": "active",
"details": null,
"action": null,
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "c6eba3ae-cb9e-4a1f-af10-13c66f61e4d9"
},
{
"instance_id": "5243f1a4-9f7b-4c91-abd5-533933bb9c90",
"action_done": false,
"host": "overcloud-novacompute-3",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "INSTANCE_ACTION_DONE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_ha_app_0",
"state": "active",
"details": "floating_ip",
"action": null,
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "d67176ff-e2e4-45e3-9a52-c069a3a66c5e"
},
{
"instance_id": "4e2e24d7-0e5d-4a92-8edc-e343b33b9f10",
"action_done": false,
"host": "overcloud-novacompute-3",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "INSTANCE_ACTION_DONE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_nonha_app_0",
"state": "active",
"details": null,
"action": null,
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "f2f7fd7f-8900-4b24-91dc-098f797790e1"
},
{
"instance_id": "92aa44f9-7ce4-4ba4-a29c-e03096ad1047",
"action_done": false,
"host": "overcloud-novacompute-4",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "ACK_PLANNED_MAINTENANCE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_ha_app_1",
"state": "active",
"details": null,
"action": "MIGRATE",
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "f35c9ba5-e5f7-4843-bae5-7df9bac2a33c"
},
{
"instance_id": "afa2cf43-6a1f-4508-ba59-12b773f8b926",
"action_done": false,
"host": "overcloud-novacompute-0",
"created_at": "2020-04-15T11:43:09.000000",
"project_state": "ACK_PLANNED_MAINTENANCE",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"instance_name": "demo_nonha_app_4",
"state": "active",
"details": null,
"action": "MIGRATE",
"project_id": "444b05e6f4764189944f00a7288cd281",
"id": "fea38e9b-3d7c-4358-ba2e-06e9c340342d"
}
],
"state": "PLANNED_MAINTENANCE",
"session": {
"workflow": "vnf",
"created_at": "2020-04-15T11:43:09.000000",
"updated_at": "2020-04-15T11:44:04.000000",
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"maintenance_at": "2020-04-15T11:43:28.000000",
"state": "PLANNED_MAINTENANCE",
"prev_state": "START_MAINTENANCE",
"meta": "{'openstack': 'upgrade'}"
},
"hosts": [
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-3",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": false,
"maintained": true,
"details": "3de22382-5500-4d13-b9a2-470cc21002ee",
"type": "compute",
"id": "426ea4b9-4438-44ee-9849-1b3ffcc42ad6",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-2",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": false,
"maintained": true,
"details": "91457572-dabf-4aff-aab9-e12a5c6656cd",
"type": "compute",
"id": "74f0f6d1-520a-4e5b-b69c-c3265d874b14",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-5",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": false,
"maintained": true,
"details": "87921762-0c70-4d3e-873a-240cb2e5c0bf",
"type": "compute",
"id": "8d0f764e-11e8-4b96-8f6a-9c8fc0eebca2",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-1",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": false,
"maintained": true,
"details": "52c7270a-cfc2-41dd-a574-f4c4c54aa78d",
"type": "compute",
"id": "be7fd08c-0c5f-4bf4-a95b-bc3b3c01d918",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-0",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": true,
"maintained": false,
"details": "ea68bd0d-a5b6-4f06-9bff-c6eb0b248530",
"type": "compute",
"id": "ce46f423-e485-4494-8bb7-e1a2b038bb8e",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-novacompute-4",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": true,
"maintained": false,
"details": "d5271d60-db14-4011-9497-b1529486f62b",
"type": "compute",
"id": "efdf668c-b1cc-4539-bdb6-aea9afbcc897",
},
{
"created_at": "2020-04-15T11:43:09.000000",
"hostname": "overcloud-controller-0",
"updated_at": null,
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"disabled": false,
"maintained": true,
"details": "9a68c85e-42f7-4e40-b64a-2e7a9e2ccd03",
"type": "controller",
"id": "f4631941-8a51-44ee-b814-11a898729f3c",
}
],
"percent_done": 71,
"action_plugin_instances": [
{
"created_at": "2020-04-15 11:12:16",
"updated_at": null,
"id": "4e864972-b692-487b-9204-b4d6470db266",
"session_id": "47479bca-7f0e-11ea-99c9-2c600c9893ee",
"hostname": "overcloud-novacompute-4",
"plugin": "dummy",
"state": null
}
]
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
404 - Not Found |
The requested resource could not be found. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Delete a maintenance session. Usually called after the session is successfully finished.
Success¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Error¶
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Project API¶
These APIs are meant for projects (tenant/VNF) having instances on top of the infrastructure under corresponding rolling maintenance or upgrade session. Usage of these APIs expects there is an application manager (VNFM) that can interact with Fenix workflow via these APIs. If this is not the case, workflow should have a default behavior for instances owned by projects, that are not interacting with Fenix.
Project workflow session API¶
These APIs are generic for any cloud as instance ID should be something that can be matched to virtual machines or containers regardless of the cloud underneath.
Get project instances belonging to the current state of maintenance session. the Project-manager receives an AODH event alarm telling about different maintenance states. Event data field length is very limited, so instances cannot be given as a list in the event. Instead, there will be an URL given to below API to get a project-specific list of instances.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | uuid |
project_id | path | string | uuid |
Response codes¶
Code | Reason | ||||||||
---|---|---|---|---|---|---|---|---|---|
200 - OK |
{
"instance_ids": ["109e14d9-6566-42b3-93e4-76605f264d8f",
"71285107-f0fc-4428-a8b2-0b3edd64bcad"]
}
|
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Project having instances on top of the infrastructure handled by a maintenance session might need to make own action for its instances on top of a host going into maintenance next, or reply an admin action to be done. This is, as the host can go down or even be removed and the instances should be then running safely somewhere else. Project manager receives an AODH event alarm telling which instances are affected and when the project is ready, it makes its own action or replies back an action which needs the admin privileges.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | uuid |
project_id | path | string | uuid |
instance_actions | body | dictionary | instance ID : action string. This variable is not needed in reply to state MAINTENANCE, SCALE_IN or MAINTENANCE_COMPLETE. |
state | body | string | There can have different values depending on what is the maintenance session state to reply to. In the below example, the maintenance state is ‘PLANNED_MAINTENANCE’ and the reply state is formed by adding ‘ACK_’ or ‘NACK_’ as the prefix to reply value. |
{
"instance_actions": {"109e14d9-6566-42b3-93e4-76605f264d8f": "MIGRATE",
"71285107-f0fc-4428-a8b2-0b3edd64bcad": "MIGRATE"},
"state": "ACK_PLANNED_MAINTENANCE"
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Project NFV constraints API¶
These APIs are for VNFs, VNMF and EM that are made to support ETSI defined standard VIM interface for sophisticated interaction to optimize rolling maintenance, upgrade, scaling and lifecycle management. These interface enhancements guarantees zero impact to VNF service during these operations and defining real time constraints for optimal operation performace.
When using workflow utilizing ETSI constraints the ‘state’ ‘PREPARE_MAINTENANCE’ and ‘PLANNED_MAINTENANCE’ notifications will be instance specific. This means also the reply needs to be instance specific instead the project specific above.
Request¶
Name | In | Type | Description |
---|---|---|---|
session_id | path | string | uuid |
project_id | path | string | uuid |
instance_id | path | string | uuid |
instance_action | body | string | Action string |
state | body | string | There can have different values depending on what is the maintenance session state to reply to. In the below example, the maintenance state is ‘PLANNED_MAINTENANCE’ and the reply state is formed by adding ‘ACK_’ or ‘NACK_’ as the prefix to reply value. |
{
"instance_action": "MIGRATE",
"state": "ACK_PLANNED_MAINTENANCE"
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Get instance constraints saved in Fenix DB. Initially this information is coming from VNF(M) and needs to be syncronized to Fenix.
Request¶
Name | In | Type | Description |
---|---|---|---|
project_id | body | string | uuid |
instance_id | path | string | uuid |
instance_id | body | string | uuid |
group_id | body | string | Instance group uuid. Should match with OpenStack server group if one exists. |
instance_name | body | string | Instance name. |
migration_type | body | string | ‘LIVE_MIGRATE’, ‘MIGRATE’ or ‘OWN_ACTION’ Own action is create new and delete old instance. Note! VNF need to obey resource_mitigation with own action This affects to order of delete old and create new to not over commit the resources. In Kubernetes also ‘EVICTION’ supported. There admin will delete instance and VNF automation like ReplicaSet will make a new instance. |
max_interruption_time | body | integer | Seconds of how long live migration can take. |
resource_mitigation | body | boolean | Instance needs double allocation when being migrated. This is true also if instance first scaled out and only then the old instance is removed. It must be True also if VNF needed to scale down, since we go over that scaled down capacity. |
lead_time | body | integer | How long lead time VNF needs for ‘migration_type’ operation. VNF needs to report back to Fenix as soon as it is ready, but at least within this time. Reporting as fast as can is crucial for optimizing infrastructure upgrade/maintenance. Zero value means interaction with VNFM is not used for this instance, but instance_group recovery_time needs to be obeyed towards max_impacted_members. |
{
"instance_id": "28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"instance_name": "VM1",
"max_interruption_time": 120,
"migration_type": "LIVE_MIGRATION",
"resource_mitigation": True,
"lead_time": 40
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
404 - Not Found |
The requested resource could not be found. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Update instance constraints to Fenix DB. Initially this information is coming from VNF(M) and needs to be syncronized to Fenix.
Request¶
Name | In | Type | Description |
---|---|---|---|
project_id | body | string | uuid |
instance_id | path | string | uuid |
instance_id | body | string | uuid |
group_id | body | string | Instance group uuid. Should match with OpenStack server group if one exists. |
instance_name | body | string | Instance name. |
migration_type | body | string | ‘LIVE_MIGRATE’, ‘MIGRATE’ or ‘OWN_ACTION’ Own action is create new and delete old instance. Note! VNF need to obey resource_mitigation with own action This affects to order of delete old and create new to not over commit the resources. In Kubernetes also ‘EVICTION’ supported. There admin will delete instance and VNF automation like ReplicaSet will make a new instance. |
max_interruption_time | body | integer | Seconds of how long live migration can take. |
resource_mitigation | body | boolean | Instance needs double allocation when being migrated. This is true also if instance first scaled out and only then the old instance is removed. It must be True also if VNF needed to scale down, since we go over that scaled down capacity. |
lead_time | body | integer | How long lead time VNF needs for ‘migration_type’ operation. VNF needs to report back to Fenix as soon as it is ready, but at least within this time. Reporting as fast as can is crucial for optimizing infrastructure upgrade/maintenance. Zero value means interaction with VNFM is not used for this instance, but instance_group recovery_time needs to be obeyed towards max_impacted_members. |
{
"instance_id": "28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"instance_name": "VM1",
"max_interruption_time": 120,
"migration_type": "LIVE_MIGRATION",
"resource_mitigation": True,
"lead_time": 40
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
When instance is deleted, the constraints should also be deleted from the Fenix DB. As Fenix is aware of existing instances, this could later be enhanced so that Fenix houskeeping could take care of removing deleted instances.
Request¶
Name | In | Type | Description |
---|---|---|---|
instance_id | path | string | uuid |
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Get instance group constraints saved in Fenix DB. Initially this information is coming from VNF(M) and needs to be syncronized to Fenix.
Request¶
Name | In | Type | Description |
---|---|---|---|
group_id | path | string | Instance group uuid. Should match with OpenStack server group if one exists. |
group_id | body | string | Instance group uuid. Should match with OpenStack server group if one exists. |
project_id | body | string | uuid |
instance_id | path | string | uuid |
instance_id | body | string | uuid |
group_name | body | string | Instance group name. Should match with OpenStack server group if one exists. |
migration_type | body | string | ‘LIVE_MIGRATE’, ‘MIGRATE’ or ‘OWN_ACTION’ Own action is create new and delete old instance. Note! VNF need to obey resource_mitigation with own action This affects to order of delete old and create new to not over commit the resources. In Kubernetes also ‘EVICTION’ supported. There admin will delete instance and VNF automation like ReplicaSet will make a new instance. |
max_interruption_time | body | integer | Seconds of how long live migration can take. |
resource_mitigation | body | boolean | Instance needs double allocation when being migrated. This is true also if instance first scaled out and only then the old instance is removed. It must be True also if VNF needed to scale down, since we go over that scaled down capacity. |
lead_time | body | integer | How long lead time VNF needs for ‘migration_type’ operation. VNF needs to report back to Fenix as soon as it is ready, but at least within this time. Reporting as fast as can is crucial for optimizing infrastructure upgrade/maintenance. Zero value means interaction with VNFM is not used for this instance, but instance_group recovery_time needs to be obeyed towards max_impacted_members. |
{
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"group_name": "vm_ha_group",
"anti_affinity_group": True,
"max_instances_per_host": 1,
"max_impacted_members": 1,
"recovery_time": 15,
"resource_mitigation": True,
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Update instance group constraints to Fenix DB. Initially this information is coming from VNF(M) and needs to be syncronized to Fenix.
Request¶
Name | In | Type | Description |
---|---|---|---|
group_id | path | string | Instance group uuid. Should match with OpenStack server group if one exists. |
group_id | body | string | Instance group uuid. Should match with OpenStack server group if one exists. |
project_id | body | string | uuid |
group_name | body | string | Instance group name. Should match with OpenStack server group if one exists. |
anti_affinity_group | body | boolean | Boolean |
max_instances_per_host | body | integer | Describes how many instance can be on same host if anti_affinity_group: True Already exist in OpenStack as ‘max_server_per_host’, but might not exist in different clouds. |
max_impacted_members | body | integer | Maximum amount of instances that can be impacted. Note! This can be dynamic to VNF load. This is important to know how many instances can be scaled down and still have this value above zero to be able to move VMs between nodes. |
recovery_time | body | integer | VNF recovery time after operation to instance. Workflow needs to take into account recovery_time for previous instance moved and only then start moving next obyeing max_impacted_members Note! regardless anti_affinity group or not. |
resource_mitigation | body | boolean | Instance needs double allocation when being migrated. This is true also if instance first scaled out and only then the old instance is removed. It must be True also if VNF needed to scale down, since we go over that scaled down capacity. |
{
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"group_name": "vm_ha_group",
"anti_affinity_group": True,
"max_instances_per_host": 1,
"max_impacted_members": 1,
"recovery_time": 15,
"resource_mitigation": True,
}
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
When instance group is deleted, the constraints should also be deleted from the Fenix DB. As Fenix is aware of existing instances, this could later be enhanced so that Fenix houskeeping could take care of removing deleted instances.
Request¶
Name | In | Type | Description |
---|---|---|---|
group_id | path | string | Instance group uuid. Should match with OpenStack server group if one exists. |
Response codes¶
Code | Reason |
---|---|
200 - OK |
Request was successful. |
Code | Reason |
---|---|
400 - Bad Request |
Some content in the request was invalid. |
422 - Unknown |
The entity of the request is not inline with resource schema |
500 - Internal Server Error |
Something went wrong with the service which prevents it from fulfilling the request. |
Command line interface reference¶
CLI reference of Fenix.
Currently, Fenix does not implement CLI. Real product integration is expected to have GUI and VNFM (application manager) to support Fenix. In the OPNFV Doctor, the maintenance test case implements infrastructure admin and VNFM behavior. In Fenix, you can find ‘infra_admin.py’ and ‘vnfm.py’ that implements the same and are always up-to-date to be tested against Fenix example workflows. Those are anyhow just for the testing sample application but works to get the idea for product implementation.
User guide¶
Fenix Architecture¶
Fenix is an engine designed to make a rolling infrastructure maintenance and upgrade possible with zero downtime for the application running on top of it. Interfaces are designed to be generic, so they can work with different clouds, virtual machines and containers. Current workflows are for OpenStack and Kubernetes, but the workflow plug-in implementation defines what kind of cloud you want to support.
The key in Fenix providing the zero downtime is to have an ability to communicate with an application manager (VNFM). As the application is aware of maintenance affecting its instances, it can safely be running somewhere else when it happens. The application also gets to know about new capabilities coming over infrastructure maintenance/upgrade and can plan its own upgrade at the same. As Fenix also provides scaling request towards applications, it is possible to make upgrades without adding more resources.
Fenix has the ability to tell any infrastructure service when a host is down for maintenance or back in use. This is handy for different things, like enabling/disabling self-healing or billing. The same interface could also be used for adding/removing hosts.
The design makes it possible to make everything with ‘one-click’. Generic API, notifications and tracking in a database are provided by Fenix together with example workflow and action plug-ins. Anyhow, to build for specific cloud deployment, one can provide workflow and action plug-ins to Fenix to fit to any use case one can think of.
Internal design¶
Fenix design is pluggable:

fenix-api is used to make maintenance workflow sessions and to provide admin and project owners an API to communicate to Fenix.
fenix-engine is running the maintenance workflow sessions and keeping track in database.
base workflow is providing basic Fenix functionality that can be inherited by the workflow plug-in used in each maintenance session.
workflow plug-in is the workflow for your maintenance session. Different plug-ins can be implemented for different clouds and deployments.
action plug-ins are called by the workflow plug-in. It is possible to have different type of plug-ins, and if there is more than one of a specific type, one can also define the order they are executed. These types are currently in use in the Fenix example workflows. You can always define your own type according to your workflow implementation:
- pre plug-in is run first
- host plug-in is run for each host
- compute plug-in is run on each compute host
- controller plug-in is run on each controller host
- post plug-in is run last
There is a possibility to define ‘metadata’ to further indicate plug-in specifics.
Interface design¶
Fenix has API and notifications that can be caught by different endpoint interfaces by subscribing to corresponding event alarm:

Infrastructure admin has an API to trigger, query, update and delete maintenance sessions. Admin can also receive the status of a maintenance session by the ‘maintenance.session’ notification trough ‘oslo.notification’. It is also possible to get the same information by subscribing to the corresponding event alarm. This is handy for getting the event to own favorite API endpoint.
Project/application having instances on top of the infrastructure under maintenance can have a manager (VNFM) to communicate with the maintenance session workflow. The manager can subscribe to project specific ‘maintenance.planned’ event alarms to get information about maintenance session state affecting its instances. The subscription also tells the workflow that the project has a manager capable of communicating with the workflow. Otherwise, workflow should have a default behavior towards project instances, or fail if communication is mandatory in your cloud use case. There is also a project-specific API to query its instances under current maintenance workflow session state and to answer back to workflow.
Any infrastructure service can also be made to support ‘maintenance.host’ notification. This notification is telling whether a host is in maintenance or back in normal use. This might be important for enabling/disabling self-healing or billing. Notification can also be used to indicate when a host is added or removed.
High level sequence diagram¶
This is the original design diagram not utilizing the ETSI defined instance and instance group constraints.
This advanced diagram utilizing the ETSI defined instance and instance group constraints.
Fenix BaseWorkflow¶
BaseWorkFlow class implemented in ‘/fenix/workflow/workflow.py’ is the one you inherit when creating your own workflow. Example workflow ‘default.py’ using this can be found from the workflow directory ‘/fenix/workflow/workflows’.
The class provides the access to all maintenance session related data and the ability to send Fenix notifications and process the incoming API requests.
There is also a dictionary describing the generic workflow states that should be supported:
{
"MAINTENANCE": "maintenance",
"SCALE_IN": "scale_in",
"PREPARE_MAINTENANCE": "prepare_maintenance",
"START_MAINTENANCE": "start_maintenance",
"PLANNED_MAINTENANCE": "planned_maintenance",
"MAINTENANCE_COMPLETE": "maintenance_complete",
"MAINTENANCE_DONE": "maintenance_done",
"MAINTENANCE_FAILED": "maintenance_failed"
}
Key is the state name and value is the internal method that you iplement in your workflow to handle that state. When the method returns, it will be checked from Class variable ‘self.state’ what is the next method to be called. So your state related method should change ‘self.state’ to what you want to do next. The method should also implement calling of any action plug-ins and other state related functionality like sending notifications.
States¶
Here is what is supposed to be done in different states when also utilizing the default workflow.
MAINTENANCE¶
This is the initial state right after infrastructure admin has created the maintenance session.
Here one should check if all projects are subscribed to AODH event alarm for event type ‘maintenance.planned’. If project supports this, one can assume we can have interaction with that project manager (VNFM). If not, we should have some default handling for project instances during rolling maintenance, or we should decide to go to state ‘MAINTENANCE_FAILED’ as we do not support that kind of project. From here onwards, we assume projects support this interaction, so can better define other coming states.
Next, we send ‘maintenance.planned’ notification with state ‘MAINTENANCE’ to each project. We wait for the duration of ‘self.conf.project_maintenance_reply’ the reply or fail if some project did not reply. After all projects are in state ‘ACK_MAINTENANCE’ we can wait until the time is ‘self.session.maintenance_at’ and then start the actual maintenance.
When it is time to start we might call the type ‘pre’ action plugins to make actions needed before rolling host by host forwards. This might include downloading of needed software changes and already doing some actions for controllers in case of maintenance operation like OpenStack upgrade.
If currently all the compute capacity is in use and we want to have an empty compute that we can maintain first, we should have ‘self.state’ as ‘SCALE_IN’ to scale down the application. If there is capacity, but no empty host (assuming we want to make maintenance only to empty host), we can have ‘self.state’ as ‘PREPARE_MAINTENANCE’ to move instances around to have an empty host if possible. In case we had an empty host, we can go straight put ‘self.state’ to ‘START_MAINTENANCE’ to start maintenance on that host.
SCALE_IN¶
We send ‘maintenance.planned’ notification with state ‘SCALE_IN’ to each project. We wait duration of ‘self.conf.project_scale_in_reply’ the reply or fail if some project did not reply. After all projects are in the state ‘ACK_SCALE_IN’ we can repeat the same checks as in state ‘MAINTENANCE’ to decide is ‘self.state’ should be ‘SCALE_IN’, ‘PREPARE_MAINTENANCE’ or ‘START_MAINTENANCE’. Again on any error we always put ‘self.state’ to ‘MAINTENANCE_FAILED’
PREPARE_MAINTENANCE¶
As we have some logic to figure out the host that we can make empty, we can send ‘maintenance.planned’ notification with state ‘PREPARE_MAINTENANCE’ to each project having instances on that host. We wait for the duration of ‘self.conf.project_maintenance_reply’ the reply or fail if some project did not reply. After all affected projects are in state ‘ACK_PREPARE_MAINTENANCE’ we can check project and instance specific answer and make action given like ‘migrate’ to move instances away from the host. After the action is done we will send ‘maintenance.planned’ for each each instance with the state ‘INSTANCE_ACTION_DONE’ and with the corresponding ‘instance_id’.
Next, we should be able to put ‘self.state’to ‘START_MAINTENANCE’.
START_MAINTENANCE¶
In case no hosts are maintained yet, we can go through all empty compute hosts in the maintenance session:
We send ‘maintenance.host’ notification with state ‘IN_MAINTENANCE’ for each host before we start to maintain it. Then we run action plug-ins of type ‘host’ in the order they are defined to run. After we are ready with the maintenance actions we send ‘maintenance.host’ notification with state ‘MAINTENANCE_COMPLETE’.
When all empty computes are maintained we can put ‘self.state’ to ‘PLANNED_MAINTENANCE’.
In case all empty hosts were already maintained, we could pick empty host that we have after ‘PLANNED_MAINTENANCE’ is run on some compute host:
We send ‘maintenance.host’ notification with state ‘IN_MAINTENANCE’ before we start to maintain the host. Then we run action plug-ins of type ‘host’ in the order they are defined to run. After we are ready with the maintenance actions we send ‘maintenance.host’ notification with state ‘MAINTENANCE_COMPLETE’.
When all empty computes are maintained we can put ‘self.state’ to ‘PLANNED_MAINTENANCE’ or if all compute hosts are maintained we can put ‘self.state’ to ‘MAINTENANCE_COMPLETE’.
PLANNED_MAINTENANCE¶
We find a host that has not been maintained yet and contains instances. After choosing the host, we can send ‘maintenance.planned’ notification with state ‘PLANNED_MAINTENANCE’ to each project having instances on the host. After all affected projects are in state ‘ACK_PLANNED_MAINTENANCE’ we can check project and instance specific answer and make action given like ‘migrate’ to move instances away from the host. After the action is done we will send ‘maintenance.planned’ with the state ‘INSTANCE_ACTION_DONE’ with the ‘instance_id’ for the instance action was completed. It might also be that the project manager did already an own to re-instantiate, so we do not have to do any action.
When the project manager receives ‘PLANNED_MAINTENANCE’ it also knows that instances will now be moved to the already maintained host. With the payload, there will also go ‘metadata’ that can indicate new capabilities the project is getting when instances are moving. It might be for example:
“metadata”: {“openstack_version”: “Queens”}
It might be nice to make the application (VNF) upgrade now at the same time when instances are anyhow moved to new compute host with new capabilities.
Next, when all instances are moved and the host is empty, we can put ‘self.state’ to ‘START_MAINTENANCE’
MAINTENANCE_COMPLETE¶
Now all instances have been moved to already maintained compute hosts and all compute host are maintained. Next, we might run action ‘post’ type of action plug-ins to finalize maintenance.
When this is done we can send ‘maintenance.planned’ notification with state ‘MAINTENANCE_COMPLETE’ to each project. In case projects scaled down at the beginning of the maintenance they can now scale back to full operation. After all projects are in state ‘ACK_MAINTENANCE_COMPLETE’ we can change the ‘self.state’ to ‘MAINTENANCE_DONE’
MAINTENANCE_DONE¶
This will now make the maintenance session idle until infrastructure admin will delete it.
MAINTENANCE_FAILED¶
This will now make the maintenance session idle until infrastructure admin will fix and continue the session or delete it.
Future¶
Currently, infrastructure admin needs to poll Fenix API to know the session state. When notification with the event type ‘maintenance.session’ gets implemented, infrastructure admin will be receiving state change whenever it will change.
Fenix Advanced Workflow¶
Example advanced workflow is implemented as ‘fenix/workflow/workflows/vnf.py’. This workflow utilizes the ETSI defined instance and instance group constraints. Later there needs to be a workflow also with NUMA and CPU pinning. That will be very similar, but it will need more specific placement decisions which mean scaling has to be for exact instances and moving operations have to be calculated to have the same pinning obeyed.
Workflow states are similar to ‘default’ workflow, but there is some differences addressed below.
The major difference is that VNFM is supposed to update VNF instance and instance group constrains dynamically always to match VNF current state. Constraints can be seen in API documentation as APIs are used to update the constraints to Fenix DB. Constraints help Fenix workflow to optimize the workflow as fast as it can, when it knows how many instances can be affected and all other constraints that also makes sure there is zero impact to VNF service.
States¶
MAINTENANCE¶
Difference to default workflow here is that by the time the maintenance is called and we enter to this first state all VNFs affected needs to have instance and instance group constraints updated to Fenix. A perfect VNFM side implementation should always make sure the changes in VNF will be reflected here.
SCALE_IN¶
As Fenix is now aware of all the constraints, it can optimize many things. One is to scale exact instances as we know max_impacted_members for each instance group, we can optimize how much we scale down to have optimal amount of empty compute nodes while still have optimal amount of instances left as max_impacted_members. Other thing here is when using NUMA and CPU pinning. We definitely need to dictate the scaled down instances as we need exact NUMA and CPUs free to be able to have empty compute host. Also when making the move operations to pinned instances we know it will always succeed. A special need might also be in edge could system, where there is very few compute host available.
After Fenix workflow has made its math, it may suggest the instances to be scaled. If VNFM reject this, retry can let VNFM decide how it scales down, while it might not be optimal.
VNFM needs to update instance and instance group constraints after scaling.
PREPARE_MAINTENANCE¶
After state ‘SCALE_IN’ the empty compute capacity can be scattered. Now workflow need to make math of how to get empty compute nodes in the best possible way. As we have all the constraints we can do operations parallel for different compute nodes, VNFs and their instances in different instance groups.
Compared to default workflow ‘maintenance.planned’ notification is always for single instance only.
START_MAINTENANCE¶
Biggest enhancement here is that hosts can be handled parallel if feasible.
PLANNED_MAINTENANCE¶
As we have all the constraints we can do operations parallel for different compute nodes, VNFs and their instances in different instance groups.
Compared to default workflow ‘maintenance.planned’ notification is always for single instance only.
MAINTENANCE_COMPLETE¶
This is same as in default workflow, but VNFM needs to update instance and instance group constraints after scaling.
MAINTENANCE_DONE¶
This will now make the maintenance session idle until infrastructure admin will delete it.
MAINTENANCE_FAILED¶
This will now make the maintenance session idle until infrastructure admin will fix and continue the session or delete it.
Notifications¶
Similarly to other OpenStack services Fenix emits notifications to the message bus with the Notifier class provided by oslo.messaging [1]. From the notification consumer point of view a notification consists of two parts: an envelope with a fixed structure defined by oslo.messaging and a payload defined by the service emitting the notification. The envelope format is the following:
{
"priority": "string, selected from a predefined list by the sender.
Fenix currently uses 'info'.",
"event_type": "string, defined by the sender. Event types are defined
later in this docuemnt.",
"timestamp": "string, the isotime of when the notification emitted",
"publisher_id": "string, defined by the sender. Fenix uses 'fenix'",
"message_id": "uuid, generated by oslo.",
"payload": "json serialized dict, defined by the sender. This is
defined for each event type later in this document."
}
Admin¶
These notifications are meant for admin level user. Infrastructure admin who is in charge of triggering the rolling maintenance and upgrade workflows and for any infrastructure service needing to know about the host being in maintenance. This might be used for enabling/disabling self-healing or billing.
Event type ‘maintenance.host’¶
This event type is meant for infrastructure services to know the host might be down and taken out of normal usage. Also after the host is back or new host added, there is another message to tell host is back in use or added. This might be meaningful for self-healing or billing.
payload¶
Name | Type | Description |
---|---|---|
service | string | Origin service name: Fenix |
state | string | Maintenance state. values can be ‘IN_MAINTENANCE’ or ‘MAINTENANCE_COMPLETE’. In future this might have also values like ‘HOST_ADDED’ or ‘HOST_REMOVED’. |
session_id | string | UUID of the related maintenance session |
host | string | Host name |
project_id | string | workflow admin project ID |
Example:
{
"service": "fenix",
"state": "IN_MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"host": "overcloud-novacompute-0.opnfvlf.org",
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3"
}
Event type ‘maintenance.session’¶
This event type is meant for infrastructure admin to know the changes in the ongoing maintenance workflow session. This can be used instead of polling API. Via API you will get more detailed information if you need to troubleshoot.
payload¶
Name | Type | Description |
---|---|---|
service | string | Origin service name: Fenix |
state | string | Maintenance workflow state (States explained in the user guide) |
session_id | string | UUID of the related maintenance session |
percent_done | string | How many percent of hosts are maintained |
project_id | string | workflow admin project ID |
Example:
{
"service": "fenix",
"state": "IN_MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"percent_done": 34,
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3"
}
Project¶
These notifications are meant for a project level user to know about a maintenance session affecting its instances.
Project/application manager (VNFM) can have a ‘POST /maintenance’ API to catch notification through an AODH event alarm [2].
Event type ‘maintenance.planned’¶
This event type is meant for a project level user to know about a maintenance session state affecting its instances. According to this event project manager (VNFM) can know to make actions to its instances affected by maintenance and replying back to Fenix.
payload¶
Name | Type | Description |
---|---|---|
service | string | Origin service name: fenix |
allowed_actions | list | A list of allowed actions for an instance. Allowed values are: ‘MIGRATE’, ‘LIVE_MIGRATE’ and ‘OWN_ACTION’. ‘OWN_ACTION’ means an action project manager can do itself. Usually this could be re-instantiation even with a new flavor. Other actions are done by Fenix as they need the admin privileges. In Kubernetes also ‘EVICTION’ supported. There admin will delete instance and VNF automation like ReplicaSet will make a new instance. Valid for states: ‘SCALE_IN’, ‘PREPARE_MAINTENANCE’ and ‘PLANNED_MAINTENANCE’. |
instance_ids | string | Link to Fenix maintenance session and project specific API to get instance IDs related to current maintenance workflow ‘state’. A special case is with the ‘state’ ‘INSTANCE_ACTION_DONE’ where the value is a single instance_id only. When using Telco workflow with ETSI defined constraints value is also just a single instance_id in the ‘state’ ‘PREPARE_MAINTENANCE’ and ‘PLANNED_MAINTENANCE’. |
reply_url | string | Link to Fenix maintenance session and project specific API to send the reply corresponding to this notification. When using Telco workflow with ETSI defined constraints reply URL is instance specific in the the ‘state’ ‘PREPARE_MAINTENANCE’ and ‘PLANNED_MAINTENANCE’. |
state | string |
|
session_id | string | UUID to related maintenance session |
reply_at | string | time when need to reply to Fenix |
actions_at | string | time when Fenix triggers its actions |
project_id | string | workflow admin project ID |
metadata | dictionary | Can tell hints; like new capabilities coming after as a result to ‘state’ ‘PLANNED_MAINTENANCE’ when instances will be moving to already maintained host. As knowing these capabilities, the project-manager can plan its own upgrade at the same time or later. This will be handy to even re-instantiate instances with a new flavor to take a new type of hardware into use. |
Example of notification for many instances:
{
"service": "fenix",
"allowed_actions": ["MIGRATE", "LIVE_MIGRATE", "OWN_ACTION"],
"instance_ids": "http://0.0.0.0:12347/v1/maintenance/76e55df8-1c51-11e8-9928-0242ac110002/ead0dbcaf3564cbbb04842e3e54960e3",
"reply_url": "http://0.0.0.0:12347/v1/maintenance/76e55df8-1c51-11e8-9928-0242ac110002/ead0dbcaf3564cbbb04842e3e54960e3",
"state": "MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"reply_at": "2018-02-28T06:40:16",
"actions_at": "2018-02-29T00:00:00",
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3",
"metadata": {"openstack_release": "Queens"}
}
Example of notification for single instance. Note the instance specific ‘reply_url’:
{
"service": "fenix",
"allowed_actions": ["MIGRATE", "LIVE_MIGRATE", "OWN_ACTION"],
"instance_ids": ["28d226f3-8d06-444f-a3f1-c586d2e7cb39"],
"reply_url": "http://0.0.0.0:12347/v1/maintenance/76e55df8-1c51-11e8-9928-0242ac110002/ead0dbcaf3564cbbb04842e3e54960e3/28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"state": "PREPARE_MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"reply_at": "2018-02-28T06:40:16",
"actions_at": "2018-02-29T00:00:00",
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3",
"metadata": {"openstack_release": "Queens"}
}
Example of notification for single instance in Kubernetes. Note the instance specific ‘reply_url’ and allowed actions for Kubernetes:
{
"service": "fenix",
"allowed_actions": ["OWN_ACTION", "EVICTION"],
"instance_ids": ["28d226f3-8d06-444f-a3f1-c586d2e7cb39"],
"reply_url": "http://0.0.0.0:12347/v1/maintenance/76e55df8-1c51-11e8-9928-0242ac110002/ead0dbcaf3564cbbb04842e3e54960e3/28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"state": "PREPARE_MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"reply_at": "2018-02-28T06:40:16",
"actions_at": "2018-02-29T00:00:00",
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3",
"metadata": {"openstack_release": "Queens"}
}
[1] | http://docs.openstack.org/developer/oslo.messaging/notifier.html |
[2] | https://docs.openstack.org/aodh/latest/admin/telemetry-alarms.html#event-based-alarm |
Administrators guide¶
Fenix¶
Fenix should be deployed to the infrastructure controller or manager node and be used for any cloud infrastructure maintenance, upgrade, scaling or life-cycle operations.
VNF and VNFM¶
In the NFV use case, the VNF and VNFM need to support Fenix to optimize the workflows and to guarantee zero impact to VNF during different infrastructure operations. This means instance and instance group constraints and the interaction with Fenix. Operations are most optimal when the VNF can be scaled according to its current utilization level. This allows for the smallest possible maintenance window, with zero impact on the VNF service.
Infrastructure Admin UI¶
Infrastructure admin UI needs to support Fenix. UI needs to be able to call the Fenix admin APIs and preferably listen to Fenix admin events. UI should be able to call different infrastructure maintenance and upgrade workflows with needed parameters. APIs and events also give the possibility to have detailed information about the workflow progress and to troubleshoot possible errors. In complex clouds, errors can be still simple and quickly corrected even manually. In this kind of special case, the UI can also support updating Fenix workflow session to continue exactly where it failed. Explained in GET /v1/maintenance/{session_id}/detail
Integration¶
The above-mentioned UI, VNF and VNFM are currently not in the scope of Fenix. The implementation should be in other open-source projects or own proprietary solutions. Currently, at least OpenStack Tacker is looking to support Fenix.
For testing the integration of Fenix, there is tools directory including definitions of sample VNFs, VNFM and admin UI. There are also instructions on how to test Fenix example workflows. These tools give an idea of what needs to be supported and how to integrate Fenix in the production environment. Note that tools only give what needs to be in the VNFM and VNF side for Fenix testing purposes. They do not try to have a standard implementation like VNFD or other needed interactions on VNF side. What is standard, is the interfacing against Fenix.
References¶
References of Fenix.
Fenix specifications¶
ETSI NFVI software modification specification¶
https://storyboard.openstack.org/#!/story/2006557
Implement the needed interfacing between VNFM and Fenix that is specified in ETSI FEAT03 related documentation etsi. Limit current changes to instances and instance groups.
Problem description¶
This feature addresses the support for the coordination of the NFVI software modification process with the VNFs hosted on the NFVI in order to minimize impact on service availability.
Use Cases¶
Guarantee a zero impact to VNF service during Fenix infrastructure maintenance, upgrade and scaling workflow operation. This implies that VNF and VNFM supports the ETSI specification and Fenix interaction.
Proposed change¶
Implement APIs to set VNF specific instance and instance group variables.
New APIs are to have VNF project instance and instance group data changed in the Fenix database. These constraints might be set in VNFD or the VNF element manager can change these any time according to VNF current load level. Having the constraints gives the ability to optimize the infrastructure maintenance operation as we can scale down the VNFs as much as possible and therefore to able to maintain parallel as many compute nodes as possible. Instance grouping can be instances belonging to certain anti-affinity group, but all instances need to be grouped, so we know how many of those are at least needed and how many of those can be exposed to maintenance at the same time. If nothing else, group mean instance of a certain flavor.
Make an example workflow that supports the usage of these APIs. Workflow should implement one example rolling maintenance use case. Existing Fenix interaction towards VNFM will be utilized with small changes.
The variables common to instance and instance group can be overridden in the instance object. Both objects can be updated at any time. Update can be considered in any action that is not currently not ongoing. Existing timer would not be updated. These objects are not enough to optimize infrastructure workflow. The existing Fenix interaction is also needed to optimize the maintenance window as small as possible. Also this allows upgrading the VNF with new infrastructure capabilities and with no additional impact on VNF service availability if done at the same time as the infrastructure upgrade.
This diagram will illustrate the existing Fenix workflow where application manager updates instance and instance group constraints always when instances are created or deleted. Constraints can also be updated anytime if the level of VNF service will allow different amount of instances at that time.
Alternatives¶
N/A
Data model impact¶
Fenix database will need to have new tables to support instance and instance group objects.
REST API impact¶
All APIs will have 200 OK as return. Error codes defined during implementation.
API PUT /v1/instance/{instance_id}
is used to update instance object.
API GET /v1/instance/{instance_id}
is used to get instance object.
PUT
API should have this structure as input and GET
API as return:
{
"instance_id": "instance_UUId string",
"project_id": "Project UUID string",
"group_id": "group_UUID string",
"instance_name": "Name string",
"max_interruption_time": 120, # seconds
# How long live migration can take
"migration_type": "LIVE_MIGRATION",
# LIVE_MIGRATION, MIGRATION or OWN_ACTION
# Own action is create new and delete old instance.
# Note! VNF need to obey resource_mitigation with own action
# This affects to order of delete old and create new to not over
# commit the resources.
"resource_mitigation": "True", # True or False
# Current instance needs double allocation when being migrated.
# This is true also if instance first scaled out and only then the old
# instance is removed. It must be True also if VNF needed to scale
# down, since we go over that scaled down capacity.
"lead_time": 60 # seconds
# How long lead time VNF needs for 'migration_type' operation. VNF needs to
# report back to Fenix as soon as it is ready, but at least within this
# time. Reporting as fast as can is crucial for optimizing
# infrastructure upgrade/maintenance.
}
API DELETE /v1/instance/{instance_id}
is used to delete instance object.
API PUT /v1/instance_group/{group_id}
is used to update instance group
object:
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
}
API GET /v1/instance_group/{group_id}
is used to get instance group.
compared to PUT
this strcuture has also the instance_ids
:
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
"instance_ids": [] # List of instances belonging to this group
}
API DELETE /v1/instance_group/{instance_id}
is used to delete instance
group object.
New API is needed for project instance specific reply:
This API will not be used to reply to ‘state’ ‘PREPARE_MAINTENANCE’ and ‘PLANNED_MAINTENANCE’ notifications that will be instance specific.
PUT /v1/maintenance/<session_id>/<project_id>/<instance_id>
:
{
"instance_action": "MIGRATE",
"state": "ACK_PLANNED_MAINTENANCE"
}
Notifications impact¶
Event type maintenance.planned
notification will need changes.
New state
value INSTANCE_ACTION_FALLBACK
should be added to tell live
migration was not possible and Fenix will force the migration to complete.
After that the normal INSTANCE_ACTION_DONE
or INSTANCE_ACTION_FAILED
will be expected.
instance_ids
is currently limited to either single instance_id
or
a link to get all affected instances. Now this should be always a single
instance, but in state
value of MAINTENANCE
or SCALE_IN
.
MAINTENANCE
should always have the link to Fenix API to get all instances
that may be affected during the maintenance session. SCALE_IN
can mention
only one exact instance as it maybe be needed to allow other pinned instance
to have a target host with needed resources. This can happen in small edge
deployment. Empty string indicates VNF can decide how it scales down. Workflow
may then need to have several SCALE_IN
notifications to finally have enough
unused resources to execute workflow further. state
having value
MAINTENANCE_COMPLETE
should have empty string as instance_ids
value. In
this state
VNF should scale back to instances it had in the beginning of
the maintenance session.
Other end user impact¶
VNFD and EM needs to support defining and updating instance and instance group variables
Other deployer impact¶
VNFM needs to proxy updating instance and instance group variables
Implementation¶
Assignee(s)¶
- Primary assignee:
- Tomi Juvonen <tomi.juvonen@nokia.com>
Work Items¶
- APIs to set instance and instance group objects
- Example workflow
- Testing
- Documentation changes
Dependencies¶
There can be enhancements later on to other projects. Anyhow initially needed functionality can be handled completely inside Fenix.
Testing¶
There is huge amount of combinations of VNF deployments and used variables can be changed during the operations. Fenix will support all there variables and their changes. Fenix workflow is always an example and limits to what it can support and is tested against. The main thing to test is that all variables and their changes are supported and validated. The testing of VNF deployment might be limited to example use case supported by example workflow.
Documentation Impact¶
Fenix documentation needs to be updated after the implementation is ready.