OCI Observability and Management (O&M) Platform consists of several cloud services that we can enable to monitor, analyze, and manage applications and infrastructure environments with full-stack visibility, prebuilt analytics, and automation capabilities.
In this blog, I will create an alarm that is triggered when the service metrics reach a designated threshold, and using OCI notification we will get a notification alert.
Steps:
1. Confirm compute instance monitoring Plug-in is Enabled
Navigate to Main Menu -> Compute -> Instances and select your instance
Click on the Oracle Cloud Agent tab.
The compute instance monitoring plug-in should be running and enabled.
2. Create a Topic and Subscription Inside a Topic for Notification
To create an alarm, first I need to create a notification so that the alarm has a way to notify the relevant parties.
OCI Notification has two main elements: Topic (a communication channel for sending alarms) and Subscription (endpoints for receiving those messages)
Navigate to Main Menu –> Developer Services –> Application Integration, and select Notifications.
Click Create Topic and give any topic name ( for eg. CPU stress-topic) and description and click create.
After the topic state changes to Active, click the topic Name.
Now Click Create Subscription, Provide the Protocol name and detail
Click create and you will see your subscription OCID.
The subscription details screen will be displayed with the subscription status pending.
I should receive an email to confirm the subscription with Confirm subscription verification link in it. Like below..
Verify the subscription by clicking on Confirm subscription and then you will see the subscription status changes to active in the OCI console.
3. Now the final step is to create an alarm for CPU Utilization
Navigate to Main Menu -> Observability & Management -> Monitoring and click Alarm Definitions.
Click Create Alarm.
a. Define Alarm
b. Define Metric description like compartment, metric namespace (oci_computeagent), Metric name, interval, and statistics like below.
I leave the metric dimensions area blank as I have only one compute instance but if you have like 5 or more instances then you can use dimensions to filter based on dimensions.
c. Define trigger rule: you can use any value as per requirement.
d. Define Alarm Notifications: Here you have to mention the Topic that we created in step 2.
e. We can also select the message format and repeat notification if an alarm continues. Or you can suppress the notification.
f. Check Enable this alarm and save it.
You should now be able to see the alarm’s details when you click on Alarm like below.
And when the threshold reaches to >=70 as per the alarm setup, we will get an email notification like below.
Email provides details about Alarm OCID, number of metrics breaching threshold, and dimensions.
Stay Tuned for my next O&M Blog!