OCI FortiGate HA Cluster – Reference Architecture: Code Review and Fixes
Kosseila Hd
Jan 17, 2024 7:26:00 PM
OCI Quick Start repositories on GitHub are collections of Terraform scripts and configurations provided by Oracle. These repositories are designed to help organizations quickly deploy common infrastructure setups on the OCI Platform. Each Quick Start focuses on a specific use case or workload, which simplifies the process of provisioning on OCI using Terraform. A sort of IaC-based reference architecture.
Today, we will code review one of those reference architecture which is a Fortinet Firewall Solution deployed in OCI.
Note: This article won’t discuss the architecture, but will rather address its terraform code flaws and fixes.
Certain Terraform errors may not reach your RM stack due to its design. For instance, RM allows the hardcoding of specific variables, like availability domains, directly in its interface. This sidesteps the need for these variables to be checked by native conditions in the TF code.
Moreover, RM reads these variables from the schema.yaml file, altering the behavior compared to local Terraform CLI execution. This approach can result in certain errors being handled or bypassed within the RM environment, creating a distinction from standard Terraform workflows.
The stack is a result of the collaboration of both Oracle and Fortinet. This architecture is based on a Hub and Spoke topology, using a FortiGate firewall from OCI Marketplace. I actually deployed it while working on one of my projects.
For details of the architecture, see Set up a hub-and-spoke network topology.
You will find this terraform config under the main oci-fortinet GitHub repository. But not in the root directory.
The folder in question is drg-ha-use-case under oracle-quickstart/oci-Fortinet/use-cases/drg-ha-use-case
At the time of writing this, the errors were still not fixed despite opening issues and sharing the fix. You can see that the last commit goes back to 2 years. You will need to clone the repo and cd to the drg-ha-use-case subdirectory
$ git clone https://github.com/oracle-quickstart/oci-fortinet.git
$ cd use-cae/drg—ha-use-case
$ terraform init
You will face this issue on a region with only one availability domain (i.e. ca-toronto-1) as the data source of the availability domain will fail the terraform execution plan.
CAUSE: See issue #8
In the above error terraform complains about the availability of data sources having only one element
This impacts 2 of the “oci_core_instance resource” blocks (2 web-vms, 2 db-vms).
compute.tf => line 235 & line 276
Problem?
$ vi data_source.tf
# ------ Get list of availability domains
8 data "oci_identity_availability_domains" "ADs" {
9 compartment_id = var.tenancy_ocid
10 }
…
Reason:
In terraform the count.index
always starts at 0, if you have a resource with a count
of 4, the count.index
object will be 0, 1, 2, and 3.
Let’s take for example the “web-vms” oci_core_instance block in compute.tf > line 235
If we run the condition:
– The variable availability_domaine_name is empty
– The ads data source length = 1 element. That means that the AD name will be equal to
ads data_source collection with an index value of [0+1] = 1
data…ads.availability_domains[1] doesn’t exist as it only contains 1 element
The Solution
Complete the full availability domain conditional expression on line 235 and line 276 (web-vms/db-vms)
Add the case where data source ads.availability_domains has 1 element (the region has one AD only)
Bad Logic
Seeking the name of the count.index+1 availability domain is still wrong when the region has more than 1 AD
Example: say you want to create 3 VMs and your region has 2 Availability domains >1.
The first iteration [0] will set count.index+1 = 1 ( 2nd data source element = AD2)
Then the second iteration sets a count.index+1 = 2 ( 3rd data source element=AD3)
The 2nd and 3rd iterations will always fail because there are only 2 ADs (index list [0,1]).
Another issue you will run into is a failure to deploy subnets due to data source collection being empty (no element).
CAUSE: See issue #9
In the above error terraform complains that {allow_all_security} data source is empty
This impacts all FortiGate subnet blocks in the config as they all share the same security lists.
network.tf => line 240 & more
Reason:
In this configuration, there are 2 compartments, one for compute and another for network resources
If you take a look at “allow_all_security” block in datasource.tf > line 64-to-74
You’ll notice a wrong compartment ID in the security lists data source (compute instead of network)
Solution
This was a silly mistake, but took me a day to figure it out while delving through a pile of new Terraform files.
All you need to do is replace the compute compartment variable with var.network_compartment_ocid
Edit network.tf line 64-74
# ------ Get the Allow All Security Lists for Subnets in Firewall VCN
data "oci_core_security_lists" "allow_all_security" {
compartment_id = var.network_compartment_ocid <--- // CORRECT Compartment
vcn_id = local.use_existing_network ? var.vcn_id: oci_core_vcn.hub.0.id
...
I wasn’t done debugging as I found other misplaced compartment variables in some VNIC attachments data sources
See datasource.tf: Line 103-115 &118-130, you need to replace them by var.compute_compartment_ocid
This type of undetected code issue is why I never trusted the first deployment in Resource Manager.
In order to avoid problems in the future, especially if you decide to migrate out of RM at some point, I suggest the following workflow:
Run locally and validate any code bug
Run on Resource Manager
Store to git repo (blueprint with eventual versioning)
I hope this was helpful as the issues I opened are still unsolved for over a year in their GitHub Repo.
Fill out the form below to unlock access to more Eclipsys blogs – It’s that easy!