Azure AI Search (previously Azure Cognitive Search) is a powerful search engine that provides full text search amongst other features.
In this blog post we'll go through:
Setting up Azure AI search and related resources with terraform
In the next blog post we'll go through:
Extracting static pages and processing them
Uploading the static pages to a storage account
Indexing the static pages in Azure AI search
Searching the pages
Getting started
Create a git repository to work from:
Now we need to setup the terraform providers we need, create a file called providers.tf:
The AzureRM provider will be used to create most of the resources, the Mastercard/restapi provider is used to configure Azure AI Search as there is currently no built in provider that can manage Azure AI Search resources after its creation, (see this issue from 2021).
After creating the providers file lets init terraform:
terraform init
You should see output that looks like:
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
The resources
First we'll create a resource group:
A storage account that we will use to store the data in part 2:
Then a search service:
Now that we've created the search service we can configure the restapi provider that we'll need in a minute, update the providers.tf file adding this to the bottom:
To store the data we'll need an index,
We'll pull in a few fields that are auto generated by the indexer:
metadata_storage_name - name of the file
metadata_storage_path - full storage path including the storage account name
metadata_storage_content_md5 - used to see if the file has changed for indexing
content - the text content that has been extracted from the source file
title - a metadata attribute that will be set on the file in part 2
We've configured a semantic search configuration which will provide a secondary ranking of the result using language understanding to reevaluate the result set.
Now that we've defined an index, we need a datasource that an indexer can pull from:
Notice that there's a data deletion policy based off of native blob soft delete, we configured that earlier in the storage account retention configuration, if files are deleted, they will be deleted from the index as well.
Also see that there's no credentials required to access the storage account, thats because the system assigned managed identity of the search service is accessing it using Azure RBAC with the 'Storage Blob Data Reader' role.
Finally we will create an indexer that will extract the files from the storage account:
You will see that there's no schedule configured, that's because this example is for a static site, we'll trigger the indexer in part 2 using the API as part of the static site deployment so there's no need to schedule it.
You should be able to run the indexer now and get a successful run it will just import 0 documents.
Click the run button in the top left and if everything has worked you should get a success:
For a complete reference of all the code used in this blog please see: