How API Operational Intelligence solves critical business challenges
Nowadays, businesses have a golden opportunity to use API calls metadata to get valuable insights that is, if acted upon, would give them the ability to be proactive in responding to customer experience issues, threats and strategic growth.
20 . 3
APIs in todays Digital Transformation (DX) era, have become one of the core building blocks powering our highly connected world. Whether you're shopping online, chatting with your friends on WhatsApp or using Google Maps to find your way to a new place, you're interacting with several different businesses by making hundreds of API calls behind the scenes.
APIs have become a strategic investment for any business going through digital transformation because they allow business data and functionality to be accessed from any device anytime, anywhere. That in-turn opens up a whole new, data-driven revenue stream for businesses.
Many businesses going through digital transformations did manage successfully to get their data (Business Information Assets) exposed in the form of APIs to customers, employees, business partners and IoT. These businesses have a golden opportunity to use API calls metadata to get valuable insights that is if acted upon, would give them the ability to be proactive in responding to customer experience issues, threats and strategic growth.
Gathering and producing intelligence from API calls is called API Operational Intelligence. API Operational Intelligence is the continuous, real-time analytics that delivers visibility and insight into business operations in The Digital Age.
In this article, I will walk you through how easily you can stand-up API Operational Intelligence capability. A capability that if designed and implemented properly, it would put timely actionable-intelligence at the fingertips of decision-makers and enable them to make better-informed critical decisions addressing customer experience issues, organization's threats and strategic growth.
Intelligence gathering in The Digital Age
Without a comprehensive understanding of who is doing what, when and from where, organisations would lack the ability to be proactive in responding to customer experience issues, threats and strategic growth. Who is doing what, when and from where represent what is called Raw Intelligence in the context of modern intelligence gathering. Actionable intelligence is information that can be acted upon and in the context of API Operational Intelligence it is represented by dashboards and alerts.
Raw Intelligence collected about and from API consumers (customers, employees, partners, & IoT) provides valuable insights about their usage patterns and help businesses custom-build products based on individual requirements. Using these insights, the concept of one-to-one marketing or personalised marketing would be made possible on a massive scale.
API Call Metadata (Raw Intelligence) are the data that provides information about the API being called. This includes, but not limited to the following:
- Who is calling the API (subject Id)
- What API being called (API resourceId & operation)
- When the API is called (date and time),
- From where the API is being called (Geo Location based on IP address, GPS Coordinates or Indoor Location using Wifi or BLE Beacons).
It is better to refrain from collecting the actual request and response of the API itself as from a privacy standpoint, this is considered invasive and would expose the business to new threats and ongoing risks.
Modern API Operational Intelligence-gathering methods are highly invasive by nature. The focus of this article is to standup API Operational Intelligence capability to get deep and valuable insights without the need to be invasive. Invasive intelligence gathering techniques are usually used in Corporate Counterintelligence to protect highly sensitive business information assets (e.g. intellectual property, trade secrets, business processes, strategic goals, etc.). Invasive intelligence gathering are outside the scope of this article but in the future I will be writing an entire separate article about it. If you would like to be up to date with the latest articles published in my blog, you can subscribe here.
API Operational Intelligence is the natural evolution of solutions like SIEM, log management, monitoring, etc. As the businesses advance in their digital transformation journey, API OI would replace partially or entirely these solutions. The diagram below outlines the typical way of gathering and producing intelligence in the digital age.
In an age of increasing connectivity and data-collection technologies, businesses need to find the right balance between getting valuable insights about an individual (customers or employees) and the individual own privacy. It would be in the best interest of a business going through digital transformation to find some innovative ways of striking that balance as the consequences of one data leak could be beyond its own survival. For example, Home Depot spent $43 million to manage the consequences of one data leak in one quarter. Money was spent on investigations, providing identity theft protection services to consumers, increased call center staffing, and other legal and professional services.
The Business Scenario
Let's take a business scenario and walkthrough the pressing operational needs and see how an API Operational Intelligence capability can satisfy these needs.
Earth2 Golden View Beach Resort (fake name) is one of the best diving & snorkeling spots on earth. The resort has a hotel, villas, parking lot, bars, restaurants, two pools, small Casino, kids play park, entertainment shows, fitness center, Spa, Beauty Salon, shopping arcade, and On-site diving center are all here to ensure a relaxing getaway.
In-terms of digital capabilities, the resort has:
||Mobile App, Website & Kiosks (on-site and downtown in shopping centres) for customers to check-in, do bookings, order food, shopping, etc.|
||Mobile App for their employees to manage customer interactions within the resort (bookings, hotel, customer checks, alerts, etc.). Provides far more functionalities compared to the Kiosk|
||for their hotel booking partners to access and book rooms and activities within the resort.|
||Park Assist (Camera-based Parking Guidance System). This one is IoT device that is installed on their parking lot. They have one IoT device (Camera and Processor) on every parking spot. They also got some cameras scattered around the complex for security.|
The resort management team faces lots of challenges everyday and they are unable to cope with them as they don't have enough insights to help them make informed-decisions quickly. They lack the ability to be proactive in facing these challenges or providing customer-facing employees with insights to guide their more complex decisions. Some of the insights that can help are:
- What is the distribution of digital interactions with customers across channels (Mobile, Website, & Kiosk)?
- How many incidents were reported across the compound and how many are outstanding with no one to look at.
- How is the experience across the channels. Are customers facing any difficulties to login, do a booking, or shopping online?
- What is the performance of our APIs servicing digital channels?
- How many customers in their loyalty program taking benefits (free drinks, free rooms, etc.) on daily basis and what is the cost associated with that.
- When do we expect bottlenecks in the car park at busy days.
Among the challenges they have is that most of Earth2 Resort team are moving around the resort a lot and not office-based staff. Even the office-based ones are also moving around having meetings and in places scattered around the resort. It's more of mobile workforce use case.
You can think of the resort as being a small town with people living in it as customers, resort management as the city council, employees as the city council employees and IoT as cameras and sensors all over the town. The business partners as some of the businesses that offers services through the city council.
Solution: Splunk -based API Operational Intelligence Capability
Standing up an API OI capability requires:
||API call metadata (Raw Intelligence) collectors to be placed at the resort API Gateway. The API collectors are just a script that gets embedded in the security policy of the API Gateway and defines what are the metadata that need to be collected while the API call is in-progress. The script should be non-blocking, meaning it shouldn't block the flow of the processing of the API call. This script usually takes around 6 milliseconds to complete. This means it would add 6 milliseconds to the overall time taken for the API call to complete and send a response back to the calling App (Mobile, Kiosk, Website, IoT) which is insignificant.|
||An OI platform that can produce and distribute actionable-intelligence in the form of dashboards and alerts to the resort decision makers. The platform should also have APIs that provides actionable-intelliegcne to machines to make accurate decisions immediately.|
||Secure Mobile App to receive alerts and consume dashboards|
There are several products in the market that can achieve some or all of what we need to achieve here. Based on my experience in that domain, I found Splunk to standout from the rest.
Splunk is one of the top software products in the Operational Intelligence domain. Some businesses have the perception that it is an extremely expensive product. Usually businesses with that perception don't have the right expertise to guide them through their journey toward API Operational Intelligence.
Splunk becomes expensive if you use it as a logging platform where you have an endless amount of logs from everywhere that you want to inject into Splunk so that later on (who knows when) you can may be get something out of it. usually also organisations do that to comply with a requirement from IT Security Team.
If you take the above approach, regardless of which software product you choose, the solution would end-up really expensive and in most cases totally useless and the business would miss out on so many opportunities to be proactive in responding to digital experience issues, threats and strategic growth. Add to this, the risks and new threats associated with collecting massive amount of data.
My approach to this would be different from the usual approach taken by businesses mentioned above. The focus here is to standup API operational intelligence capability not a logging capability. We will collect only the raw metadata that has the potential to be used to produce actionable intelligence. It is also a good practice to have a business process in-place to periodically retire data after a certain amount of time to mitigate privacy risks that result from the accumulation of data.
By collecting only what you need, retiring data periodically and the latest cloud offering from Splunk we can cut down the cost dramatically and the delivery timeline.
One of the main challenges in standup up API OI as a capability is working out what dashboards and alerts to define as this requires lots of experience and knowledge in the intelligence gathering domain. In this section, I will walk you through some of examples of dashboards and alerts and in what situation you can use them to make better-informed decisions.
API Operational Intelligence Dashboards (Consumed By Humans)
||Max TPS (Transaction per Second): This dashboard shows the maximum number of TPS occurred within a timeframe (e.g. last 24 hours, last 4 hours, etc.). This dashboard helps decision makers in case of an immediate increase of API traffic load where immediate action must be taken to bring the load down. In an environment where capacity management is done properly, an increased spike load could happen because of an attack on the the APIs. An alert could be configured on this dashboard to be sent in case the TPS reached a critical level for further actions to be taken.|
||Top 10 Peak Times (Time Span 15 minutes): This dashboard splits the timeframe chosen into chunks of 15 minutes, gets the number of API calls for every chunk of 15 minutes and then get the top 10 highest chunks in-terms of number of API calls. This gives the decision maker the ability to find out when around the day/week, etc. was the busiest 15 minutes. This can be used to plan for activities that carries out changes to the APIs to be done outside the busiest hours.|
||API Consumption by Front-End Apps: This dashboard gives you a pie chart that has a slice for every front-end App connected to consume the API. Here you can see how front-end Apps are performing in-terms of their use for the APIs. This can help you decided where to put the investment in front-end Apps. You could find that the Website App is quite well under compared to the Mobile App and you can decide to just allocate all new funds to the Mobile App (Mobile First Strategy).|
||Max TPS per Day (Last 7 days): This dashboard gives you a pie chart with a slice for every day in the last 7 days. The pie chart can tell you quickly what is your busiest day in the week. This can help you decide when to plan activities for changing APIs. It can also help you to establish a pattern for consumer usage around the week.|
||Top 10 APIs: In a microservice for customer account management, this dashboard gives you the top 10 APIs used within the microservice. This helps you in deciding which APIs you need to focus on in-terms of performance and capacity. The higher the number in the count column, the more attention the relevant API needs.|
||How many API calls were successful/failed: This dashboard shows you a pie chart with a slice for every success or fail reason. It can help you to assess how the API is performing, and the more the success slice percentage the less you need to worry. In the same dashboard you can have the pie chart display success/fail by front-end App and this where you can filter down errors to specific apps. If you found that there are Apps that are showing many number of errors you can cut off the access for that particular app immediately or you can have a chat with the App product owner depending on the criticality and number of errors.|
||API Requests by location on a map: This dashboard shows a map of the world and from where the API requests are initiated. If you hover on the green circles on the map, you would get to see the number of requests initiated from every location. The location can be determined by source IP address, source GPS location (collected by front-end Apps) or Indoor location which can be determined by BLE beacons or Wifi hotspots. Please note that GPS coordinates as well as indoor locations are far more accurate than IP Address. Getting GPS coordinates or indoor location for users is an invasive exercise that requires user's consent and a whole process to manage access to these data. Knowing from where users are accessing the APIs can help in decisions related to threat protection, and strategic growth as well as customer experience.|
API Operational Intelligence Alerts
Defining alerts is proven to be invaluable in being proactive and taking actions quickly when a critical event occurs. Splunk can send alerts to Splunk Mobile App which in-turn display the alert on a smartwatch if paired with the app. Splunk can also be easily integrated with Slack to send alerts, which can make all of your communication in one place if the organisation is heavy on the use of Slack.
API Operational Intelligence APIs (Consumed by Machines)
The APIs would be always vulnerable to attacks and unauthorised access. You could have dozens of incidents per year just dealing with these kind of attacks and trying to block them. Sometimes you get false alarms too. What if you can have a machine takes care of that for you and report back, wouldn't that be super awesome!!.
Let's see how can we achieve blocking attacks automatically and accurately without any human intervention.
||Front-End Apps (e.g. Mobile Apps) send API requests which goes through WAF, API Gateway and ends up calling the API itself. While requests are passing through WAF (Akamai, Imperva Incapsula, or others), it gets checked against IP Black list and if the source IP of the API call request is in that list, the request would get rejected.|
||While requests are passing through the API Gateway (IBM DataPower, apigee, Axway, AWS, or others), it gets inspected to see if it is authorised or not. If all, good the request gets sent to the API at the backend server and response from backend server is sent back to the calling App (Mobile App). After the response is sent and before the API call finishes processing, a programming logic (script) kicks-in to collect API Call metadata (Raw Intelligence) and send it to API OI (Splunk) by calling the Splunk collector API.|
||When an an authentication error occurs at the gateway, the API Gateway calls OI APIs (Splunk) to check using the source IP address if there was any previous authentication errors from the same IP Address. The rule here is if the number of the errors for the same IP is more than 3 then block the request.|
||API Gateway finds that there are more than 3 authentication errors from the same IP as per API OI (Splunk) response. API Gateway invokes WAF API (e.g. Akamai, Imperva InCapsula, or AWS) to add the IP address to the list of blocked IP addresses and sends AttackDetectedAndBlocked event with the details to API OI (Splunk).|
||In API OI (Splunk), there is an alert configured to check if the AttackDetectedAndBlocked has occurred and if it did, it would send an alert to Splunk Mobile App, Slack Channel (integrated with Splunk). The alert would appear on wearable devices (e.g. Apple Watch) if paired with the Mobile that received the alert.|
The above is just one small example of the many opportunities that are available in a typical API Operational Intelligence platform. You need to be careful while defining patterns for blocking traffic to avoid false positives.
The 4 hours implementation
The steps below assumes that earth2 resort IT has API Gateway capability stood-up and has a level of maturity to enable the OI policies to be embedded smoothly. I will be writing another articles to walk you through how to establish API Channels & API Gateway as a capability to support digital initiatives. You can subscribe to my blog to get notified when I do if you are interested.
The setup needed to be done takes around 4 hour if not less and can be summarised in the following 11 steps:
||Signup for Splunk Cloud and access to the do the below configurations|
||Create an index and call it b2c_api_index|
||Create a new JSON source type by cloning _json source type and adjust it's settings accordingly.|
||Create HTTP event collector and get API Key|
||Create a new App and call it "API Operational Intelligence"|
||On API Gateway, create a non-blocking policy that gets embedded inside the existing policies for protecting existing APIs. The policy would reach out to Splunk collector and adds the metadata for the API call using the API Key created in the above step.|
||Install Splunk Add-on for Mobile Access to allow mobile app to connect to Splunk, consume dashboards and receive alerts.|
||Create dashboards with data source as the index that was created on step 2 above.|
||Create users to access Splunk dashboards|
||Adjust security settings for apps and dashboards accordingly.|
||Use Splunk Mobile App, access using the one of the user's credentials created above|
Conclusion & Takeaways
Nowadays, businesses have a golden opportunity to use API calls metadata to get valuable insights that is if acted upon, would give them the ability to be proactive in responding to customer experience issues, threats and strategic.
Striking a balance between what you collect about individuals (customers or employees) and the individual own privacy is key to success in this area. It could be difficult to strike that balance but not impossible if you have the right expertise to guide you through your journey.
Notice of Non-Affiliation and Disclaimer: The author of the article is not affiliated, associated, authorized, endorsed by, or in any way officially connected with any of the product vendors (Splunk, Axway, Akamai, Imperva InCapsula, Optus, IBM, Amazon AWS, WhatsApp, or Google) mentioned in this article, or any of its subsidiaries or its affiliates.
Share article with others on your favourite social media network
About the Author
Adam Ali 🦋
I'm End-2-End Digital Solution Architect with 16+ years of experience in design, development, and integration of end-2-end robust solutions with particular attention to security, high performance, scalability and high availability.All author posts