6 minute read

SaaS Availability Management

Next Article
INDEX

INDEX

In another example, on December 20, 2005 Salesforce.com (the on-demand customer relationship management service) said it suffered from a system outage that prevented users from accessing the system during business hours. Users “experienced intermittent access” from 9:30 a.m. to 12:41 p.m. Eastern time and from 2:00 p.m. to 4:45 p.m. Eastern time because of a database cluster error in one of the company’s four global network nodes, company officials said in a statement the day following the outage. The statement added that “Salesforce.com addressed the issue with the database vendor” so that service could be restored in the afternoon.

Factors Impacting Availability

Advertisement

The cloud service resiliency and availability depend on a few factors, including the CSP’s data center architecture (load balancers, networks, systems), application architecture, hosting location redundancy, diversity of Internet service providers (ISPs), and data storage architecture. Following is a list of the major factors:

• SaaS and PaaS application architecture and redundancy. • Cloud service data center architecture, and network and systems architecture, including geographically diverse and fault-tolerance architecture. • Reliability and redundancy of Internet connectivity used by the customer and the CSP. • Customer’s ability to respond quickly and fall back on internal applications and other processes, including manual procedures. • Customer’s visibility of the fault. In some downtime events, if the impact affects a small subset of users, it may be difficult to get a full picture of the impact and can make it harder to troubleshoot the situation.

• Reliability of hardware and software components used in delivering the cloud service. • Efficacy of the security and network infrastructure to withstand a distributed denial of service (DDoS) attack on the cloud service. • Efficacy of security controls and processes that reduce human error and protect infrastructure from malicious internal and external threats, e.g., privileged users abusing privileges.

SaaS Availability Management

By virtue of the service delivery and business model, SaaS service providers are responsible for business continuity, application, and infrastructure security management processes. This means the tasks your IT organization once handled will now be handled by the CSP. Some mature organizations that are aligned with industry standards, such as ITIL, will be faced with new challenges of governance of SaaS services as they try to map internal service-level categories to a CSP. For example, if a marketing application is considered critical and has a high

service-level requirement, how can the IT or business unit meet the internal marketing department’s availability expectation based on the SaaS provider’s SLA? In some cases, SaaS vendors may not offer SLAs and may simply address service terms via terms and conditions. For example, Salesforce.com does not offer a standardized SLA that describes and specifies performance criteria and service commitments. However, another CRM SaaS provider, NetSuite, offers the following SLA clauses:

Uptime Goal—NetSuite commits to provide 99.5% uptime with respect to the NetSuite application, excluding regularly scheduled maintenance times.

Scheduled and Unscheduled Maintenance—Regularly scheduled maintenance time does not count as downtime. Maintenance time is regularly scheduled if it is communicated at least two full business days in advance of the maintenance time. Regularly scheduled maintenance time typically is communicated at least a week in advance, scheduled to occur at night on the weekend, and takes less than 10–15 hours each quarter.

NetSuite hereby provides notice that every Saturday night 10:00pm–10:20pm Pacific Time is reserved for routine scheduled maintenance for use as needed.

Here is another SLA example:

During the Term of the applicable Google Apps Agreement, the Google Apps Covered Services web interface will be operational and available to Customer at least 99.9% of the time in any calendar month (the “Google Apps SLA”). If Google does not meet the Google Apps SLA, and if Customer meets its obligations under this Google Apps SLA, Customer will be eligible to receive the Service Credits described below. This Google Apps SLA states Customer’s sole and exclusive remedy for any failure by Google to provide the Service.

Monthly Uptime Percentage

< 99.9% – ≥ 99.0%

< 99.0% – ≥ 95.0%

< 95.0%

Days of Service added to the end of the Service term, at no charge to

Customer

3

7

15

Customer Must Request Service Credit. In order to receive any of the Service Credits described above, Customer must notify Google within thirty days from the time Customer becomes eligible to receive a Service Credit. Failure to comply with this requirement will forfeit Customer’s right to receive a Service Credit.

Maximum Service Credit. The aggregate maximum number of Service Credits to be issued by Google to Customer for any and all Downtime Periods that occur in a single calendar month shall not exceed fifteen days of Service added to the end of Customer’s term for the Service. Service Credits may not be exchanged for, or converted to, monetary amounts.

Google Apps SLA Exclusions. The Google Apps SLA does not apply to any services that expressly exclude this Google Apps SLA (as stated in the documentation for such services) or any performance issues: (i) caused by factors outside of Google’s reasonable control; or (ii) that resulted from Customer’s equipment or third party equipment, or both (not within the primary control of Google).

There is no such thing as standard SLA among cloud service providers. Uptime guarantee, service credits, and service exclusions clauses will vary from provider to provider.

Customer Responsibility

Customers should understand the SLA and communication methods (e.g., email, RSS feed, website URL with outage information) to stay informed on service outages. When possible, customers should use automated tools such as Nagios or Siteuptime.com to verify the availability of the SaaS service.

As of this writing, customers of a SaaS service have a limited number of options to support availability management. Hence, customers should seek to understand the availability management factors, including the SLA of the service, and clarify with the CSP any gaps in SLA exclusions and service credits when disruptions occur. In a recently published white paper by the U.S.-based Software & Information Industry Association (SIIA), the efficacy of SaaS SLAs was analyzed in the context of software vendors moving to a SaaS delivery model. The paper concluded that certain elements are necessary to make the SLA an effective document, and states that:

Communication and clear expectations are required from both the service provider and their customers to identify what is important and realistic with respect to standards and expectations.

Customers of cloud services should note that a multitenant service delivery model is usually designed with a “one size fits all” operating principle, which means CSPs typically offer a standard SLA for all customers. Thus, CSPs may not be amenable to providing custom SLAs if the standard SLA does not meet your service-level requirements. However, if you are a medium or large enterprise with a sizable budget, a custom SLA may still be feasible.

Since most SaaS providers use virtualization technologies to deliver a multitenant service, customers should also understand how resource democratization occurs within the CSP to best predict the likelihood of system availability and performance during business fluctuations. If the resources (network, CPU, memory, storage) are not allocated in a fair manner across the tenants to perform the workload, it is conceivable that a highly demanding tenant may starve other tenants, which can result in lower service levels or poor user experience.

This article is from: