Who's a SOAsaurus?

Image of a dead dinosaur

I told you I was ill.

The phrase “Don’t be a SOAsaurus” is being bandied about on twitter and the like and it got me thinking about using that particular analogy to describe SOA Web Services practises and contrast them with the clever little RESTful API Service mammals that maybe saw off the big, ugly lizards.

Before getting into computing I did spend some time in Geology so I’m coming at this argument from a slightly odd standpoint. For any Geologists reading I was structural, ophioites and terrain docking. We used to look down on this palaeontology stuff and everyone looked down on the geophysicists.

To recap the dinosaurs then. We know them from the fossil record. To become a fossil is a one in a bazillion chance so there have to be a lot of you about. By definition any fossil you find must have come from a wildly successful species. So SOAsaurus must be some form of compliment. On top of that, dinosaurs (as we commonly refer to them) lasted over one hundred million years, dominated the land, sea and sky AND gave birth to the mammals. By that analogy REST wouldn’t be here but for SOA. In the same way SOA has had it’s time in the press and still continues to have it’s time in enterprise. Fully half of the enquiries Gartner specialists like Paolo Malinverno get are from people working on SOA, installing service based architectures XML and developing new services.

The analogy extends to our RESTful mammals as well. At night they had the advantage of heat to go out scavenging dinosaurs and stealing eggs. In the same way I see technologies scavenged from SOA; sledgehammer to crack a nut UDDI re-emerges as API portal, WSDL starts to emerge as WADL. Vendors see that the wheel is being reinvented so technologies like service and security gateways extend their functionality to encompass both worlds.

When the dinosaurs did go it had taken the combined effects of millennia of climate change from the volcanic eruptions forming a decent portion of what is now India plus the impact of a meteorite big enough to form a 200km wide crater. That was big enough to wipe out two thirds of all species on the planet. It reminds me of a major UK banking group I’ve worked with whose mainframe still ran on token ring and used a protocol older than I am! Big, successful technologies are hard to kill off for several reasons, primarily they work for the frame of reference they were put in for. We’ll be living with SOA practices running organisations for many decades yet.

But maybe I got the analogy wrong. REST isn’t the mammal after all and the dinosaurs never died out. They survived by ditching the weight and becoming more agile with less in the way of teeth. In the same way its not hard to imagine why developers want to get rid of the stack of J2EE, XML, SOAP, WS-ReliableMessaging, WS-PolicyForSecureReliablePolicyIdentityFederationPolicy…. REST represents the birds and one’s about to crap on your shoulder sometime soon.

I think I agree with SOAsaurus, I like the term. SOA gets to live as Argentinosaurus, Compsognathus or Protoceratops. REST could be Albatross, Turkey or Penguin (okay, now I’m poking fun). As Archaeopteryx was no doubt fond of saying; there’s room for a bit of both.

Of course this is all fine discussing the mammals and the dinosaurs but its the bacteria that use us and let us live in the end. Any suggestion as to the computing equivalent?

Whether you’re a SOAsaurus seeking SOA governance, are looking to evolve using REST/SOA mediation, or are already walking upright with REST and need to manage APIs, we’ve got you covered.

Follow me @PeteL0gan uk.linkedin.com/in/petelogan/.

Elastic Scaling of APIs in the Cloud

As an Enterprise Architect for Intel IT, I worked with IT Engineering and our Software and Services group on the elastic scaling of the APIs that power the Intel AppUp® center. Our goal was to scale our APIs to at least 10x our baseline capacity (measured in transactions per second) by moving them to our private cloud, and ultimately to be able to connect to a public cloud provider for additional availability and scalability. Here’s a quick set of practices we used to achieve our goal:

  1. Virtualize everything.  This may seem obvious and is probably a no-op for new APIs, but in our case we were using a bare-metal installs at our gateway and database layers (the API servers themselves were already running as VMs). While our gateway hardware appliance had very good scalability, we knew we were ultimately targeting the public cloud and that our need for dynamic scaling could exceed our ability to add new physical servers.  Using a gateway that scales in pure software virtual machines without the need for special purpose-built hardware helped us achieve our goal here.
  2. Instrument everything.  We needed to be able to correlate leading indicators like transactions per second to system load at each layer so we could begin to identify bottlenecks. We also needed to characterize our workload for testing – understanding a real-world sequence of API methods and mix/ordering of reads and writes. This allowed us to create a viable set of load tests.
  3. Identify bottlenecks.  We used Apache jmeter to generate load and identify points where latency became an issue, correlating that against system loads to find out where we had reached saturation and needed to scale.
  4. Define a scaling unit.  In our case, we were using dedicated DB instances rather than database-as-a-service, so we decided to scale all three layers together. We identified how many API servers would saturate the DB layer, and how many gateways we would need to manage the traffic. We then defined a collection of VMs that would provision all of these VMs together. We might have scaled each layer independently had our API been architected differently, or if we were building from scratch on database-as-a-service.

    Example collection for elastic scaling

  5. Repeat. The above let us scale from 1x to about 5x or 6x without any problem. However, when we hit 6x scaling we discovered that a new bottleneck: the overhead of replicating commits across the database instances. We went back to the drawing board and redesigned the back end for eventual consistency so we could reduce database load.
  6. Automate everything.  We use Nagios and Puppetto monitor and respond to health changes. A new scaling unit is provisioned when we hit predefined performance thresholds.

    Automation/Orchestration workflow

  7. Don’t forget to test scaling down.  If you set a threshold for removing capacity, it’s important to make sure that your workflow allows for a graceful shutdown and doesn’t impact calls that are in progress.

The above approach got us to 10x our initial capacity in a single data center. Because of some of our architecture decisions (coarse-grained scaling units and eventual consistency) we were then able to add a GLB and scale out to multiple data centers – first to another internal private cloud and then to a public cloud provider.

What's in a Composite API Platform?

Intel recently released what we call a composite API platform with our new API Manager product. What exactly do we mean by this?

A composite platform is a single platform for API management that handles both Public (sometimes called “Open”) APIs and Enterprise APIs. It’s composite because it exhibits both the cost savings of “cloud” through a multi-tenant SaaS partner portal coupled with the control of on-premises gateway for traffic management. Like a composite material, the mingling of two or more constituents gives the final solution different properties not found in either alone.

For a public or open API it’s important to have developers interact in a shared manner, generally done through a public SaaS partner management portal. True multi-tenant SaaS offerings gives the Enterprise cost advantages, as the partner management piece is akin to running a website for potentially thousands of developers.Running a successful website means people, resources, archival and a higher cost of ownership.

Further, Multi-tenant SaaS means developers may be using more than just your API as they may also be finding other APIs they are interested in advertised from other tenants. This is a good thing as these are the caliber of developers you want. After all, experienced developers can bring more to the table – they may even come up with an awesome app that mixes your data with a partner’s in a new way.

As flashy as the cloud is, not all Enterprises can risk complete movement to a public cloud environment, especially for security and compliance. The set of applications bound to the enterprise are sometimes called “gravity bound”, as they are part of an information system tied to a core business processes or cannot be outsourced due to compliance, privacy or security issues.

How do these applications gain the benefits of the API economy? What if you want to build an mobile app or partner app that interacts with a mainframe or legacy system? How do you ensure compliance for API traffic that involves sensitive information? What about security?

For these types of large scale environments, the Enterprise has good reasons to buy and own some of the components used to expose the API. Overall, the composite API platform really mixes the concepts of Public APIs and Enterprise APIs together.

All APIs are really Enterprise APIs, its the manner in which they are exposed and their purpose that labels then Public or “Enterprise”, but in reality they both support an Enterprise’s API strategy and we might argue that the most successful enterprises will actually have both.

An Anecdote: Is the Web Clunky?

I was at dinner with a friend who was considering enrolling in a survey class on client side web technologies. The course would cover things like JavaScript, Silverlight, HTML5, Adobe flash and the like. As she was talking, I was playing with my new Samsung Galaxy Note 2, which if you are not familiar, is somewhere between a traditional smartphone and a tablet. As a side note, that phone is pretty awesome in my book.

As she was talking about the course I gave her the phone and told her to look up the definition of a word, any word, first using the Internet and then next using an “App.” This is a simple task of course, but the experience of doing this using the web versus an app has an extremely high ‘clunkiness’ factor to it.

For the web experience, you press “Internet”, go to Google, search for “dictionary”, find one or two of the top ranked dictionary sites, wait for it to load, type in your word and find the answer amongst a panoply of side-rail ads. If you are a Google power user you can use the”definition” keyword, but not all users know about that.

For the the app experience, you open the dictionary app, put in the word and get the answer. Simple, smooth and fast. This experience is fueled by APIs, specifically an API call from the native app to the Internet or other back-end system providing the answer.

I told her, “Well, in that class you are considering, you are learning all of the technologies that enable the first experience.”

“Why would I do that?” She responded. “The first way seems so clunky.”

I considered responding with various arguments about how the web has fueled tremendous growth and the value of open standards for mark-up and a common syntax for universal resources – but then I thought of the cash-value the task at hand – getting the definition of a word, and while the web experience gets you the answer, the app experience gets the answer faster and with a better experience. When the task is well defined to a single purpose, the app shines.

The big question now is, will that native experience be restricted to devices or will it it spread to ultra-books and desktops? How much of that native app experience will spill to the larger computing devices?

Blake

Composite Distributed Applications and RESTful APIs

I was at Gartner Catalyst last week in San Diego for a luncheon keynote where I explored the concept of a composite distributed application. This is an idea that I have been chewing on for some time and is a direct result of how Enterprises are thinking about application architecture in light of “cloud” and “big data” as well as some of the trends we are seeing in our own customer base for Intel(R) Expressway Service Gateway.

First question: Where do Enterprise applications begin and end in 2012? Let’s state the obvious: the definition of an application as a monolithic piece of object code is ancient history. Let’s try the next definition, a standard n-tier shared nothing web application. This is certainly more timely, but I would also consider it dated.

If we add external cloud services, such as xPaaS (to use Gartner’s terminology) and disparate data warehouses or “big data”, located in geographically dispersed data-centers, this n-tier definition located in a single place doesn’t quite capture all of the application and may leave out important pieces. Key pieces of functionality may live “elsewhere” and this is where our standard enterprise application becomes distributed, with pieces in different physical locations as well as composite which means the inclusion of external xPaaS services such as storage, queuing, authentication or similar services.

So when we think about the larger boundaries of a composite distributed application, what are some salient properties? I came up with the following list for my talk:

Composite Distributed Application Properties

Hybridized – Includes new feature development as well as the integration of legacy code, which can be done by integrating legacy message or document formats and protocols. In other words, Enterprises don’t want to throw out existing functionality, even if it happens to be written in a different programming language

Location Independent– Important pieces of logic, persistence and functionality may be split across 1-n clouds, a mix of standard data center deployment, private cloud and public cloud. The application is essentially living across different clouds. All clouds can win.

Knowledge Complete –  As traditional enterprises emulate web companies with big data analytics and web intelligence, distributed applications must access the results of “Big Data” analytics, which are possibly owned by different factions in the Enterprise. The composite distributed application will need to aggregate results and make important predictions across these sources as well as include any relevant data warehouse and JDBC sources.

Contextual – Produces just-in-time results based on client context, device and identity. For example, the application I/O model must meet the demands of mobile devices, such as REST APIs, as well as internal enterprise stakeholders

Accessible & Performs – Produces data compatible with any client on any operating system, with minimal latency. Scales to hundreds of thousands of users where clients are a mix of smart phones, tablets, browsers, or devices.

Secure and Compliant – Meets compliance and security requirements for data in transit and data at rest, such as PCI, HIPAA and other requirements. This may involve a mix of traditional “coded-in” security,  security at the message level (via a proxy), standard transport level security, and data tokenization prior to analytics

Common Service Layer

A common theme of current Intel service gateway customers is the creation of a common service layer that unifies existing back-end services.What happens is that services grow organically on different platforms and operating systems, written in different languages but can be orchestrated under a common RESTful theme (for more background on REST fundamentals see DZone’s REST Reference Card paper). For instance, many of our customers have a mix of REST-style or SOAP web services and then use a gateway facade or layer to unify these. Unification, however, is only one of the requirements. The second requirement is external exposure to new clients and partners with appropriate performance, trust, threat, and increasingly, throttling/SLA features. Trending right now are OAuth and API key mechanisms, especially when the clients are expected to be mobile devices.

How does this architecture grow into a composite distributed application? This is where location can play a role: as enterprises adopt more cloud PaaS services, their existing services will grow beyond what is found in the data-center, to what is found outside the data-center.

For example, one large service provider that we work with uses Intel Expressway Service Gateway to create a facade for 50+ RESTful services. In the future as they adopt cloud, additional services may also be delivered from the cloud that fit under the facade,  so the RESTful facade and services together may all be properly called “the application”  – here the application is a mash-up of services split among clouds.

We call it “the application” because its all three pieces, the gateway, the internal services and the cloud services that comprises the pieces.  The next question here is how to secure these API interactions and ensure this new breed of application meets performance and compliance requirements.

I think the answer here is that you have to focus on the data itself sent and received at each API hop. This means more emphasis on tokenization and encryption, as well as an understanding of the relevant authentication and authorization controls and how they apply depending on who needs to access the data. For “Big Data” this may mean pre-processing map/reduce input to provide tokenization or encryption prior to performing analytics, essentially ensuring compliance prior to processing.

-Blake

How to Harden Your APIs by Andy Thurai

The market for APIs has experienced explosive growth in recent years, yet one of the major issues that providers still face is the protection and hardening of the APIs that they expose to users. In particular, when you are exposing APIs from a cloud based platform, this becomes very difficult to achieve given the various cloud provider constraints. In order to achieve this you would need a solution that can provide the hardening capabilities out of the box, but that still permits for customization of the granular settings to meet the solution need. Intel has such a solution and it has been well thought out. If this is something you desire this article might help you foresee the many uses and versatility.

Identify sensitive data and sensitivity of your API

The first step in protecting sensitive data is identifying it as such. This could be anything like PII, PHI and PCI data. Perform a complete analysis of your inbound and outbound data to your API, including all parameters, to figure this out.

Once identified, make sure only authorized people can access the data.

This will require solid identity, authentication, and authorization systems to be in place. These all can be provided by the same system. Your API should be able to identify multiple types of identities. In order to achieve an effective identity strategy, your system will need to accept identities of the older formats such as X.509, SAML, WS-Security as well as the newer breed of OAuth, Open ID, etc. In addition your identity systems must  mediate the identities, as an Identity Broker, so it can securely and efficiently relate these credentials to your API to consume.

You will need to have identity-based governance policies in place. These policies need to be enforced globally not just locally. Effectively this means you need to have predictable results that are reproducible regardless of where you deploy your policies. Once the user is identified and authenticated, then you can use that result to authorize the user based on not only that credential, but also based on the location where the invocation came from, time of the day, day of the week, etc. Furthermore, for highly sensitive systems the data or user can be classified as well. Top secret data can be accessed only by top classified credentials, etc. In order to build very effective policies and govern them at run time, you need to integrate with a mature policy decision engine. It can be either standard based, such as XACML, or integrated with an existing legacy system provider

Protect Data

Protect your data as if your business depends on it, as it often does, or should. Make sure that the sensitive data, whether in transit or at rest (storage), is not in an unprotected original format. While there are multiple ways the data can be protected, the most common ones are encryption or tokenization. In the case of encryption, the data will be encrypted, so only authorized systems can decrypt the data back to its original form. This will allow the data to circulate encrypted and decrypt as necessary along the way by secured steps. While this is a good solution for many companies you need be careful about the encryption standard you choose, your key management and key rotation policies. The other standard “tokenization” is based on the fact you can’t steal what is not there. You can basically tokenize anything from PCI, PII or PHI information. The original data is stored in a secure vault and a token (or pointer, representing the data) will be sent in transit down stream. The advantage is that if any unauthorized party gets hold of the token, they wouldn’t know where to go to get the original data, let alone have access to the original data. Even if they do know where the token data is located, they are not white listed, so the original data is not available to them. The greatest advantage with tokenization systems is that it reduces the exposure scope throughout your enterprise, as you have eliminated vulnerabilities throughout the system by eliminating the sensitive and critical data from the stream thereby centralizing your focus and security upon the stationary token vault rather than active, dynamic and pliable data streams.. While you’re at it, you might want to consider a mechanism, such as DLP, which is highly effective in monitoring for sensitive data leakage. This process can automatically tokenize or encrypt the sensitive data that is going out. You might also want to consider policy based information traffic control. While certain groups of people may be allowed to communicate certain information (such as company financials by an auditor,etc) the groups may not be allowed to send that information. You can also enforce that by a location based invocation (ie. intranet users vs. mobile users who are allowed to get certain information).

QOS

While APIs exposed in the cloud can let you get away with scalability from a expansion or a burst during peak hours, it is still a good architectural design principle to make sure that you limit or rate access to your API. This is especially valuable  if you are offering an open API and exposure to anyone, which is an important and valuable factor. There are 2 sides to this; a business side and a technical side. The technical side will allow your APIs to be consumed in a controlled way and the business side will let you negotiate better SLA contracts based on usage model you have handy. You also need to have a flexible throttling mechanism that can help you implement this more efficiently such as just notify, throttle the excessive traffic, shape the traffic by holding the messages until the next sampling period starts, etc. In addition, there should be a mechanism to monitor and manage traffic both for long term and for short term which can be based on 2 different policies.

Protect your API

The attacks or misuse of  your publicly exposed API can be intentional or accidental. Either way you can’t afford for anyone to bring your API down. You need to have application aware firewalls that can look into the application level messages and prevent attacks. Generally the application attacks tend to fall under Injection attacks (SQL Injection, Xpath injection, etc), Script attacks, or attack on the Infrastructure itself.

Message Security

You also need to provide both transport level and message level security features. While transport security features such as SSL, TSL provide some data privacy you need to have an option to encrypt/ sign message traffic, so it will reach the end systems safely and securely and can authenticate the end user who sent the message.

Imagine if you can provide all of the above in one package. Just take it out of the packaging, power it up, and with a few configuration steps provide most of what we have discussed above?  More importantly in a matter of hours you’ve hardened your API to your enterprise level (or in some cases better than that). Intel has such a solution to offer.

Check out our Intel API gateway solution which offers all of those hardening features, in one package and a whole lot more. Feel free to reach out to me if you have any questions or need additional info.

http://cloudsecurity.intel.com/solutions/cloud-service-brokerage-api-resource-center

 

Andy Thurai — Chief Architect & CTO, Application Security and Identity Products, Intel

Andy Thurai is Chief Architect and CTO of Application Security and Identity Products with Intel, where he is responsible for architecting SOA, Cloud, Governance, Security, and Identity solutions for their major corporate customers. In his role, he is responsible for helping Intel/McAfee field sales, technical teams and customer executives. Prior to this role, he has held technology architecture leadership and executive positions with L-1 Identity Solutions, IBM (Datapower), BMC, CSC, and Nortel. His interests and expertise include Cloud, SOA, identity management, security, governance, and SaaS. He holds a degree in Electrical and Electronics engineering and has over 20+ years of IT experience.

He blogs regularly at www.thurai.net/securityblog on Security, SOA, Identity, Governance and Cloud topics. You can find him on LinkedIn

Next Gen Enterprise API Architecture for Mobile

The Enterprise software industry has grown up around the standard three tier-architecture for web applications, which was pioneered circa 1995. This architecture is ideal for web browsers, which have become the universal client of the Enterprise.

With the introduction of Enterprise mobile applications, we are seeing new avenues for innovation, new user experiences and increased convenience. In some ways, however, we are rolling back the clock.

Allow me to clarify: If we accept the premise that native mobile applications deliver the best functionality on disparate mobile platforms, we are at the cusp of re-introducing “thick client” applications back into the enterprise. Native mobile applications are rich in their design and functionality but behave like monolithic applications: They provide their own persistence tier, slick user-interfaces, natively compiled code, require upgrades and updates on the client device, and utilize a mix of synchronous and asynchronous communication. Sure they use REST for communication, but is this due to historical accident?

Other than the physical platform itself (which is a smartphone or tablet), native mobile applications may have more in common with old “Win32 client/server apps” that existed before the browser revolution. Are we moving forwards or backwards?

Further, what about web mobile applications that run in the browser on the mobile device? How do they factor in? How do new technologies like HTML5 affect these types of applications? How do REST APIs affect the mobile architecture?

Is the Enterprise ready for mobile? How does the standard three tier architecture fare in light of these trends?

I try to get a handle on these issues in our new whitepaper, A Unified Mobile Architecture for the Modern Data Center

Happy Reading,

Blake