Mark Russinovich: How Microsoft is building its cloud future

In an exclusive interview, Mark Russinovich opens the hood of Windows Azure and discusses how IT should prepare for its inevitable cloud transition

1 2 3 4 Page 2
Page 2 of 4

InfoWorld: Right. The Azure fabric.

Russinovich: The Azure fabric. And that manages a pool of machines. And then there's an application front-end, a virtual machine deployment front-end we call RDFE -- Red Dog Front End. Red Dog is a carryover from Microsoft from Azure's code name.

Here's what happens when a customer deploys a PaaS application (what we call a Cloud Service, a collection of virtual machines) or when they deploy IaaS as virtual machines: It goes to RDFE, then RDFE finds a fabric controller that has, based on heuristics, the best utilization and capacity available for the deployment and gives the deployment to the fabric controller, which then goes and finds servers to deploy the virtual machines onto.

It uses a bunch of heuristics as well as constraint satisfaction to figure out which servers are the ones that the virtual machines should land on. We've got the concept of update domains and fault domains, so that when the infrastructure is being updated we don't take down the whole application. We split the application across different servers so that when we're servicing the infrastructure of the servers, it's only taking down a slice of the application.

InfoWorld: Is that through componentizing applications as well?

Russinovich: If you look, for example, at our PaaS platform, we've got two virtual machine object types called Worker and Web Role. They're really layered on top of virtual machines. So you would take a piece of code, you would say: I want to run it in my Web front-end from my application. The developer writes it, packages it up, and gives it to us.

Now, what we do is create a virtual machine and then stick that code in it. And that code is a stateless programming model, so anything that it writes to the local store on that server is treated as cache -- temporary, ephemeral storage. They would use an external durable store like Windows Azure Storage or Windows Azure Database to store its data. And then when they're in that programming model -- the PaaS program model -- a developer to scale out can simply say: I want ten of those, I want 100 of those. And then the fabric would go and scale that out to 100 virtual machines.

As it's scaling that out, you can request up to 20 update demands, which means we would spread you across at least 20 servers -- likely way more than that because of the way the allocation works. But then that means that when we update a server only one twentieth of your front-end will go down while it's being updated.

InfoWorld: So it's a total scale-out replicated infrastructure. That must have been a big computer science problem for you.

Russinovich: It still is. Actually, this is what's so exciting about being at Azure right now. When I joined Microsoft, I'd done a lot of Windows stuff before, but operating systems had already pretty much matured. I mean, Windows today in the internals isn't very different than 20 years ago, and Linux is the same way -- just like UNIX back in the '70s.

This cloud operating system, data center operating system, is brand new. So the problems are new, the algorithms are new, the computer science is new. How do you detect failures quickly? How do you respond to them? How do you best do resource allocation?

InfoWorld: That must be exciting. But at Microsoft, while you've been there, hasn't there been a sort of a religious change regarding Azure? At first, it was all about PaaS. Then 18 months ago it was "we're going to do IaaS after all."

Russinovich: Actually, to go back a step further when the project started, it was focused internally at building a platform for our own Microsoft Services. We're going to build new services on this thing so let's do PaaS, because that's the way to create great, scalable, highly available cloud services and we want to push developers inside Microsoft to do things the right way from the start.

Steve Ballmer then says: Hey, you know, this Azure thing -- we should actually make it public. The future is public cloud computing; customers can write their own services and deploy them on our infrastructure. Once we made it public we started to realize ... people have a ton of existing code.

This is where we started to run up against the app model that Azure launched with, which was pure .Net, partial trust only, which was no native code. People would say: I've got a native code library I want to use. How can I get that in? When the answer was no, they couldn't, they were like: OK, well, I can't move.

So we started to open up these things. You can do native code. You can have admin access in the virtual machine. One by one we relaxed these things to allow more existing code to come in. Well the main, primary requirement of existing server code is persistent storage, and so that's the big step function -- to go from new code written for the specific platform to running existing server code like a server database.

Because in the PaaS stateless model, yeah, you can install SQL in that thing and it's going to create a database and it's going to write data into it. But if that server fails, the virtual machine gets reincarnated on the next server and it's got amnesia -- the data is gone. So that's where we said the ultimate on-ramp to the platform is persistent disk, and that's what the world calls infrastructure-as-a-service. Then you're able to bring your own OS image.

1 2 3 4 Page 2
Page 2 of 4
How to choose a low-code development platform