Page 1

Euro o Clou du

CLOUDVIEWS 2010 CLOUD ECOSYSTEM PROCEEDINGS CLOUDVIEWS 2010 2ND CLOUD COMPUTING INTERNATIONAL CONFERENCE MAY 20-21 2010, PORTO, PORTUGAL


roruE oud olC


CloudViews 2010 Cloud Ecosystem PROCEEDINGS CloudViews 2010 2nd Cloud Computing International Conference MAY 20-21 2010, PORTO, PORTUGAL


Proceedings of the CloudViews 2010 2nd Cloud Computing International Conference May 20-21, 2010, Porto, Portugal

Editors Benedita Malheiro Miguel Leitão

Paulo Calçada Pedro Assis

Program Committee António Costa (ISEP)

António Pinto (ESTGF)

Benedita Malheiro (ISEP) Miguel Leitão (ISEP) Inês Dutra (UP)

Paulo Calçada (IPP)

Ricardo Costa (ESTGF) Sérgio Lopes (IPP) Hugo Magalhães

Juan Burguillo (UVigo)

Design: antoniocruz.eu

Print: Minerva, Artes Gráficas ISBN: 978-989-96985-0-5 Deposito Legal: Copyright © 2010 EuroCloud Portugal Copying is permitted provided that copies are not made or distributed for direct commercial advantage, and credit to the source is given. Abstracting is permitted with credit to the source. Contact the editor or the publisher for other uses.

A publication of EuroCloud Portugal Association Campus ISEP

Rua Dr. António Bernardino de Almeida, 431 4200-072 Porto Portugal


Supporters


Proceedings special sponsor

PARTNERS


Contents VI Authors

VII Message from Editors

1

How Hardware Virtualization Works

Gregory Pfister

19 Using Private Clouds to Increase Service Availability Levels and Reduce Operational Costs Narciso Monteiro

29 CERN's Virtual Batch Farm Tony Cass, Sebastien Goasguen, Belmiro Moreira, Ewan Roche, Ulrich Schwickerath, Romain Wartel

41 GRID, PaaS for e-science

J. Gomes, G. Borges, M. David

49 Privacy for Google Docs: Implementing a Transparent Encryption Layer Lilian Adkinson-Orellana, Daniel A. Rodríguez-Silva, Felipe Gil-Castiñeira, Juan C. Burguillo-Rial

57 Web based collaborative editor for LATEX documents Fábio Costa, António Pinto

63 Managing Cloud Frameworks through Mainstream and Emerging NSM Platforms

Pedro Assis


2nd Cloud Computing International Conference

Authors António Pinto

Juan C. Burguillo-Rial

CIICESI, Escola Superior de Tecnologia e Gestão de Felgueiras

Engineering Telematics Department, Universidade de Vigo

Politécnico do Porto

C/ Maxwell, s/n, Campus Universitario de Vigo, 36310, Vigo, Spain

INESC Porto, Portugal

jrial@det.uvigo.es

apinto@inescporto.pt

Belmiro Moreira

Lilian Adkinson-Orellana R&D Centre in Advanced Telecommunications

European Organization for Nuclear Research

Lagoas-Marcosende s/n, 36310, Vigo, Spain

CERN CH-1211, Geneva, Swizterland

ladkinson@gradiant.org

Daniel A. Rodríguez-Silva

M. David

R&D Centre in Advanced Telecommunications

Laboratório de Instrumentação em Física Experimental de Partículas

Lagoas-Marcosende s/n, 36310, Vigo, Spain

Lisboa, Portugal

darguez@gradiant.org

Ewan Roche European Organization for Nuclear Research

Narciso Monteiro Porto, Portugal narciso.monteiro@iol.pt http://narcisomonteiro.gamagt.com

CERN CH-1211, Geneva, Swizterland

Fábio Costa

Pedro Assis School of Engineering

CIICESI, Escola Superior de Tecnologia e Gestão de Felgueiras

Porto Polytechnic Institute

Politécnico do Porto

Portugal

8050148@estgf.ipp.pt

pfa@isep.ipp.pt

Felipe Gil-Castiñeira

Romain Wartel

Engineering Telematics Department, Universidade de Vigo

European Organization for Nuclear Research

C/ Maxwell, s/n, Campus Universitario de Vigo, 36310, Vigo, Spain

CERN CH-1211, Geneva, Swizterland

xil@det.uvigo.es

G. Borges

Sebastien Goasguen European Organization for Nuclear Research

Laboratório de Instrumentação em Física Experimental de Partículas

CERN CH-1211, Geneva, Swizterland

Lisboa, Portugal

Clemson University Clemson, SC 29634, USA

Gregory Pfister Research Professor, Colorado State University, USA

sebgoa@clemson.edu

IBM Distinguished Engineer (retired)

Tony Cass

greg.pfister@gmail.com

European Organization for Nuclear Research

http://perilsofparallel.blogspot.com

CERN CH-1211, Geneva, Swizterland

J. Gomes

Ulrich Schwickerath

Laboratório de Instrumentação em Física Experimental de Partículas

European Organization for Nuclear Research

Lisboa, Portugal

CERN CH-1211, Geneva, Swizterland

jorge@lip.pt

› vi


PROCEEDINGS

Message from Editors

The motto of 2nd Cloud Computing International Conference, which took place at the Instituto Superior de Engenharia do Porto on the 20th and 21st of May 2010, was the development of a seamless cloud ecosystem. It was organised by EuroCloud Portugal Association as an open forum for the exchange of knowledge, ideas and technology, contributing and promoting to the materialization of a true cloud ecosystem. The conference included several activities involving companies, researchers, managers and students and was organised into two main streams addressing, respectively, the state of the art (main event) and the new challenges and ideas (Business Opportunity Forum). The main event hosted the presentation of the current scientific, technical and commercial solutions by some of the most relevant Cloud Computing players from the industry – CISCO, IBM, SalesForce, EMC2, Microsoft, Novell, PHC, Primavera, Galileu, etc. – and from academia – Open Nebula, CERN, LIP, CMU Portugal, etc. The Business Opportunity Forum held the presentation and debate of new business ideas and commercial products as well as scientific proposals and prototypes, promoting the exchange of knowledge and fostering cooperation between academia and industry. The best commercial idea/product was elected by an invited panel of experts. This book gathers a group of selected papers presented at the 2nd Cloud Computing International Conference. They cover distinct relevant cloud ecosystem aspects such as hardware virtualization ["How Hardware Virtualization Works", G. Pfister], service availability ["Using Private Clouds to Increase Service Availability Levels and Reduce Operational Costs", N. Monteiro], cloud infrastructure development ["CERN's Virtual Batch Farm", T. Cass et al.], evolution and complementarities between the Grid and the Cloud paradigms ["GRID, PaaS for e-science", J. Gomes et al.], privacy ["Privacy for Google Docs: Implementing a Transparent Encryption Layer", L. Adkinson-Orellana et al.], collaboration ["Web based collaborative editor for LaTeX documents", F. Costa et al.] and management ["Managing Cloud Frameworks through Mainstream and Emerging NSM Platforms", P. Assis]. The Editors would like to express their gratitude to the authors for their valuable contribution to the conference and to this publication as well as to the participants of the 2nd Cloud Computing International Conference for the lively and interesting discussions provided. Porto, May 2010 EuroCloud Portugal Association

vii ‹


PROCEEDINGS

How Hardware Virtualization Works Gregory Pfister Research Professor, Colorado State University, USA IBM Distinguished Engineer (retired) greg.pfister@gmail.com http://perilsofparallel.blogspot.com

Abstract. Zero. Zilch. Nada. Nothing. Rien. That’s the best approximation to the intrinsic overhead for computer hardware virtualization, with the most modern hardware and adequate resources. Judging from comments and discussions I’ve seen, there are many people who don’t understand this. So I’ll try to explain here how this trick is pulled off. Keywords: Virtualization, Cloud Computing

1

Virtualization and Cloud Computing

Virtualization is not a mathematical prerequisite for cloud computing; there are cloud providers who do serve up whole physical servers on demand. However, it is very common, for two reasons: − First, it is an economic requirement. Cloud installations without virtualization are like corporate IT shops prior to virtualization; there, the average utilization of commodity and RISC/UNIX servers is about 12%. (While this seems insanely low, there is a lot of data supporting that number.) If a cloud provider could only hope for 12% utilization at best, when all servers were used, the provider will have to charge a price well above competitors who do not have that disadvantage. − Second, it is a management requirement. One of the key things virtualization does is reduce a running computer system to a big bag of bits, which can then be treated like any other bag o’ bits. Examples: It can be filed, or archived; it can be restarted after being filed or archived; it can be moved to a different physical machine; and it can be used as a template to make clones, additional instances of the same running system, thus directly supporting one of the key features of cloud computing: elasticity, expansion on demand. Notice that I claimed the above advantages for virtualization in general, not just the hardware virtualization that creates a virtual computer. Virtual computers, or “virtual machines,” are used by Amazon AWS and other providers of Infrastructure as a Service (IaaS); they lease you your own complete virtual computers, on which you can load and run essentially anything you want. 1‹


2nd Cloud Computing International Conference

In contrast, systems like Google App Engine and Microsoft Azure provide you with complete, isolated, virtual programming platform – Platform as a Service (PaaS). This removes some of the pain of use, like licensing, configuring and maintaining your own copy of an operating system, database system, and so on. However, it restricts you to using their platform, with their choice of programming languages and services. In addition, there are virtualization technologies that target a point intermediate between IaaS and PaaS, such as the containers implemented in Oracle Solaris, or the WPARs of IBM AIX. These provide independent virtual copies of the operating system within one actual instantiation of the operating system. The advantages of virtualization apply to all the variations discussed above. And if you feel like stretching your brain, imagine using all of them at the same time. It’s perfectly possible: .NET running within a container running on a virtual machine. Here, however, I will only be discussing hardware virtualization, the implementation of virtual machines as done by VMware and many others. Also, within that area, I am only going to touch lightly on virtualization of input/output functions, primarily to keep this article a reasonable length. So, on we go to the techniques used to virtualize processors and memory.

2

The Goal

The goal of hardware virtualization is to maintain, for all the code running in a virtual machine, the illusion that it is running on its own, private, stand-alone piece of hardware. What a provider is giving you is a lease on your own private computer, after all. “All code” includes all applications, all middleware like databases or LAMP stacks, and crucially, your own operating system –including the ability to run different operating systems, like Windows and Linux, on the same hardware, simultaneously. Hence: Isolation of virtual machines from each other is key. Each should think it still “owns” all of its own hardware. The result isn’t always precisely perfect. With sufficient diligence, operating system code can figure out that it isn’t running on bare metal. Usually, however, that is the case only when specific programming is done with the aim of finding that out.

3

Trap and Map

The basic technique used is often referred to as “trap and map.” Imagine you are a thread of computation in a virtual machine, running on a one processor of a multiprocessor that is also running other virtual machines.

›2


PROCEEDINGS

Fig. 1. Trap and Map So off you go, pounding away, directly executing instructions on your own procesproce sor, running directly on bare hardware. There is no simulation or, at this point, softsof ware of any kind involved in what you are doing; you manipulate the real physical registers, use the real physical adders, floating-point floating point units, cache, and so on. You are Poundingoncache, playingwithrunning asfastas thehardwarewillgo. Fastasyoucan. Fas playingwit pointers, keepinghardwawrepipelinesfull, until… BAM! You attempt to execute an instruction that would change the state of the physical machine in a way that would be visible to other virtual machines. (See the Figure igure 1.) Just altering the value in your own register file doesn’t do that, and neither does, for example, writing into your own section of memory. That’s why you can do such things at full-bore bore hardware speed. Suppose, however, you attempt to do something like set the real-time time clock – the one master real time clock for the whole physical machine. Having that clock altered out from under other running virtual machines would not be very good at all for their health. You aren’t allowed to do things like that. So, BAM, you trap. You are wrenched out of user mode, or out of supervisor mode, up into a new higher privilege mode; call it hypervisor mode. There, the hyperhype visor looks at what you wanted to do – change the real-time clock -- and looks in a bag of bits it keeps eeps that holds the description of your virtual machine. In particular, it grabs the value showing the offset between the hardware real time clock and your real time clock, alters that offset appropriately, returns the appropriate settings to you, and givess you back control. Then you start runningasfastasyoucan again. If you later read the real-time time clock, the analogous sequence happens, adding that stored offset to the clock. value in the hardware real-time re Not every such operation is as simple as computing computing an offset, of course. For examexa ple, a client virtual machine’s supervisor attempting to manipulate its virtual memory

3‹


2nd Cloud Computing International Conference

mapping is a rather more complicated case to deal with, a case that involves maintaining an additional layer of mapping (kept in the bag ‘o bits): A map from the hardware real memory space to the “virtually real” memory space seen by the client virtual machine. All the mappings involved can be, and are, ultimately collapsed into a single mapping step; so execution directly uses the hardware that performs virtual memory mapping.

4

Concerning Efficiency

How often do you BAM? Unhelpfully, this is clearly application dependent. But the answer in practice, setting aside input/output for the moment, is not often at all. It’s usually a small fraction of the total time spent in the supervisor, which itself is usually a small fraction of the total run time. As a coarse guide, think in terms of overhead that is well less than 5%, or in other words, for most purposes, negligible. Programs that are IO intensive can see substantially higher numbers, though, unless you have access to the very latest in hardware virtualization support; then it’s negligible again. A little more about that later. I originally asked you to imagine you were a thread running on one processor of a multiprocessor. What happens when this isn’t the case? You could be running on a uniprocessor, or, as is commonly the case, there could be more virtual machines than physical processors or processor hardware threads. For such cases, hypervisors implement a time-slicing scheduler that switches among the virtual machine clients. It’s usually not as complex as schedulers in modern operating systems, but it suffices. This might be pointed to as a source of overhead: You’re only getting a fraction of the whole machine! But assuming we’re talking about a commercial server, you were only using 12% or so of it anyway, so that’s not a problem. A more serious problem arises when you have less real memory than all the machines need; virtualization does not reduce aggregate memory requirements. But with enough memory, many virtual machines can be hosted on a single physical system with negligible degradation.

5

Translate, Trap and Map

The technique described above depends crucially on a hardware feature: The hardware must be able to trap on every instruction that could affect other virtual machines. Prior to the introduction of Intel’s and AMD’s specific additional hardware virtualization support, that was not true. For example, setting the real time clock was, in fact, not a trappable instruction. It wasn’t even restricted to supervisors. (Note, not all Intel processors have virtualization support today; this is apparently a done to segment the market.) Yet VMware and others did provide, and continue to provide, hardware virtualization on such older systems. How? By using a load-time binary scan and patch. (See Figure 2.) Whenever a section of memory was marked executable – making that marking was, thankfully, trap-able – the hypervisor would immediately scan the ›4


PROCEEDINGS

Fig. 2. Translate, Trap and Map executable ecutable binary for troublesome instructions and replace each one with a trap ini struction. In addition, of course, it augmented the bag ‘o bits for that virtual machine with information saying what each of those traps was supposed to do originally. originally Now, many software companies are not fond of the idea of someone else modifymodif ing their shipped binaries, and can even get sticky about things like support if that is done. Also, my personal reaction is that this is a horrendous kluge. But is a necessary kluge, needed to get around hardware deficiencies, and it has proven to work well in thousands, if not millions, of installations. Thankfully, it is not necessary on more recent hardware releases.

6

Paravirtualization

Whether ther or not the hardware traps all the right things, there is still unavoidable overove head in hardware virtualization. For example, think back to my prior comments about dealing with virtual memory. You can imagine the complex hoops a hypervisor must repeatedly dly jump through when the operating system in a client machine is setting up its memory map at application startup, or adjusting the working sets of applications by manipulating its map of virtual memory. One way around overhead like that is to take a long, long, hard look at how prevalent you expect virtualization to be, and seriously ask: Is this operating system ever really going to run on bare metal? Or will it almost always run under a hypervisor? Some operating system development streams decided the answer to that question is: No bare metal. A hypervisor will always be there. Examples: Linux with the Xen hypervisor, IBM AIX, and of course the IBM mainframe operating system z/OS (no mainframe has been shipped without virtualization since the mid-1980s). mid

5‚


2nd Cloud Computing International Conference

Fig. 3. Paravirtualization If that’s at’s the case, things can be more efficient. If you know a hypervisor is always really behind memory mapping, for example, provide an actual call to the hypervisor to do things that have substantial overhead. For example: Don’t do your own memory mapping, just ask the hypervisor for a new page of memory when you need it. Don’t set the real-time time clock yourself, tell the hypervisor directly to do it. (See Figure igure 3.) This kind of technique has become known as paravirtualization, and can lower the overhead of virtualization significantly. A set of “para-APIs” “para APIs” invoking the hypervisor directly has even been standardized, and is available in Xen, VMware, and other hypervisors. The concept of paravirtualizatin actually dates back to around around 1973 and the VM operating system developed in the IBM Cambridge Science Center. They had the notnot unreasonable notion that the right way to build a time-sharing time sharing system was to give every user his or her own virtual machine, a notion somewhat like today’s today’ virtual desktop systems. The operating system run in each of those VMs used paravirtualizaparavirtualiz tion, but it wasn’t called that back in the Computer Jurassic. Virtualization is, in computer industry terms, a truly ancient art.

7

Drown It In Silicon

In the previous us discussion I might have lead you to believe that paravirtualization is widely used in mainframes (IBM zSeries and clones). Sorry. That turns out not to be the case. Another method is used. Consider the example of reading the real time clock. All that has has to happen is that a silly little offset is added. It is perfectly possible to build hardware that adds an offset all by itself, without any “help” from software. So that’s what they did. (See Figure F 4.) They embedded nearly the whole shooting match directly directly into silicon. This implies that the bag ‘o bits I’ve been glibly referring to becomes part of the hardware archiarch

›6


PROCEEDINGS

Fig. 4 Do it all in Hardware tecture: ture: Now it’s hardware that has to reach in and know where the clock offset rer sides. What happens with the memory mapping gets, to me anyway, anyway, a tad scary in its complexity. But, of course, it can be made to work. work Nobody else is willing to invest a pound or so of silicon into doing this. Yet. As Moore’s Law keeps providing us with more and more transistors, perhaps at some point the industry will tire of providing even more cores, and spend some of those transistors on something that might actually be immediately usable.

8

A Bit About Input and Output

One reason for all this mainframe talk is that it provides an an existence proof: MainMai frames have been virtualizing IO basically forever, allowing different virtual mam chines to think they completely own their own IO devices when in fact they’re shared. And, of course, it is strongly supported in yet more hardware. A virtual virtual machine can issue an IO operation, have it directed to its address for an IO device (which may not be the “real” address), get the operation performed, and receive a completion interinte rupt, or an error, all without involving a hypervisor, at full hardware hardware efficiency. So it can be done. But until very recently, it could not be readily done with PCI and PCIe (PCI ExE press) IO. Both the IO interface and the IO devices need hardware support for this to work. As a result, IO operations have for commodity and RISC systems been done interpretively, by the hypervisor. This obviously increases overhead significantly. Paravirtualization can clearly help here: Just ask the hypervisor to go do the IO directdirec ly. However, even with paravirtualization this requires the hypervisor to have its own IO driver set, separate from that of the guest operating systems. This is a redundancy that adds significant bulk to a hypervisor and isn’t as reliable as one would like, for the simple reason that no IO driver is ever as reliable reliable as one would like. And reliabilireliabil ty is very strongly desired in a hypervisor. Errors within it can bring down all the guest systems running under them. Another thing that can help is direct assignment of devices to guest systems. This gives a guest virtuall machine sole ownership of a physical device. Together with 7‹


2nd Cloud Computing International Conference

hardware support that maps and isolates IO addresses, so a virtual machine can only access the devices it owns, this provides full speed operation using the guest operating system drivers, with no hypervisor involvement. However, it means you do need dedicated devices for each virtual machine, something that clearly inhibits scaling: Imagine 15 virtual servers, all wanting their own physical network card. This support is also not an industry standard. What we want is some way for a single device to act like multiple virtual devices. Enter the PCI SIG. It has recently released a collection – yes, a collection – of specifications to deal with this issue. I’m not going to attempt to cover them all here. The net effect, however, is that they allow industry-standard creation of IO devices with internal logic that makes them appear as if they are several, separate, “virtual” devices (the SR-IOV and MR-IOV specifications); and add features supporting that concept, such as multiple different IO addresses for each device. A key point here is that this requires support by the IO device vendors. It cannot be done just by a purveyor of servers and server chipsets. So its adoption will be gated by how soon those vendors roll this technology out, how good a job they do, and how much of a premium they choose to charge for it. I am not especially sanguine about this. We have done too good a job beating a low cost mantra into too many IO vendors for them to be ready to jump on anything like this, which increases cost without directly improving their marketing numbers (GBs stored, bandwidth, etc.).

9

Conclusion

There is a joke, or a deep truth, expressed by the computer pioneer David Wheeler, co-inventor of the subroutine, as “All problems in computer science can be solved by another level of indirection.” Virtualization is not going to prove that false. It is effectively a layer of indirection or abstraction added between physical hardware and the systems running on it. By providing that layer, virtualization enables a collection of benefits that were recognized long ago, benefits that are now being exploited by cloud computing. In fact, virtualization is so often embedded in cloud computing discussions that many have argued, vehemently, that without virtualization you do not have cloud computing. As explained previously, I don’t agree with that statement, especially when “virtualization” is used to mean “hardware virtualization,” as it usually is. However, there is no denying that the technology of virtualization makes cloud computing tremendously more economic and manageable. Virtualization is not magic. It is not even all that complicated in its essence. (Of course its details, like the details of nearly anything, can be mind-boggling.) And despite what might first appear to be the case, it is also efficient; resources are not wasted by using it. There is still a hole to plug in IO virtualization, but solutions there are developing gradually if not necessarily expeditiously. There are many other aspects of this topic that have not been touched on here, such as where the hypervisor actually resides (on the bare metal? Inside an operating system?), the role virtualization can play when migrating between hardware architec-

›8


PROCEEDINGS

tures, and the deep relationship that can, and will, exist between virtualization and security. But hopefully this discussion has provided enough background to enable some of you to cut through the marketing hype and the thicket of details that usually accompany most discussions of this topic. Good luck.

9‚


PROCEEDINGS

Using Private Clouds to Increase Service Availability Levels and Reduce Operational Costs Narciso Monteiro Porto, Portugal narciso.monteiro@iol.pt http://narcisomonteiro.gamagt.com

Abstract. In an era marked by the so-called technological revolution, where computing power has become a critical production factor, just like manpower was in the industrial revolution, very quickly this dispersal of computing power for numerous facilities and sites became unsustainable, both economically as in terms of maintenance and management. This constringency led to the need of adopting new strategies for the provision of information systems equally capable but in a more consolidated manner, while also increasing the desired levels of the classic set of information assurances: availability, confidentiality, authenticity and integrity. At this point enters a topic not as new as it may seem, virtualization, which after two decades of oblivion presents itself as the best candidate to solve the problem technology faces. Throughout this document, it will be explained what this technology represents and what benefits are obtained from the evolution of a classical model of systems architecture into a model based in optimization, high availability, redundancy and consolidation of information communications and technology using the virtualization of networks and systems. A number of good principles to adopt will also be taken into account to ensure the correct deployment of the technology, culminating with the analysis of a case study from the Metro do Porto, SA company. Keywords: Virtualization, High Availability, Private Cloud

1

Introduction

The term virtualization is, in common sense, seen as an emerging and revolutionary technology whose existence arises in recent years as a capability never seen before and only made possible by the technological revolution of the 2000s. However, this concept was first launched in the '60s by Christopher Strachey, an Oxford University professor, with the aim of creating a way to take advantage of computing by sharing time, resources, processes and peripherals. Following this theory, two computers considered essential landmarks in the history of virtualization development are born: The Atlas and project M44/44X by IBM. If by one hand Atlas pioneered in introducing the concept of virtual memory and paging techniques, the M44/44X project is actually the first physical equipment to run multiple virtual machines. The cost and

11 ‚


2nd Cloud Computing International Conference

size of the then current mainframes were IBM main incentives, which sought the best way to take advantage of the large investment in hardware, giving them the title of creating the "virtual machine" concept [1]. Interestingly, perhaps by a lack of vision or long-term planning, in the 80s and 90s the proliferation and the drastically reduction of the computer equipments cost, coupled with the birth of more capable and easy to use operating systems, leads to the abandon of the idea of sharing resources in favor of the massive distribution of computing. What the industry didn’t realize at the time were the operational costs of managing large physical size infrastructures: − Maintenance contracts − Logistics and power consumption − Complexity and diversity of the architecture and the consequent need for a large number of skilled personnel dedicated solely to maintenance tasks, often manual − Low usage of the infrastructure total capacities − Megalomaniac disaster recovery and high availability scenarios − Lack of scalability, leading to the continuous need for hardware investment, causing a snowball effect in the above factors Already in the 2000s, when the management of these infrastructures reached an unbearable and limited point, very similar to that which led IBM in the 60s (which should be noted that never abandoned the idea and continued research and development in this area [1]), virtualization has once again a word to say, much by the fault of a company called VMWare.

2

Virtualization Principles and State of the Art

Virtualization consists in the insertion of a software layer between the operating system and the underlying hardware, which implements an abstraction of that hardware and makes it available in a controlled manner to the upper layer. This layer is called the hypervisor, whose name derives from the original concept of supervisor, introduced primarily by Atlas, and represents a component created for managing system processes and provisioning of resources like memory and time sharing, which separates it from the component responsible for the effective execution of applications. This hypervisor layer can be of two types: Hypervisor type 1 or Hypervisor type 2. In the first case, the virtualization software acts as an operating system, allowing its execution directly on the hardware, being considered that virtual machines operate on the second layer above hardware (Fig. 1). Naturally, this solution has the most resemblance to a real machine and allows better performance of the guest operating systems.

› 12


PROCEEDINGS

Fig. 1. Hypervisor Type 1

On the other hand, Hypervisor 2 is unable to operate autonomously and requires another operating system to intermediate the communication between the virtualization software and the physical host, placing virtual machines on the third layer above hardware (Fig. 2). The introduction of an additional layer naturally reduces the smoothness and performance of the interaction between the host and guest systems but in counterpart becomes useful in testing and investigation scenarios, as it allows the intermediate operating system to capture and/or inject low level virtual machine instructions, typically for diagnosis and simulation purpose.

Fig. 2. Hypervisor Type 2

In both cases, the hypervisor function consists in making physical resources available to virtual machines in a transparent and secure manner, a sort of invisible mediator that aims in guaranteeing the virtual abstraction of hardware to each system and respective applications, which are totally unaware of the presence of a logical layer and act as if a whole physical machine was under their control. One of the virtues of this abstraction is the possibility to indicate the presence of a device or peripheral that actually doesn’t exist as well as the emulation of an interface type (for instance, the hard drive interface technology) that omits the true technological form the hardware has. Even so, there are known limitations to the type of controllers that can be emulated, especially at graphical level and with more specific or less common devices. Following the general growing demand and interest in virtualization, many manufacturers and solutions arise in the market, with many flavors and goals. At least 73 different developments have been identified, ranging from proprietary to the most common systems, including the world of workstations. For the virtualization of serv-

13 ‚


2nd Cloud Computing International Conference

ers with a high demand of performance and high availability, there are three most recognized Hypervisor 1 products that stand out in the market, having the largest number of implementations. These products are ESX from VMWare, Hyper-V from Microsoft and Citrix’s XenServer. Comparing each solution’s specifications, ESX appears as the most complete and more capable product, which justifies their market leadership [2]. Based on the work already developed by IBM, VMWare pointed every effort in a big challenge: the virtualization of x86 systems based on Intel 32bit architecture. The execution scheme of input / output instructions wasn’t designed for this purpose, causing serious obstacles in the abstraction of resources. This is due to an identified set of instructions that require execution in privileged mode, in other words, direct contact with the hardware instead of communication with an interface between the virtual operating system and the physical host, creating protection exceptions that lead to system blocking. This was a key point because technology would not succeed if the stability of the system wasn’t guaranteed. As the result of a proprietary mechanism to monitor processing, VMware was able to circumvent this constraint and thus successfully implement its conceptual model, becoming a pioneer in x86 virtualization and aftermost market leader. This mechanism consists in the dynamic rewriting of operating system kernel key parts in order to capture these sensitive instructions and therefore allow its interpretation to be performed by the virtual machine supervisor [6].

3

Creating a Private Cloud through the Implementation of a High Availability Cluster: Key Points and Expected Achievements

This implementation involves the application of a private cloud in the critical services infrastructure of Metro do Porto, SA. The company is endowed with a classical architecture, which provides good results in the field of availability and, due to its recent existence, is at a high rate of expansion. It is against this condition that the company wants this growth and its management made in a sustainable manner and with clear improvement in the service quality. For this end, virtualization presents itself as the technology with the right motives and, based on a previous cost comparison study, is also the most economically advantageous. The aim is to provide a development that will achieve higher scalability, lower costs of operation and maintenance as well as an increase in the availability rate, while also improving their data/services backup methodologies. In the end, the results must achieve the following goals: − At least 99.99% annual average of service availability − Consolidation of the infrastructure management − Reduction in the annual costs for operation and maintenance − Capacity growth without hardware investment over the medium term To support this technology, it was decided to purchase a cluster of three servers with VMWare ESX and the high availability module, based on a HP Blade Center platform and using a Storage Area Network (SAN) HP EVA 4400 storage equipment. This

› 14


PROCEEDINGS

SAN is composed of four drawers with twelve 300GB drives, totaling approximately 12TB of raw space. This capacity was estimated to be able to store all systems and their data and provide growth opportunities for a year without adding disks. It should be noted that the company, due to their business area, has atypical needs of storage, culminating in an average quarterly growth of about 11%. At processing level, the blade servers are equipped with dual quad-core Intel Xeon 2.83 GHz and 32GB of memory. Abstracting these resources in the cluster, a total of 128GB of memory and 24 processing cores are made available. This central unit will start by holding 9 standard servers but, with these features, growth possibilities in the medium term without any investment are expected. Before setting up the cluster, storage volumes must be configured in the SAN according to the provisioning requirements. VMWare good practices recommend that no more than 16 virtual machines share the same volume (defined as datastore), even though this allocation depends on the system purpose, according to its data type and required I/O capabilities. In this provisioning, it is necessary to have in account that one of the virtualization capabilities is the possibility to take system snapshots. This method, as its name suggests, allows a sort of picture to be taken to the binary structure of a file system in a specific point of time. With this resource, it is possible to recover the system to that point allowing the rollback of upgrade and maintenance tasks. It also enables the possibility to copy the main file, as it is unlocked for reading and writing during that period. For this reason, each presented volume should have an exceeding size of 30%, as recommended by the manufacturer, even though this value could be excessive in situations where storage capacity isn’t abundant. The networking architecture must also be defined. The blade center used in this demonstration is equipped with four network interface cards and, following security recommendations [5], two of them will be exclusively dedicated to the management network, which includes the service console. It is in this network that inter-virtual machine traffic occurs and tasks like VMotion are executed, therefore requiring isolation from other networks. The other network cards are dedicated, in load balancing mode, to regular traffic and connected to the physical network devices. It is also advisable, in order to prevent unusual situations, to setup each pair of network cards as an alternative to the other pair, allowing an additional fault tolerance scenario. The process of creating virtual machines is a rather easy step and there are three ways of introducing them to the cluster: starting from scratch, physical to virtual conversion or importing vApps. The physical to virtual process, or P2V, is provided by an additional application called VMWare Converter. This kind of conversion usually results without any problem in Windows systems and most Linux releases, as long as they are conveniently prepared. This preparing means the removal of specific drivers which are not automatically recognized by the operating system without additional software (hard locks, special graphic cards, etc.) as well as hardware management products inherent to the physical platform, like Systems Homepage from HP. The Converter is also capable of importing to ESX files created on VMWare Server or even third-party software, like Acronis True Image and Symantec Ghost images.

15 ‚


2nd Cloud Computing International Conference

Before considering the cluster fully implemented, there is another security recommendation that should be taken into account. It is highly advisable to use the ESX native functionality that restricts the resources used by each virtual machine, in order to prevent extraordinary situations that can affect the other platforms. This process requires some learning time so the most effective option is to define limits that seem acceptable and in accordance with the characteristics assigned to each service. Within the normal course of production, it will be easier to understand the needs that each system has and thus adjust restrictions into a more appropriate level. This kind of migration process, when performed with a comprehensive study of needs and with the implementation scheme properly defined and planned, has all the conditions to run as smoothly as in this case study, where no abnormal situation was verified and the commitment to be done without any impact to users was completed exceptionally. After reaching the optimum configuration, the implementation is considered complete and outcome indicators can be raised, allowing a comparative analysis of results. 3.1

Outcome Assessment Indicators

The process of evaluating outcomes consists in a direct comparison between the values verified before and after the conversion of the infrastructure. Naturally, the lifting of such data requires that the collection has been carried out in an acceptable period of time, allowing to effectively achieve results in a medium term. In this case, figures are available for the classical model since the first half of 2007, and it is easy to compute an average for a reasonable period. But, in the case of the virtual infrastructure, given its recent implementation, the time period under review will be just the first quarter of 2010. Although the implementation occurred before, this is the period when actually all the adjustments, which are part of the learning process, were considered as being in a stable point, hence it’s only legitimate to compare data after this state. Using the reporting module of the application the company has for monitoring systems and services, data can be exported into a spreadsheet format for a given period of time. These values, among others, include the response time and service status, which allows evaluating the availability observed in this period of time. This evaluation is performed by considering a service as available whenever it is running within a given range of performance values. 3.1.1 Before Virtualization Before this process, every new service required the acquisition of new hardware and its corresponding maintenance contract. This also meant that every added service would increase the data center cooling and power needs, while adding another point of failure in the infrastructure. Note that this need for hardware isn’t just the acquisition of the server itself. For each new equipment fiber optical connections to the SAN and to the backup devices (cables and interface cards) are required, along with Ethernet connections and corresponding switch ports. All together, this increased the infra-

› 16


PROCEEDINGS

structure management complexity and dispersion, making it harder to keep under control. As for service availability, the values obtained in the three years before virtualization were as follows. In 2007, the first full year of operation of the Office of Information Systems, an average availability rate of 99.95% was achieved, representing a total of approximately 1836 minutes without service, as shown in Table 1. Table 1. Availability data for 2007

Servers - January 01, 2007 00:00:00 - December 31, 2007 00:00:00 Service Unavailability time (minutes) Domain Controller 1 1253,15 Domain Controller 2 80,1 E-mail Service 30,02 ERP 40,01 Maintenance Service 20,44 File Server 80,04 Print Server 50,63 Geographical Information 220,92 System Webmail 60,36 1835,67

Availability (%) 99,73% 99,98% 99,99% 99,99% 99,98% 99,98% 99,99% 99,95% 99,99% 99,95%

The following year, due to some technological changes and various unexpected failures of equipment, there was a decline to 99.83%, totaling about 6503 minutes of accumulated staging (Table 2). Table 2. Availability data for 2008

Servers - January 01, 2008 00:00:00 - December31, 2008 00:00:00 Service Unavailability time (minutes) Domain Controller 1 352,54 Domain Controller 2 810,82 E-mail Service 1456,91 ERP 472,42 Maintenance Service 80,05 File Server 80,06 Print Server 133,89 Geographical Information 1558,21 System Webmail 1558,07 6502,97

Availability (%) 99,92 99,81 99,67 99,89 99,98 99,98 99,97 99,64 99,64 99,83

17 ‚


2nd Cloud Computing International Conference

In 2009, the year that the infrastructure was converted, there was an improvement on the previous year, still below desired levels. The rate was 99.90%, which represents about 4420 total minutes without service (Table 3). Table 3. Availability data for 2009

Servers - January 01, 2009 00:00:00 - December 31, 2009 23:30:00 Service Unavailability time (minutes) Availability (%) Domain Controller 1 270,81 99,95 Domain Controller 2 250,75 99,95 E-mail Service 601,19 99,88 ERP 410,89 99,92 Maintenance Service 711,74 99,86 File Server 790,98 99,84 Print Server 570,89 99,89 Geographical Information 180,84 99,96 System Webmail 631,71 99,87 4419,8 99,90 3.1.2 After Virtualization As indicated, the values for this scenario are still premature for a medium term contrast. However, if the present situation continues and these data are extrapolated to the rest of the year, the outlook is fairly encouraging. In the first quarter of 2010, the much sought rate of 99.99% was obtained, which culminates in less than 153 total minutes without service (Table 4). Table 4. Availability data for 2010

Servers - January 01, 2010 00:00:00 - March 31, 2010 23:30:00 Service Unavailability time (minutes) File Server 70,4 Print Server 10,01 ERP 0 Domain Controller 1 0,33 Domain Controller 2 10,34 E-mail Service 20,33 Maintenance 10,34 Geographical Information 20,34 System Webmail 10,33 152,42

› 18

Availability (%) 99,95 99,99 100,00 100,00 99,99 99,98 99,99 99,98 99,99 99,99


PROCEEDINGS

This kind of values is only possible thanks to the ability the IT staff gained in power perform maintenance tasks without any downtime window, as the result of a cluster that can migrate and change the services characteristics without interruption. In addition to these data, we should note that no subsequent purchase of equipment was made and five new services were added, with a few more already in line for entering production. Nine physical servers were discontinued, with their corresponding maintenance contracts, and the power consumption of the data center was reduced by 42%. With this type of solution the IT management gains a new consolidated way of dealing with the infrastructure, not only by its ease of use but also in the fact that most maintenance tasks can now be performed at any time, without the need for hard and stressful overtime interventions, sometimes just to perform small operations. Every time one of the nodes is placed in maintenance mode (or when it really fails), ESX automatically evacuates virtual machines contained in it to the others, informing the high availability module that there is an unavailable node, in order to reorganize all the resource provisioning plan. All of this is made without any service rupture. It is a solution of this nature that allows establishing an official SLA that can actually be accomplished without the need for an incredibly high budget.

4

Conclusion

Desirably, this work will provide a basis for a better understanding of the sustainability problem that the world of IT faces. The fast changes that everyday confronts professionals of this area make their adaptation speed and the ability to respond to these challenges key factors of success. And with the evolution of all active parts sensitizing for the need to align business with their information systems orientation, the more critical becomes the idea of supplying large scale, uninterruptible and high quality IT services. The focus must necessarily pass from the technology itself to the results it produces. As Nascimento refers in his notable work about the changes that information system professionals have faced along the last years, the informatics department is no longer an area that independently defines and implements working methods but instead an area that, just like every other department in an organization, works in a collaborative way. In this scenario, the IT is only the vehicle that provides these services to the upper layer, to the information systems, which by their hand supply essential data for development and decision making. Supplying these services is unarguably a complex task that requires specialized professionals and high level practices, but without putting the emphasis on the vehicle instead of the data it produces. Ideally, there will become a time where accessing a service will be like turning on a light. Everyone expects it to work without questioning which method or by which means the energy is supplied. It will be a given fact that it exists, what is its goal and its unavailability unacceptable [3]. Just like virtualization abstracts resources and makes them available to the services in a controlled manner, so does the IT walk to the creation of an abstract cloud that only its managers and technicians understand the working details, leaving space for clients of that cloud to benefit in a correct and continuous way, with a single concern: productivity.

19 ‚


2nd Cloud Computing International Conference

References 1. Dittner, R., Rule, D. (2007) “The Best Damn Server Virtualization Book Period”. Syngress Publishing, Inc., Elsevier, Inc. 2. IT 2.0: Next Generation IT Infrastructures. [Online]. Available: http://www.it20.info/misc/virtualizationscomparison.htm [Accessed on May 2010] 3. Nascimento, J. C. (2006) “Gestão de Sistemas de Informação e os Seus Profissionais”. FCA 4. Oltsik, J. (2009) “The new security management model. Enterprise Strategy Group” 5. VMWare (2009) “Network segmentation in virtualized environments” 6. Virtualization Basics: History of virtualization. [Online]. Available: http://www.vmware.com/virtualization/history.html [Accessed on December 2009]

› 20


PROCEEDINGS

1

1,2

1 1

1

1

1

2

21 ‹


2nd Cloud Computing International Conference

› 22


PROCEEDINGS

23 ‹


2nd Cloud Computing International Conference

› 24


PROCEEDINGS

25 ‹


2nd Cloud Computing International Conference

› 26


PROCEEDINGS

27 ‹


2nd Cloud Computing International Conference

› 28


PROCEEDINGS

29 ‹


2nd Cloud Computing International Conference

› 30


PROCEEDINGS

31 ‹


PROCEEDINGS

GRID, PaaS for e-science J. Gomes, G. Borges, M. David Laborat´ orio de Instrumenta¸c˜ ao em F´ısica Experimental de Part´ıculas, Lisboa, Portugal jorge@lip.pt

Abstract. Grid computing shares many of the characteristics of a platform as a service cloud. Still grid computing has remained confined to large scientific communities that need access to vast amounts of distributed resources, while cloud computing is gaining adoption and emerging as a more flexible way to use remote computing resources. In this paper we highlight some of the grid computing achievements and shortcomings and provide some insights as how they can be improved through the combined use of infrastructure as a service clouds. Keywords: Cloud Computing, Grid Computing, PaaS

1

Introduction

The grid computing paradigm is widely used to address the needs of demanding computational applications in the scientific research domain. Europe has been very successful in the development of grid technologies applied to e-science, making possible the deployment of large production infrastructures such as the one operated by the project Enabling Grids for E-science (EGEE) [1]. The EGEE project enabled the integration of computing resources across Europe and elsewhere creating the largest multidisciplinary grid infrastructure for scientific computing worldwide. The infrastructure was built using the gLite [2] grid middleware which made possible the integration of computing clusters in distinct geographic locations. However the adoption of grid computing by smaller and less organized research communities has been slow. Low flexibility, high complexity, and inadequate business models are often mentioned as barriers to the adoption of the grid technologies. Cloud computing is a possible approach to complement grid computing and address some of its current limitations. Cloud computing has the potential to provide a more flexible environment for grid computing and other paradigms, while enabling higher flexibility, elasticity and better optimization of the underlying infrastructures.

2

Grid and cloud computing

Cloud computing is based on the concept that the user needs can be satisfied by the provisioning of remote computing services through the Internet. The user

33 ‹


2nd Cloud Computing International Conference

can instantiate and use these services without caring about the infrastructure that is behind. Grid computing and cloud computing have much in common, in fact grid computing can be seen as a Platform as a Service (PaaS) cloud. A grid infrastructure is a platform to manage data and execute processing jobs. The users access the resources through the Internet regardless of their location or nature and without having to worry about what is behind them. The technologies that support both grid computing and cloud computing interfaces are in many aspects similar. For the end user the biggest conceptual difference is that in grid computing the infrastructure architecture and details are more open, while cloud computing infrastructures tend to be a black box where the underlying details are much more hidden. Since the first clouds appeared as commercial services, it was natural that the architectures and software behind them were closed and proprietary. As the interest grew, open implementations appeared that made possible the deployment of private clouds. In contrast grid computing was born in the academic domain and as such the architectures and implementations were open from the beginning. Although PaaS clouds are emerging, what has made cloud computing gain momentum are the Infrastructure as a Service (IaaS) clouds in which users can instantiate and access remote computer systems usually provided as virtual machines. This type of service is highly flexible. Users can gain immediate access to virtual machines that can be tailored and customized to perform whatever task is needed. The user has access to the operating system and is no longer restricted to specific software interfaces as in the grid or in PaaS clouds. Finally users can instantiate more resources as needed. In architectural terms, grid computing sits on top of a layer constituted by physical processing clusters and storage systems. It is conceivable that his layer can be replaced by a virtual one composed of resources provided by IaaS clouds. In this way the advantages of both paradigms could be combined and fully exploited.

3

The grid achievements

Grid computing provides a software layer between the user and the actual computing resources. By implementing common interfaces the grid middleware can hide the specificities of each resource (processing, storage, instruments) and promote their integration under a unified computing infrastructure. In this way the users can have transparent access to a wider range of resources regardless of their location, ownership and characteristics. The grid middleware effectively simplifies the access to distributed resources making possible their combined use to solve complex and demanding computational problems. The grid middleware can also facilitate the data management in data intensive applications. The gLite middleware can keep track of files, manage data replication, schedule data transfers and provide transparent and efficient access to many types of storage systems.

› 34


PROCEEDINGS

Grid computing has attained many achievements. The standards effort organized in turn of the Open Grid Forum (OGF) [3] contributed to interoperability among different grid middleware stacks and better user and programming interfaces. The development of sophisticated data and resource management capabilities has made possible a high degree of efficiency for distributed computing. The International Grid Trust Domain (IGTF) [4] created a global authentication domain for grid users and services based on national certification authorities. Common usage and security policies contributed to establish clear responsibilities and promote trust between users, resource providers and infrastructures. The developments around the virtual organization concept enabled the creation of structured user communities that share common resources, and in which users may have different roles, responsibilities and be further structured in groups, with excellent access rights granularity. The G´eant [5] European academic network and the national research networks (NREN), made possible the deployment of high performance international scientific computing grids supporting distributed processing for data intensive applications at an unprecedented scale. Grid computing has become a vital tool for many research communities that rely on it as an integration technology to unify and provide seamless access to their computing resources. Consequently a sustainable model for grid computing infrastructures in Europe was needed to ensure the long term availability of the relevant technologies and services. A two layer model was defined with National Grid Initiatives (NGI) in each country supported by the governments, and the European Grid Initiative (EGI) [6] a body established to unify the NGIs and operate a pan-European grid. EGI is now taking the role of EGEE ensuring a smooth transition and continuous growth of the European grid infrastructure. Most national grid initiatives also encompass other distributed computing technologies as a complement or option to traditional grids. In this context there is a growing interest in cloud computing by both the NGIs and the communities that they serve. This interest extends to the European Grid Initiative that in its EGI-Inspire [7] project highlights cloud computing as one of the new distributed computing technologies that it seeks to integrate. EGI will establish a roadmap as to how clouds and virtualization can be integrated into EGI exploring not only the technology issues but also the total cost of ownership under multiple scenarios. This trend is not completely new as the EGEE project already started some exploratory work on the provisioning of gLite services on top of clouds within the context of the RESERVOIR [8] project and the StratusLab [9] collaboration. Europe has now an excellent framework for distributed computing that can be potentially exploited by several paradigms and technologies.

4

Grid user communities

The grid has been very successful and effective for many user communities. Some communities such as High Energy Physics match perfectly the grid computing paradigm. Some of the ideal grid community properties are: a very structured and well organized community, large user base, geographically dispersed users and

35 ‹


2nd Cloud Computing International Conference

resources, good technical skills, share of common goals and common data, huge processing and data management requirements, will to share and collaborate to achieve common goals. In short, an evident motivation and a reward for sharing computing resources within the community is needed. However many communities do not match these properties so well. Small communities may not have the necessary technical skills and resources. Communities that are not structured will have more problems to adhere to the virtual organizations model. Communities with fierce competition or without a tradition to cooperate do not have the will to share. Communities owning some computing capacity but having isolated peeks of load may find the grid too much of an effort for their needs. These communities have a low motivation for grid computing. Their needs can be better satisfied through the use of other paradigms.

5

Grid business model

Scientific grids are based on the virtual organizations model. In this model users organize themselves and create virtual organizations which are basically user communities that share their own resources. Therefore the grid infrastructure can be seen as a bus where the computing resources and the virtual organizations are plugged so that resource sharing can happen. This model is ideal for distributed user communities that have their own resources and want to share them to achieve some common goal. However it does not promote resource sharing outside of the VO boundaries because there is no compensation or reward for sharing resources with other VOs with which there are no common goals. An economic model promoting compensations for resource sharing is missing. This issue extends to the computing resource owners that in the absence of local users pushing for grid integration do not have motivation to share their resources in a grid infrastructure. Even if they have such motivation they tend to commit and share the least possible resources. As extreme consequence the VOs become isolated islands and there is no resource sharing outside of their scope. This behavior reduces tremendously the elasticity of the grid infrastructures. Consequently although some VOs may have idle resources, others that could benefit from this capacity will not be able to access it. Although the aggregated grid capacity can be large, small user communities that don’t own computing resources may not benefit by joining the grid. Cloud computing offers a more generic solution that fits a much wider range of user requirements. Therefore cloud computing has the potential to attract more users and resource owners that could share their capacity through a scientific cloud. A large pool of cloud resources could be built by joining resources from academic and research organizations leading to a better optimization of the installed capacity. This capacity could then be exploited for grid computing and other uses.

› 36


PROCEEDINGS

6

Grid adoption barriers

From the user feedback and experience we identified a number of limitations that constitute barriers to the adoption of the current grid computing technologies: – – – – – – – – – – – –

Mostly oriented to batch processing Limited support for HPC applications Steep learning curve Hard to deploy, maintain and operate May require considerable human resources Creation of new VOs is a heavy task First time user induction is hard Several middleware stacks without full interoperability Not much user friendly Reduced range of supported operating systems Too heavy for small sites Too heavy for users without very large processing requirements

Many improvements addressing these and other concerns have been introduced to simplify the deployment and use of the current infrastructures. Nevertheless many of these issues are still problematic. We believe this is one of the reasons why the number of active VOs in infrastructures such as EGEE is becoming flat. Further simplification of the processes and technology may expand the grid to new communities. Still there will be always users and applications that do not match well the grid computing model. For these cloud computing or other computing paradigms can provide a better solution.

7

Combining clouds and grids

Several scenarios for the combined use of IaaS clouds and grids are being studied. Notably the StratusLab [9] collaboration has tested several methods for running grids on top of clouds. Here we provide an overview of some models their advantages and disadvantages in the scope of scientific computing infrastructures. 7.1

Partially clouded grid

The elasticity of grid infrastructures can be increased using one or more clouds. When the grid resources become saturated clouds could be used to provide additional computing systems to run grid processing nodes. The grid would expand and run on top of the cloud infrastructures. This model is economically appealing because the capacity of the native grid resources could be minimized and dimensioned to sustain only the typical load scenarios, then the cloud would be used to sustain the usage peaks. This model would also minimize the cloud usage costs and dependency, and as such can be more safely used with commercial clouds. This can be applied both to expand the capacity of individual grid computing sites making them elastic, and to create virtual grid computing sites

37 ‹


2nd Cloud Computing International Conference

that would be nothing more than front ends to computing nodes running on top of clouds. In figure 1 a grid site composed of a computing element (CE) giving access to a computing cluster composed of worker nodes (WN) can be expanded by joining additional worker nodes instantiated from cloud providers. Optionally all worker nodes could be instantiated from cloud providers.

Fig. 1. Expanding grid sites to the cloud.

7.2

Fully clouded grid

A grid fully based on cloud services where the whole infrastructure would be running on top of clouds could be used to provide a fully dynamic allocation of resources. There can be several advantages on basing the full grid infrastructure on clouds: no need of a physical infrastructure, all management efforts can be focused in running the grid service, easier deployment without the troubles of installing physical machines, and possibly load balancing and higher resiliency if supported by the cloud service provider. For sustained high usage this model may become very expensive when implemented on top of commercial grids, therefore it requires a careful estimation of the expected usage and related costs.

8

Potential issues for scientific clouds

For scientific computing the described models present the following additional concerns. For data intensive applications the network bandwidth between the computing nodes, the storage systems and the users is extremely important. In Europe, grids such as EGEE were built on top of G´eant the European academic and research backbone, possibly the world most performing network backbone. Commercial clouds are outside of the G´eant backbone and therefore must be accessed through the commercial Internet with much higher costs and less available bandwidth. We believe that for data intensive applications the computing and storage resources must be inside the G´eant network or directly accessible without additional costs. Also for data intensive applications the data storage is a concern. For efficiency reasons the scientific data must be stored near the processing nodes and for very large storage requirements the commercial clouds may not provide neither the required capacity nor a competitive price. Even if they do, there are concerns regarding: data privacy, data availability, and long term data access.

› 38


PROCEEDINGS

High performance computing applications namely parallel applications frequently require low latency communication and parallel file systems. This type of applications may need specialized low latency hardware interconnects and software setups that are not commonly available in cloud infrastructures. Even when the application latency requirements can be satisfied with Ethernet networks, the use of virtual machines increases considerably the communication latency making it unacceptable for many parallel computing applications. The clouds currently available may be only usable for high throughput computing applications. The black box approach of the commercial cloud providers prevents a full understanding of the architecture and scalability of these cloud services which raises concerns given the complexity of the scientific application requirements. The lack of standards among cloud providers reduces the market competitiveness and increases the fear of dependence from a specific provider (vendor lockin). Furthermore it limits interoperability and increases the application porting effort. The service level agreements offered by most cloud providers fail short in providing enough confidence in the availability and reliability of the services. For the scientific community the financial sustainability is also a concern. For long term projects the payment of commercial cloud services would depend on sustained funding over the years which in many countries is not easy to ensure. A project may receive considerable funding for the first year but be severely cut in the next. In this context buying hardware when there is funding can be a better solution. But frequently the reverse also happens when projects buy hardware but don’t receive sustained funding to pay their maintenance and operation. Finally there are legal concerns in depending from a commercial cloud provider such as what happens to the data stored in a cloud provider that closes.

9

Conclusions

Many factors have to be weighted when deciding between building and operating a computing infrastructure or buying the service from a cloud provider. Independently from the cost issues many of these aspects suggest that commercial clouds may not be the ideal solution for the scientific research community, and that scientific clouds operated by the scientific community may provide a more adequate service. In a first approach commercial clouds may only be feasible for high throughput computing applications without large data processing requirements. In many countries the national grid initiatives have been deploying computing facilities that although initially thought for grid computing are also suitable for other types of distributed computing. The availability of computing resources made possible by the grid can now be used to promote other types of usage such as cloud computing to complement or maximize the return of these investments. An interesting solution for the scientific community would be the creation of scientific cloud on top of the existing computing resources operated by the

39 ‹


2nd Cloud Computing International Conference

national grid initiatives. These clouds could be used to provide grid computing, cloud computing and other types of services thus covering a wider range of users and needs. Applications that do not fit well the present grid computing models could be run directly on the cloud resources or, the users could themselves deploy in the cloud whatever middleware they consider more adequate for their applications. Users would have the power to choose and also take care of their specific needs themselves. In addition services such as databases, repositories, web servers and others that are not adequately supported in grid computing could be deployed in the cloud. Since cloud computing provides a more generic approach suitable for a wider range of requirements, it can be more widely accepted and adopted than grid computing. Organizations that don’t see a benefit in grid computing could become more easily interested in joining their resources to a scientific cloud infrastructure. By increasing the resources available in the cloud the potential universe of computing resources for the grids running on top of it also increases and as such grid computing would benefit from the increased capacity offered by the clouds. This approach could be complemented by an economic model that would grant to the organizations sharing resources some processing time in other cloud sites, thus enabling organizations to get something in return for the idle time that they share. These scenarios show that infrastructures mixing cloud computing and grid computing can be mutually beneficial, and that grid computing can profit from the flexibility, and elasticity of the cloud technologies.

References 1. Enabling Grids for E-SciencE. Web site: http://www.eu-egee.org/ 2. Lightweight Middleware for Grid Computing. Web site: http://glite.web.cern.ch/glite/ 3. Open grid Forum. Web site: http://www.ogf.org/ 4. The International Grid Trust Federation. Web site: http://www.igtf.net/ 5. Pan-European data network for research and education. Web site: http://www.geant.net/ 6. European Grid Initiative. Web site: http://web.eu-egi.eu/ 7. Integrated Sustainable Pan-European Infrastructure for Researchers in Europe. Web site: http://www.egi.eu/ 8. Resources and Services Virtualization without Barriers. Web site: http://www.reservoir-fp7.eu/ 9. Enhancing Grid Infrastructures with Cloud Computing. Web site: http://www.stratuslab.org/

› 40


PROCEEDINGS

Privacy for Google Docs: Implementing a Transparent Encryption Layer Lilian Adkinson-Orellana1, Daniel A. Rodríguez-Silva1, Felipe Gil-Castiñeira2, Juan C. Burguillo-Rial2, 1

GRADIANT, R&D Centre in Advanced Telecommunications, Lagoas-Marcosende s/n, 36310, Vigo, Spain {ladkinson, darguez}@gradiant.org

2 Engineering Telematics Department, Universidade de Vigo, C/ Maxwell, s/n, Campus Universitario de Vigo, 36310, Vigo, Spain {xil, jrial}@det.uvigo.es

Abstract. Cloud Computing is emerging as a mainstream technology thanks to the provided cost savings in deployment, installation, configuration and maintenance. But not all is positive in this new scenario, user’s (or company’s) confidential information is now stored in servers possibly located in foreign countries and under the control of other companies acting as infrastructure providers; so its security and privacy can be compromised. This fact discourages companies and users to adopt new solutions implemented following the Cloud Computing paradigm. In this paper we propose a solution for this problem. We have conceived a new transparent user layer for Google Docs, and implemented it as a Firefox add-on, which encrypts the information before storing it on Google servers; making virtually impossible to get access to the information without the right password. Keywords: cloud computing, google docs, security, privacy, firefox add-on.

1 Introduction The continuous evolution of Information Technologies (IT) and the lower cost of servers and desktop PCs (which are becoming more and more powerful) is promoting the emerging of new IT services. Among them stands out the Cloud Computing (or simply “Cloud”) paradigm. We can say that Cloud Computing has been born as the evolution and combination of several technologies, mainly: distributed computing [1], distributed storage [2] and virtualization [3]. Cloud Computing implies a change in the traditional paradigms basically because the infrastructure is completely hidden to the final user. In Cloud Computing we can find 3 levels or layers as shown in Fig. 1.

41 ‹


2nd Cloud Computing International Conference

IaaS PaaS SaaS Fig. 1. Cloud Computing levels: Infrastructure, Platform and Software as a Service.

IaaS (Infrastructure as a Service) is the lower level and includes infrastructure services, i.e. (virtual) machines used to run applications. We can find examples of IaaS in Amazon EC2, GoGrid or RackSpace. PaaS (Platform as a Service) is the next abstraction level. Here we can find a platform that allows developers to build applications following a specific API. Examples of PaaS are Google App Engine, Microsoft Azure or Sales Force. SaaS (Software as a Service) is the highest level and involves applications offered as a service that are executed in the Cloud (over a PaaS or a Iaas). Examples of SaaS are Google Apps, Salesforce or Zoho. Some of the specific advantages of using cloud services are scalability, ubiquity, pay-per-use and no hardware/maintenance investment, however there exist some problems related with integration with current systems and especially with the security and the reliability on the service [4]. As well known examples of Cloud Computing applications (belonging SaaS level) we may cite Google’s Gmail email client service or Google Docs, a web based editor for text documents and spreadsheets that offers its service to the users who have a Google account. This paper explains the development of an add-on for the Firefox browser [5], which allows users of Google Docs service to use a security layer to protect their documents in a transparent way. This paper is organized as follows. Section 2 describes in a more deeply way the problematic of privacy in Cloud Computing, and also the services that offer possibilities for editing documents on the Cloud, giving particular emphasis on Google Docs. Section 3 describes the functionality and the internal structure of the presented add-on, as well as an example of the requirements and the behavior for users. Finally, section 4 gives some conclusions and raises possible future enhancements to extend the add-on functionality, explaining the current difficulties to implement them.

› 42


PROCEEDINGS

2 Privacy in Cloud Computing 2.1 Cloud Privacy Concerning security in Cloud Computing paradigm, it does not only include aspects of confidentiality or privacy of the information, but it also could affect to the loss of data, although it is out of the scope of this paper. Since the processing of applications is moved to the cloud servers, sensitive data of users is exposed to the infrastructure provider. This means that users must trust in providers, nevertheless this is not always feasible, so some security mechanisms are required to solve the problem. Particularly, it is especially critical the case when users store sensitive data remotely, because in the case the cloud servers, containing that information, suffered an attack; user’s data would be compromised. For example in [6] it is explored information leakage in third-party clouds (Amazon EC2) and it is described how it is possible, under specific circumstances, to access the information of a cloud server (a virtual machine) from another different virtual machine both in the same physical server. In the case of a service like Google Docs, the documents of the user are simply protected by the password associated to his Google account. If the session was not properly closed or his password was stolen, all the documents that were kept using this service would be exposed. 2.2 Document editing cloud solutions (SaaS) Actually, there are many SaaS that offer the possibility of editing documents on the Cloud. In Table 1 some known solutions are compared taking into account their main features. The table shows a representative set of solutions, most of them free, but there are much more web based editors that can be found with similar characteristics to the described in the table, mainly because Cloud software solutions are becoming more and more common. For example, OpenOffice offers an online version too, which is very similar to the desktop application, but it is still a beta version. Many people use this kind of software for free and this means that they have to be careful with the sensitive information they are storing in the Cloud.

43 ‚


2nd Cloud Computing International Conference

Table 1. Comparison among different Cloud office applications. Maximum document size 500K

Maximum storage 1 GB

Price free

Real time collaboration Yes

Edit uploaded documents Yes

-

1 GB

free

No

Yes

Microsoft Office Live

25MB

5 GB

free

No

No

ThinkFree

10 MB

1 GB

30 days trial

Yes

-

Feng Office

-

300MB

30 days trial

Adobe BuzzWord

10 MB

-

free

Google Docs Zoho

Yes Yes

No

Type documents Text Spreadsheets Presentations Text Spreadsheets Presentations Text Spreadsheets Presentations Text Spreadsheets Presentations Text Spreadsheets Presentations Text

The reason why we have chosen Google Docs is that it is a very popular and free service, with a complete and well-documented API. The use of this API simplifies the development of possible extensions. In addition it has an easy interface that allows users to made changes at real time on shared documents.

3 Security layer to protect Google Docs documents 3.1 Firefox add-on to protect Google Docs documents The security layer we have implemented to add privacy to Google Docs documents relies on a Firefox add-on based on JavaScript [7] and XUL [8], a language similar to XML used to create Firefox extensions. The add-on uses two hidden documents created using the Google Docs API, which will contain all the information needed to encrypt and decrypt user’s information. One of the documents contains the data about the user’s ciphered documents (algorithm used, password and encryption options if it was necessary). The other one maintains the same information, but only about the documents that are currently being shared. When the add-on is enabled, it starts an asynchronous communication with Google Docs servers using the API, sending AJAX requests to authenticate the user and get data about all the documents owned by the user, their sharing permissions or the content of the documents. In this way, the add-on also gets access to the hidden

› 44


PROCEEDINGS

documents described before, which will act as indices of the ciphered documents, whether there are shared or not. Furthermore, as we can see in Fig. 2, when the process starts, it creates two channel listeners to capture all the data that it is sent to and received from the servers. When for example, a document is saved, the message with the data is intercepted, encrypting only the user’s content and leaving the rest unmodified. A password chosen by the user is required in the encryption process, so nobody else could access the information of the document. Afterwards, the plaintext is replaced with the ciphered text, and the message is released, so it continues its way to the server. With this method, the user’s information received by the server is indecipherable, but the server will not notice any difference because only the document’s content is modified. It is also remarkable that every time a document is encrypted, the information related to the process is stored in the indices. If any condition of the ciphering changed, such as the document’s password or the algorithm used, the indices would be automatically updated with the updated information.

Browser add-on Encrypted documents

Alg o op rith tio m, ns

Index

Client with Internet browser

re

Sto

do

cId

Plaintext re Sto ve co

Re

r

r

ve

co

Re

Ciphered text

Google docs Cloud

Encryption Decryption module

Fig. 2. Behavior of the add-on: components and actions involved in a secure editing.

When an encrypted document is requested, the same process is executed, but in the opposite way. After identifying the document, the hidden index is read and the associated information to the encrypted document is obtained. Then the ciphered data of the incoming message is accessed, decrypted with the information recovered from the index (algorithm, key size, mode…) and finally replaced with the plaintext. When the document is finally shown to the user, it is completely readable, and he can work with it as it was a normal one.

45 ‹


2nd Cloud Computing International Conference

3.2 Using the secure Google Docs In this section we will describe the functionality of the add-on, from the user’s perspective. The first step to be able to use the add-on is to install the .xpi file, which is a compressed file that follows the typical structure for a Firefox add-on. Once the addon has been installed and the user accesses to Google Docs with his Google account (http://docs.google.com), it is necessary to activate the add-on by pressing a new button with a lock image that appears at the status bar or alternatively through the browser’s tools menu. Once it has been enabled, the main difference the user can find with respect to the normal use of Google Docs is that the index table with his/her documents contains more information; indicating which ones have been previously ciphered and which algorithms had been used in each case (see Fig. 3). .

Fig. 3. Google Docs interface showing the index table of available documents

Supported algorithms are shown in Table 2 with their main properties. The user can choose any of them, and his/her election will influence on the security level and the speed of the encryption process [9]. The choice of the user’s password is very important too, since the most secure environment could be compromised by the use of a weak password. If the add-on is enabled, when the user is about to save the changes in a new document, or in one that had not been ciphered till that moment, a new popup window appears, asking the user for the password to cipher the information, and the possible encryption algorithms with their corresponding options (as the key size, or the mode if it was the case).

› 46


PROCEEDINGS

Table 2. List of supported encryption algorithms and its main features.

Block size 128 bits

Key size 128, 192, 256 bits

Security Secure

Speed Fast

64 bits

56 bits

Insecure

Slow

-

64 bits

56-168 bits

Moderately secure

Very Slow

No

Blowfish

Name Advanced Encryption Standard Data Encryption Standard Triple Data Encryption Algorithm -

Speed depends on key size? Yes

64 bits

No

Rivest Cipher 4

64 bits

Moderately secure Insecure

Fast

RC4

Tiny Encryption Algorithm Corrected Block TEA

Very fast Fast

No

TEA

32-448 bits 8-2048 bits 128 bits

arbitrary, (min 64 bits)

128 bits

Moderately secure

Fast

No

AES DES Triple DES

xxTEA

64 bits

Insecure

No

After this step, the user will be able to work with the data as usual, being completely transparent the process of ciphering the data. If the add-on is disabled, and the user tries to access to his ciphered documents, the result will be unintelligible to him, as it can be observed in Fig. 4.

Fig. 4. User’s ciphered document opened without using the add-on

Once the parameters of the encryption have been set for a document, it is also possible to modify them. For example, changing the password or the algorithm that was used the last time the document was saved. If the user wants to eliminate the ciphering of a concrete document (deleting its password) he can do it as well.

47 ‚


2nd Cloud Computing International Conference

4 Conclusion and future work In this paper we have presented a new security mechanism for SaaS applications that brings the possibility to the users of Google Docs service to have an additional privacy layer to protect their documents on the cloud server side, with a very simple interface. Without the user’s password used to encrypt the documents, the information cannot be recovered, even by the person concerned; so if the user forgets it, the data would not be readable. This application is currently being improved with the possibility to share encrypted documents with other users, with the only condition that all users have installed the Firefox add-on and know the shared password. As a future enhancement of the service, we are working in the usage of the same solution to manage the spreadsheets, but in this case some interesting problems arise: it is possible to encrypt the data of the spreadsheet, but the operations usually involved in this type of documents are not performed on client side, instead they are carried out on Cloud servers. So, it would be required a specific platform to allow processing encrypted operations with encrypted data [10,11], and this feature depends exclusively on the provider.

5 References 1.

Mei-Ling Liu , “Computacion Distribuida. Fundamentos Y Aplicaciones”, Ed. Pearson Educación, 2004 2. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, R. E. Gruber, "Bigtable: A Distributed Storage System for Structured Data", Proceedings on 7th Symposium on Operating Systems Design and Implementation (OSDI’06), November 6-8, 2006, Seattle, WA (USA) 3. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. SIGOPS Oper. Syst. Rev. 37, 5 (Dec. 2003), 164-177 4. Tim O’Reilly, “The Fuss About Gmail and Privacy: Nine Reasons Why It's Bogus”, http://oreillynet.com/pub/wlg/4707 5. Firefox add-ons website: https://addons.mozilla.org 6. T.Ristenpart,E.Tromer,H.Shacham,andS.Savage.“Hey, you, get off of mycloud: exploring information leakage in third-party compute clouds, ”InCCS’09:proceedings of, the 16th ACM conf. on Computer and comm. security, pages 199–212, NewYork, NY, USA, 2009 7. T. Negrino, D. Smith, “Javascript”, Pearson – Prentice Hall, 5th ed., Madrid, 2005 8. Mozilla Development Center: XUL [Online]. Available: https://developer.mozilla.org/en/XUL. [Accessed: April 9, 2010] 9. A. A. Tamimi, “Performance Analysis of Data Encryption Algorithms”. Available: www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf.pdf [Accessed May 11, 2010] 10. Ernest F. Brickell, Yacov Yacobi. “On Privacy Homomorphisms (Extended Abstract)”, Advances in Cryptology – EUROCRYPT'87, LNCS, Springer-Verlag 1987, pp. 117-125 11. Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Celik. “Privacy preserving error resilient DNA searching through oblivious automata”. In 14th ACM Conference on Computer and Communications Security, pages 519-528, Alexandria, Virginia, USA, October 29-November 2 2007. ACM Press.

› 48


PROCEEDINGS

Web based collaborative editor for LATEX documents F´abio Costa1 and Ant´ onio Pinto2 1

2

CIICESI, Escola Superior de Tecnologia e Gest˜ ao de Felgueiras, Polit´ecnico do Porto 8050148@estgf.ipp.pt CIICESI, Escola Superior de Tecnologia e Gest˜ ao de Felgueiras, Polit´ecnico do Porto INESC Porto, Portugal apinto@inescporto.pt

Abstract. Document editing is one of the tasks that is now seen as possible in a cloud computing environment. This is mainly due to novel applications such has Google Docs that are now offered as a service. Collaborative document editing implies that multiple authors are allowed to perform document edition, supporting functionalities such as revision, commenting and modification management. Cloud computing consist in the idea of moving all files and applications to the “Cloud”, and to allow user access from any system and platform, requiring only an Internet connection. Moreover, these applications are expected to work similarly as desktop applications. The capability of these applications being able to work without connectivity to the Internet appears has the key issue to address.

1

Introduction

Novel applications such has Google Docs are now offering document editing functionalities as a service. The large adoption of this type of applications triggered the evolution of web standards to cope with their requirements, namely HTML. This type of applications are classified as a Software as a Service (SaaS). SaaS is software that is available directly from the Internet, not requiring a typical software installation. Such eliminates compatibility problems between different platforms, being the the browser the homogenization layer. This model reduces implementations costs and makes software maintenance easier [6,9]. The present work consists in developing a web application that allows creation, storage, and collaborative editing of LATEX files with revision history support. The key innovation of this work relies on the foreseen capability to work off line. There are some rich-full LATEX editors on line [2,3,4], others focus on particular uses of LATEX such has equation editing [5]. However none allows its use without a permanent Internet connection.

49 ‹


2nd Cloud Computing International Conference

Listing 1.1. Google Gears sample manifest file { ” betaManifestVersion ” : 1 , ” version ” : ” my version string ” , ” r e d i r e c t U r l ” : ” l o g i n . html ” , ” entries ” : [ { ” u r l ” : ” main . html ” , ” s r c ” : ” m a i n o f f l i n e . html ” } , { ” u r l ” : ” . ” , ” r e d i r e c t ” : ” main . html ” } , { ” u r l ” : ” main . j s ” } { ” u r l ” : ” formHandler . html ” , ” i g n o r e Q u e r y ” : t r u e } , ] }

2

Off line operation

The support for temporary off line operation, and the required data synchronization when the service comes back on line, can be supported by Google Gears [8] or, more recently, by HTML 5 [7]. 2.1

Google Gears

Google Gears is a JavaScript API package that includes a local server, a database and a worker pool. The main purpose of the local server is to enable web applications to start and operate without Internet connectivity by caching and serving resources locally, using HTTP. The database allows the developer to store users’s structured data on the client side that, when the application reconnects to the Internet, will demand synchronization with the main database server. The worker pool allows for the execution of JavaScript functions in the background without hampering the main page execution. For instance, this asynchronous operation of the worker pool is helpful when synchronizing large amounts of data and still maintaining the application responsiveness.

The local server is the component that manages the application specific cache. To do so, Google Gears makes available the ResourceStore and the ManagedResourceStore classes. The first implements a typical URL cache, storing locally ad-hoc URLs such as PDF files. The latter implements mechanisms to automatically download and update a set of URLs identified in a manifest file.

› 50


PROCEEDINGS

Listing 1.2. HTML 5 sample manifest file 1 CACHE MANIFEST 2 NETWORK: 3 comm . c g i 4 CACHE: 5 images / sound竏段 c o n . png 6 images / background . png 7 s t y l e / default . c s s

An example manifest is shown in Listing 1.1. The manifest file is composed of attribute-value pairs that identify the contents to be cached. Namely, the attribute entries lists the URLs of all resources to be used by the application while operating without connectivity. 2.2

HTML 5

On the other hand, the newer version of HTML (HTML 5) contains features that help web applications to operate off line, using manifest files on the web server similarly to Google Gears. Manifests include a list of files required by the application when working off line. The web browser then stores these files on a local cache so that they can still be accessed even if the Internet connection is lost.

An example HTML5 manifest is shown in Listing 1.2. These manifests may composed of up to three sections: cache; network; fall back. The first section (Lines 4 to 7) represents the resources that will be downloaded and cached locally, which will be used instead of the online resources whenever there is no Internet connection. The network section (Lines 2 and 3) enumerates the resources that must never be cached locally, always requiring an Internet connection to be used. A fall back section can also be used to identify substitutes for online resources that were not cached successfully, for whatever reason.

Listing 1.3 exemplifies how to associate an HTML5 cache manifest to an HTML page. Such association, shown in line 2, must be done for every HTML page that will require offline operation.

51 窶ケ


2nd Cloud Computing International Conference

Listing 1.3. Sample inclusion of a manifest file in HTML < !DOCTYPE HTML> <html m a n i f e s t=” c a c h e . m a n i f e s t ”> <body> ... </body> </html>

2.3

Summary

Both Google Gears and HTML5 support the development of web applications that are capable of operating in scenarios of intermittent connection to the Internet. The use of manifest files that list the resources to be cached locally is present in both approaches. The major drawback of HTML5 resides in the fact that it requires an active Internet connection in the initial access to the web application. Google Gears, by including its local server, is capable of serving the application even if there is no active Internet connection. The major drawback of Google Gears resides in the fact that it is not a open standard, what reduces its availability in the client desktops. Google Gears also demands its installation in the client desktops prior to its use.

3

Related work

Multiple web based LATEX documents exist online, namely ScribTeX [3], MonkeyTex [2] and Verbosus [4]. Table 1 compares their functionalities. In particular, it shows that none of the three enable off line operation, Verbosus does not allow document editing by more than one user, only ScibTex enables users to impose document access permissions, and MonkeyTex does not render LATEX files in to PDF files.

ScribTex Verbosus MonkeyTex

› 52

Off line operation File sharing Permission management PDF N Y Y N N N N Y N Table 1. Comparison of online Latex document editors

rendering Y Y N


PROCEEDINGS

4

Proposed solution

The proposed solution consists in a web application for collaborative editing of LATEX documents, using the novel features of HTML5 combined with the Latex document segmentation and latexdiff [1]. The requirements identified for this application include: 1. 2. 3. 4. 5.

User and session authentication User management Userâ&#x20AC;&#x2122;s permissions management Revision history File differences mark-up

The first requirement imposes that a user must not access the application without being previously authenticated. The second requirement imposes that each document owner must be able to identify other users that will be able to collaboratively edit one or more Latex documents. The third requirement imposes that the each document owner must be able to specify which type of operations the remaining users are allowed to perform over the document. The forth requirement imposes that the document owner must be able to view all modifications made to documents. The fifth requirement imposes that users, when editing the same document, must be able to visibly identify changes introduced by other users. The proposed solution, named LATEX Web Editor (LWE), will make use of: HTML5 to support off line operation of the application; latexdiff to generate PDF files that visually mark up significant differences; and of the Latex document segmentation, which will enable synchronization and manipulation of document parts while being edited by multiple users.

5

Conclusion

We propose to develop a web based collaborative LATEX editor that may operate despite of temporary losses of Internet connection, using the novel features of HTML5 that enable off line operation.

References 1. latexdiff, available on line at http://www.ctan.org/tex-archive/support/latexdiff, April 2010. 2. Monkeytex, available on line at http://monkeytex.bradcater.webfactional.com, April 2010.

53 â&#x20AC;š


2nd Cloud Computing International Conference

3. Scribtex, available on line at http://www.scribtex.com, April 2010. 4. verbosus, available on line at http://www.verbosus.com, April 2010. 5. CodeCogs. Online equation editor, available on line at http://www.codecogs.com/components/equationeditor/equationeditor.php, April 2010. 6. M. Corlan D. Nickul and J. Wilber. Software as a service: A pattern for modern computing. May 2009. 7. Ian Hickson. Html5 (including next generation additions still in development). February 2010. 8. Omar Kilani. Taking web applications offline with gears. August 2007. 9. Lizhe Wang and Gregor von Laszewski. Scientific cloud computing: Early definition and experience. October 2008.

â&#x20AC;ş 54


PROCEEDINGS

Managing Cloud Frameworks through Mainstream and Emerging NSM Platforms Pedro Assis School of Engineering Porto Polytechnic Institute Portugal pfa@isep.ipp.pt

Abstract. In the future Cloud interoperability shall be a key requirement, notably in hybrid model federated spaces—a scenario that can be envisaged for “academic Clouds.” This paper proposes the integration of Cloud computing software frameworks, commonly called Infrastructure as a Service, with Network and Systems Management (NSM) platforms. In spite of current efforts addressing Cloud interoperability, the author argues that state of the art management technologies can provide such support. This proposal envisages the development of several adapters to expose Cloud frameworks data as SNMP agent, CIM provider and SPARQL endpoint. Hence, Clouds monitoring, configuration and event handling heterogeneities are endorsed through the integration with mainstream and emerging management domains. Design issues concerning the development of a CIM provider for OpenNebula are highlight. Keywords: Network and Systems Management Standards, Cloud Computing, IaaS Management

1

Introduction

According to Mell and Grace, NIST (National Institute of Standards and Technology) researchers, Cloud Computing is both a deployment and service model [1] that aims ICT (Information & Communication Technologies) platforms transformation into elastic, highly available, fault tolerant, secure and multi-tenant systems. As such evolution takes place; it is expect that ICT technicians will focus their work on their companies’ core business and not on technology complexity. This complexity has been referred by IBM researchers Kephart and Chess as the “main obstacle to further progress in the ICT industry” [2]: Complexity yields from the deployment of larger, more sophisticated computer-based systems, revealing an increasing need to access “everything”, “anywhere” at “anytime.” The IBM Autonomic Computing Manifest identifies system complexity and Human inability to manage it as key issues that must be addressed. According to IBM, to enable sustain evolution of ICT platforms the solution, and challenge, is to develop self-managing computer-based systems. This

55 ‹


2nd Cloud Computing International Conference

vision, named Autonomic Computing, relates with natural self-organizing systems, which account for large numbers of interacting entities at different levels. As Cloud computing technology matures different platforms are being deployed that make use of specific interfaces and tools to implement management functions. This proposal addresses management interoperability between Cloud computing software frameworks (e.g. OpenNebula, Eucalyptus and Nimbus), and between these and mainstream NSM platforms—Simple Network Management Protocol (SNMP) and Web-Based Enterprise Management (WBEM)—, as well as with emerging semantic Web technologies that are being applied to NSM—Resource Description Framework (RDF) and Web Ontology Language (OWL). The development of several adapters to expose Cloud Computing Frameworks (CCF) data as SNMP (sub)agent, CIM (Common Information Model) provider and SPARQL (Simple Protocol and RDF Query Language) endpoint, promotes the integration of CCF management with current management domains. It is common sense that Cloud computing adoption will benefit from synergies with other research and standardization initiatives, namely in what concerns management. Why? First, promoting Cloud computing frameworks integration with mainstream management domains, CCF management shall profit from widely deployed management standards and widespread knowledge regarding their use. Secondly, CCFs will capitalize from emerging management technologies and tools, which address contemporary management requirements. Thirdly, NSM platforms offer a common “interface” to unify Cloud frameworks monitoring, configuration and event handling. Although the proposed approach sounds valuable, several questions must be investigated in the course of this research: What Cloud computing scenarios are most likely to profit from this work? What initiatives, including standardization efforts, are taking place? What are the CCF management requirements? Can Cloud computing management be seamlessly integrated in current management domains? To answer these questions two case studies are presented. The first one relates with the identification and analysis of a scenario that shall benefit from this research—a case study on European academic Clouds. The second aims to evaluate architectural and design issues of OpenNebula’s management integration within the WBEM/CIM platform. This effort is based on OpenPegasus management broker. The remaining of this paper offers an overview of network and systems management (Section 2) based on three key questions: Why, What and How. In Section 3 a brief introduction to Cloud computing is given. A Cloud federation hybrid model for academic Clouds is discussed in Section 4. In Section 5, related research and project proposal are “put on display.” As proof-of-concept a case study on WBEM/CIM and OpenNebula CCF is discussed in Section 6. Main conclusions and further work are presented in Section 7.

› 56


PROCEEDINGS

2

Network and Systems Management: Why, What and How?

Management functional areas follow OSI’s classification—Fault, Configuration, Accounting, Performance and Security (FCAPS). Fault, accounting and performance account for different views (e.g. abnormal operation detection) of system monitoring, while configuration and security are related to system control (e.g. user access). The deployment of such management functions enable network and systems administrators to pursuit users’ and organizations’ requirements. Some of the most cited characteristics a system must provide from the users’ perspective are (no particular order) improved automation, personalization and easy to use; security and monitoring features; adequate response time and restore capability. On the other hand, from the organizations’ point of view, the most relevant features are to be able to control corporate strategic assets, complexity and costs; reduce downtime and improve services; and support integrated management. In the 1990s, to address the evolution of networks and computer-based systems, to which centralized management paradigm did not comply with, distributed management found its way out of research labs. Although earlier versions of TCP/IP SNMP only supported a “weak” form of distribution (based on the Client/Server paradigm), SNMPv3, RMON (Remote Monitoring) and OSI CMIP (Common Management Information Protocol) support management by delegation [3] (hierarchical management). In recent times, the challenge has been to leap from mere delegation to collaboration (between entities involved in the management process). Collaboration requires a “strong” form of distributed paradigm based either on object distribution or code mobility. In what concerns object distribution, SNMP Script MIB, Sun JMAPI (Java Management API), OMG CORBA (Common Object Request Broker Architecture) and DMTF WBEM are examples of such approach. These systems support object distribution across heterogeneous environments (object model), supporting a set of interactions and access to common services (reference model). In what concerns code mobility, there two kinds of mobility: strong and weak [4]. In the first case, the management code plus the execution state migrates between manageable nodes. Telescript, Agent Tcl and Emerald are examples of frameworks that support strong mobility. On the other hand, weak mobility concerns with code migration, meaning that the embedded management tasks are reinitialized every time they move to another location. Mole, Tacoma, M0, Facile, Obliq and Safe-Tcl are examples of such platforms. Although object and code distribution reveal true distributed systems, these approaches only address the “What” and “How” questions—answers to these questions should disclose object and code distribution strategies, respectively. To achieve true collaboration the “Why” question must also be tackled. Answers to such question are goal driven and should provide the required guidelines to the previous questions. The Semantic Web effort (RDF, OWL, SPARQL, etc.), and Distributed Artificial Intelligence (DAI), can play an important role in the deployment of semantic management environments. Further information regarding distributed network and systems management taxonomies can be found in [5].

57 ‹


2nd Cloud Computing International Conference

3

Cloud Computing

Cloud computing paradigm has been under scrutiny of researchers and business. Inbetween criticisms and praises, Cloud computing is affirming itself as being capable of integrate, in its ecosystem, existing technologies and tools. In the author’s view, the technologies and standards reuse, on demand provisioning (elasticity) and new business model (pay-as-you-go), are among Cloud computing highlights that justify this paradigm added value for ICT evolution. The roots of Cloud computing lay in utility computing back in the 1990s, as Application Service Providers (ASP) started to deliver software as a service. Web services followed, and with them the promise of a new model for software delivery based on a registry for dynamic binding and discovery. Tightly couple with Web services, Service-Oriented Architecture (SOA) generalized the service provider-consumer pattern. Finally, Grid computing stands side-by-side with Cloud computing, although the latest offers much more than a simple batch submission interface. According to Keahey et al. “Cloud computing represents a fundamental change from the Grid computing assumption: when a remote user “leases” a resource, the service provider turns control of that resource over to the user” [6]. In the real world, Cloud computing should provide the means to handle user demand for services, applications, data, and infrastructure in such a way that these requests can be rapidly orchestrated, provisioned, and scale up/down through a poll of resources related with computing, networking, and storing facilities. Cloud services are made available through different deployment models. Mell and Grace envisage the followings—Private, Community, Public and Hybrid Clouds. A Private Cloud is operated by a single entity, while a Community Cloud is operated by a set of organizations that share common interests. Public Clouds are made available to the public or large industry group, they are own by an organization that sells Cloud services. A Hybrid Cloud is a composition of two or more Clouds as described before. Such organization models do not require in-house Cloud infrastructure, neither its management nor control. This can be provided by a third party under an outsourcing agreement. OpenCrowd’s (www.opencrowd.com) taxonomy establishes four areas for Cloud computing—Infrastructure services (e.g. storage and computational resources), Cloud services (e.g. appliances, file storage and Cloud management), Platform services (e.g. business intelligence, database, development and testing); and Software services (e.g. billing, financial, legal, sales, desktop productivity, human resources and content management). On the other hand, NIST advises that Cloud computing should offer three main types of services, each addressing specific user needs—Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). IaaS offers the provision of raw computing resources including processing, storage, and network. The consumer has control over the assigned resources, not over the underlying Cloud platform. Among the examples of Cloud computing software frameworks are OpenNebula, Eucalyptus and Nimbus. PaaS provides a development platform, comprising programming languages and tools, which enables consumer to develop and deploy applications onto the Cloud infrastructure. Following the e-Science initia-

› 58


PROCEEDINGS

tive (www.eu-egee.org), European higher education institutions currently providing Grid Computing services would be integrated in this virtualized infra-structure, offering Grid services as platform as a service. This way, it would be possible the cohabitation of native Grid applications as resources in a Cloud computing ecosystem, likewise Google Apps, Microsoft Windows Azure, SalesForce.com and others. Finally, SaaS provisions applications/services running on top of the Cloud platform. Consumer doesn’t have any control, but over user configuration data (e.g. Facebook, Gmail). The main difference between these two taxonomies is due to the emphasis that OpenCrowd places on the need to “create customize Clouds,” while Mel and Grace work does not.

4

Cloud Federation Hybrid Model: A case study on European academic Clouds

Cloud computing is an opportunity to promote Higher Education Institutions (HEIs) cooperation in what concerns knowledge and resource disclosure, as it provides technological solutions to share infrastructures, applications, services and data. Cloud technology enhances the ability to cooperate, speed up processes, increase services availability, and resources scaling with potential reduction costs. A federated space of academic Clouds will embrace HEIs private Clouds, as well as public resources. Such hybrid model requires the interoperability among infrastructures to overcome technology heterogeneity. A federation of European academic Clouds would involve higher education institutions with different backgrounds that, eventually, had no prior contact with Cloud computing technology. Such multidisciplinary cooperation between partners is a strong point: The input of participating institutions does not originate in one specific area but it rather comprises different areas, bringing together institutions that had not had joint projects and promote cooperation between different areas. HEI globalization allows students and higher education staff to acquire the demanded proficiency to ensure their success in a global workplace. Such competences are no longer confined to scientific and technical issues, but include language skills, as well as social, cultural, political and ethic knowledge. Cross-credit processes and international dual awards are within initiatives that HEIs are already deploying and which require technological support to make such efforts effective. Cloud computing SaaS services might present feasible solutions to this use-scenario, as it is required to develop common interfaces to promote interoperability among HEIs academic/administrative applications and information systems. One area that must be addressed is the enhancement of the European Authentication and Authorization Infrastructure (AAI) to support secure academic information transactions using standard procedures related with metadata description and information mapping, authentication, and data confidentiality [7]. It is likely that some of the open issues can be challenged using Semantic Web and Policy management standards, namely in what concerns the enrichment of information description, data consolidation, account management interoperability and Service Level Agreements (SLA).

59 ‹


2nd Cloud Computing International Conference

Internationalization requires a steady flow of financial support for institutions and mobility scholarships. Eventually, in years to come, this can reveal to be problematic for South European countries like Portugal. According to OECD (Organization for Economic Co-operation and Development), overall funding per students in OECD countries “has slowed down since the early 1990s” [8]. The same study concludes that direct public funding in 2003 was still the main source of revenue for most of the European public HEIs, namely Portuguese (about 90%), while in Asia/Pacific such scenario was quite different. In 2003, Japan, Republic of Korea and Australia direct public funding was less than 40%, being the remaining at household’s expenditures. In OECD countries private funding has a small impact on HEIs budget, except in the United States, but it has grown 5% since early 1990s to 2003. Despite these facts, internationalization is not only about costs, but an investment that provides important direct revenues: Australia international students (in-bound) accounted, in the academic year 2007-08, for the third place in the export balance ($14.1 billion). The trend should be the establishment of partnerships and through them improve HEIs portfolio, attract more foreign students, and reduce operational costs by sharing “academic commodities.” It is in this context that Cloud computing paradigm can make a difference: Datacenters consolidation, cluster resources sharing, and the usage of third party (business) Clouds for academic services (email and others) shall allow, in the mid term, financial advantages as costs associated with software and hardware acquisitions (on-site installations), and technical staff are reduced. The University of Westminster reports, “The cost of using Google Mail was literally zero. It was estimated that providing the equivalent storage on offer on internal systems would cost the University around £1,000,000” [9]. To gain economical advantages from Cloud computing, HEIs can, on one hand, start to use free/low cost services provided by business (education programs), on the other hand, migrate theirs monolithic datacenters to the Cloud (private). However, this is just “the tip of the iceberg.” The deployment of a Cloud community whose members, assuming both the provider and consumer roles, openly cooperate in a Cloud federation supporting transparent and elastic provision, i.e. allowing the dynamic scale up/down of HEI’s resources (IaaS), will lead to the full potential of Cloud computing paradigm. In this case, each federated HEI should take the provider role and contribute to a common resource poll, accept common management and control policies, deploy common provision rules, and agree with SLA principles. In this context, a federation of identity providers must be established, similar to the AAI platform deployed to support the federated space of Learning Management Systems (LMS).

5

NSM & CCF Partnership: Related research and project proposal

Although several initiatives are taking place toward Cloud computing frameworks interoperability, none is actually widespread supported as each IaaS platform has a specific management console and internal API. CCF management interoperability is

› 60


PROCEEDINGS

Fig. 1. Integration of CCF management with current NSM platforms

being addressed, among others, by Open Grid Forum OCCI (Open Cloud Computing Interface) working group, DMTF Open Cloud Standards Incubator and Zend’s Simple Cloud API. OCCI (www.occi-wg.org) is developing an “API specification to remote management of Cloud computing infrastructure” to support the deployment, autonomic scaling and monitoring required by the life-cycle management of virtualized infrastructures. With the Open Cloud Standards Incubator (www.dmtf.org/about/cloudincubator), DMTF is working toward the standardization of Cloud management and interactions to facilitate interoperability. Finally, Simple Cloud API is designing a common API to support file and document storage, as well as simple queue services. The present proposal (Figure 1) has similar goals as the above initiatives as it addresses CCF management interoperability through its integration with current management domains. However, this proposal main focus is to reuse current management technologies, promote the identification of eventual shortcomings and layout feasible solutions. Pursuing such “integrated management scenario,” it is proposed the development of a set of adapters to “expose” CCF monitoring, configuration and event handling data as a SNMP (Simple Network Management Protocol) agent (or subagent), CIM (Common Information Model) provider, and SPARQL (Simple Protocol and RDF Query Language) endpoint. The SNMP adapter consists of a Management Information Base (MIB) module and a SNMP (sub)agent. The MIB describes Cloud framework’s data model, limited to what is exposed through the proper API. The adapter processes SNMP Packet Data Units (PDU), BER (Basic Encoding Rules) encoded, invoking native (CCF) scripts. As far as WBEM adapter is concerned, it comprises a CIM provider, a CIM-to-XML (Extensible Markup Language) encoder/decoder and a HyperText Transfer Protocol (HTTP) server. CIM provider interacts with Cloud computing framework, while the remaining modules support the interaction with WBEM platform using HTTP/XML.

61 ‹


2nd Cloud Computing International Conference

Fig. 2. WBEM/CIM framework architecture

Finally, the SPARQL endpoint enables the storage of CCF’s data in RDF/OWL triplets stores enabling its integration and reasoning with other data sources (eventually in the Linked Data space). To provide an integrated view of heterogeneous IaaS infrastructures, it is required to describe the exposed (manageable) data according to a common RDF/OWL ontology or explore ontology merging mechanisms.

6

A Case Study on WBEM/CIM and OpenNebula

The Distributed Management Task Force WBEM/CIM standards—from an industry initiative in 1996, to an industry standard in 1999—use Web technologies to promote management interoperability between different management domains. Such standards reflect the DMTF understanding about the fundamental requirements for management success: common data description; on-the-wire encoding; and a set of operations to manipulate data. The Common Information Model infrastructure (currently in version 2.6) is an object-oriented modeling tool, described by MOF (Management Object Format) language and supported by an UML profile (version 1.0.0b) developed by DMTF in partnership with OMG (Object Management Group). CIM establishes a three layer model: Core, Common and Extension models. Core model (version 2.25.0) captures notions applicable to all areas of management; while Common model (version 2.25.0) captures notions common to particular management areas, but independent of technology or implementation. The Extension models are technology-specific extensions of the previous ones. The Web-Based Enterprise Management (WBEM) allows CIM implementations to operate in an open, standardized manner. To achieve this it encapsulates in HTTP messages XML documents describing CIM constructs and operations (versions 2.3.1 and 1.3.1, respectively). HTTP messages expose CIM operations information in its headers to allow efficient firewall/proxy handling. Also, a CIM query language (version 1.0.0) specification is available. The WBEM/CIM framework architecture is presented in Figure 2. In this architecture, the OpenNebula provider will act as a broker between CIM Object Manager (CIMOM) and OpenNebula CCF. Several open source projects provide integrated WBEM/CIM frameworks: OpenPegasus (currently in version 2.10.0), developed and maintained by The Open Group; the WBEM Services (version 1.0.2), based on JSR 48, by Sun Microsystems; the OpenWBEM (version 3.2.2), by Quest Software and Novell; and the SNIA CIMOM, now obsolete. OpenNebula’s provider specification

› 62


PROCEEDINGS

Fig. 3. OpenNebula provider interfaces

establishes OpenPegasus as the development framework (Figure 3) due to CMPI (Common Manageability Protocol Interface) support, performance issues, completeness and development platform. Also, this project is actively supported, providing updates in a timely manner. Besides full-fledge WBEM/CIM frameworks, it is also an interesting option the IBM WBEM/CIM server called SFCB (Small Footprint CIM Broker) developed by SBLIM project. Such product is available for many Linux distributions and supports CMPI interface. According to results published by IBM, SFCB has a smaller footprint and in some scenarios better response time than OpenPegasus (tests scenarios are described in [10]). The CIMOM/OpenNebula provider interface (Figure 3) will be based on The Open Group’s CMPI, version 2.0 [11]. The main advantage of using such interface is the provider re-use in any WBEM-based management server supporting this interface— “write once run everywhere.” This interface supports C bindings, although C++ and others are also possible (but not standardized), reduces the effort to write a provider (e.g. memory management issues), offers interoperability between CMPI-compliant CIMOMs, supports all common CIMOM functions, it is scalable (i.e. thread-safe), and remote management (using Remote CMPI) is also possible. The OpenNebula provider/CCF interface (Figure 3) can be based either on XMLRPC interface (version 1.4), which has support for Java, Ruby and C/C++, or OCA (OpenNebula Cloud API), version 1.4, which has Ruby and Java bindings. Both interfaces provide access to the data related with framework’s monitoring, configuration and event handling. Such data is organized in a set of categories, e.g. host management, virtual machine management, virtual network management, and user management. These concepts are suitably described by a set of CIM classes, namely CIM_Application, CIM_User, CIM_Device, CIM_Network, CIM_Security, and CIM_System. But, to ensure consistent description of management domains and attain interoperability between management applications the guidelines provided by DMTF management profiles must be followed. These profile documents identify the classes that must be instantiated, as well as properties, methods and values that must be manipulated to “represent and manage a given domain.” DMTF provides several profiles documents related with virtualization, namely: − Resource Allocation Profile and Allocation Capabilities Profile (both abstract patterns) − System Virtualization Profile and Virtual System Profile (autonomous profiles)

7

Conclusion

In this paper the author presented a feasible approach, based on an OpenNebula CIM provider, to endorse Clouds management interoperability through their integration

63 ‹


2nd Cloud Computing International Conference

− System Virtualization Profile and Virtual System Profile (autonomous profiles)

7

Conclusion

In this paper the author presented a feasible approach, based on an OpenNebula CIM provider, to endorse Clouds management interoperability through their integration with network and systems management domains: NSM platforms offer common interfaces to unify Cloud frameworks monitoring, configuration and event handling. Also, the development of NSM adapters shall allow the integration of CCF platforms management with the management of the remaining resources, virtualized or not. The proposal merits were identified and an application scenario based on a hybrid model federated space of academic Clouds was analyzed. Although the OpenNebula CIM provider is still an ongoing effort and some use-scenarios are still to be addressed, from the study (based on on-the-wire encoding, data descriptions and supported operations) and implementation made so far (based on OpenPegasus management broker), the author in confident of the WBEM/CIM suitability to seamlessly support all OpenNebula’s management functions. In the future a similar provider is envisaged for EC2-based Clouds (Figure 3). Only then true interoperability in a federated Clouds scenario can be evaluated.

References 1. Mell, P. and. Grace, T. (2009) NIST Definition of Cloud Computing v15. NIST. 2. Kephart, J. O. and Chess, D. M (2003) "The Vision of Autonomic Computing", IEEE Computer, Vol. 36, Nº.1, pp 41-50. 3. Goldszmidt, G. and Yemini, Y. (1995) “Distributed Management by Delegation”, Proceedings of the 15th International Conference on Distributed Computing Systems. 4. Ghezzi, C. and Vigna (1997) “Mobile Code Paradigms and Technologies: A Case Study”, Mobile Agents 97, Rothermel, K. and Popescu-Zeletin, R. (Eds.), Lecture Notes In Computer Science, Vol. 1128, Springer-Verlag. 5. Martin-Flatin, J-P and Znaty, S. (2000) “Two Taxonomies of Distributed Network and Systems Management Paradigms”, Emerging Trends and Challenges in Network Management, Ho, L. and Ray, P. (Eds.), Plenum Publishers. 6. Keahey, K., Tsugawa, M., Matsunaga, A. and Fortes, J. (2009) “Sky Computing”, IEEE Internet Computing, Vol. 13, Nº. 5, pp 43-51. 7. Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., Lee, G., Patterson, D. A., Rabkin, A., Stoica, I. and Zaharia, M. (2009) Above the Clouds: A Berkeley View of Cloud Computing. Electrical Engineering and Computer Sciences, University of California at Berkeley, Technical Report No. UCB/EECS-2009-28. 8. Kärkkäinen, K. (2006) “Emergence of Private Higher Education Funding Within the OECD Area”. OECD. 9. Sulton, N. (2010) “Cloud computing for education: A new dawn?”, International Journal of Information Management, Elselvier, Vol. 30, pp 109-116. 10. Schuur, A. (2005) SFCB: Small Footprint CIM Broker. Linux Technology Center System Management, IBM. 11. The Open Group (2006) Systems Management: Common Manageability Programming Interface (CMPI), Issue 2.0.

› 64


Euro o Clou du


roruE oud olC

2nd CLOUD COMPUTING INTERNATIONAL CONFERENCE

CloudViews 2010 Proceedings  

CloudViews 2010 Cloud Computing International Conference Proceedings

Read more
Read more
Similar to
Popular now
Just for you