Issuu on Google+

The International Journal of Engineering And Science (IJES) ||Volume|| 1 ||Issue|| 1 ||Pages|| 13-17 ||2012|| ISSN: 2319 – 1813 ISBN: 2319 – 1805

Research of elastic management strategy for cloud storage 1

SHAO Bi-lin, 2BIAN Gen-qing, 3ZHU Xu-dong

1, First Author

School of Management, Xi’an University of Architecture and Technology, School of Information and Control Engineering, Xi’an University of Architecture and Technology, 3 Schools of Information and Control Engineering, Xi’an University of Architecture and Technology ,

*2, Corresponding Author

-------------------------------------------------------------Abstract------------------------------------------------------------In order to sol ve those issues including limited storage capacity, high cost of storage and fault recovery in traditional HDFS , cloud storage can effectively solve these problems by using virtual resources of the IaaS based on HDFS . But it cannot assure cloud storage to utilize virtual resources more effectively. In order to solve these problems, the paper proposes a elastic cloud storage framework based on HDFS , and introduces the thought of virtual resources management into the framework, and proposes a elastic management strategy of virtual resource allocation and scheduling based on this framework. The simulation experiment shows cloud storage can effectively improve the efficiency in the use of virtual resources.

Keywords: HDFS; Elastic Cloud Storage; Virtual Resource; Resource Allocation and Scheduling; feedback control theory.

------------------------------------------------------------------------------------------------------------------------Date of Submission: 19, October, 2012 Date of Publication: 10, November 2012 -----------------------------------------------------------------------------------------------------------------------I

Introduce

With the development of Web2.0 technology, especially the popularity of social network and popular, the traditional storage technologies (network Storage or distributed storage) are difficult to meet the demand of their vast amounts of unstructured data storage [1]. So more and more commercial websites began to use cloud storage architecture based on HDFS, such as Google, Amaze, Facebook, etc., the development is gathering mo mentu m. Design ideas of HDFS -based cloud storage derived fro m the virtual cluster computing, which is expanded fro m a single v irtual storage servers to a virtual storage server cluster (ie: Hadoop virtual cluster). Hadoop virtual cluster can be comb ined with mu ltiple virtual storage servers, just as a storage server which has a great computing power and high-throughput provides users with a transparent access interface [2]. However, the existing cloud storage has certain limitat ions in study and implementation, which concerned for high-throughput and high reliability excessively, and ignored the cloud storage’s actual use rate on virtual resource, thus it seriously affects the cloud storage application and popularization. Cloud storage for load-handling capability and disturbance switching capability is important, but www.thei jes.com

we should be even more concerned about the actual efficiency of cloud storage to the virtual resources. Therefore, through introducing the thought of virtual resource management, this article proposes a kind of elastic storage architecture and elastic management strategy of virtual resources. The simu lation results show that cloud storage can effectively improve the efficiency in the use of virtual resources. II HDFS on El astic Cloud Storage The paper makes the virtual resource scheduling units as Slot. Slot represents the characterizat ion of virtual resources calculation ability, wh ich is decided by the virtual resource CPU and memory size. Set CPU as the unit CPU capacity, taking 1GHz; mem on behalf of the unit memo ry capacity, taking 1GB; a virtual server Pi is able to provide a number of Slot

 cpui memi  Sloti  min  ,   cpu mem 

The IJES

(1)

Page 13


Research of elastic management strategy for cloud storage Among them,   says round down, cpui is the

dynamically start, stop, move a service instance, equivalent dynamic start, stop, migration of a

CPU size of virtual server Pi , memi is the memory

HDFS data node; secondly, through the virtual

size of virtual server Pi . In a mo ment t , HDFS

resource management platform based on the

loads is loadt , the nu mber of virtual server is N ,

virtual resource management capabilit ies for the

then available virtual resources:

HDFS data node provides a flexib le v irtual

N

Slottotal   Sloti

(2)

of Hadoop virtual cluster preformed resource N

i 1

And HDFS virtual resource use rate:

u

loadt  Slottotal

size, and allows mult iple HDFS shared resources;

loadt  cpui

N

 min  cpu 

i 1

resource allocation strategy, dynamic adjustment

total

at the same time, through the encapsulation of a

,

memi  memtotal 

virtual resource scheduling service, real-t ime monitoring of HDFS and virtual resource load

(3)

informat ion, and according to the load informat ion

If the HDFS v irtual resource usage in an expected

and the elastic scheduling strategy on data node,

high and low water level, it is judged as effective

HDFS elastically scheduling. Thus achieved

resource. The problem to be solved is dynamic

without affecting the existing HDFS structure and

programming R, such that when a certain amount

function ,to solve the resource waste and the

of loadt . 0  ulow  u  uhigh  1 . Therefore, to

problem of resource sharing, making use of many existing virtual resource management platform to

make the HDFS schedule resource elastically

achieve them is proposed, such as OpenNebula [3][4],

according to the real-time load state and resource

Platform[5]. in order to simulation test, virtual

use rate, it must be introduced the virtual resource

resource management conduct ISF of Platform

flexib ility

will be used.

management

strategy.

Flexible

management strategy should include: flexible

Application Program

Application Program

Application Program

HDFS Client

HDFS Client

HDFS Client

resource allocation strategy and flexib le resource scheduling strategy. But if the new resource

Internet/Intranet

management component, or resource scheduling

Elastic scheduling service resources

Hadoop Cluster Service(HDFS2)

Hadoop Cluster Service(HDFS1)

strategy is added to the existing HDFS, it will not Vitual Resource Management Platform(VM Orchestra)

only increase the complexity of HDFS, but also increase the HDFS manager addit ional cost, and

Service instance

Service instance

Service instance

Service instance

Service instance

Service instance

Service instance

Service instance

Service instance

Service instance

increase the load, thus affect the overall HDFS

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

Hadoop-based elastic cloud storage

load capacity and stability.

Hypervisor(XEN)

This paper introduces the virtual resource

Hypervisor(XEN)

...

Amazon EC2

Physical resource pool

management platform, and puts forward Elastic cloud storage system based on HDFS. First of all,

Fig 1. Elastic Cloud Storage based on HDFS

the HDFS cluster is encapsulated into a virtual

Shown in Figure 1, a elastic cloud storage

resource management platform as a service, each

based on HDFS is made up of four parts : a virtual

service instance represents a HDFS data node. The

resource, virtual resource management platform

use of virtual resource management plat form for

(VM Orchestra), Hadoop virtual cluster service,

service instance lifetime management to realize

Hadoop virtual cluster service flexib le scheduling.

virtual resource management platfo rm to the

Virtual resource is the carrier of Each HDFS

HDFS data node lifetime management, i.e., by

data nodes, managed by the virtual resource

www.thei jes.com

The IJES

Page 14


Research of elastic management strategy for cloud storage management

platform.

Virtual

resource

Elastic

cloud

storage

virtual

resource

management platform can be deployed to run a

allocation strategy, consists of reservation strategy,

range of services, the p latform can assign a virtual

sharing strategy, borrowing and lending strategy,

resource and start one or more service instance for

recycling strategies. Reservation and sharing

each service configuration.

strategy provides a group of static exclusive and

Hadoop virtual cluster service, namely HVCS.

shared resources belonging to HCS. Reservation

Each HVCS service represents a particular HDFS

policy allows directly to HCS allocation reserved

Hadoop cluster, consisting of a configuration and

resources, regardless of the HCS can be used or

one or more instances. In this paper, each instance

not be used, even if HCS load is small; and

of HVCS represents a HDFS data node, the

sharing strategy provides shared resources for

configuration of HVCS specifies the required

HCS, including 2 parts: shared resource belonging

informat ion when HDFS runs, and a group of

to HCS and overload using shared resource.

flexib le resource allocation strategy, namely

Shared resource does not immed iately distribute

HDFS flexib le resource allocation strategy of data

HCS resource, HCS can use resource according to

node.

the configuration of the sharing proportion only

Hadoop virtual cluster elastic scheduling

HCS has more demand, wh ich requires more

service, namely HVRS. A data node is increased,

virtual nodes, and reserve resources has been

removed or transferred fro m the HDFS cluster by

exhausted; when reserved resources and sharing

HVRS through the current HDFS resources

resources still cannot meet the demand of HCS

informat ion and the load informat ion, HDFS data

virtual node, sharing strategy allows HCS to use

node lifet ime control and service scheduling

other HCS free sharing resources. Borrowing and

interface and the provisions of flexibility principle

lending strategy are used for HCS to reserve

to perform HDFS data node elastic scheduling.

resources sharing, this strategy will take effect

HVCS resource usage was obtained by the

only when shared resource has overload. By

interface wh ich is provided by the virtual resource

borrowing, lending the idle reserved resource,

management platform, and the HDFS load

HCS can effectively use the free reserved

informat ion is provided by HDFS.

resources in the Hadoop cluster, which effectively improves the environment resources use rate.

III

Virtual Resource Elastic Management Strategy of Elastic Cloud Storage

Recycling strategy used to recycle HCS idle

Virtual resource elastic management strategy

out, support HCS in its spare time, and because of

of elastic cloud storage includes 2 parts: flexible

increasing load need more virtual node, the others

resource allocation strategy and flexib le resource

can use the resources which belongs to HCS. The

scheduling strategy. Flexib le resource allocation

research results found that, after introducing the

strategy, used to specify a set of the v irtual

virtual resource allocation strategy, compared with

resources for HDFS to ensure the elastic

the conventional HDFS fixed using a group of the

scheduling object available; the elastic scheduling

allocation of resources, cloud storage on HDFS

strategy, elastically schedules HDFS data node

available

through the usage of virtual resource, so as to

according to load state.

resources which is required sharing or borrowing

effectively use the virtual resource.

resources

are

dynamic

regulated

3.2 Virtual Resource El astic Scheduling

IV Flexi ble Allocati on Strateg y on Virtual

Strategy

Resource www.thei jes.com

The IJES

Page 15


Research of elastic management strategy for cloud storage For every HDFS, the data node elastic

Slottotal , i.e.:

scheduling strategy is composed of a three tuple structure P(S, A, R), wherein S is a sample space (Samp ling),

A

for

the

action

space

of

L(t )  s1L(t 1)  r1Slottotal (t 1)  r2 Slottotal (t  2)

implementation (Actions), R (Rule) represents

(5)

scheduling rules.

The model parameters

s1 , r1 and r2 , which

S represents sampling space, wh ich consists of the virtual node load information {u(l (hi ))}iN1 ,

can use regression minimu m variance (Recursive

wherein i represents a virtual node, hi represents

Least-Squares, RLS) method assessment. In order

the virtual node load capacity, l (hi ) represents a virtual node load, u (l (hi )) represents virtual node

to realize the flexib le control based on this model,

resource consumption. Then

equation (6):

S  {S (hi ) |{u(l (hi )) |1  i  N}}

(4)

A said elastic scheduling execution action set, A  ( Aover , Awasted ) , wherein

the equation (5) is transformed the control

Slottotal (t ) 

Aover represents

for resource use overload handling Awasted represents waste treatment action.

(6)

action,

By (3) type R knowledge, scheduling rules are

Where in Lref  L(t  1) , namely : the next time step expected HDFS load information.

defined as follo ws:

The scheduling rule R is introduced to the

* if the current HDFS in the resources

u h igher than uhigh , judged as

utilizat ion rate

resource usage overload, HCRS will perform the action Aover on the HDFS data node overload operation, namely: if u  uhigh , then Aover . * if the current HDFS resource using rate is lower than

s r 1 Lref  1 L(t )  2 Slottotal (t  1) r1 r1 r1

control equation (6), wh ich can get elastic control equation (7): s1 r2 1  r Lhigh  r L(t )  r Slottotal (t  1) if Lhigh  L(t ) 1 1 1  s1 r2 1 Slottotal (t )   Llow  L(t )  Slottotal (t  1) if Llow  L(t ) r1 r1  r1  Slottotal (t  1) otherwise  

u Namely: the HSC can work only when the

ulow , it is judged to exist HDFS

waste of resources, HCRS will perform Awasted ,

Lhigh  L(t )

(h igher

than

supply)

or

HDFS node waste processing, namely : if u  ulow ,

Llow  L(t ) (less supply), and set new Slottotal (t ) ,

then Awasted .

namely the executive Aover or Awasted .

The elastic scheduling is the process of tracking HDFS load in formation {l (hi )}iN1 , based

V

Simulati on test

Virtual resource

elastic

management

on the scheduling rules R, it schedules and controls timely HDFS's Slottotal , and realization

strategies are tested, specific methods are as

of virtual resource is dispatching. In order to

using virtual resource management product ISF of

realize virtual resource automated flexible

Platform , including 8 adjustable virtual resource

scheduling, introduced into the classical feedback

Slots; and HDFS1 cluster contains 2 slots reserved

[6][7]

control theory , in modeling HDFS, the relations of load informat ion L  {l (hi )}iN1 and www.thei jes.com

follows: v irtual resource management platform:

resources and 1 slots of shared resources; at the same time HDFS2 cluster contains 1 slots resource The IJES

Page 16


Research of elastic management strategy for cloud storage reservation and 1 slots of shared resources; load

difficult ies to share resources among mult iple

sampling

HDFS, and the problems and difficulties are

through

the

use

of

CPU

rate

measurement, high water level uhigh and low

caused

by

the

lack

of

flexib le

resource

management pattern. Against this problem, this

water level ulow is set to 85%, 40%; Aover to

paper proposes elastic cloud storage framewo rk

increase a slots, Awasted to remove a slots; HDFS

based on the HDFS, and puts forward HDFS data

testing tools as HDFS Bench. Testing server test

node flexible resource allocation strategy and

tool over a period of time, respectively for 2

elastic scheduling strategy according to its

HDFS cluster time tested, through the observation

characteristic. The simulat ion experiment shows

of 2 virtual cluster with load change for virtual

that elastic management strategy can improve

resources (slots) usage. The test results as shown

HDFS existing problems, and effectively imp rove

in figure 2.

the efficiency in the use of virtual resource. Acknowledg ment This research is supported by National Natural Science Foundation of China (61073196&61272458) and Natural Science Research Foundation of Shanxi Province. China (2011JM 8026).

Reference [1]

M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B. Trushkowsky,J.Trutna,

Fig 2. Simu lation Result

Scale-independent

The results show that, when HDFS1 loading amount is small, it uses only one slots, while the

and

storage

H. for

Oh. social

Scads: computing

applications. In Proc. of CIDR, 2009. [2]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google

remainder of the resource is the part to high load

file system. In Proc. of SOSP, 2003.Volume 37 Issue 5,

HDFS2 ;with the increasing of HDFS1 loading

December 2003, Pages: 29-43.

amount, shared slots used by HDFS2 and

[3]

Open Nebula. http://opennebula.org.

borrowing slots will gradually returned to the

[4]

B. Sotomayor, R. Montero, I. Llorente, and I. Foster.

HDFS1, then their maximu m load using slots;

Capacity Leasing in Cloud Systems using the Open

With the HDFS2 loading amount gradually

Nebula Engine. In Workshop on Cloud Computing and its

decreased, HDFS2 gradually reduced the use of

Applications (CCA08)ďźŒ2009.

slots, thus released slots are gradually used by the

[5]

Platform: http://www.platform.com/

full load of HDFS1; through the use of slots

[6]

X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, P.

sharing and borrowing fro m HDFS2, wh ich

Padala, and K. Shin. What does control theory bring to

gradually reduces load. Along with its load

systems research? SIGOPS Operating Systems Review,

reduction, HDFS2 gradually reduces the use of

2009, 43(1):62-69.

slots, and tends to smooth. VI

[7]

Conclusion

Automated control in cloud computing: Challenges and

Through studying of the existing HDFS constructing

methods

and

the

H. C. Lim,S. Babu, J. S. Chase, and S. S. Parekh.

data

opportunities. ACDC '09 Proceedings of the 1st workshop

node

on Automated control for dat acenters and clouds. Volume:

controlling mode, what makes the utilizat ion rate

C, Issue: 09, Publisher: ACM, Pages: 13-18.

of HDFS for the system res ource low is the problems that the waste of resource usage and the www.thei jes.com

The IJES

Page 17


Research of elastic management strategy for cloud storage