Database systems a practical approach to design implementation and management 6th edition connolly s

Database Systems A Practical Approach to Design Implementation and Management 6th Edition Connolly

Full download at: Solution Manual: https://testbankpack.com/p/solution-manual-for-database-systems-a-practical-approach-todesign-implementation-and-management-6th-edition-by-connolly-and-begg-isbn0132943263-9780132943260/

SOLUTIONS TO REVIEW QUESTIONS

AND EXERCISES FOR PART 6 – DISTRIBUTED DBMSS and REPLICATION (CHAPTERS 24 – 26)

1

2 Solutions to Review Questions and Exercises Chapter 24 Distributed DBMSs - Concepts and Design........................................................................................3 Chapter 25 Distributed DBMSs - Advanced Concepts 11 Chapter 26 Replication and Mobile Databases....................................................................................................19

Chapter 24 Distributed DBMSs - Concepts and Design

Review Questions

24.1 Explain what is meant by a DDBMS and discuss the motivation in providing such a system.

See Section 24.1.1; motivation given at start of Section 24.1.

24.2 Compare and contrast a DDBMS with distributed processing. Under what circumstances would you choose a DDBMS over distributed processing?

Distributed processing defined at end of Section 24.1.1. Would choose a DDBMS, for example, if each site needed control over its own data, sites had their own existing DBMSs, communication costs would be significantly reduced, and so on.

24.3 Compare and contrast a DDBMS with a parallel DBMS. Under what circumstances would you choose a DDBMS over a parallel DBMS?

Parallel DBMS defined at end of Section 24.1.1. Parallel DBMSs tend to be used over short distances, usually within the same site. If requirements need distribution over sites with large geographic spread, then choice should be straightforward

24.4 Discuss the advantages and disadvantages of a DDBMS.

See Section 24.1.2.

24.5 What is the difference between a homogeneous and heterogeneous DDBMS? Under what circumstances would such systems generally arise?

See Section 24.1.3. Heterogeneous DDBMSs may arise as the result of integration of disparate systems. Homogeneous DDBMSs are more likely to be the result of a strategic decision to move to a DDBMS and implement the system in a top-down fashion.

24.6 What is the main differences between LAN and WAN?

LAN – Local Area Network and WAN – Wide Area Network. Distinction is usually based on geographic dispersal of the systems: a LAN covers a relatively short distance, for example, within an office building, a school or college, or home. WAN covers systems that are networked over larger distances. See also Table 24.2.

3

24.7 What functionality do you expect in a DDBMS?

Expect the same functionality of a centralized DBMS plus:

 extended communications services to provide access to remote sites and allow transfer of queries and data among the sites using a network;

 extended system catalog to store data distribution details;

 distributed query processing, including query optimization and remote data access;

 extended security control to maintain appropriate authorization/access privileges to the distributed data;

 extended concurrency control to maintain consistency of replicated data;

 extended recovery services to take account of failures of individual sites and the failures of communication links.

See Section 24.3.1.

24.8 What is a multidatabase system? Describe a reference architecture for such a system.

An MDBS is a distributed DBMS in which each site maintains complete autonomy (see end of Section 24.1.3). Reference architecture provided in Figure 24.5.

24.9 One problem area with DDBMSs is that of distributed database design. Discuss the issues that have to be addressed with distributed database design. Discuss how these issues apply to the global system catalog.

The question that is being addressed is how the database and the applications that run against it should be placed across the sites. Two basic alternatives: partitioned or replicated. In partitioned scheme database is divided into a number of disjoint partitions each of which is placed at a different site. Replicated designs can be fully or partially replicated.

Two fundamental design issues are fragmentation and distribution. Mostly involves mathematical programming to minimize combined cost of storing the database, processing transactions against it, and communication. Problem is NP-hard; therefore proposed solutions are based on heuristics.

The global system catalog (GSC) is only relevant if we talk about a distributed DBMS or multiDBMS that uses a global conceptual schema. Problems are similar to above. Briefly, a GSC may be either global to entire database or local; it may be maintained centrally at one site, or in a distributed fashion over a number of sites; finally, replication - there may be a single copy of the directory or multiple copies. These three dimensions are orthogonal to one another.

24.10 What are the strategic objectives for the definition and allocation of fragments?

See start of Section 24.4.

4

24.11 Describe alternative schemes for fragmenting a global relation. State how you would check for correctness to ensure that the database does not undergo semantic change during fragmentation.

Alternative schemes are: primary horizontal, vertical, mixed, and derived horizontal fragmentation (see Section 24.4).

Correctness rules are: completeness, reconstruction, and disjointness (see Section 24.4 again).

24.12 What layers of transparency should be provided with a DDBMS? Give examples to illustrate your answer. Justify your answer.

See Section 24.5.

24.13 A DDBMS must ensure that no two sites create a database object with the same name. One solution to this problem is to create a central name server. What are the disadvantages with this approach? Propose an alternative approach that overcomes these disadvantages.

See Section 24.5.1 - Naming Transparency.

Problems with the central name server, which has the responsibility for ensuring uniqueness of all names in the system, are:

 loss of some local autonomy

 performance problems, if the central site becomes a bottleneck

 low availability, if the central site fails, the remaining sites cannot create any new database objects.

An alternative solution is to prefix an object with the identifier of the site that created it. For example, a relation BRANCH created at site S1 might be named S1.BRANCH. Similarly, we would need to be able to identify each fragment and each of its copies. Thus, copy 2 of fragment 3 of the branch relation created at site S1 might be referred to as S1.BRANCH.F3.C2. However, this results in loss of distribution transparency.

An approach that resolves the problems with both these solutions uses aliases for each database object. Thus, S1.BRANCH.F3.C2 might be known as local_branch by the user at site S1. The DDBMS has the task of mapping aliases to the appropriate database object.

24.14 What are the four levels of transactions defined in IBM’s DRDA? Compare and contrast these four levels. Give examples to illustrate your answer.

See Section 24.5.2 and Figure 24.14.

5

Exercises

A multinational engineering company has decided to distribute its project management information at the regional level in mainland Britain. The current centralized relational schema is as follows:-

Employee (NIN, fName, lName, address, DOB, sex, salary, taxCode, deptNo)

Department (deptNo, deptName, managerNIN, businessAreaNo, regionNo)

Project (projNo, projName, contractPrice, projectManagerNIN, deptNo)

WorksOn (NIN, projNo, hoursWorked)

Business (businessAreaNo, businessAreaName)

Region (regionNo, regionName)

where Employee contains employee details and the national insurance number NIN is the key.

Department contains department details and deptNo is the key. managerNIN identifies the employee who is the manager of the department. There is only one manager for each department.

Project contains details of the projects in the company and the key is projNo. The project manager is identified by the projectManagerNIN, and the department responsible for the project by deptNo

WorksOn contains details of the hours worked by employees on each project and (NIN, projNo) forms the key.

Business contains names of the business areas and the key is businessAreaNo and Region contains names of the regions and the key is regionNo

Departments are grouped regionally as follows:

Region 1: Scotland; Region 2: Wales; Region 3: England

Information is required by business area which covers: Software Engineering, Mechanical Engineering and Electrical Engineering. There is no Software Engineering in Wales and all Electrical Engineering departments are in England. Projects are staffed by local department offices.

As well as distributing the data regionally, there is an additional requirement to access the employee data either by personal information (by Personnel) or by work related information (by Payroll).

6

24.15 Draw an Entity-Relationship (ER) diagram to represent this system.

Crow’s foot notation used for expediency:

24.16 Using the ER diagram from Exercise 24.15, produce a distributed database design for this system, and include:

(a) a suitable fragmentation schema for the system;

(b) in the case of primary horizontal fragmentation, a minimal set of predicates;

(c) the reconstruction of global relations from fragments.

State any assumptions necessary to support your design.

Possible solution as follows:

Don’t fragment Business/Region - replicate relations at all sites - only contain a small number of records.

Department

Use primary horizontal fragmentation for Department with minterm predicates :

D1 Region = ‘Scotland’ and businessArea = ‘SE’

D2 Region = ‘Scotland’ and businessArea = ‘ME’

D3 Region = ‘Wales’ and businessArea = ‘ME’

D4 Region = ‘England’ and businessArea = ‘SE’

D5 Region = ‘England’ and businessArea = ‘ME’

D6 Region = ‘England’ and businessArea = ‘EL’

7

D1  D2  D3  D4  D5  D6 Employee Works_On Project Business Department Region

Reconstruction:

Employee

Use vertical fragmentation for Employee:

E1: NIN, fName, lName, address, DOB, sex, deptNo(Employee)

E2: NIN, salary, taxCode(Employee)

Then used derived fragmentation on fragment E1: Eii: E1 deptNo Di 1  i  6 Reconstruction:

Projects

Use derived fragmentation for Projects: P

WorksOn

Use derived fragmentation for WorksOn: W

24.17 Repeat Exercise 24.16 for the DreamHome case study documented in Appendix A.

Possible solution as follows:

Don’t fragment Branch - replicate relations at all sites - only contain a small number of records.

PropertyForRent

Use primary horizontal fragmentation for PropertyForRent with minterm predicates (for example):

P1j price  39999 AND branchNo = j

P2j 40000  price  69999 AND branchNo = j

P3j price  70000 AND branchNo = j 1  j  maximum number of branches

Reconstruction:  (P1i  P2i  P3i) i=1

8

11  E12  E13  E14  E15  E16 ) NIN E2

(E

deptNo Di 1  i  6

i: Projects

1  P2  P3  P4  P5  P6 )

Reconstruction: (P

E1i 1  i  6

i: WorksOn NIN

Staff

Assume salaries paid by head office (branch 1 say), so use vertical fragmentation first:

S1: staffNo, fName, lName, branchNo (Staff)

S2: staffNo, position, sex, DOB, salary(Staff)

Then use horizontal fragmentation on fragment S1:

S1i: branchNo= i S1 1  i

Reconstruction: (S11  S12  S13  S1j )

Client Use horizontal fragmentation:

Reconstruction:

Viewing, Owner

Use derived fragmentation for Viewing and Owner:

Reconstruction: as for PropertyForRent

24.18 Repeat Exercise 24.16 for the EasyDrive School of Motoring case study documented in Appendix B.2.

24.19 Repeat Exercise 24.16 for the Wellmeadows case study documented in Appendix B.3.

24.20 In Section 24.5.1 when discussing naming transparency, we proposed the use of aliases to uniquely identify each replica of each fragment. Provide an outline design for the implementation of this approach to naming transparency.

FUNCTION map(name) {

IF name appears in the replica table THEN

result = name of replica of name;

IF name appears in the fragment table THEN {

result = expression to construct fragment;

9

 j

S

staffNo

2

) 1  i  j

Ci: branchNo= i (Client

C

 Cj )

(C1  C2 

3…

Vik: Viewing Pik 1  i  3, 1  k  j Oik: Owner Pik 1  i  3, 1  k  j

FOR each iname IN result { replace iname in result with map(iname);

RETURN result;

IF name appears in the alias table THEN

expression = map(name); ELSE

expression = name;

24.21 Compare a distributed DBMS that you have access to against Date’s 12 rules for a DDBMS. For each rule for which the system is not compliant, give your reasons why you think there is no conformance to this rule.

This is a small student project, the result of which is dependent on the system analyzed.

10

} }

}

Chapter 25 Distributed DBMSs - Advanced Concepts

Review Questions

25.1 In a distributed environment, locking-based algorithms can be classified as centralized, primary copy, or distributed. Compare and contrast these algorithms.

See Section 25.2.3.

25.2 One of the most well-known methods for distributed deadlock detection was developed by Obermarck. Explain how Obermarck’s method works and how deadlock is detected and resolved.

See Section 25.3 under Distributed Deadlock Detection.

25.3 Outline two alternative two-phase commit topologies to the centralized topology.

Alternative topologies: linear and distributed 2PC. See end of Section 25.4.3.

25.4 Explain the term nonblocking protocol and explain why two-phase commit protocol is not a nonblocking protocol.

A nonblocking protocol should cater for both site and communication failures to ensure that the failure of one site will not affect processing at another site. In other words, operational sites should not be left blocked.

In the event that a participant has voted COMMIT but has not received global decision and is unable to communicate with any other site that knows the decision, that site is blocked. Although 2PC has a cooperative termination protocol that reduces the likelihood of blocking, blocking is still possible and the blocked process will just have to keep on trying to unblock as failures are repaired.

25.5 Discuss how the three-phase commit protocol is a non-blocking protocol in the absence of complete site failure.

The basic idea of 3PC is to remove the uncertainty period for participants who have voted commit and are waiting for the global abort or global commit from the coordinator. 3PC introduces a third phase, called pre-commit, between voting and global decision. See Section 25.4.4.

25.6 Specify the layers of distributed query optimization and detail the function of each layer.

See start of Section 25.6

11

25.7 Discuss the costs that need to be considered in distributed query optimization and discuss two different cost models.

See Section 25.6.3

25.8 Discuss the distributed query optimization algorithms used by R* and SDD-1.

See Section 25.6.3

25.9 Briefly describe the distributed functionality of Oracle11g

See Section 25.7.1 .

25.10 You have been asked by the Managing Director of DreamHome to investigate the data distribution requirements of the organization and to prepare a report on the potential use of a distributed DBMS. The report should compare the technology of the centralized DBMS with that of the distributed DBMS, and should address the advantages and disadvantages of implementing a DDBMS within the organization, and any perceived problem areas. The report should also address the possibility of using a replication server to address the distribution requirements. Finally, the report should contain a fully justified set of recommendations proposing an appropriate solution. A well-presented report is expected. Justification must be given for any recommendations made.

25.11 Give full details of the centralized two-phase commit protocol in a distributed environment. Outline the algorithms for both coordinator and participants.

Algorithm (a) 2PC coordinator algorithm begin

STEP C1 VOTE INSTRUCTION

write ‘begin global commit’ message to log send ‘vote’ message to all participants do until votes received from all participants wait on timeout go to STEP C2b end-do

STEP C2a GLOBAL COMMIT

if all votes are ‘commit’ then begin write ‘global commit’ record to log

12

Exercises

send ‘global commit’ to all participants end

STEP C2b GLOBAL ABORT

at least one participant has voted abort or coordinator has timed out else begin write ‘global abort’ record to log send ‘global abort’ to all participants end

end-if

STEP C3 TERMINATION do until acknowledgement received from all participants wait

end-do

write ‘end global transaction record’ to log finish end

Algorithm (b) 2PC participants algorithm begin

STEP P0 WAIT FOR VOTE INSTRUCTION do until ‘vote’ instruction received from coordinator wait

end-do

STEP P1 VOTE

if vote = ‘commit’ then send ‘commit’ to coordinator else send ‘abort’ and go to STEP P2b do until global vote received from coordinator wait

end-do

STEP P2a COMMIT

if global vote = ‘commit’ then perform local commit processing

STEP P2b ABORT

at least one participant has voted abort else perform local abort processing end-if

13

STEP P3 TERMINATION

send acknowledgement to coordinator finish end

Algorithm Cooperative termination protocol for 2PC begin do while P0 is blocked

STEP 1 HELP REQUESTED FROM Pi

P0 sends a message to Pi asking for help to un-block if Pi knows the decision (Pi received global commit/abort or Pi unilaterally aborted) then begin Pi conveys decision to P0 P0 unblocks and finishes end end-if

STEP 2 HAS Pi VOTED? if Pi has not voted then begin Pi unilaterally aborts P0 told to abort P0 unblocks and finishes end end-if

STEP 3 Pi CANNOT HELP; TRY Pi+1 next Pi end-do end

Algorithm 2PC participant restart following failure begin do while Pr is blocked

STEP 1 ASCERTAIN STATUS OF Pr IMMEDIATELY PRIOR TO FAILURE if Pr voted ‘commit’ then go to STEP 2 else begin Pr voted ‘abort’ prior to failure or had not voted Pr aborts unilaterally Pr recovers independently and finishes

14

STEP 2 IS GLOBAL DECISION KNOWN?

if Pr knows global decision then begin Pr takes action in accordance with global decision Pr recovers independently and finishes end end-if

STEP 3 Pr CANNOT RECOVER INDEPENDENTLY AND ASKS FOR HELP

Pr asks for help from participant Pr+1 using the cooperative termination protocol end-do end

25.12 Give full details of the three-phase commit protocol in a distributed environment. Outline the algorithms for both coordinator and participants.

Algorithm (a) 3PC coordinator algorithm begin

STEP C1 VOTE INSTRUCTION

write ‘begin global commit’ message to log send ‘vote’ message to all participants do until votes received from all participants wait

on timeout go to STEP C2b end-do

STEP C2a PRE-COMMIT

if all votes are ‘commit’ then begin

write ‘pre-commit’ message to log send ‘pre-commit’ message to all participants end

STEP C2b GLOBAL ABORT

at least one participant has voted abort or coordinator has timed out else begin write ‘global abort’ record to log send ‘global abort’ to all participants go to STEP 4

15

end end-if

STEP C3 GLOBAL COMMIT

do until all (pre-commit) acknowledgements received wait

end-do

write ‘global commit’ record to log send ‘global commit’ to all participants end

STEP C4 TERMINATION do until acknowledgement received from all participants wait

end-do

write ‘end global transaction record’ to log finish end

Algorithm (b) 3PC participants algorithm begin

STEP P0 WAIT FOR VOTE INSTRUCTION do until ‘vote’ instruction received from coordinator wait

end-do

STEP P1 VOTE

if participant is prepared to commit then send ‘commit’ message to coordinator else send ‘abort’ message to coordinator and go to STEP P2b do until global vote received from coordinator wait

end-do

STEP P2a PRE-COMMIT

if global instruction = ‘pre-commit’ then go to STEP P3 (and wait for global commit) end-if

STEP P2b ABORT

at least one participant has voted abort perform local abort processing

16 end end-if

STEP P3 COMMIT

do until ‘global commit’ received from coordinator wait end-do perform local commit processing

STEP P4 TERMINATION send acknowledgement to coordinator finish end

25.13 Analyze the DBMSs that you are currently using and determine the support each provides for the X/Open DTP model and for data replication.

This is a small student project, the result of which is dependent on the system analyzed.

25.14 Consider six transactions T1, T2, T3, T4, and T5 with:

T1 initiated at site S1 and spawning an agent at site S2,

T2 initiated at site S3 and spawning an agent at site S1,

T3 initiated at site S1 and spawning an agent at site S3,

T4 initiated at site S2 and spawning an agent at site S3,

T5 initiated at site S3

The locking information for these transactions is shown in following table.

17

go to STEP P4

(a) Produce the local wait-for-graphs (WFGs) for each of the sites. What can you conclude from the local WFGs?

Conclusion: There is no local deadlock at any site.

(b) Using the above transactions, demonstrate how Obermarck’s method for distributed deadlock detection works. What can you conclude from the global WFG?

Cycle at site 1, so move WFG from Site 1 to site 3. The resulting WFG shows a cycle:

Sites 1 and 3

which implies system is in global deadlock and one of the transactions must be selected to be aborted and restarted.

18

T1 T2 T3 T1 T4 T2 T3 T4 T5 Site 1 Site 2 Site 3 T1 T2 T3 T1 T4 T2 T3 T4 T5 Site 1 Site 2 Site 3 Text Text Text

T1 T2 T3 T4 T5 Text

Review Questions

Chapter 26 Replication and Mobile Databases

26.1 What is data replication?

Data replication is the process of generating and reproducing multiple copies of data at one or more sites.

26.2 Identify the benefits of using replication in a distributed system

For a discussion of benefits see Section 26.1.

26.3 Provide examples of typical applications that use replication.

For a discussion of typical applications see Section 26.1.

26.4 Compare and contrast eager with lazy replication

In the Chapter 25 we examined protocols for updating data that worked on the basis that all updates are carried out as part of the enclosing transaction. This was necessary because a distributed transaction accesses different fragments on different sites; in other words, the updates are immediately applied at every site. Atomicity is ensured by using the 2PC (two-phase commit) protocol. The immediate propagation of updates in a replicated database is called eager or synchronous update propagation. Eager update propagation ensures that all copies are updated within the enclosing transaction and voting at the end ensures the atomicity.

An alternative mechanism to eager replication is called lazy or asynchronous update propagation. With this mechanism, the target database is updated after the source database has been modified. The delay in regaining consistency may range from a few seconds to several hours or even days. However, the data eventually synchronizes to the same value at all sites (eventual consistency). Although not all applications can cope with such a delay, it appears to be a practical compromise between data integrity and availability that may be more appropriate for organizations that are able to work with replicas that do not necessarily have to be always synchronized and current.

26.5 What is a replication server?

The replication server is an alternative and potentially a more simplified approach to data distribution. The functionality of a replication server is described in Section 26.2.2

19

26.6 Compare and contract the differ type of data ownership models available in the replication environment. Provide example for each model.

The main types of ownership are primary/secondary copy, workflow, and update-anywhere, sometimes referred to as peer-to-peer or symmetric replication. See Section 26 2.5

26.7 Discuss the functionality required of a replication server.

At its basic level, we expect a distributed data replication service to be capable of copying data from one database to another, synchronously or asynchronously. However, there are many other functions that need to be provided, including:

 Scalability - The service should be able to handle the replication of both small and large volumes of data.

 Mapping and transformation - The service should be able to handle replication across heterogeneous DBMSs and platforms. As we noted in Section 24.1.3, this may involve mapping and transforming the data from one data model into a different data model, or the data in one data type to a corresponding data type in another DBMS.

 Object replication - It should be possible to replicate objects other than data. For example, some systems allow indexes and stored procedures (or triggers) to be replicated.

 Specification of replication schema - The system should provide a mechanism to allow a privileged user to specify the data and objects to be replicated.

 Subscription mechanism - The system should provide a mechanism to allow a privileged user to subscribe to the data and objects available for replication.

 Initialization mechanism - The system should provide a mechanism to allow for the initialization of a target replica.

 Easy administration - It should be easy for the DBA to administer the system and to check the status and monitor the performance of the replication system components.

26.8 Discuss the implementation issues associated with replication.

Some implementation issues associated with the provision of data replication by the replication server include:

 transactional updates;

 conflict detection and resolution.

For discussion see Section 26.3.

20

26.9 Discuss how mobile database support the mobile worker.

We are currently witnessing increasing demands on mobile computing to provide the types of support required by a growing number of mobile workers. Such individuals require to work as if in the office but in reality they are working from remote locations including homes, clients’ premises, or simply while en route to remote locations. The ‘office’ may accompany a remote worker in the form of a laptop, smartphone, tablet, or other Internet access device. With the rapid expansion of cellular, wireless, and satellite communications, it will soon be possible for mobile users to access any data, anywhere, at any time. However, business etiquette, practicalities, security, and costs may still limit communication such that it is not possible to establish online connections for as long as users want, whenever they want. Mobile databases offer a solution for some of these restrictions.

See Section 26.4.

26.10 Describe the functionality required of mobile DBMS.

All the major DBMS vendors now offer a mobile DBMS. In fact, this development is partly responsible for driving the current dramatic growth in sales for the major DBMS vendors. Most vendors promote their mobile DBMS as being capable of communicating with a range of major relational DBMSs and in providing database services that require limited computing resources to match those currently provided by mobile devices. The additional functionality required of mobile DBMSs includes the ability to:

 communicate with the centralized database server through modes such as wireless or Internet access;

 replicate data on the centralized database server and mobile device;

 synchronize data on the centralized database server and mobile device;

 capture data from various sources such as the Internet;

 manage data on the mobile device;

 analyze data on a mobile device;

 create customized mobile applications.

26.11 Discuss the issues associated with mobile DBMSs.

Section 26.4.2 discusses issues with mobile DBMSs. Of particular interest are issues relating to:

 Security

 Transactions

 Query processing

 Query optimization

21

26.12

Discuss the Kangaroo Transaction model.

Exercises

26.13 You are requested to undertake a consultancy on behalf of the Managing Director of DreamHome that requires an investigation into the data distribution requirements of the organization and to prepare a report on the potential use of a database replication server. The report should compare the technology of the centralized DBMS with that of the replication server, and should address the advantages and disadvantages of implementing database replication within the organization, and any perceived problem areas. The report should also address the possibility of using a replication server to address the distribution requirements. Finally, the report should contain a fully justified set of recommendations proposing an appropriate solution.

The format and the appropriate content covered in answering this question is described in the question set.

26.14 You are requested to undertake a consultancy on behalf of the Managing Director of DreamHome to investigate how mobile database technology could be used within the organization. The result of the investigation should be presented as a report that discusses the potential benefits associated with mobile computing and the issues associated with exploiting mobile database technology for an organization. The report should also contain a fully justified set of recommendations proposing an appropriate way forward for DreamHome.

The format and the appropriate content to be covered in answering this question is described in the question set.

22

Turn static files into dynamic content formats.

Create a flipbook