Issuu on Google+

The OlivaNova Model Execution System (ONME) and its Optimization through Linguistic Validation Methods

Günther Fliedl Christian Winkler Horst A. Kandutsch Department of Applied Computer Science Department of Linguistics and Computational Linguistics Universität Klagenfurt, Universitätsstraße 65-67, A-9020 Klagenfurt +43 463 2700 2814 Guenther.Fliedl@uni-klu.ac.at Christian.Winkler@uni-klu.ac.at Horst.Kandutsch@softwaregutachten.at

Abstract Independent studies prove that MDA and CASE approaches give software companies a clear advantage in a more and more tough competion among international enterprises. Astonishingly, these promising tools are still lacking an adequate position on the market, mostly due to the following reasons: the developer departments’ fear of losing power/importance within their companies, and the alledgedly scarce possibilities of project supervising by the management. ONME (Oliva Nova Model Execution System) not only offers arguments against such objection – it goes beyond: by providing project supervising and avoiding not only logical errors with the help of linguistic methods in a standardized developing process. The present paper gives an overview over model-driven source code generation by means of the programming machine, an outlook on current work in and on some indispensable linguistic reflections. Key words: Model Driven Software Development (MDSD), Natural Language Processing (NLP), Source Code Validation


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

1.

MODEL DRIVEN SOFTWARE DEVELOPMENT (MDSD): THE CLASSICAL PROGRAMMING APPROACH

With respect to MDA in general, some studies by an internationally acclaimed Benchmark Group [1], are worth mentioning. They have confirmed that with the OlivaNova Model Execution System (ONME1), as described below, a 23 times increase in productivity is achieved when compared with other similar processes and products on the market. In detail, they compared the performance of 6 new development projects against a peer group, CARE Technologies, who implemented and used ONME and a previous CARE study, made in 2001. The peer group of companies was selected using the following criteria [1]: 

Studies completed in the last 18 months.

Project Type – New Development

Platforms – Web, Windows

Technologies – Web development (including JSP, EJB, ActiveX, COM+, ColdFusion)

Briefly, the studies’ results show that all measures exceeded the peer group average by far and were a very good improvement on the previous study. CARE development tools and processes deliver productivity levels that greatly exceed their peers for every lifecycle phase. Overall productivity is, as mentioned before, 23 times greater, not only but also therefore time to market is 6 times faster, and defect rates are 15 times lower than the peer [1]. The following Chart (Figure 1) shows the effort of the projects in functions points [10] and the productivity of the peer groups and CARE.

Productivity/Effort (FPs/Day) 160 CARE03

134

140

CARE01

120

PAVG03

100

PAVG01

80

70

74 65

74

Escituras

40 20

D Tecnico

60

56

60

24 3

Control Project

NUCA Ventas Visitas

2

0

t 3 1 s co 03 01 ec ra E0 E0 ni G G oj c R R r itu V V e c P A A A A T l C C P P Es D ro nt o C

NU

CA

as as nt sit i e V V

Figure 1: Productivity Chart [1]

As Barry Boehm, Director of Southern California Centre for Software Engineering puts it, Software’s complexity and accelerated development schedules make avoiding defects difficult. He also claims that finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase [2]. The following excerpt of the Studies: Defect Detection (Figure 2) clearly illustrates the facts:

1

OlivaNova Model Execution System offers a so called ON-Modeler. This tool is an Object Oriented tool for conceptual modelling that allows the creation, edition and formal validation of conceptual models of high level. 2/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

Defect Detection 100

Defect Detection (%)

90 80 70 60

CARE03

50 40

PAVG03

CARE01 PAVG01

30 20 10 0

n n ts ts ig ig es en es es T m t D D ni d ire al U le qu rn d ai e e t n t R a De Ex g in d Co

g in st e T

e nc ta p ce Ac

s st Te

n up t io r tla l a t a S st st In Po

Figure 2: Defect Detection Chart [1]

As can be seen, defects were recognised and fixed earlier, when using the CARE model driven approach for development. Having mentioned some comparative results, the absolute amount of defects needs to be considered as well. The following chart (Figure 3) shows the comparison of defect rates made during the studies. The astonishing result can be manifested through one of the big advantages of MDSD. Well defined, optimized and proven code is generated corresponding to the developed model. Applications generated by the discussed System of CARE have been awarded by Microsoft and are so called Microsoft Certified products [10].

Defect Rate 160

149

140 120

107

100

CARE03 CARE01

80

PAVG03 PAVG01

53

60 40 20

7

0 Defect Rate

Figure 3: Defect Rate Chart [1]

3/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

Automatically generated programme code stands out in comparison with manually programmed Software, through its higher quality and more comprehensive logical consistence. The result is an instant, error free and completely functioning Programme code. Oscar Pastor, inventor of ONME and professor for object-oriented development methods at the Valencia University of Technology, and Juan Carlos Molina, Research and Development Manager of CARE, put it as follows: It seems that the time of Model- Transformation Technologies is finally here. [3] 2.

AUTOMATED VALIDATION OF SPECIFICATION AGAINST SOURCECODE

The Phd Thesis Avanguide (Automated Validation of Specification against Sourcecode) presents experiments on the comparison of specifications and resulting Sourcecode using the ONME to gather source examples from complete Information Systems. It’s no secret that software sucks [7], is one of the main reasons for Avanguide determining the coverage of natural language (as used in the specification) and the resulting sourcecode. As illustrated in figure 8 (below), an amount of Requirements R should correspond with its resulting model (as XML-Files) S and after compilation trough the transformation engines with the generated Sourcecode S. There will be a measurable value, here called C, for coverage, according to the maturity of the implementation.

R

S

Requirements

XML

C ?

S

Source

Figure 4: Coverage Detection Accordingly, there are three major tasks to deal with: 1.

Information retrieval (IR) in terms of non structured natural language with the result R’. For this major task different sets of linguistic toolsets (MinTool with the result R’, MedTool with the result R’’, MaxTool with the result R’’’), divided by their specific linguistic potential, will be defined.

2.

IR in terms of structured code with the result S’. This task is of lower complexity because of the well defined keywords and structure of Sourcecode. Also, unique semantic information should be easy to determine.

3.

The comparison of R’ and S’, here called c’ (Coverage), the comparison of R’’ and S’’ called c’’. 4/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

R

S

MinTool

R´´

c´´

S´´

c´´´

S´´´

MedTool

MaxTool

R´´´

Figure 5: Extraction of Information in three steps

The most sophisticated task is without doubt the extraction of significant words (R’), phrases (R’’) and later on semantics (R’’’), as shown in figure 9. As a matter of principle, there is also the possibility to draw conclusions from any variation of C in general, C in relation to the function point count (FPC) [9] of a project and the deviation of c’’’, c’’ and c’ as well as the deviation of R’’’, R’’, R’ and R. As an example it could be suggested, that a wide difference between R’’’ and R’’ allows a conclusion according the complexity of the resulting software [8]. Another example, an ascending FPC, while C remains unaltered, could indicate non-essential work related with unnecessary costs. Another result of determining C is also the possibility to anticipate unnecessary costs, to monitor automatically progress of software projects (as required in most process models) and to find mistakes and non-essential work, may lead in bad usability [7]. 2.1 Feasibility of determining C In order to simplify the idea of detection coverage between Requirements and Sourcecode, let us consider a brief and simplified example. Assuming that somewhere in the requirements document we find some receivables R1: 

The users of cost statistics are administrators and co-workers

Administrators are allowed to create statistics about the co-workers costs.

Co-workers are not allowed to see this statistics.

The resulting model would be very simple to create and easy to understand (Figure 10).

5/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

Figure 6: Example Object Model

Presumably, there should be an indicator in the sourcecode for administrator and co-worker to be a agents with things like username and password. The administrator-class should be able to view the attributes of class statistic via an interface. The existence of such an interface for co-worker would be not only obsolete but even a mistake. Other assumption would be the possibilities in the resulting system to create statistics and costs, and so on. These assumptions are very easy readable in the resulting XML-File, less easy but readable too in the resulting source code. The following example (Figure 11) shows the re-occurring of the implicit requirement administrators and coworkers have passwords to logon to the system in the resulting source code. Via this fact we can infer that the implementation of the software system treats the objects administrator and co-worker as users.

The users of coststatistics are Administrators and Coworkers

Figure 7: Source code example 1

6/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

The explicit requirement, that administrator is a class, is reflected via the create statement in the generated SQLScript for the database of the system, or through the administrator class in the project itself. As demonstrated in this example, it would be easy to match information gathered from natural language with correlating information extracted out of structured Sourcecode. However, with respect to R1 and its obviously structure real world circumstances have not yet been examined. Avanguide has to struggle with problems wellknown in (computational) linguistics like ambiguity, redundancy, typographic mistakes, orthographic mistakes, acronyms and abbreviations and far more. Requirements delivered from partners like ASFINAG GmbH, Mazda Austria and Microsoft Austria are quite helpful in analyzing these real world circumstances. To sum up, Avanguide can be seen as bilateral chance for developers and applicants to obtain an objective report about the quality of the work already done. 3. LINGUISITIC METHODS Linguistic analysis is based on automated methods of natural language processing:  Tagging  Chunking  Predicate structure identification  Lemmatizing The tagger employed for English texts is based on rules as proposed by Eric Brill [12]. MontyKlu was developed 2004 at Klagenfurt University. The tool is based on MontyLingua (a natural language processing engine primarily developed by Hugo Liu in MIT Media Labs using the Python programming language, which is described as an end-to-end natural language processor with common sense [13]). MontyKlu adds the following functionality to MontyLingua:     

GUI Editing facility for the lexicon and the rule complex Usability with or without database XML-output Web-interface: http://montyklu.knospi.com

As for the example R1, with the help of the MontyKlu tagger all NNs and NNPs can be identified, as shown below: Tagger: The/DT users/NNS of/IN coststatistics/NN are/VBP Administrators/NNPS and/CC Coworkers/NNPS ./. Administrators/NNPS are/VBP allowed/VBN to/TO create/VB statistics/NNS about/IN the/DT Coworkers/NNPS costs/NNS ./. Coworkers/NNPS are/VBP not/RB allowed/VBN to/TO see/VB this/DT statistics/NNS ./. All these items are candidates for class names or attributes; after their lemmatization they can be found in the source code or in the data base, which is shown by he lemmatizing output below. Lemmatizer: The/DT/The users/NNS/user of/IN/of coststatistics/NN/coststatistic are/VBP/be Administrators/NNPS/Administrator and/CC/and Coworkers/NNPS/Coworker ././. Administrators/NNPS/Administrator are/VBP/be allowed/VBN/allow to/TO/to create/VB/create statistics/NNS/statistics about/IN/about the/DT/the Coworkers/NNPS/Coworker costs/NNS/cost ././. Coworkers/NNPS/Coworker are/VBP/be not/RB/not allowed/VBN/allow to/TO/to see/VB/see this/DT/this statistics/NNS/statistics ././. The respective stem forms of NNPs now occur as object names. With the help of the chunker it can be found out whether there are especially related noun phrases, not necessarily representing classes or attributes. The following output of the MontyKlu Chunker makes clear that natural language patterns under certain circumstances match patterns in the source code. Chunker: (NX The/DT users/NNS NX) of/IN (NX coststatistics/NN NX) (VX are/VBP VX) (NX Administrators/NNPS and/CC Coworkers/NNP NX) ./. (NX Administrators/NNPS NX) (VX are/VBP allowed/VBN to/TO create/VB VX) (NX statistics/NNS NX) about/IN (NX the/DT Coworkers/NNP costs/NNS NX) ./. (NX Coworkers/NNP NX) (VX are/VBP not/RB allowed/VBN to/TO see/VB VX) (NX this/DT statistics/NNS NX) ./. 7/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

Furthermore, the example shows the direct relationship between co-workers and costs. This relation is found in the source code, but also as a relation in the object model, and in the chunking element NX. Sense evaluation offers the possibility to decide whether predicate-argument-structures are relevant for implementation. Phrases of the type (full) verb + noun for instance are candidates for relevant structures. List Predicates :("be" "coststatistic" "Administrator and Coworker") ("create" "Administrator" "statistics" "about Coworker cost") ("not see" "Coworker" "statistics") The partial pattern „create“ „Administrator“ „statistics“ is represented in the statistcs-GUI (Figure 12). The administrater is allowed to enter this GUI, encountering a button “create” to put on new statistics.

Administrators can create statistics.

Figure 8: design example 1

To summarize, it may be well to underline that with the help of linguistic tools it is possible to identify candidates in natural language texts corresponding to those in the respective source code of the model involved – which is a promising way of validation.

8/9


ICSSEA 2008-9 Fliedl/Winkler/Kandusch

Conclusion According to independent studies MDA and CASE approaches offer clear advantages to competing software companies. The OlivaNova Model Execution System provides project supervising with the help of linguistic methods in a standardized developing process, thus avoiding not only logical errors. Avanguide (Automated Validation of Specification against Sourcecode) [14] presents experiments on the comparison of specifications and resulting Sourcecode using the ONME to gather source examples from complete Information Systems. MontyKlu, based on MontyLingua provides linguistic analysis based on automated methods of natural language processing: tagging, chunking, predicate structure identification, and lemmatizing. Identifying candidates in natural language texts corresponding to those in the respective source code of the model might be a promising way of validation.

References

[1]

Computer-Aided Design of User Interfaces IV. Proceedings of the Fifth International Conference on Computer-Aided Design of User Interfaces CADUI’2004 Sponsored by ACM and jointly organised with the Eight ACM International Conference on Intelligent User Interfaces IUI’2004 13–16 January 2004, Funchal, Isle of Madeira.

[2]

Barry Boehm, Victor R. Basili: Software Defect Reduction Top 10 List, Computer, January 2001 (Vol. 34, No.1)

[3]

Oscar Pastor, Juan Carlos Molina: Model-Driven Achitecture in Practice; Springer, 2007: 9783540718673.

[4]

Joachim Fischer: CARE 2004, CARE Technologies, by courtesy of Joachim Fischer, head of integranova GmbH; www.care-t.com

[5]

Joachim Fischer: integranova 2008, Company Profile integranova, by courtesy of Joachim Fischer, head of integranova GmbH; www.integranova.de

[6]

Joachim Fischer: CARE 2004, Introduction OlivaNova Modeler, by courtesy of Joachim Fischer, head of integranova GmbH; www.integranova.de

[7]

David S. Platt: Why software sucks … and What You Can Do About It; Addison-Wesley, 2006: 0321466756

[8]

Maurice H. Halstead, Elements of Software Science, Operating, and Programming Systems Series, Volume 7; New York, Elsevier, 1977.

[9]

Benjamin Poensgen: Function-Point-Analyse; dpunkt 20053898643328

[10]

Microsoft Certified 2005: http://www.microsoft.com/germany/government/newsletter/april10.mspx

[11]

MontyKlu 2005, http://Montyklu.knospi.com

[12]

E. Brill: Some advances in transformationbased part of speech tagging; In AAAI-94, Seattle, WA.

[13]

LIU, H.: MontyLingua: An end-to-end natural language processor with common sense; (2004) http://web.media.mit.edu/~hugo/montylingua/

[14]

Horst Kandutsch: Avanguide (Automated Validation of Specification against Sourcecode); Doctoral Thesis, Universität Klagenfurt (work in progress).

9/9


Model Execution System and its Optimization through Linguistic Validation Methods