Issuu on Google+

Article

Developing a Chance Discovery Method with Consideration of Sequential Relationship

Journal of Information Science XX (X) pp. 1-19 Š The Author(s) 2013 Reprints and Permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/016555150000000 jis.sagepub.com

Cho-Wei Shih Institute of Manufacturing Information and System, NCKU, Taiwan

Shih-Cheng Chang Institute of Manufacturing Information and System, NCKU, Taiwan

Hui-Chuan Chu Department of Special Education, NUTN, Taiwan

Yuh-Min Chen Institute of Manufacturing Information and System, NCKU, Taiwan

Abstract With the richness of Internet, it becomes easier to get information, including the word of mouth and appraisal which help the product innovation or service improvement for enterprises. However, as the frequent change of market, the variation of time, the causal event, and the users' rates could form the different connections between events. To know well the connections becomes an important factor to take the preemptive opportunities for enterprises. This study proposed a chance discovery method with consideration of sequential relationship, which integrates the text mining and semantic analysis to explore the general, time, causal, and hierarchical relations. The method could assist enterprises understanding the trends of market, finding the change, and enhancing the competitiveness.

Keywords Chance discovery; text mining; semantic analysis; sequential relationship

1. Introduction Internet technology is developing rapidly and has become the fastest growing new media. The abundance of easily accessible information on the Internet, including word of mouth and evaluations, provides for understanding market and consumer response information, and this information has already become a crucial reference for enterprise product innovation and service improvement[1]. Therefore, companies have started using Internet resources to create business value by understanding market demands [2], identifying market trends [3], and exploring potential opportunities [4]. The approaches to creating business value include discovering new technologies and improving existing products, and understanding customer preferences and providing new services. Understanding market trends is a critical process within these approaches because it affects final product (service) acceptance on the market [5]. Many studies have been conducted in this area [6]–[9]. However, numerous dimensions must be considered when identifying market trends and exploring potential opportunities, including consumer, company, and overall market dimensions [10]. Additionally, because relationships among these dimensions are complex, companies may make incorrect decisions if they do not understand market trends and demands [11]. Corresponding author: Cho-Wei Shih, No.1, University Road, Tainan City 701, Taiwan cwshih@imis.ncku.edu.tw


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen Previous scholars [12] have proposed the chance discovery model for identifying market chances (such as: issue name, company name, or event etc.) to understand customer demands and to identify new technologies that have development potential. Chance discovery refers to understanding a previously unnoticed event (or situation) that rarely appears, yet is significant and has a crucial influence in decision-making processes [13],[14]. Companies can use the chance discovery model to identify key factors that influence customers’ purchase decisions, understand customer demands, master rare technologies, and to assess the existence of development potential. This enables companies to engage in business development, create business value, and enhance competitiveness. Previous scholars have used patent documents [15] and industry news [4] to conduct market chance discovery. However, these data were organized based on individual subjective perceptions instead of referring directly to consumers’ experience and feelings. Patent documents and industry news are insufficient for understanding consumer demands and preferences because these data are not representative enough and they consume large amounts of time and energy. Conducting market chance discovery using online forums and blog posts that contain consumers’ most direct word of mouth and evaluation enables understandings of information regarding market dynamics and consumer responses [16]. Therefore, immediate analyses of consumer preferences can be conducted, real market chances extracted, and products or services that satisfy market demands developed. Previous studies [17]–[19] have focused primarily on explaining the possible scenarios among different events. However, events typically include chain effects, such as cause and effect relationships, time contexts, and semantic structures [3]. Therefore, what is observed is sometimes only the surface or a fragment of an event, instead of the real essence or core [20]. If the real sequence and essence of events is not understood, problems may appear, such as missing chances for novel technologies that have development potential [4] or influences resulting from cooperation with poorly managed companies [3]. Therefore, identifying the sequential relationships (time, causal, and hierarchical sequences) of events facilitates company decision makers’ overall understanding of events to allow for appropriate decisions. In summary, this study designed a chance discovery model with consideration of sequential relationship (SRCDM), and developed a chance discovery method based on this model. By identifying the sequential relationships among events, this method can be used to examine market trends and facilitate companies to identify market chances and enhance company competitiveness. To accomplish these goals, this study used blog posts to examine the time, causal, and hierarchical relations, as well as association rules among various events. Additionally, this study analyzed the chance values of the events or elements to assist companies in understanding the sequential relationships among events, such as causes and effects. This enables companies to identify market chances that have development potential. Internet

Blog Data Collection

Blog Content Preprocessing

Terms Refinement

Data Mining and Sequential Relation Mining

Chance Value Evaluation

Chance Guideline Map Blog Contents

Candidate Chances and Time Stamps

Refined Candidate Chances and Time Stamps

Figure 1 Chance Discovery Model with Consideration of Sequential Relationship (SRCDM)

2. Design and Modelling Chance Discovery This section reviewed the categories of market chances and previous chance discovery models to design a market chance discovery model and process that considers sequential relationships.

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

2


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

3

2.1. Design of the chance discovery model Market chances can be classified into the following five categories: people, event, time, location, and product (Table 1). This study defined company or organization names as “people” because companies or organizations can obtain patent rights or technologies through cooperation or mergers and acquisitions to enhance their market competitiveness. When companies or organizations demonstrate the characteristic of “simultaneously appearing,” this study attribute the characteristics to association rules. And this study include issue, disaster, technology, and methodology names in the “event” category, and this study consider the chance characteristics in this category as causal or time relations. Based on the categories of market chance and the problems of previous chance discovery models [21], this study propose the SRCDM shown in Figure 1. In this model, this study use the association rules, as well as time, causal, and hierarchical relations among data elements to identify market trends and discover chances that satisfy market demands or have development potential. To immediately understand market trends and summarize the objective results, this study used blog posts, which are immediate and objective, as the primary source for market chance discovery. This model consists of five sections: (1) blog data collection, where users can search for blog posts on the Internet based on the themes they determine; (2) blog content preprocessing, where noise (e.g., ads or Web site links) in the blog posts is removed and relevant blog content is preserved and separated into sentences for word segmentation and part-of-speech tags; (3) term refinement, where a large number of words or terms are condensed, and representative terms are selected to reduce the amount of information for follow-up analysis; (4) data and sequential relation mining, where this study identify the association rules, time relations, causal relations, and hierarchical relations among words or terms, and connect related terms to form a mixed graph composed of directed and undirected versions, that is, a chance guideline map; and (5) chance value evaluation, where the value of each candidate chance in the chance guideline map is evaluated and the chances that have high values are marked for the user’s reference. Table 1 Categories of market chances Categories of chances People

Company name Organization name

Potential chances

Chance characteristics

Obtaining relevant patents or technologies through cooperation or mergers and acquisitions

Association rule

Event

Disaster name Technology name Methodology name

Developing specific demands following political or legal changes Developing certain demands following disasters Having a competitive edge by mastering specific technologies Having a competitive edge by mastering specific methodologies

Time

Holiday name

Developing specific demands because of certain holidays

Location

Place name

Issue name

Product name

Service name Product Raw material name Parts name

Specific demands developed because of the features of the local culture The advantages and disadvantages of the products manufactured by your own company and similar products manufactured by other companies; imply the consumers’ demands and preferences. Advantages and disadvantages of the services provided by your own company and similar services provided by other companies; imply the consumers’ demands and preferences. Controlling specific raw materials and manufacturing products before competitors Controlling specific parts and manufacturing products before competitors

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

Causal relation Causal relation Time relation Time relation Association rule & time relation Association rule Hierarchical relation

Hierarchical relation Association rule Association rule

© The Author(s), 2013


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen 2.2. Design of the chance discovery process Based on the proposed SRCDM, this study designed a chance discovery process with consideration of sequential relationship (SRCDP), as shown in Figure 2. This process consists of blog data collection, blog content preprocessing, term refinement, data and sequential relation mining, and chance value evaluation, detailed as follows: (a) Blog data collection. Automatic approaches are used and keywords (selected based on the data the users wish to collect) are typed into online search engines; the search engines yield Web sites related to the theme determined by the user and the blog Web site HTML content is saved into the database. (b) Blog content preprocessing. Blog data obtained in the previous stage is processed and unstructured content is transformed into structured content to enable automatic processing and analysis in follow-up procedures, including natural language processing procedures such as noise filtering, sentence splitting, Tokenization, part-of-speech tagging, stop word removal, and stemming. (c) Term refinement. The large number of words extracted during the blog content preprocessing stage is screened and representative words are selected to reduce the amount of data processing and enhance overall efficiency. (d) Data and sequential relation mining. Analysis of sentence structures, co-occurrence of words, and the time of the posts containing these sentences and words is conducted to identify or mine the time, causal, and hierarchical relations among these words, and to establish association rules. This procedure consists of the following steps: i. Time relation mining. Domain terminology and the time these nouns appear in blog content are documented to extract the time sequential relations of the appearance of a noun in specific domains. ii. Causal relation mining. Causal relation grammar structures are established in advance and each sentence is compared to identify sentences that have causal relations and extract the causal relations among the proper nouns. iii. Hierarchical relation mining. Pre-established “adjective of domain concept maps” are used to compare sentence adjectives to understand their semantic hierarchical concepts. If the adjectives used in a certain domain tend to include a certain sematic hierarchical concept, this study can identify the relatively important concepts in this domain. iv. Association rule mining. Determine whether dependency exists among certain nouns by examining the co-occurrence statistics of the nouns within sentences. v. Candidate chance relations construction. Define the nouns mined in the previous four steps as candidate chances, and construct a mixed graph composed of directed and undirected versions based on the relations among the nouns to form a chance guideline map. (e) Chance value evaluation. The chance guideline map is used to evaluate the chance value, and examination of the candidacy chance (node) and the relation (edge) is used for importance estimations. The market chances that have high importance are marked on the chance guideline map.

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

4


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

Figure 2 Chance Discovery Process with Consideration of Sequential Relationship (SRCDP)

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

5


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen 3. Enabling Technologies for Chance Discovery Based on the design process, this study develop a Chance Discovery MEthod with Consideration of Sequential Relationship (SRCDMe). This method consists of five primary stages (Figure 2), similar to the SRCDP, which are detailed as follows:

3.1. Blog content collection The purpose of blog content collection is to obtain blog content relevant to specific topics through automatic searches using online search engines. These data are used as the data analysis sources for subsequent chance discovery. For example, once a specific industry name is determined, blog content regarding companies, products, methods, or technologies in that industry name is automatically collected. This study must understand the background topics of the certain selected topics to construct an automatic and effective search system that collects blog content relevant to certain topics or themes. Therefore, this study must construct a domain vocabulary that can be used for rapid keyword matching on search engines. The domain vocabulary contains terms related to specific domains. This study manually constructed the domain vocabulary based on the categories of chances shown in Table 1. The information sources of the domain vocabulary comprise the categories and entries summarized in Wikipedia1.

3.2. Blog Content Preprocessing In addition to the aforementioned natural language processing, blog content preprocessing consists of the following steps: (1) Series pattern extraction. The domain vocabulary and the time of blog posts (i.e., time stamp) are used to extract time series information regarding the time of the first appearance of each noun; (2) Causal pattern extraction. A comparison of the domain vocabulary, the causal verb list [22], and grammatical structures is conducted to identify the sentences with casual relations in the blog posts and the causal relations among the nouns (the causal verb list is a collection of causative verbs, and grammatical structures are examined to identify the causal sentences characterized by a “noun-verb-noun� structure) [23]; and (3) Adjective pattern extraction. Sentence structures are examined to identify grammatical structures characterized by “adjective-noun� and “noun-adjective� combinations to extract adjectives.

3.3. Term Refinement The purpose of this stage is to reduce the volume of data for subsequent processing. Therefore, this study must evaluate the importance of the time series, casual, and hierarchical information identified during the previous stages, and remove data that are not representative. This study use term frequency and inverse document frequency (TF-IDF; Eq. 1) to calculate the importance of the terms in the document set and filter the terms that yield values below the threshold value. đ?‘›đ?‘–,đ?‘— đ??ˇ đ?‘‡đ??šđ??źđ??ˇđ??šđ?‘–,đ?‘— = ∑ Ă— đ?‘™đ?‘œđ?‘” đ?‘ (1) where

đ?‘›đ?‘–,đ?‘— đ??ˇđ?‘ đ??ˇđ?‘–

đ?‘˜ đ?‘›đ?‘˜,đ?‘—

đ??ˇđ?‘–

amount of term i appearing in document j total count of blog documents total count of blog documents that contain term i

3.4. Data and sequential relation mining In this stage, this study mines the association rules or sequential relations among the terms. This consists of time series mining, causal relation mining, and hierarchical relation mining, which are detailed as follows:

ď Ź

Time series mining

The purpose of time series mining is to extract the time series of nouns. This study constructs the time series of the first appearances of nouns based on the time when specific nouns and blog posts are published. This study then calculates the association or correlation weights of two nouns in the same sentence: (Eq. 2), (Eq. 3), and (Eq. 4) [24]. Equation 2 can be applied to general scale series information and Eq. 3 can be applied to log scale series information. Although Eq. 4 is similar to Eq. 2, it can be applied to series information where the time units of the time intervals are

1

http://www.wikipedia.org

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

6


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen identical. Based on experiments, this study determined the weight calculation methods suitable for application domains and screened series information that showed weights above the threshold value to obtain a sequential patterns dataset. đ?‘¤đ?‘” (đ?‘Ąđ?‘– ) = đ?œŽ đ?‘¤đ?‘™ (đ?‘Ąđ?‘– ) = đ?œŽ đ?‘¤đ?‘? (đ?‘Ąđ?‘– ) = đ?œŽ where đ?‘Ąđ?‘– đ?‘˘ đ?œŽ

ď Ź

đ?‘Ąđ?‘–+1 −đ?‘Ąđ?‘– đ?‘˘

(2)

đ?‘Ą −đ?‘Ą o 2 (1+ đ?‘–+1 đ?‘– ) đ?‘˘

(3)

đ?‘Ą −đ?‘Ą ⌈ đ?‘–+1 đ?‘– ⌉

(4)

�

time when term i appears unit time, đ?&#x2018;˘ > 0 amount of weight decrease for unit time đ?&#x2018;˘, 0 < đ?&#x153;&#x17D; < 1

Causal relation mining

Previous scholars have primarily used Bayesian nets to infer the causal relations among events. Bayesian nets are suitable for inferring the causal relations among events because they feature conditional probability[22]. Therefore, Bayesian nets are also used for the causal relation mining in this study. This study first calculated the number of appearances of the extracted causal data and constructed a probability distribution table. The combination of several probability distribution tables constitutes an event inferencer that can be used to infer the appearance or occurrence of a specific event and the probability of t another event occurring. Therefore, this study can construct a Bayesian net that contains multiple candidate chances, each having a probability distribution table, to infer the probability of the candidate chances affecting each other. Figure 3 shows an example of the Bayesian nets and the probability distribution tables among earthquakes, tsunamis, and Sony. This study first examined the probability of earthquake occurrence and constructed an earthquake occurrence probability distribution table. This study then calculated whether earthquake occurrence triggered tsunamis. Finally, this study used the earthquake occurrence probability distribution table and the probability distribution table of earthquake and tsunami occurrence to infer the likelihood of earthquakes and tsunamis affecting the Sony Company. Earthquake T F Earthquake

Seaquake

0.4 0.6

Earthquake Seaquake

Seaquake

Earthquake T F

F 0.3 0.7 T 0.7 0.3

Sony T F

F

F

0

1

T

T

1

0

T

T

1

0

T

T

1

0

Sony

Figure 3 Examples of Bayesian nets and the probability distribution table

ď Ź

Hierarchical relation mining

In hierarchical relation mining, the previously constructed adjective of domain concept map is used for term comparison. The number of times that the adjectives correspond to specific concepts is recorded. The concept maps are structured through â&#x20AC;&#x153;concept-attributeâ&#x20AC;? relationships, where the attributes of concepts are represented by the adjectives that describe those attributes. For example, the domain concept map of optical storage consists of three concepts (capacity, speed, and price). The adjectives related to these concepts are shown in Figure 4. Domains refer to a specific Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

7


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen domain, such as optical storage and food products. Concepts refer to the advanced concepts within a specific domain, such as appearance, capacity, and price. Attributes refer to the specific contents contained within an advanced concept, such as using â&#x20AC;&#x153;tall,â&#x20AC;? â&#x20AC;&#x153;short,â&#x20AC;? â&#x20AC;&#x153;fat,â&#x20AC;? and â&#x20AC;&#x153;thinâ&#x20AC;? to describe appearance. This study first compared the extracted adjectives dataset and the domain concept map. If the same adjectives are identified in the attributes of the adjective concept map, the sentence is considered to have hierarchical relations. This study then count the number of appearances of various attributes in the blog article sets to understand consumer preference in that domain.

ď Ź

Association rule mining

In association rule mining, this study attempts to identify other associations that may exist between two nouns in a sentence if no time, casual, or hierarchical relations can be identified between them. This study used the Apriori algorithm to identify the association rules among nouns [25] and summarized the results into the association rule dataset.

Domain

Optical Storage

Capacity

Large

Medium

Speed

Small

Fast

Moderate

Concept

Price

Slow

Expensive

Normal

Cheap

Attribute

Figure 4 Domain concept map of adjective for the optical storage domain

3.5. Chance Value Evaluation This study used the diffusion index (DI), influence index (INI), and closeness index (CI) to evaluate the value of candidate chances. This study determined the overall value of a candidate chance based on an integration of these indices. This became the basis for market chance evaluation. These objective indices are detailed as follows: ď Ź DI (Eq. 6) is used to evaluate the surprise level (or uncertainty, amount of information) when a specific candidate chance appears [26]. The lower the probability of a certain candidate chance appearing, the higher the surprise level (or uncertainty, amount of information) it causes. đ??ˇđ??źđ?&#x2018;&#x2013; = â&#x2C6;&#x2019;đ?&#x2018;?đ?&#x2018;&#x;đ?&#x2018;&#x153;đ?&#x2018;? ( where

ď Ź

đ?&#x2018;&#x2021;đ??šđ?&#x2018;&#x2013; đ?&#x2018;&#x2021;đ??šđ?&#x2018;&#x17D;đ?&#x2018;&#x2122;đ?&#x2018;&#x2122;

đ?&#x2018;?đ?&#x2018;&#x;đ?&#x2018;&#x153;đ?&#x2018;?(â&#x2C6;&#x2122;) đ?&#x2018;&#x2021;đ??šđ?&#x2018;&#x2013; đ?&#x2018;&#x2021;đ??šđ?&#x2018;&#x17D;đ?&#x2018;&#x2122;đ?&#x2018;&#x2122;

) Ă&#x2014; đ?&#x2018;&#x2122;đ?&#x2018;&#x153;đ?&#x2018;&#x201D;

1 đ?&#x2018;&#x2021;đ??šđ?&#x2018;&#x17D;đ?&#x2018;&#x2122;đ?&#x2018;&#x2122;

(6)

function of normal distribution total count of candidate chance i appears in blog dataset total count of all candidate chances appear in blog dataset

INI(Eq. 7) is used to evaluate the difference between the graphs with or without a certain candidate chance [26]. This study used the average path length of the whole graph to measure the influence of each candidate chance. đ??źđ?&#x2018; đ??źđ?&#x2018;&#x2013; = đ??´đ?&#x2018;&#x192;đ??żđ??ş â&#x2C6;&#x2019; đ??´đ?&#x2018;&#x192;đ??żđ??şâ&#x2C6;&#x2019;đ?&#x2018;&#x2013; (7)

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

8


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen đ??´đ?&#x2018;&#x192;đ??żđ??ş = where

ď Ź

1 đ?&#x2018;&#x203A;(đ?&#x2018;&#x203A;â&#x2C6;&#x2019;1)

Ă&#x2014; â&#x2C6;&#x2018;đ?&#x2018;&#x2014;,đ?&#x2018;&#x2DC;â&#x2C6;&#x2C6;đ??şđ?&#x2018;Ł đ?&#x2018;&#x2018;đ?&#x2018;&#x2013;đ?&#x2018; đ?&#x2018;Ąđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;

đ??´đ?&#x2018;&#x192;đ??żđ??ş đ??´đ?&#x2018;&#x192;đ??żđ??şâ&#x2C6;&#x2019;đ?&#x2018;&#x2013; đ?&#x2018;&#x2018;đ?&#x2018;&#x2013;đ?&#x2018; đ?&#x2018;Ąđ?&#x2018;&#x2014;đ?&#x2018;&#x2DC;

average path length of whole graph average path length of whole graph without candidate chance i path length from candidate chances j to k

CI(Eq. 8) is used to evaluate the association closeness between a specific candidate chance and other candidate chances by examining the sum of the paths of candidate chances [27]. â&#x2C6;&#x2019;1 đ??śđ??źđ?&#x2018;&#x2013; = (â&#x2C6;&#x2018;đ?&#x2018;&#x2014;â&#x2C6;&#x2C6;đ??şđ?&#x2018;&#x2030; đ?&#x2018;&#x2018;đ?&#x2018;&#x2013;đ?&#x2018; đ?&#x2018;Ąđ?&#x2018;&#x2013;đ?&#x2018;&#x2014; ) (8)

Because each index reflects only one dimension of candidate chances, this study employ logistic regression [28] to evaluate the probability of each candidate chance being a market chance. A higher probability indicates that the candidate chance is more likely to be a market chance that has development potential. This study use the DI, INI, CI of each candidate chance as the input parameters ( ), transform them into standard scores, and apply a logistic function to calculate the probability value of the candidate chance (Eq. 9). and can be obtained through maximum probability estimation. đ?&#x2018;?(đ?&#x2018;Ľđ?&#x2018;&#x2013; ) = where

1 1+exp(â&#x2C6;&#x2019;(đ?&#x203A;ź+â&#x2C6;&#x2018;đ?&#x2018;&#x2014; đ?&#x203A;˝đ?&#x2018;&#x2014; đ?&#x2018;Ľđ?&#x2018;&#x2013;đ?&#x2018;&#x2014; ))

đ?&#x203A;ź đ?&#x203A;˝đ?&#x2018;&#x2014;

=

exp(đ?&#x203A;ź+â&#x2C6;&#x2018;đ?&#x2018;&#x2014; đ?&#x203A;˝đ?&#x2018;&#x2014; đ?&#x2018;Ľđ?&#x2018;&#x2013;đ?&#x2018;&#x2014; ) 1+exp(đ?&#x203A;ź+â&#x2C6;&#x2018;đ?&#x2018;&#x2014; đ?&#x203A;˝đ?&#x2018;&#x2014; đ?&#x2018;Ľđ?&#x2018;&#x2013;đ?&#x2018;&#x2014; )

(9)

intercept term in regression regression coefficient

4. Experimental Results and Analysis 4.1. Experiment Design To verify the effects of SRCDMe, we conduct experiments to verify the advantages and characteristics of SRCDMe. We used English blog article sets2 from the Third International AAAI Conference on Weblogs and Social Media (ICWSM, 2009), which includes blog posts published in 2008, as the experimental data sets. Because studies on chance discovery focusing on optical storage have been previously conducted [6], we also conducted chance discovery in this study based on the optical storage domain to cross validate the advantages and disadvantages of the methods and compare them. To objectively determine whether the market chances identified in this study matched the actual market conditions, we use Google Trends3 to examine the market chances mined in this study. Additionally, we use search volume index (SVI) to evaluate the degree of attention each market chance attracts on the search engines. If the prediction results from the experiments match the subsequent industry conditions, we can prove that the proposed method is feasible in market chance discovery. The experiment design consists of the following portions: (1) Employ the implementation results to observe and review the process of market chance discovery; (2) use the objective Google Trends to examine whether the prediction results match current industry conditions; and (3) use Google Trends to verify the results of previous studies, compare the results, and discuss whether the methods used in this study and previous studies are appropriate. The results of the experiment are detailed in the following section.

4.2. Implementation Results SRCDMe consists of five stages. However, the blog content collection stage is not included in the implementation results because we had already selected the data set from ICWSM 2009 as the experimental data. We first constructed a domain vocabulary for optical storage, comprising 1,098 keywords. After filtering blog posts with keywords related to optical storage, we obtained 272,090 blog posts. The implementation results of blog content preprocessing, term refinement, data and sequential relation mining, and chance value evaluation are detailed as follows: ď Ź In Stage 1, we used the blog content preprocessing phase to extract sequence, causal, and adjective data. We extracted 17,867 sets of sequence data, 699 sets of causal data, and 170,171 sets of adjective data, as shown in Figure 5. These implementation results comprise part of the blog post content and the screenshots of data extraction. Although the blog content underwent preprocessing, the implementation screenshots showed that no 2 3

http://www.icwsm.org/2009/ http://www.google.com/trends/

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

9


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen sequential or casual data were extracted in this example. Adjective data were extracted (e.g., â&#x20AC;&#x153;higher,â&#x20AC;? was used to describe the speed of DVD burning).

Figure 5 Screenshot of blog content preprocessing ď Ź

In Stage 2, we used the term refinement phase to screen crucial words, and used TF-IDF as the basis for screening. The refinement threshold value for series patterns was set at 1.7 and resulted in 10,894 data entries being preserved (78.87%). Because the number of causal patterns was only 614, we preserved all pieces of data. The threshold value for the adjective pattern was set at 3 and resulted in 32,020 (81.85%) data entries being preserved. The implementation screenshots are shown in Figure 6. We calculated the TF, IDF, and TFIDF for all nouns. [n] represents the noun data of the nth group. For example, Group 1 contains two nouns, that is, ArcSoft and Photo CD. The TF for both terms is 0.5, the IDF is, respectively, 6.4 and 8.437, and the TFIDF is, respectively, 3.2 and 4.218. Therefore, the sum of TFIDF is 7.419. After calculating the TFIDF for each group of noun data, we screened them based on TFIDF sums. The noun data that have TFIDF sums greater than the threshold value were selected as the candidate chances for mining in subsequent stages. Because the TFIDF sum of ArcSoft and Photo CD was the highest among all data sets, ArcSoft and Photo CD were selected as the candidate chances in subsequent relation mining.

Figure 6 Screenshot of term refinement ď Ź

In Stage 3, we used the data and sequential relation mining phase to mine the association rules and sequential relations among candidate chances. The implementation screenshots are shown in Figure 7. This stage consists of the following steps: (1) Time relation mining. We mined the time sequences of technologies and noted the time stamps of the first time each technology name appeared. This included 50 technology names (parameter đ?&#x2018;˘ = 86400, Ď&#x192; = 0.5).

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

10


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

(2)

(3)

For example, the time stamp of CD technology was 1217549914. Table 2 shows the correlation weight values of the 50 technology names. We observed that the correlation weight values đ?&#x2018;¤đ?&#x2018;&#x201D; and đ?&#x2018;¤đ?&#x2018;&#x2122; were proximal. When the difference of t +1 â&#x2C6;&#x2019; t was small, đ?&#x2018;¤đ?&#x2018;? was further from the other two; when the difference of t +1 â&#x2C6;&#x2019; t was identical, đ?&#x2018;¤đ?&#x2018;&#x201D; was ranged between đ?&#x2018;¤đ?&#x2018;&#x2122; and đ?&#x2018;¤đ?&#x2018;? , reflecting the correlation weights in a neutral manner. Therefore, we used đ?&#x2018;¤đ?&#x2018;&#x201D; as the correlation weight calculation method, and the distance cost for two adjacent technology names was 1 â&#x2C6;&#x2019; đ?&#x2018;¤đ?&#x2018;&#x201D; . Causal relation mining. We mined the causal relations of candidate chances, noted the number of appearances of the candidate chances that had causal relations, and calculated the probability of their appearances or occurrences (initial probability value of 0.5, cumulative probability value of 0.005). For example, Audio CD - make - CD appeared once, with a probability of 0.5, and the distance cost between Audio CD and CD was 1 â&#x2C6;&#x2019; 0.5 = 0.5. Hierarchical relation mining. We mined the concepts and attributes contained in technology names. For example, the concepts of CD technology include novelty, and the attributes of this concept include new, 1 earlier, and old. The distance cost between CD and novelty is (the reciprocal of the number of co1

(4)

4072

occurrences), and the distance cost between novelty and old is . 8 Association rules mining. We mined the association rules among the candidate chances (đ?&#x2018;&#x2020;đ?&#x2018;˘đ?&#x2018;?đ?&#x2018;?đ?&#x2018;&#x153;đ?&#x2018;&#x;đ?&#x2018;Ą = 50%, đ??śđ?&#x2018;&#x153;đ?&#x2018;&#x203A;đ?&#x2018;&#x201C;đ?&#x2018;&#x2013;đ?&#x2018;&#x2018;đ?&#x2018;&#x2019;đ?&#x2018;&#x203A;đ?&#x2018;?đ?&#x2018;&#x2019; = 95%). For example, one association rule is 3tPro and Ultra, where the confidence is 100%, 1 and the distance cost is 100% â&#x2030;&#x2C6; 0.37. đ?&#x2018;&#x2019;

Figure 7 Screenshot of data and sequential relations mining

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013

11


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen Table 2 Top 10 of predicted market chance Technical Name DVD Blu-ray Disc DVD-Audio CD DVD-RW DVDRAM HD DVD DVD-R DL DVD+R DL CD-ROM DVD-RAM DVD-R DVD+RW CD-RW DVD+R CD-R DVDRW CDROM CDRW Joliet Laserdisc DVD authoring Photo CD ISO 9660 Audio CD Virtual drive CD+G Optical disc MagicISO Packet writing Red Book Compact Disc Finalize CDR MiniDisc GD-ROM Video CD blue laser HDDVD El Torito CD-Text Universal Media Disc CD Video Master recording China Blue High-definition Disc Rock Ridge Drive Letter Access Hi-MD CD-ROM XA Mount Rainier

Timestamp(đ?&#x2018;Ąđ?&#x2018;&#x2013; ) 1217549914 1217551261 1217552377 1217552377 1217552925 1217554225 1217559892 1217564748 1217564748 1217564748 1217564748 1217564748 1217567252 1217567252 1217567252 1217567252 1217636971 1217637444 1217673939 1217674946 1217685356 1217699964 1217704547 1217708942 1217723094 1217762664 1217764989 1217805365 1217881456 1217889624 1217907615 1217971512 1217978770 1217978787 1218060790 1218065945 1218358735 1218518812 1218809238 1219083356 1219098570 1219452560 1219457851 1219551045 1219797014 1219959400 1220064585 1221529317 1222021348 1222710943

đ?&#x2018;Ąđ?&#x2018;&#x2013;+1 â&#x2C6;&#x2019; đ?&#x2018;Ąđ?&#x2018;&#x2013; 1347 1116 0 548 1300 5667 4856 0 0 0 0 2504 0 0 0 69719 473 36495 1007 10410 14608 4583 4395 14152 39570 2325 40376 76091 8168 17991 63897 7258 17 82003 5155 292790 160077 290426 274118 15214 353990 5291 93194 245969 162386 105185 1464732 492031 689595

đ?&#x2018;¤đ?&#x2018;&#x201D; 0.9893 0.9911 1.0000 0.9956 0.9896 0.9556 0.9618 1.0000 1.0000 1.0000 1.0000 0.9801 1.0000 1.0000 1.0000 0.5716 0.9962 0.7462 0.9920 0.9199 0.8894 0.9639 0.9654 0.8927 0.7280 0.9815 0.7233 0.5431 0.9366 0.8656 0.5989 0.9434 0.9999 0.5180 0.9595 0.0955 0.2769 0.0973 0.1109 0.8851 0.0584 0.9584 0.4735 0.1390 0.2718 0.4301 0.0000 0.0193 0.0040

đ?&#x2018;¤đ?&#x2018;&#x2122; 0.9846 0.9872 1.0000 0.9937 0.9852 0.9384 0.9468 1.0000 1.0000 1.0000 1.0000 0.9718 1.0000 1.0000 1.0000 0.5534 0.9946 0.7030 0.9885 0.8925 0.8554 0.9496 0.9516 0.8593 0.6859 0.9738 0.6815 0.5317 0.9136 0.8277 0.5749 0.9225 0.9998 0.5131 0.9437 0.2279 0.3505 0.2293 0.2397 0.8503 0.1962 0.9423 0.4811 0.2600 0.3473 0.4510 0.0557 0.1494 0.1113

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

đ?&#x2018;¤đ?&#x2018;? 0.5000 0.5000 1.0000 0.5000 0.5000 0.5000 0.5000 1.0000 1.0000 1.0000 1.0000 0.5000 1.0000 1.0000 1.0000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.0625 0.2500 0.0625 0.0625 0.5000 0.0313 0.5000 0.2500 0.1250 0.2500 0.2500 0.0000 0.0156 0.0039

Š The Author(s), 2013

12


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen 

In Stage 4, we used the chance value evaluation phases to evaluate the value of the candidate chances, as shown in Figure 8. The upper half of the figure shows the numerical results of evaluating the indicators of the candidate chances, such as -7.6532, -0.0028, -0.0026, and 40.8014, respectively, for the DI, INI, CI, and RI of the DVD technology. The lower half of the figure shows the results of an overall evaluation of the value of each candidate chance, such as 1 for the market chance probability of the Technicolor Company. Because the final chance guideline map was highly complex, Figure 9 shows only part of the results for the relations. Additionally, the figure highlights the market chances obtained based on the overall chance value evaluation results. In this figure, Audiovox and Virtual Drive are show as market chances.

Figure 8 Screenshot of chance value evaluation

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

13


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

Figure 9 Final chance guideline map Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

14


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

15

4.3. Neutral Experimental Results In this experiment, we used Google Trends to determine the degree that market chances matched current industry conditions. An SVI greater than 1 for market chances indicates that the user has used the market chance for online searching, reflecting the userâ&#x20AC;&#x2122;s degree of attention to this chance. Therefore, when the SVI of a market chance was once greater than 1 (after 2008), experimental results are considered to match current industry conditions (receives attention). The parameters of the logistic regression following the final training are = â&#x2C6;&#x2019;4.4040, 1 = 1 .3768, = 0.1416. 2 = 9.0786, and Table 3 shows the overall experimental results. When the threshold value is 0.4, the first eight of the top ten items that have the highest predicted probability of market chances are companies, with an event at the number nine ranking and a technology at the number ten ranking. A comparison shows that seven of these results matched the results from Google Trends, with an accuracy rate of 70%. As an example, Audiovox Corporation (NASDAQ ticker symbol: VOXX) is an American consumer electronics company. Figure 10 shows its share prices and trading volume between 2008 and 2012. Although the company declined to a relatively low level in 2009, it subsequently completed a series of acquisitions and has come to occupy an important position in the consumer electronics market. Ranked number nine, the pollution event (environmental pollution issues) has received significant market attention and remains a key consideration for improvement in companiesâ&#x20AC;&#x2122; research and development, as well as production. Table 3 Top 10 of predicted market chance for companies Rank Type 1 2 3 4 5 6 7 8 9 10

Chance

Company Name Company Name Company Name Company Name Company Name Company Name Company Name Company Name Event Name Technical Name

Audiovox Humax JVC Nero AG Pacific Digital Sentinel Technicolor Universum Pollution Virtual Drive

DI

INI

CI

-0.1567 6.4E-03 -0.1873 -0.1566 1.6E-01 -0.1905 -0.1635 6.4E-03 -0.1869 -0.1560 6.4E-03 -0.1873 -0.1560 8.4E-02 -0.1876 -0.1636 6.4E-03 -0.1873 -0.1577 6.4E-03 -0.1873 -0.1560 6.4E-03 -0.1873 -0.1587 -7.1E-02 0.4586 -0.1699 0.3174 -0.1884

p

SVI > 1

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9774 0.4142

Y Y N N Y Y N Y Y Y

Because technology is usually a core ability of companies, we examined the market chance of the â&#x20AC;&#x153;technology category.â&#x20AC;? We employed the rare index (RI; Eq. 10)[15] to evaluate the degree that technologies were mastered by companies. Table 4 shows the experimental results of the technology category with a threshold value set at 0.2. In Google Trends, five technologies were evaluated to be active on the market, matching 50% of the results of this study. Ranked number one, virtual drive is a virtual storage medium that can be understood as a cloud storage service, such as Googleâ&#x20AC;&#x2122;s recently released Google Drive, Microsoft SkyDrive, Apple iCloud, and Dropbox from the Dropbox company, which are the most-used storage media at the current time. Ranked number two, Blu-ray Discs possess a similarly high application value because of their hardware technology and consumersâ&#x20AC;&#x2122; audio and video quality requirements. đ?&#x2018;&#x2026;đ??źđ?&#x2018;&#x2013; = đ?&#x2018;&#x2021;đ??šđ??źđ??ˇđ??šđ?&#x2018;&#x2013; Ă&#x2014; log ( where

đ?&#x2018; đ?&#x2018;? đ?&#x2018; đ?&#x2018;?,đ?&#x2018;&#x2013; đ?&#x2018;Ąđ?&#x2018;&#x2013; đ?&#x2018;Ąđ?&#x2018;  đ?&#x2018;Ąđ?&#x2018;&#x2019;

đ?&#x2018; đ?&#x2018;?

đ?&#x2018; đ?&#x2018;?,đ?&#x2018;&#x2013;

)Ă&#x2014;(

đ?&#x2018;Ąđ?&#x2018;&#x2013; â&#x2C6;&#x2019;đ?&#x2018;Ąđ?&#x2018;  đ?&#x2018;Ąđ?&#x2018;&#x2019; â&#x2C6;&#x2019;đ?&#x2018;Ąđ?&#x2018; 

+ 0.0001)

(10)

Total number of the company name Number of co-occurrences of company name and technology name i Time of the first appearance of technology name i Time of the first appearances of all technology names Time of the final appearances of all technology names

Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen

16

Figure 10 Audiovox share prices and trading volumes between 2008 and 2012

Table 4 Top 10 of predicted market chance for technologies Rank 1 2 3 4 5 6 7 8 9 10

Chance Virtual Drive Blu-ray Disc DVD-R DL DVD-R Master Recording DVD-Audio Hi-MD China Blue High-Definition Disc ISO 9660 DVDRAM

DI

INI

CI

p

RI

SVI > 1

-0.1699 -0.1566 -0.1635 -0.1560 -0.1560 -0.1565 -0.1560 -0.1560 -0.1561 -0.1608

0.3174 0.1619 0.1619 0.0842 0.0842 0.0842 0.0842 0.0842 -0.0075 -0.0713

-0.1884 -0.1900 -0.1866 -0.1876 -0.1879 -0.1879 -0.1886 -0.1888 -0.1906 -0.1861

0.4142 0.3827 0.3719 0.3623 0.3571 0.3516 0.3448 0.3399 0.2350 0.2176

-1.3612 -1.3612 -0.6348 -0.6132 -0.1093 -0.1093 0.7911 1.0803 1.0803 1.2375

Y Y N N N Y N N Y Y

4.4. Comparison and Discussion The results show that RI and CI values remained primarily stable, whereas INI values more significantly influenced predictions for market chance probability. This is because when evaluating the candidate chances from the overall relation structure, differences are more significant when generating different candidate chance overall relation structures, thereby resulting in significantly different INI results for various candidate chances and strong differentiation for evaluating candidate chances. The chance discovery results of previous studies [6] indicated that the Victor Company of Japan, Ltd. (JVC) and Eternal Chemical Company Ltd. were market chances. The SVI of JVC was Journal of Information Science, XX (X) 2013, pp. 1â&#x20AC;&#x201C;19, DOI: 10.1177/016555150nnnnnnn

Š The Author(s), 2013


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen less than 1, whereas that of Eternal was greater than 1. We predicted that market chances included JVC in this study; thus the prediction that JVC would be a market chance was consistent with the results of previous studies. “Eternal” is a company name, “Eternal Chemical” (Eternal Chemical Company Ltd.), in the optical storage domain. However, “eternal” takes the actual definition and connotation of the adjective “eternal,” and is a word with multiple meanings in semantics. Consequently, when we input “eternal” into Google Trends, the search results may not fully represent the market attention that Eternal Company attracts. Therefore, we used “Eternal Chemical” instead of “Eternal” in Google Trends searches to obtain accurate data regarding Eternal Chemical Company Ltd. However, because of insufficient searches for Eternal Chemical, no trend results could be generated on Google Trends, indicating that Eternal Company had not attracted market attention; hence, it was not a market chance. In summary, the method for market chance discovery proposed by this study can accurately discover and identify market chances that have development potential or that are undiscovered by previous studies. This proves that the method proposed in this study is superior to that proposed in previous studies. In the logistic regression, the parameters were = −4.4040, 1 = 1 .3768, 2 = 9.0786, and = 0.1416. This indicates that DI and INI are crucial indices for evaluating market chances, whereas CI is not. However, in Table 3 and 4, neither DI nor CI values clearly reflect the values of the candidate chances, with the majority of the DI and CI of all chances ranging between -0.156 and -0.191. INI was an index value with significant differences. DI and CI values were close because the value evaluation was based on the co-occurrences of candidate chances. However, the indices cannot clearly reflect the highs and lows of values because the technology names usually do not appear in the same sentences. Additionally, judging from parameter , relations other than association rules and sequential relations exist among market chances and were neglected by this study. Parameter indicates that CI does not have significant effects; thus, we can remove this index to reduce the amount of calculation.

5. Conclusion and Future Work Using the SRCDMe developed by this study, the researchers can mine the association rules and sequential relations among candidate chances. Additionally, this study objectively evaluated the value of candidate chances to discover and mine the real market chances. The contributions of this study are as follows: (a) Developed a data mining and sequential relation mining method, which is a mining method for market element association that can be used for subsequent market trend analysis. This method enables the analysis process to understand the overall appearance of the market. (b) Developed a chance value evaluation method that assists people with insufficient experience or background knowledge to conduct chance discovery for the market and identify market chances that have development potential. (c) Developed a chance guideline map, which is a visualization relationship map for time, causal, and hierarchical relations, as well as association rules among candidate chances. On this map, the researchers also marked the market chances that have high value to facilitate users to understand the relationships among candidate chances and provide guidance for the positions of market values. (d) The method developed by this study achieved satisfactory benefits. A comparison between the experimental results and objective Google Trends evaluations shows that the SRCDMe is superior to the method used in previous studies. Additionally, this method can be used to discover new market chances not identified in previous studies, which demonstrates its advantages. This study used logistic regression to evaluate the value of candidate chances. = −4.4040 represents the error generated because this study neglected other possible relations among candidate chances. Future studies can be conducted to discuss the positive and negative relations or indirect relations among candidate chances, as well as other relations not considered in this research methodology. This will enable chance discovery results to identify more valuable market chances and provide for closer associations to current market conditions. For diversification operations , many enterprises develop products or services that incorporate multiple industries. To increase market shares, they must develop new businesses, and even decentralize operational risks and improve operational security. However, because different industries have different specific technologies, methodologies, and attributes, this study must construct a domain vocabulary for each domain to apply the methodology developed in this study to interdisciplinary market chance discovery. Future studies can be conducted to invent an automatic domain terminology or proper noun identification method for domain vocabulary construction. This will enable the application of the SRCDMe to other industries to activate industry development and develop new market chances.

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

17


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

Chen H, Chau M, Zeng D. CI Spider: a tool for competitive intelligence on the Web. Decision Support Systems. 2002 Dec;34(1):1–17. Li Y-R, Wang L-H, Hong C-F. Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications. 2009 Apr;36(3):5200–5204. Goda S, Ohsawa Y. Chance Discovery in Credit Risk Management; Estimation of Chain Reaction Bankruptcy Structure by Chance Discovery Method. In: IEEE International Conference on Systems, Man and Cybernetics, 2006 (SMC’06). Taipei, Taiwan: 2006. p. 2127 –2132. Hong C-F. Qualitative Chance Discovery – Extracting competitive advantages. Information Sciences. 2009 May 13;179(11):1570–1583. Karkkainen H, Elfvengren K. Role of careful customer need assessment in product innovation management--empirical analysis. International Journal of Production Economics. 2002;80(1):85–103. Hong C-F. Discovering the rare opportunity by strategy based interactive value-focused thinking model. International Journal of Knowledge-Based and Intelligent Engineering Systems. 2007 Jan 1;11(5):259–271. Hong C-F, Lin M-H, Yang H-F, Huang C-J. A novel chance model for building innovation diffusion scenario. In: IEEE International Conference on Systems Man and Cybernetics (SMC 2010). Istanbul, Turkey: 2010. p. 3845–3852. Iwase Y, Seo Y, Takama Y. Scenario to Data Mapping for Chance Discovery Process. Soft Computing - A Fusion of Foundations, Methodologies and Applications. 2007;11(8):773–781. Ohsawa Y. Chance discovery as value sensing by data based meta cognition. In: Chbeir R, Badr Y, Abraham A, Laurent D, Koppen M, Ferri F, et al., editors. CSTST 2008: Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology. Cergy-Pontoise, France: ACM; 2008. p. 5–6. Rochford L, Rudelius W. New product development process: Stages and successes in the medical products industry. Industrial Marketing Management. 1997 Jan;26(1):67–84. Kim WC, Mauborgne R. Blue Ocean Strategy: How to Create Uncontested Market Space and Make Competition Irrelevant. 1st ed. Harvard Business Review Press; 2005. Ohsawa Y. Chance discoveries for making decisions in complex real world. New Generation Computing. 2002;20(2):143–163. Ohsawa Y, Nara Y. Understanding Internet users on double helical model of chance-discovery process. In: Proceedings of the 2002 IEEE International Symposium on Intelligent Control. 2002. p. 844 – 849. Sakakibara T, Ohsawa Y. Gradual-increase extraction of target baskets as preprocess for visualizing simplified scenario maps by KeyGraph. Soft Computing. 2007;11(8):783–790. Li Y-R, Wang L-H, Hong C-F. Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications. 2009 Apr;36(3, Part 1):5200–5204. Chen H, Chau M, Zeng D. CI Spider: a tool for competitive intelligence on the Web. Decision Support Systems. 2002 Dec;34(1):1–17. Abe A, Ohsawa Y. Special issue on chance discovery. International Journal of Knowledge-based and Intelligent Engineering Systems. 2007;11(5):255–257. Maeno Y, Ohsawa Y. Human ndash;Computer Interactive Annealing for Discovering Invisible Dark Events. IEEE Transactions on Industrial Electronics. 2007 Apr;54(2):1184 –1192. Wang H, Ohsawa Y, Nishihara Y. Innovation support system for creative product design based on chance discovery. Expert Systems with Applications. 2012 Apr;39(5):4890–4897. Kim K-J, Jung M-C, Cho S-B. KeyGraph-based chance discovery for mobile contents management system. International Journal of Knowledge-based and Intelligent Engineering Systems. 2007 Dec;11(5):313–320. Ohsawa Y. Chance discoveries for making decisions in complex real world. New Gener Comput. 2002 Jun 1;20(2):143–163. Atkinson J, Rivas A. Discovering Novel Causal Patterns From Biomedical Natural-Language Texts Using Bayesian Nets. IEEE Transactions on Information Technology in Biomedicine. 2008 Nov;12(6):714 –722. Girju R, Moldovan D. Text Mining for Causal Relations. In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference (FLAIRS). Pensacola Beach, Florida, USA: AAAI Press; 2002. p. 360–364. Chang JH. Mining weighted sequential patterns in a sequence database with a time-interval weight. Knowledge-Based Systems. 2011 Feb;24(1):1–9. Tan P-N, Steinbach M, Kumar V. Introduction to data mining. Pearson Addison Wesley; 2006.

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

18


Cho-Wei Shih, Shih-Cheng Chang, Hui-Chuan Chu, and Yuh-Min Chen [26] Chiu T-F, Hong C-F, Chiu Y-T. An Experiment Model of Grounded Theory and Chance Discovery for Scenario Exploration. In: Nguyen N, Katarzyniak R, Chen S-M, editors. Advances in Intelligent Information and Database Systems. Springer Berlin / Heidelberg; 2010. p. 291–301. [27] Yang H-F, Lin M-H. Innovative Chance Discovery – Extracting Customers’ Innovative Concept. In: Giacobini M, Brabazon A, Cagnoni S, Di Caro G, Ekart A, Esparcia-Alcazar A, et al., editors. Applications of Evolutionary Computing. Springer Berlin / Heidelberg; 2009. p. 462–466. [28] Hosmer DW, Lemeshow S. Applied logistic regression. John Wiley and Sons; 2000.

Journal of Information Science, XX (X) 2013, pp. 1–19, DOI: 10.1177/016555150nnnnnnn

© The Author(s), 2013

19


2013 developing a chance discovery method with consideration of sequential relationship