Issuu on Google+

S0806637

Maintaining Reputation through Online Analytics Why the Public Relations Industry must Adapt or Die

Presented as part of the requirement for an award within the Undergraduate Modular Scheme at the University of Gloucestershire (April 2012)

Michael White Š 2012

Word Count: 9,876


April 2012

Maintaining Reputation through Online Analytics

PUR334

Declaration DECLARATION: This dissertation is the product of my own work. I agree that it may be made available for reference and photocopying at the discretion of the University.

Author’s Signature:

Michael White

Date 11/04/2012

1|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

Abstract Over the last 6 years the communication landscape has changed significantly. The advent of Facebook, Twitter, YouTube and other social networks introduced a range of additional communication channels. Never has it been more important for the public relations industry to maintain reputation. Whilst an understanding of social networking tools now exists within the industry, confusion still exists surrounding the range of metrics available online and how these can be utilised to effectively provide Return on Investment (ROI) for clients. A number of 3rd party measuring tools now exists allowing similar ‘search, measure, understand and engage’ solutions.

This report uncovers in-depth alternative solutions to the terms ‘measure and understand’ for capturing quantitative and qualitative data. These measuring metrics and techniques may be used by the public relations industry to achieve their campaigns’ objectives.

Main Findings: 1. ROI is relied upon for reputation management and direct sales. 2. Third party measuring tools exist but are not perfect. 3. The PR industry needs standardisation. 4. Semantic Analysis works but has not yet been perfected.

2|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

Acknowledgements Throughout writing this dissertation concerning online measurement the author drank approximately 180 cups of coffee, smoked 450 cigarettes and listened to 900 hours’ worth of music. Despite this congenial lifestyle writing this dissertation was only made possible by the following people.

The author’s parents: For having hope in their seven year old child with dyslexia who could not read or write.

Lecturer, Practitioner and Extraordinaire, David Phillips: For his guidance surrounding semantic analytics.

Microsoft Librarian, David K. Stewart: For providing research through the Microsoft UK library.

Graduated PR student, Michael Healey: Who once interviewed the author for his own dissertation and was an inspiration for writing this one.

Wikipedia: The author’s unreferenced secret weapon.

3|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

Table of Contents Declaration

1

Abstract

2

Acknowledgements

3

Introduction

6

1.0 Literature Review

8

1.1 Public Relations Industry: Adapt or Die

9

1.2 Web Analytics 2.0

12

1.3 How to Measure Sales and Relationships

16

1.4 Introducing the Semantic Web

20

2.0 Methodology

24

2.1 Research Sample Design

26

2.2 Ethical Considerations

28

3.0 Latent Semantic Indexing Research into Neville Hobson’s Twitter timeline

29

3.1 LSI Python Script

30

3.2 Retrieval, Filter and Identification

32

3.3 Term Count Model and Singular Value Decomposition

33

3.4 The Results

38

4.0 Evaluation

41

4.1 Evaluation of Latent Semantic Indexing

41

4.2 Bayesian Inference and Other Interpretations

43

4|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

5.0 Conclusion

45

5.1 ROI is relied upon for reputation management and direct sales

45

5.2 Third party measuring tools exist but are not perfect

45

5.3 The PR industry needs standardisation

45

5.4 Semantic Analysis works but has not yet been perfected

45

References

47

Illustrations

51

Appendix

59

5|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

Introduction The public relations industry is in a state of rapid change. On the 1st March 2012 the Public Relations Society of America (PRSA) announced the results of a vote which concluded with their modern definition of PR (White, 2012):

“Public relations is a strategic communication process that builds mutually beneficial relationships between organizations and their publics.”

This definition is similar to the UK’s Chartered Institute of Public Relations (CIPR) (CIPR, 2012):

“Public relations is about reputation – the result of what you do, what you say and what others say about you. Public relations is the discipline which looks after reputation, with the aim of earning understanding and support and influencing opinion and behaviour. It is the planned and sustained effort to establish and maintain good will and mutual understanding between an organisation and its publics”.

The Public Relations Consultants Association (PRCA), a UK organisation, definition of PR is extremely similar to the CIPR (PRCA, 2012):

“Public relations is all about reputation. It’s the result of what you do, what you say, and what others say about you. It is used to gain trust and understanding between an organisation and its various publics – whether that’s employees, customers, investors, the local community – or all those stakeholder groups…”

For this PR society, chartered institute and association the emphasis on ‘reputation’ is clear but a modern definition of PR must take into consideration how the growth of digital communication channels provide an opportunity for

6|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

the PR industry to expand into additional service areas outside of reputation management.

Furthermore, a viable method of measuring reputation has not yet been discovered. Measuring online sentiment levels fulfils the CIPR’s understand of reputation being “what others say about you� but it is not yet possible to align sentiment with the global values of a brand, product or service.

Whilst the sharp increase of communication channels being made available across a range of communication platforms will inevitably impact reputation management, the definition of PR should also be in question. Since 2006 the introduction of, now popular, social networks have made available additional measurement metrics. Some of these metrics are already being utilised by the online advertising industry to generate direct sales for their clients. As made clear by some small public relations agencies managing online advertising campaigns for their clients (Jefkins, 2000). This dissertation explores the possibility of public relations finding additional ways to measure reputation online and understanding that digital PR is not just concerned with reputation but also direct sales.

Within the literature review a succinct but broad assessment of the various online measurement metrics were examined before an in-depth study into how semantic analysis could be used to measure public relations activities. All documents associated with the study can be found in the appendices.

7|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

1.0 Literature Review This review seeks to identify, examine and compare key forms of online measurement. The purpose is to understand the scale of online metrics available for digital public relations campaigns and the interpretation of data involved. The information within this literature review will continue to serve as a necessary foundation for the research present in section 3.0.

Preparing this review has involved consolidating relevant published texts, gaining insights through marketing based blogs, examining online journal databases and keyword searches on micro-blogging platform Twitter. The author’s personal experiences within the field of public relations and online advertising are also included.

This review is comprised of the following sections:

Public relations industry: adapt or die

Web Analytics 2.0

How to measure sales and relationships

Introducing the semantic web

8|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

1.1 Public Relations Industry: Adapt or Die Edelman’s annual 8095 report researched 3,100 millennials across 8 different countries. The Millennial generation accounts for those between the ages of 17 to 32 as of 2010, their behaviours showing a stark difference compared to baby boomers (born early 1946 – 1964) and generation x (born early 1960 – 1980). Evidence in the report exampled the close relationship millennials have with brands online (Gould, 2010):

28%relied upon brands to make a positive impact in the world

36% relied upon brands to learn about new trends

18% announced they would switch to a competing brand if they were offered tools to help them in other areas of life

16% relied upon brands to help achieve personal goals

Organisations must ensure that their brands adapt effectively to fit to the online environment, often referred to as Web 2.0 (Gordon, 2011). The pressure is on the public relations industry to have the confidence to manage customer relations on the front lines. The latest PRCA barometer revealed the gloomy outlook accurately surmised through a response to the report by Weber Shandwick’s vice-president (Owens, 2012),

“Clients are saying there is an uncertain market and that we have got to be smarter with our budgets. We are seeing more quarter-by-quarter release of budgets – there is a desire for more control”.

PRCA’s barometer revealed a worrying lack of confidence which the public relations industry is beginning to face on the verge of a possible double-dip recession. Clients are holding back their budgets and the public relations industry needs to prove effective ROI. The horizon of social networking platforms over the last 6 years has pressured a vast array of industries to adapt or die. Whilst public

9|Page


April 2012

Maintaining Reputation through Online Analytics

PUR334

relations agencies, in-house professionals and consultants are all gradually endorsing social media as part of a wider campaign strategy – knowing strategy and tactics is not enough. Calculations of performance measurement must reach a standard which not only upholds the values in the definition of public relations but will be endorsed by the CIPR (Chartered Institute of Public Relations). Not only have the tools which public relations professionals use changed, but the industries very definition must be adapted.

Figure 1 - ROI

The formula for ROI calculates the return of an investment divided by the costs (Investopedia, 2011).

The public relations industry has to identify the key values from social media, in relation to the campaign they are running, in order to conclude the necessary ROI calculation. Public relations theory is integral to understanding how communication channels should adapt.

Prominent thought leader, Brian Solis, announces in his latest book “The End of Business as Usual” (2011) that the medium is no longer the message. A play on words from Marshall McLuhan’s famous coinage from many years before, “the medium is the message”. Audiences are heavily sharing on social networks which are transforming behaviours which, in western society, insinuating the hypodermic needle theory ineffective. According to hypodermic needle theory (also known as magic bullet theory), “the mass media could influence a very large group of people directly and uniformly by ‘shooting’ or ‘injecting’ them with appropriate messages designed to trigger a response” (Gupta, 2006, p. 36). According to Brian Solis, “media channels that compete for our attention are transforming our behaviours, empowering users to take control of the

10 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

information that reaches them… messages are reborn through context and the relevant experiences of people and organisations we value” (Solis, 2011, p. 15).

The public relations industry has reached a critical stage requiring quick but considered evolution. All content focused industries must evolve much like natural selection in nature. When British evolutionary biologist, Richard Dawkins, wrote “Climbing Mount Improbable” he referred to an analogy of creatures reaching the peak of their evolution which resulted in their fixture in the natural world or extinction as another creature continued through natural selection. The same applies for the public relations industry as the internet landscape is shared with online advertising. The CIPR must protect the industry by defining its role through the purpose of public relations campaigns. The public relations we see today may be indistinguishable in three years’ time.

Only three years ago there were many websites designed with landing pages for users once referred through a search engine (Phillips & Young, 2009). Last year Facebook could have been considered the social hub for many users before visiting a website. In the last few months Google+’s affect upon the Google search algorithm has meant an era of social search (Goold, 2012). The landing page of a website could be considered less significant in an era when online recommendation has first taken place. A powerful factor considering Edelman’s 8095 report (at the beginning of this chapter) as more millennials discover through sharing. This is only one of many developments which public relations have experienced in the 21st Century. In a recent CIPR interview Dr Jon White provided a quick definition of public relations as a social psychology (CIPR TV, 2011); the public relations industry must understand how to measure and understand. Discovering ROI measurements starts through evaluation of a messages’ context which explains relevancy for a public relations campaign. The industry must adapt, not die.

11 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

1.2 Web Analytics 2.0 The public relations industry must adapt or die which is why measurement is integral for every business to survive. To understand corporate reputation, relationships must be measured for success (Paine, 2011). The term Web 2.0 is frequently referred to in context of the evolution of online - websites provide the facilities for information sharing and collaboration. This form of communication can be likened to several of Grunig and Hunt’s four models (Grunig & Hunt, 1984):

1. Press Agentry Description: One Way Communication. Publicity focused In Practice: Little research into the audience necessary. Half-truths can be told with the outcome of behaviour manipulation.

2. Public Information Description: One Way Communication. Accuracy Necessary. In Practice: Little research into the audience necessary. Accuracy is essential but feedback is not measured.

3. Two Way Asymmetric Description: Feedback used to change attitudes In Practice: Feedback from the audience used to adapt messages for behavioural change, not manipulation.

4. Two Way Symmetric Description: A conversation In Practice: Removes the need of a journalist as a mediator, allowing conversation and adaptation from both parties involved.

12 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

In terms of online communication channels Web 1.0 describes how messages were communicated across websites as one way communication through the use of ‘Press Agentry’ and ‘Public Information’ models. Just as the information was communicated it could be said that analytics 1.0 were apparent. The metrics available were found on the basis of a clickstream data. This data has its limitations. Avinash Kaushik is the author of the leading research and analytics blog, Occam’s Razor. Within his latest book “Analytics 2.0” he makes the distinction that clickstream asks the question ‘what?’ rather than ‘why?’

Clickstream data includes (Google Analytics, 2012):

Visits – The total amount of visits to a website

Unique Visits – The unduplicated amount of visits to a website

New visits – A measurement of new visits versus returning visits

Page views – The amount of pages views on a website

Time – The average amount of time from all visits

Frequency of Visit – The total amount of times a user has returned to a website

Bounce Rate – The percentage of single-page visits in which the person left your site from the landing page

Traffic Sources – This includes data from search engines, referring sites and other traffic sources.

Keywords – This shows the keywords a user typed into a search engine before arriving on a website.

What clickstream statistics can mean for a digital public relations campaign is increased revenue, reduced costs and an improvement of customer satisfaction (Kuashik, 2010). Google Analytics specialises in clickstream data analysis, it is free to use and vital to measure results online. The formula for use depends upon the values you place in your ROI.

13 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

Figure 2 - ROI

The starting origin of online collaboration is almost impossible to pinpoint. It may have begun with Morse code in the 1800s. In reality collaboration began with the arrival of two historical events; the CTSS (Compatible Time-Sharing System) and the invention of HTML (Hypertext Mark-up Language). Both of these developments are examples of the human and technological developments which explain where Web 2.0 is today. Email was human communication and HTML used hyperlinks which is what defines the internet as WWW (World Wide Web). The Barabasi-Albert model is drawn from an algorithm which represents scale-free networks (Barabasi et al, 1999). It is an example of the interconnected structure of the internet but also how humans connect across a social network.

Figure 3 - Barabasi-Albert model

14 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

The Barabasi-Albert model above is shown with 18 points of connection. Imagine the scale of Facebook with its 800 million active users, 800 million points of connection (Facebook, 2012). Web Analytics 2.0 is made possible through the transparency which is exhibited through the content sharing across a vast array of social networking platforms. Content is flowing freely across the internet and it must be listened to and measured because:

You need to keep track of your stakeholders

You need to provide your client the best ROI

We need the public relations industry to evolve

Over the last 6 years we have not just seen the rise of social networking platforms but also 3rd party measuring tools such as Brandwatch, Radian6 and Sysomos. These top self-service social media analytics all offer services playing on a variation of ‘search, measure, understand and engage’ technology. Organisations who use these tools as part of a social media strategy type in search terms along with Boolean strings – the results not only showing what customers may be remarking but allows organisations to plan engagement tactics. In evaluating this data it is necessary to grasp the definitions upheld by the industry:

Quantitative: Data that refers to numbers and frequencies (number of updates, average subscribing rate, etc.) Qualitative: Data that provides information of meaning (status updates, tweets, etc.) Correlation: Works with quantifiable data to find relation between variables.

The exponential growth of social media requires public relations industry to consider the correlation of data before the data mining processes of 3 rd party providers. We are currently heading towards an era of correlation based digital public relations where mass sentiment results in reaction based communication.

15 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

1.3 How to measure sales and relationships Web Analytics 2.0 is concerned with presenting the ‘What?’ and ‘Why?’ behind clickstream data (Kuashik, 2010). During the 1990s the online advertising industry witnessed the revolution of ‘one-to-one’ marketing which is “where direct response, direct mail, the internet and the interactive opportunities of digital TV come together” (White, 2000, p. 203). Over the following 10 years the online advertising formed its own standardisation for measurement. This involves measuring the below metrics (Gay, Charlesworth and Esen, 2007):

CPC (Cost Per Click)

CPA (Cost Per Action)

CPL (Cost Per Lead)

CTR (Click Through Rate)

CR (Conversion Rate)

CPM (Cost Per Thousand)

Calculations CTR = CLICKS / IMPRESSIONS CR = CPA / CLICKS CPM = (TOTAL COST / IMPRESSIONS)*1000

Using the above metrics and calculations correlation may then be found between the advertised product/brand and advertising MPU (Media Placement Units). For instance a clothing brand may be advertising jeans for males between the ages of 18 – 25. Costs within network advertising can be attributed to individual metrics; usually the CPC, CPA or CPM. In some instances a hybrid cost method may be attributed (CPC and CPA) to provide the client with a better ROI.

This advertising campaign is run on the basis of sales which mean an action tag will be placed on the client’s website attributed with the cost of £5.00; this is a

16 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

CPA cost method. For each sale through advertising they spend £5.00, the RRP (Recommended Retail Price) of the jeans on the website is £19.99 each. If the advertising campaign were to run then the graph of results may appear as below.

Figure 3 – Online advertising example statistics

Costs within network advertising can be attributed to individual metrics; usually the CPC, CPA or CPM. In some instances a hybrid cost method may be attributed (CPC and CPA) to provide the client with a better ROI.

In the above example the client spent £16,835 running the network advertising campaign (excluding internal marketing costs) with the £5.00 being spent on each sale through advertising. However the actual sale costs of the jeans on the client’s website were £19.99 leaving a £14.99 profit gap. Gross revenue was therefore £67,306.33 leaving net revenue of £50,471.33.

ADVERTISING SPEND – TOTAL SALES = NET PROFIT

As stated the above calculations are set out as an example of how analytics are used in network advertising to generate sales. These analytics are found through

17 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

the same JavaScript method as Google Analytics (and a host of other free analytics tools), they are not advertising exclusive.

Advertising is tasked with tracking sales based upon clickstream data. Could 2012 be the year when the public relations industry utilises these metrics to not only raise awareness through social networks but also track sales? It would not be the first time that public relations industry has relied upon the advertising industry for validity. Even though the CIPR does not officially endorse AVE (Ad Value Equivalency) a research paper published in 2003 by the IPR1 describes its demand by bosses and clients for use (Fox, 2003). The calculation for AVE is:

MEASURING COLUMN INCHES * ADVERTISING RATES = EQUIVELENT COST

Or

SECONDS WITHIN BROADCAST MEDIA * ADVERTISING RATES = EQUIVELENT COST

The comparison between advertising and public relations is a cause for concern as clients may presume an equal outcome of messages’ effect. This is to ignore the additional calculations which may be used to multiply an additional 1.5 to 1.6 to the number (industry standard rates) to manipulate ROI for the client. In essence AVE follows the same calculation as CPM in online advertising – with a greater concept of accuracy. Within public relations the outcome of relationships could be measured through symmetrical communication (Childers, 1999) which assists with:

Understanding the needs of stakeholders

Tracking the effectiveness of messages

1

The IPR (Institute of Public Relations) gained Chartered status in 2005 making it the CIPR. (http://publicsphere.typepad.com/mediations/2005/02/ipr_wins_charte.html)

18 | P a g e


April 2012 

Maintaining Reputation through Online Analytics

PUR334

Listening to mediators (Journalists, Bloggers and Opinion Leaders) to provide them with relevant content.

In terms of clickstream data the contextual relevancy of messages is found through correlation. Patterns within data are evaluated against performance objectives to assume poor or positive results. With social networks it is possible to focus upon quantitative and qualitative statistics with the introduction of the semantic web.

19 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

1.4 Introducing the Semantic Web Today social networks are largely comprised of text based content which requires an algorithm for detecting linguistics and presenting such data as qualitative data sets. Semantic analytics are therefore an amalgamation between text analytics and network ontologies. Recent research presents a dependency upon RDF (Resource Description Framework)2, a model which allows data sets to be placed within web pages (RDF Working Group, 2004). This creates a noteworthy distinction between hyperlinks and RDF (Lee, 2009);

“Like the web of hypertext, the web of data is constructed with documents on the web. However, unlike the web of hypertext where links are relationships anchors in hypertext documents written in HTML, for data they links between arbitrary things described by RDF�3.

The creation of RDF links allows navigation across one data source to many others; with the addition of a FOAF (Friend of a Friend) data link it is possible to attribute identification to another author (Lee, 2009). When FOAF is used with RDF a social network is created between individuals and data sets (Golbeck & Rothstein, 2008), allowing a significant degree of accuracy between content, context and a network of relationships.

Currently data mining for semantic data is achieved through semantic search engines (through crawlers) or semantic web browsers. Presenting data in an understandable format is accomplished through the use of OWL (Web Ontology Language), a sublanguage for applying additional vocabulary for when data needs to be processed by machines rather than humans (McGuiness & Harmelen, 2004).

2

Which is written using XML (Extensible Mark-up Language) This quote written by founding father of the World Wide Web in 2006 (revised in 2009), Tim Berners-Lee, signalled the founding of Linking Open Data project which aims to make data freely available to everyone. 3

20 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

The semantic web is made possible through all the above technical elements, the question is how to utilise the conventions for analytical processing. A research paper published by University of Georgia and University of Maryland entitled “Semantic Analytics on Social Networks: Experience in Addressing the Problem of Conflict of Interest Detection� describes the semantic research method as follows (Meza, 2005):

1) Obtaining high quality data Extraction of data from sites which includes metadata extraction from sources to ensure relevancy.

2) Data preparation Mostly data clear up and evaluation

3) Entity disambiguation Attach relevant data to the correct entity

4) Metadata and ontology representation Importing or exporting data as RDF/RDFS and OWL.

5) Querying and inference techniques Data processing to enable semantic analytics and discovery

6) Visualization Prepare data in a readable format

7) Evaluation Comparison needed between shown data and other evidence to see if a correlation appears.

21 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

The process of obtaining semantic analytics depends upon the task; research is new and therefore experimental. Semantic measurement methods require a reimagining of ranking methods which may be used to measure blogs in the past simply based upon clickstream data as proposed by Katie Delahaye Paine (2011) – such methods could now be considered archaic.

The most recent semantic measurement method is called Latent Semantic Analysis (LSA) which evaluates underlying meaning and concepts behind language to build relationships between nouns and adjectives (Puffinware, 2010). Content is no longer king, context is king (Solis, 2011) and LSA provides contextual relationships which closely illustrates natural language recognition (Landauer, Foltz & Laham, 1998).

With regards to the semantic web, PR professionals are already ahead of the game with their knowledge of values behind relationships. Just as Brian Solis observed that new technologies are adjusting our behaviours (Solis, 2011), the public relations industry must change their behaviour of how they utilise new media – adapt or die. This begins with:

Adjusting our terminology from referring to stakeholders as ‘audiences’ to instead ‘publics’, removing the illusion of control that public relations professionals still believe they have (Grunig, 2009).

Building relationships on a symmetrical basis rather than asymmetrical (Grunig, 2011).

Understanding that intent is necessary4 so that a stakeholder understands that a message is relevant (Theaker, 2008) and listening to feedback.

4

This theory is discussed in the textbook “Human Communication” written by Michael Burgoon, Frank G. Hunsaker and J. Dawson.

22 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

Based upon the nexus of values associated with multiple entities (individuals) created from semantic analytics a study of linguistic pragmatics can be used to form the correct rhetoric for stakeholders. The approach considers context of content online and provides a method for public relations professionals to provide meaning behind their messages (Mackey, 2005). Thus allows the completion of campaign objectives to assist in raising awareness and change of behaviour (which may even result in direct sales).

23 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

2.0 Methodology As stated in the introduction to this dissertation the author intends to research into three different areas:

1) The unprecedented growth of digital communication channels. 2) To assess the current usage of online metrics for evaluating web 1.0 and web 2.0 platforms. 3) To assess the potential usage of semantic analysis for the public relations industry.

To accurately research each of these areas the author deployed a variety of different research methods. The main research piece of this dissertation is the research into semantic analysis and the research present in the literature review will be used to complement and provide perspective for the conclusion of the research. Due to the modern nature of the research present within this dissertation it was not possible for the author to reference or interview anyone involved with PR campaigns using semantic measurement as nobody is practicing it yet.

The table below outlines the mixture of secondary and primary research utilised, along with how these align with research aims and objectives. The first research aim is designed to provide an academic insight into the growth of digital communication channels and how they are measured. The second research aim relies heavily upon primary research as it is an experimental piece of research.

Research Aim

Objectives

Secondary

Primary

To assess the

To explore the

Literature Review

N/A

current usage of

increasing use of

online metrics for digital

24 | P a g e


April 2012

Maintaining Reputation through Online Analytics

evaluating web

communication

1.0 and web 2.0

channels

platforms.

To explore the

PUR334

Literature Review

N/A

Literature Review

Observations

Literature Review

N/A

Literature Review

Observations

symmetry between traditional and digital communication channels To explore current metrics for online measurement To explore the potential of the semantic analysis method To assess the

Conducting

potential usage

research into latent Published Texts

of semantic

semantic indexing

analysis for the

(LSI)

public relations industry. Figure 4 – Research table

25 | P a g e

Testing


April 2012

Maintaining Reputation through Online Analytics

PUR334

To make it clear how the range of research methods and several research aims provide conclusions to the questions provided at the start of this dissertation the author has constructed a visual table. Research Methods

Literature Review

Findings

Observations and Testing

Research

Evaluation

Conclusion Figure 5 – Research layout

Data collected for the semantic analysis research was achieved through extracting data from Neville Hobson’s Twitter timeline and interpreting data manually and through a python script5. This interpretation includes visually displaying results using a singular value decomposition algorithm. The results of this research can be found within the conclusion of this dissertation.

2.1 Research sample design Literature Review The literature review was conducted within this dissertation to gain understanding of the progress of digital communication, the range of metrics available and to achieve perspective surrounding semantic analysis. The review was achieved by reading a wide range of PR publications; practitioners published books and wider reading into online marketing. All material was selected on the basis of its relevancy. This also included using digital communication channels:

5

This script is available to view in the appendices.

26 | P a g e


April 2012

Maintaining Reputation through Online Analytics

Facebook

Twitter

Google+

Google Reader

Online Journals

Online Databases

PUR334

Due to the nature of the research within the literature review no primary sources of data collection were chosen. However all secondary evidence was selected based upon the credentials of their authors.

Data Analysis Before approaching the research into Latent Semantic Indexing (LSI) it was important for the author to note the types of data which would be collected:

Quantitative Data This data takes the forms of numerical figures.

Qualitative Data This data takes for the form of letters, words and sentences.

Correlating Data Observing patterns between two or more pieces of data and presenting these patterns as results. In terms of LSI this could take the form of contextual patterns.

Knowing each stage of the LSI analysis was done through additional research which has all been referenced within this dissertation. Stages of this research have been included within section 3.0 in order to maintain research integrity.

27 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

2.2 Ethical considerations Making the decision to know which online data should be collected for LSI research was made with a conscious approach. It was important that the data used has a clear human source so that patterns can be detected. Neville Hobson’s Twitter timeline was eventually selected due to its public nature, but care was still taken not to publish a tweet widely if it had the possibility to distress the original author.

The script used for LSI analysis was not originally programmed by the author of this dissertation. However modifications were made concerning the data inputted into the script, slight modification to variables to show appropriate results and a correct to the script due to an update made to Python 2.7. This script has been made available in the appendices.

All other material referenced within this dissertation is publically available.

28 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

3.0 Latent Semantic Indexing (LSI) Research into Neville Hobson’s Twitter timeline An ideal example for presenting the benefit of Latent Semantic Indexing (LSI) is to observe how search engines such as Google operate. When a user provides a search term an exact lexical match would not be appropriate due to the existence of synonymies (Duz, 2008). Therefore an example search of “Cheap gardening spades shop” could result in a lexical match of card playing, gambling, gardening, etc. In reality the Boolean search query would return every Google indexed webpage that includes all four words. Instead Google uses a version of LSI to understand the patterns of words across every indexed webpage (among other methods). This mathematical technique uses Singular Value Decomposition (SVD) to identify the context between words. The process assumes that similar words will be used within the same contexts, discovered through the relationships between words. Through the contextual basis of word weightages LSI is able to identify the category of written documents. For public relations professionals this method, when delivered through an automated algorithm, reimagines stakeholder analysis.

Words that are usually written about a celebrity can be analysed to understand associated values.

Research into competitors can be done to understand related terms which can then be targeted in Search Engine Optimisation (SEO) adaptations.

Understanding the values behind stakeholder groups to craft messages effectively.

29 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

Automatic categorisation of media releases to understand the contexts they should appear in.

Brand values become something being referred to by users online rather than fixed in a marketing department.

The possibilities of LSI in public relations will become clear through time. As a piece of research into LSI this technique has been used within this dissertation to identify the key themes surrounding Neville Hobson’s Twitter timeline.

Neville Hobson first began blogging in 2002, a hobby which grew to incorporate how a business should communicate using digital communication channels. Today he has over 25 years’ experience in public relations, marketing communication and financial relations (Hobson, 2012). His acclaimed status is clearly exampled by his popular Twitter profile boasting over 10,000 followers (as of 12/02/2012).

3.1 LSI Python Script This LSI research was conducted using a modified version of this Python Latent Semantic Analysis code: http://www.puffinwarellc.com/index.php/news-andarticles/articles/33-latent-semantic-analysis-tutorial.html?start=2. The script was run using Python 2.7 using additional scientific libraries NumPy and SciPy. Modifications to the script include a change of subject data, change of stop words, a display command to print index words and a line to stop the program automatically closing upon build.

Evaluating Neville Hobson’s Twitter timeline using LSI has involved the following steps:

1. Retrieve 50 tweets from Hobson’s timeline (11 Feb – 9 Feb 2012). 30 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

As a piece of manual LSI research 50 tweets provided an adequate sample. An automated algorithm could pull hundreds of tweets for analysis.

2. Filter URLs, hashtags, retweets and numerical values. LSI is concerned with qualitative data in the form of words out of English syntax. All the data needs to be associated with Neville Hobson (hence no retweets). 3. Identify index words. These are words which occur twice or more in the sample data, are not stop words (such as ‘it’, ‘the’, ‘a’, ‘if’, etc.) and must carry meaning.

4. Discover correlation using Term Count Model (TCM). The TCM presents the initial stages of LSI by capturing the frequency of index words from retrieved data.

5. Apply weightages to index words. Once the frequency of index words has been discovered an algorithm is used to apply contextual weightages to words.

6. Visual display of results. Each index word, with their unique weightage, is presented in a graph. Words plotted in certain sections of the graph indicate categories.

7. Interpretation of Results Understand the data.

31 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

3.2 Retrieval, Filter and Identification Retrieving 50 tweets from Neville Hobson’s Twitter timeline involved a simple copy and paste into a word document6. The sample of tweets which were extracted is from a single calendar period between the 9th – 11th February 2012. Any tweets which were Re-Tweets (RTs) were disregarded as this research into LSI requires data unique to Neville Hobson.

The data filtration process describes the clean-up process of extracting purely qualitative data. In the context of data usually found posted on Twitter this involved removing:

URLS

HashTags

Re-Tweets (RTs)

Numerical values

@replies to other users

Once the data has been filtered the second stage of LSI is to identify the “index words” of the document. These are words that appear twice or more within the captured data. So for instance, if the first tweet contained the word “social” and the thirtieth also contained “social” – this makes “social” an index word. All index words are connotative which means that their meanings can be interpreted against other index words.

Retrieving the index words of this document involved several forms of verification. The first stage involved manually reading over Neville Hobson’s tweets and highlighting index words individually. This involved identifying index words within the document and measuring their frequency of appearance. To verify this manual process, which is subject to error, an adjustment to the python

6

This document can be found in the appendix

32 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

script was made to display the self.keys variable (line 95) to show the index words:

1. Advice

13. Morning

2. Business

14. Networks

3. Comments

15. Neville

4. Daily

16. Perspectives

5. Era

17. Post

6. Event

18. Reading

7. Fun

19. Recording

8. Global

20. Sharing

9. Google

21. Snow

10. Hobson

22. Social

11. Looks

23. Today

12. Media

In doing so it was possible to identify any “stop words� within the sample data through the process of elimination. This concerns examining English sentence syntax to identify coordinating conjunctions, pronouns, adjectives and verbs. For this sample data this included the omission of the following words:

'on','just','to','for', 'great', 'i', 'between','and','a','good', 'is', 'the', 'of', 'some', 'in', 'other', 'why', 'get', 'by', 'I', 'as', 'use', 'says', 'out', 'too', 'via', 'here', 'it', 'about', 'an', 'at', 'be', 'coming', 'especially', 'I', 'into', 'its', 'make', 'need', 'not', 'one', 'prime', 'still', 'thanks','that', 'we', 'well', 'what', 'will', 'with'.

3.3 Term Count Model and Singular Value Decomposition There are several ways to measure the initial results of LSI. These include the Term Count Model (TCM) and Singular Value Decomposition (SVD). The TCM marks the initial stage of LSI for understanding the frequency of index word mentions. LSI works by reducing the structured syntax of language to instead recognising individual key words. The TCM places the initial data results of the 33 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

retrieved data into a count model so that it is possible to understand how frequent key words appear in each extracted tweet. This process alone does not result in any viable data but does allow for SVD to take place later in the process.

The TCM results of Neville Hobson’s Twitter timeline data can be found on the next page7. Figures three and four show the data as the initial spread sheet table and as a graph. At this early stage it is already apparent that the key word ‘social’ is by far the most frequent word.

Please turn over 7

Larger versions of figure 6 and 7 can be found in the illustrations.

34 | P a g e


35 | P a g e

PUR334

Figure 7 - Visualisation of TCM

Maintaining Reputation through Online Analytics

Figure 6 - TCM

April 2012


April 2012

Maintaining Reputation through Online Analytics

PUR334

Now that the TCM table has been constructed it is necessary to revert back to the Python script to have the selected data broken down into different dimensions. This process is called Singular Value Decomposition (SVD) and is an algorithm built to show on a visual basis the relationship between each key word and the term of which they originate from. The number of dimensions available in SVD is relative upon the data sets selected and the purpose of the SVD process. In terms of evaluating SVD for Twitter timeline data three dimensions have been used. A histogram can be used to understand the importance of each singular value based upon the data sets used (Puffinware, 2010). The meaning behind each dimension is as follows:

Dimension 1: The TCM frequency of each index word. Dimension 2: The X value relationship dimension. Dimension 3: The Y value relationship dimension.

As the first dimension of SVD simply measures the frequency of each index word it will not be necessary to implement. Therefore dimensions two and three will be utilised for the SVD model. In turn these will form the X and Y axis on a comparative scatter graph. The scatter graph works by noting the values of dimension two and dimension three which form each of the different coordinates on the graph. As each of the dimensions have been discovered through using an algorithm which notes each key word’s relationship with the term they originate from, the data should show clusters of similar words associating around particular values. For instance ‘advice’, ‘comments’ and ‘sharing’ may closely align with each other and may be interpreted as a social category.

Fig 8 shows a list of each key word and the associated values under dimension two and dimension three.

36 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

Figure 8 – SVD dimensions table

Once these values have been aligned using a Microsoft Excel spread sheet table the results appear as shown on the next page.

37 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

3.4 The Results

Figure 9 – Visual SVD

As expected certain key words have aligned more closely with some others dependent upon their original relationship with the tweet from which they originated from. This explains why the individual key words ‘Neville’, ‘Hobson’ and ‘Daily’ has aligned to form their original simple sentence again as each word equally appears in exactly the same tweets. The original fifty tweets have not been included on this graph as their sheer number would have made it impossible to interpret the key word results and their very existence would not assist to fulfil the research task necessary for this dissertation. If smaller data sets had been used (perhaps evaluating a handful of newspaper articles) then the original terms would have had a meaningful value when compared against the extracted key words. The final stage of this LSI research concludes with a manual interpretation of the weighted key word sets.

38 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

Figure 10 – Visual SVD with categorisation

This final stage requires manual interpretation of the categories which are present as a result of SVD and LSI research. The three circled categories could be classed as the following: 

Red: Broadcasting These three words are loosely based around the application of broadcasting.

Blue: Community Without a doubt these key words are all associated with community activities and social business. Notice how all four of the words are to do with the creation and sharing of information on Twitter. This may also show that Neville Hobson has some influence as a user on Twitter.

Yellow: Authority & Teaching This could also be labelled as a social category but with respect to Neville Hobson’s timeline show that he has authority and teaching. Notice how ‘comments’, ‘reading’ and ‘advice’ are closely weighted on the scatter

39 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

graph which may indicate that some tweets are about commenting and publishing articles.

40 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

4.0 Evaluation 4.1 Evaluation of Latent Semantic Indexing Despite the apparent success of the research within this dissertation concerning LSI the author must note there are five important areas of improvement needed with this system.

Small data set For this research 50 tweets were captured for analysis which has left words such as “era”, “networks” and “snow” uncategorised when weighted by SVD. A larger data sample would provide increased accuracy and depth into Neville Hobson’s online activity.

Shared meaning LSI is unable to understand that some words may be spelt exactly the same but their meanings may differ. Whilst the word ‘reading’ was categories under “Authority & Teaching” the context of the sentence it originated from may have actually meant the location Reading. In order for LSI to understand the actual meanings behind words an additional research process would need to be used before SVD.

The clean-up process Extracting tweets from Twitter for analysis is a process which requires a large amount of data clear-up. For an automated process an algorithm would need to be constructed in order to identify hashtags, urls and @replies. As LSI can be implemented on a number of different digital communication channels then separate algorithms would need to be constructed to implement different data clean-up processes.

41 | P a g e


April 2012 

Maintaining Reputation through Online Analytics

PUR334

Interpretation LSI represents patterns of words. Within this example we can see how the words “Neville”, “Hobson” and “Daily” have all been attributed the same weightage through SVD as all words only appear in the same tweets. As LSI can only identify words with the same meaning this leaves the word “Morning” entirely separate from “Daily” even though both share close meaning. In the same way the words “social” and “networks” have been grouped differently even though the two words are usually frequently used to describe the same term, “social networks”. Therefore LSI provides a pattern but additional interpretation is needed to identify word categories.

Automation is key The research into LSI in this dissertation is extremely basic in comparison to the large data sets that would exist within a PR agency or in-house environment. It has taken a month for the author to fully understand the process of LSI to process a small data set of Neville Hobson’s Twitter timeline. For this measurement process to be used professionally then an automated system would need to be constructed which can quickly crawl, extract, clean-up and process data. Despite extensive research an organisation or agency offering these services does not yet exist.

42 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

4.2 Bayesian Inference and Other Interpretations The key stage of LSI is concerned with the nature of the SVD which takes place. For the research within this dissertation the author has approached key word weighting based upon a three dimensional analysis but discarding the first dimension for more accurate results. However, curating the results of LSI can take many forms which all take place after the SVD process. These processes have not been applied to the processed data within this research piece due to the small data set. Yet these different processes have been listed below.

Bayesian inference Bayesian inference is a mathematical method used to understand to what extent is a notion true or false. In statistical terms this is known as Boolean logic (MS Research, 1998) and this is a process which works in the background for almost all variable based computing solutions. In this respect (Radford, 1998), “all forms of uncertainty are expressed in terms of probability”. Therefore the system works based upon a posteriori8 justifications which make it perfect for curating the results of LSI. If a LSI system used an advanced Bayesian inference script then the LSI algorithm could be completely automated, based upon an initial human evaluation of categorising key words against sub-set categories.

 Benefits: Fully automated system; Machine learning environment.  Considerations: Advanced script needed; Risk of misinterpretation of words.

Natural language analysis This process would involve taking the end results of LSI and then putting them through a further process so that each key word is categorised under certain concepts. For instance the word ‘Reading’ can be defined to either be linked to an activity or a location. This would be achieved by manually weighting the word

8

The term ‘a posteriori’ is Latin to explain “from the later” and in philosophy explains knowledge gained from empirical evidence or experience.

43 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

closer to each of the two concepts by reinforcing its relationship with close words. For instance if ‘Reading’ and ‘Car’ were to appear within the same syntax then natural language analysis would result in ‘Reading’ being a location in this instance. This is a process which works upon the basis of Boolean probability which would mean it could be used in parallel with Bayesian inference.

 Benefits: Fully automated system; Machine learning environment; More accurate results.  Considerations: Advanced script needed; Risk of multiple languages; Unknown semantic concepts.

Manual weighting system The simplest way to curate the results from LSI would be to evoke a manual weighting system. This would involve users of a partially automated LSI programme to make judgements concerning the results of analysis. This may take the form of a star based rating system, a numbered relevance system (1 – 10) or manually grouping certain results together under their own set categories.

 Benefits: Easy to set up.  Considerations: Time consuming; Risk of human error; No machine learning.

44 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

5.0 Conclusion 1. ROI is relied upon for Reputation Management and Direct Sales The public relations industry has always deployed an algorithm in order to understand how a client receives their ROI. In the past this has involved the use of AVE models but online it is necessary for the CIPR to invoke a standardisation for practitioners to utilise.

2. Third party measuring tools exist but are not perfect Clickstream data exists to answer the ‘What?’ and ‘Why?’ questions behind data. However there are a range of third party measuring tools that capture this clickstream data and use their own algorithms to provide sentiment levels. These programmes can be used but only at a professional’s own discretion as the calculations for sentiment are not usually publically available.

3. The PR industry needs standardisation The Online Advertising industry has been used as an example within this dissertation to show how that particular industry has applied their own standardisation behind online metrics. In this respect the public relations industry is years behind; not only is not there a standard for measuring traditional PR but a standard does not yet exist for digital public relations. As the Chartered body the CIPR must organise standard measurement metrics so that services can be better understood by clients and by the agencies offering services.

4. Semantic Analysis works but has not yet been perfected The research into LSI shows how this measurement method could be utilised by PR professionals to measure reputation online. As of the publication of this dissertation no organisations exist who can offer this form of measurement. However this may change in the next couple of

45 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

years. This form of measurement is already being utilised by Google to deliver their search results and will most likely be used by the PR industry to measure their activities online. A bigger research study would be required to really show how LSI could revolutionise the digital PR industry.

46 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

References 1. Barabasi, et al. (1999) ‘Emergence of Scaling in Random Networks’, Science Journal, 509-512 [online]. Available at: http://www.sciencemag.org/content/286/5439/509.full (Accessed: 26 January 2012) 2. Childers, L. (1999) ‘Guidelines for Measuring Relationships in Public Relations’, the Institute for Public Relations. University of Florida. 3. CIPR TV. (2011) ‘CIPR TV Discusses Broadcast PR and the PR 2020 Report’. Retrieved January 26, 2012 from YouTube: http://www.youtube.com/watch?v=pzUYBEm-E6w&feature=youtu.be 4. CIPR. (2012) ‘What is PR?’ Retrieved April 04, 2012 from CIPR website: http://www.cipr.co.uk/content/careers-cpd/careers-pr/what-pr 5. Duz, M. (2008) ‘Latent Semantic Indexing LSI Explained’. Retrieved April 04, 2012 from SEO blog: http://www.seo-blog.com/latent-semanticindexing-lsi-explained.php 6. Facebook. (2012) ‘Statistics’. Retrived January 26, 2012 from Facebook: https://www.facebook.com/press/info.php?statistics 7. Fox, J. B. (2003) ‘A Discussion of Advertising Value Equivalency (AVE)’, The Institute for Public Relations. University of Florida. 8. Gay, R. Charlesworth, A and Esen, R. (2007) Online Marketing: a customer-led approach. Oxford: Oxford University Press. 9. Golbeck, J. and Rothstein, M. (2008) ‘Linking Social Networks on the Web with FOAF: A Semantic Web Case Study’. University of Maryland. 10. Google Analytics. (2012) ‘Google Analytics Product Tour’. Retrieved January 26, 2012 from Google Analytics website: http://www.google.com/analytics/tour.html 11. Goold, P. (2012) ‘Google’s ‘Search, plus Your World’ Highlights the Additional Benefits of Social Activity, says Punch Communications’. Retrieved January 26, 2012 from Yahoo News website: http://news.yahoo.com/google-search-plus-world-highlights-additionalbenefits-social-081625020.html 12. Gordon, A. (2011) Public Relations. Oxford: Oxford University Press. 13. Gould, D. (2010) ‘8095 Report: For Millennials, Brand Preference is a Form of Self Expression’. Retrieved January 26, 2012 from PSFK website:

47 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

http://www.psfk.com/2010/10/8095-report-for-millennials-brandpreference-is-a-form-of-self-expression.html 14. Grunig, E. J. and Hunt, T. T. (1984). Managing Public Relations. United States: Holt, Rinehart & Winston. 15. Grunig, J. E. (2009). Paradigms of global public relations in an age of digitalisation. Prism 6(2): http://praxis.massey.ac.nz/prisms_online_journ.html 16. Gupta, O. (2006). Encyclopaedia of Journalism and Mass Communication. India: Isha Books 17. Hobson, N. (2012) ‘About’. Retrieved April 04, 2012 from Neville Hobson’s blog: http://www.nevillehobson.com/about/ 18. Investopedia. (2011) ‘Return on Investment – ROI’. Retrieved January 26, 2012 from Investopedia website: http://www.investopedia.com/terms/r/returnoninvestment.asp#axzz1jzp gEI6N 19. Jefkins, F. (2000) Advertising. Edinburgh: Pearson Education Limited. 20. Kaushik, A. (2010) Web Analytics 2.0: The art of online accountability & science of customer centricity. Indiana: Wiley Publishing 21. Landauer, K. T., Foltz, W. P. and Laham, D. (1998) ‘An Introduction to Latent Semantic Analysis’, Discourse Processes Journal, 25, 259-284 [online] Available: http://lsa.colorado.edu/papers/dp1.LSAintro.pdf (Accessed: 26 January 2012) 22. Lee, B. T. (2009) ‘Linked Data’. Retrieved January 26, 2012 from W3 website: http://www.w3.org/DesignIssues/LinkedData.html 23. Mackey, S. (2005) ‘Rhetorical Theory of Public Relations: Opening the door to semiotic and pragmatism approaches’, The Annual Meeting of Australian and New Zealand Communication Association. Deakin University. 24. McGuinness, L. D. and Harmelen, V. F. (2004) ‘OWL Web Ontology Language Overview’. Retireved January 26, 2012 from W3 website: http://www.w3.org/TR/owl-features/ 25. Meza, A. B, et al. (2005) ‘Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection’. University of Georgia & University of Maryland.

48 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

26. MS Research. (1998) ‘Basics of Bayesian Inference and Belief Networks’. Retrieved April 07, 2012 from Microsoft Research website: http://research.microsoft.com/enus/um/redmond/groups/adapt/msbnx/msbnx/basics_of_bayesian_infere nce.htm 27. Owens, J. (2012) ‘PRCA Trends Barometer Reveals Concerns About Industry Outlook’. Retrieved January 26, 2012 from PR Week: http://www.prweek.com/news/rss/1112783/PRCA-trends-barometerreveals-concerns-industry-outlook/ 28. Paine, D. K. (2007) ‘How to set benchmarks in social media: Exploratory research for social media, lessons learned’. KDPaine & Partners. 29. Paine, D. K. (2011) Measure what Matters: Online Tools for Understanding Customers, Social Media, Engagement, and Key Relationships. New Jersey: John Wiley & Sons. 30. Phillips, D. and Young. P. (2009) Online Public Relations: A practical guide to developing an online strategy in the world of social media (2nd Ed). London: Kogan Page. 31. PRCA. (2012) ‘What is PR?’ Retrieved April 04, 2012 from PRCA website: http://www.prca.org.uk/What_is_PR 32. Puffinware. (2010) ‘Latent Semantic Analysis (LSA) Tutorial’. Retrieved January 26, 2012 from iMetaSearch website: http://www.puffinwarellc.com/index.php/news-and-articles/articles/33latent-semantic-analysis-tutorial.html 33. Radford, M. (1998) ‘Philosophy of Bayesian Inference’. Retrieved April 07, 2012 from Toronto University website: http://www.cs.toronto.edu/~radford/res-bayes-ex.html 34. RDF Working Group. (2004) ‘Resource Description Framework (RDF). Retrieved January 26, 2012 from W3C website: http://www.w3.org/RDF/ 35. Solis, B. (2011) The End of Business as Usual: Rewire the way you work to succeed in the consumer revolution. New Jersey: John Wiley & Sons. 36. Theaker, A. (2008). The Public Relations Handbook (3rd Ed). Oxon: Routledge 37. White, M. (2012) ‘Considering PRSA’s Definition of PR’. Retrieved April 04, 2012 from Michael White’s blog:

49 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

http://www.mikewhite.co.uk/2012/03/19/considering-prsas-definitionof-pr/ 38. White, R. (2000) Advertising (4th Ed). Berkshire: McGraw-Hill Publishing Company.

50 | P a g e


April 2012

Maintaining Reputation through Online Analytics

Illustrations Figure 1: ROI

Figure 2: Barabasi-Albert model

Figure 3: Online advertising example statistics

51 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

PUR334

Figure 4: Research table Research Aim

Objectives

Secondary

Primary

To assess the

To explore the

Literature Review

N/A

current usage of

increasing use of

Literature Review

N/A

Literature Review

Observations

Literature Review

N/A

Literature Review

Observations

online metrics for digital evaluating web

communication

1.0 and web 2.0

channels

platforms.

To explore the symmetry between traditional and digital communication channels To explore current metrics for online measurement To explore the potential of the semantic analysis method

To assess the

Conducting

potential usage

research into latent Published Texts

of semantic

semantic indexing

analysis for the

(LSI)

public relations industry.

52 | P a g e

Testing


April 2012

Maintaining Reputation through Online Analytics

Figure 5: Research layout

53 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

Figure 6: TCM

54 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

Figure 7: Visualisation of TCM

55 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

Figure 8: SVD Dimensions table

56 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

Figure 9: Visual SVD

57 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

Figure 10: Visual SVD with categorisation

58 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

PUR334

Appendix Copy of Python LSI Tweet Analysis Code from numpy import zeros from scipy.linalg import svd #following needed for TFIDF from math import log from numpy import asarray, sum

titles = ["Firefox on Win just updated to version. Critical security fix", "March release for Samsung Galaxy S II Android update. Anywhere between and days away", "Asus Transformer Prime review stars and a good q: What is the point of the Prime", "yw. Great post, some good contribs to the issue in the other comments", "Added to the conversation on Guardian post about recording phone interviews for podcasts", "Why Social Media Jobs Get Filled By Younger Folks: Infographic", "Viewpoint: V for Vendetta and the rise of Anonymous. Great read", "FTW", "I tend to take a power strip with sockets and only one adapter", "things you still need to know about social media social business. Spot on. especially", "tips for managing negative comments online. Good advice", "thanks, Kerry, good refocus", "The Neville Hobson Daily is out", "Breakfast supplies", "U.S. Air Force May Buy 18,000 Apple IPad2s for Flight Crews. Businessweek via", "we can do that, Ellee, would be fun", "Morning. Beautiful, sunny and, terrific start to the weekend", "Thinking that Google Hangouts is a pretty neat tool, especially the screen sharing feature",

59 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

"An imaginative approach to a difficult (macabre, perhaps) topic to talk about - what happens to your digital conte", "fyi re Feel free to RT", "thanks. A good story. Almost as good as", "hi Sylvie. Not aware of any recent surveys on smallbiz and use of social networks in Australia", "of UK small businesses use social networks for business, says survey", "Global perspectives on social media", "Blog Global perspectives on social media", "The Neville Hobson Daily is out", "Google is getting into the music hardware business says the", "The FTSE social media index. Ranking methodology explained, too. Via", "Texas Jury Strikes Down Patent Trolls Claim to Own the Interactive Web. Good result", "What does (and doesn't) on Twitter and Facebook. Hard to get English plainer than this", "there's a good shoe shiner in the enclosed courtyard at Devonshire Square, EC", "yes, same here, not much traffic coming in to Reading from the A4 east", "Ads coming to the LinkedIn mobile app", "Seeing that Harry Redknapp is still a news headline. Come on, FA, just give him the job", "of course a lot of snow is a relative expression", "Driving into Reading shortly should be fun", "Morning. Quite a bit of snow out there. Well, an inch or two anyway", "uksnow Will it settle? Looks unlikely although tomorrow morning will tell", "The tone of life on social networking sites Behavioural study by Pew, interesting findings", "File Sharing in the Post MegaUpload Era Mainly, staggeringly less efficient.", "End of an era: Kodak discontinues its camera business",

60 | P a g e


April 2012

Maintaining Reputation through Online Analytics

PUR334

"I suspect is the one to ask that: is anyone recording the Google session at", "Looks a must-be-there event: Google at", "we need to make that happen", "that looks a great event, Holly, thanks But I won't be in London that day unfortunately", "Many thanks to for his superb insight & advice on social media monitoring today <= my pleasure", "Windows Consumer Preview due February: why it's not called beta", "The Neville Hobson Daily is out! Top stories today via", "wrestles with microblog revenue plan user loyalty, monetize", "we'll make it work" ] stopwords = ['on','just','to','for', 'great', 'i', 'between','and','a','good', 'is', 'the', 'of', 'some', 'in', 'other', 'why', 'get', 'by', 'I', 'as', 'use', 'says', 'out', 'too', 'via', 'here', 'it', 'about', 'an', 'at', 'be', 'coming', 'especially', 'I', 'into', 'its', 'make', 'need', 'not', 'one', 'prime', 'still', 'thanks','that', 'we', 'well', 'what', 'will', 'with'] ignorechars = ''',:'!'''

class LSA(object): def __init__(self, stopwords, ignorechars): self.stopwords = stopwords self.ignorechars = ignorechars self.wdict = {} self.dcount = 0 def parse(self, doc): words = doc.split(); for w in words: w = w.lower().translate(None, self.ignorechars) if w in self.stopwords: continue elif w in self.wdict:

61 | P a g e


April 2012

Maintaining Reputation through Online Analytics

self.wdict[w].append(self.dcount) else: self.wdict[w] = [self.dcount] self.dcount += 1 def build(self): self.keys = [k for k in self.wdict.keys() if len(self.wdict[k]) > 1] self.keys.sort() self.A = zeros([len(self.keys), self.dcount]) for i, k in enumerate(self.keys): for d in self.wdict[k]: self.A[i,d] += 1 def calc(self): self.U, self.S, self.Vt = svd(self.A) def TFIDF(self): WordsPerDoc = sum(self.A, axis=0) DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1) rows, cols = self.A.shape for i in range(rows): for j in range(cols): self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i]) def printA(self): print self.keys print 'Here is the count matrix' print self.A def printSVD(self): print 'Here are the singular values' print self.S print 'Here are the first 3 columns of the U matrix' print -1*self.U[:, 0:3] print 'Here are the first 3 rows of the Vt matrix'

62 | P a g e

PUR334


April 2012

Maintaining Reputation through Online Analytics

print -1*self.Vt[0:3, :]

mylsa = LSA(stopwords, ignorechars) for t in titles: mylsa.parse(t) mylsa.build() mylsa.printA() mylsa.calc() mylsa.printSVD() raw_input("\n\nPress The Enter Key To Exit")

63 | P a g e

PUR334


Managing Reputation through Online Analytics