Morgan Floyd - Intuit's Live Community

Page 1

Floyd Morgan Floyd_Morgan@intuit.com @fmorgan Lucene Revolution, 2011


Agenda •  •  •  •  •  •

About Me About Live Community Live Community Search NLP Next Steps Questions? Answers?


About Me •  Principal Software Engineer at Intuit


Intuit QuickBase

Intuit Inc. is a leading provider of business and financial management solutions for small and mid-sized businesses; financial institutions, including banks and credit unions; consumers and accounting professionals. More than 200 applica0ons and 7700 employees worldwide.


About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering


TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself tax preparation software. TurboTax helps more than 20 million people a year. $1 billion in revenue


About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine


About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online


About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community


About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community

•  Central Technology Organization –  Live Community Platform



About Live Community •  It’s a user contribution system –  Q&A


About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance


About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users


About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business


About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business •  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...


About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business •  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ... •  It’s a platform –  APIs, skinning, dynamic provisioning (AWS in progress)


Intuit Money Manager, India


QuickBooks Online, UK


devZone, Intuit dev


QuickBooks Online, US


TurboTax Desktop & Online, US


Terminology


Consumers (in the millions)


Contributors (in the thousands)


Top Contributors (in the hundreds)


Employees (contribute too)


Tax Season

Officially begins on December 1 and ends on April 15.


About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users


About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone


About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions


About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions •  Over 750 thousand answered questions –  10 thousand questions asked on peak day


About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions •  Over 750 thousand answered questions –  10 thousand questions asked on peak day •  Our contributors answers thousands of questions –  Top contributor – 70 thousand answers


Demo


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture



Why Solr? •  Lots of features/functionality


Why Solr? •  Lots of features/functionality •  Ease of integration


Why Solr? •  Lots of features/functionality •  Ease of integration •  We can scale it independently


Why Solr? •  •  •  •

Lots of features/functionality Ease of integration We can scale it independently You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!


Why Solr? •  •  •  •

Lots of features/functionality Ease of integration We can scale it independently You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!

•  Search is really important –  Search everywhere…


Why Solr? •  •  •  •

Lots of features/functionality Ease of integration We can scale it independently You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!

•  Search is really important –  Search everywhere…


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture






Auto suggest •  Provides a glimpse of our vast content


Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2)


Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP…


Auto suggest •  •  •  •

Provides a glimpse of our vast content facet query (Solr 1.2) We use NLP… It’s used on every search touch point


Auto suggest •  •  •  •  •

Provides a glimpse of our vast content facet query (Solr 1.2) We use NLP… It’s used on every search touch point Second most frequent request


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture




In-product “mini” search •  Primary search interface for consumers


In-product “mini” search •  Primary search interface for consumers •  It appears integrated


In-product “mini” search •  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface


In-product “mini” search •  •  •  •

Primary search interface for consumers It appears integrated Now the most utilized search interface It makes all content available


In-product “mini” search •  •  •  •  •

Primary search interface for consumers It appears integrated Now the most utilized search interface It makes all content available Over 3 million users last tax season


# using Solr is easy! require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, :filter_queries => "post_status: # {Post::ANSWERED}" )


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture




Web-site “full” search •  Primary search interface for contributors and employees


Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ...


Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ... •  Faceted search empowers development teams to narrow on issues


Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ... •  Faceted search empowers development teams to narrow on issues •  200+ TurboTax issues discovered last tax season




# using Solr is easy! require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( ”bug”, :filter_queries => "post_status: # {Post::OPEN}" )


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture



Instant answer •  Present similar answered question


Instant answer •  Present similar answered question •  Search with the terms of the new question


Instant answer •  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject


Instant answer •  •  •  •

Present similar answered question Search with the terms of the new question Narrow the focus to the subject Show snippet of a recommended answer


Instant answer •  •  •  •  •

Present similar answered question Search with the terms of the new question Narrow the focus to the subject Show snippet of a recommended answer Accidental A/B test


Demo


# using Solr is easy! require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, { :query_fields => "subject", :filter_queries => "post_status: #{Post::ANSWERED}" } )


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture



Instant question •  Present similar unanswered questions


Instant question •  Present similar unanswered questions •  Answer reuse


Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question


Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question •  Narrow the focus to the subject


Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question •  Narrow the focus to the subject •  We also use a date filter


“Aren’t we addicted enough!”


Demo


# using Solr is easy!  require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) today = DateTime.now.at_beginning_of_day.utc.to_time date_from = 7.to_i.days.ago ( today ).getutc.iso8601 c.search( "how do i input 1099", { :query_fields => "subject", :filter_queries => "post_status: #{Post::OPEN} AND created_at_d:[#{date_from} TO *]" } )


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture



Answer bot •  We continue to search for you –  The day after you ask


Answer bot •  We continue to search for you –  The day after you ask

•  Send an email


Answer bot •  We continue to search for you –  The day after you ask

•  Send an email •  Runs for 7 days


Answer bot •  We continue to search for you –  The day after you ask

•  Send an email •  Runs for 7 days •  We only send another email if the results have changed


Answer bot •  We continue to search for you –  The day after you ask

•  Send an email •  Runs for 7 days •  We only send another email if the results have changed •  From our explicit feedback –  39% answered question



Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture



Advertising •  We use our user generated content in advertising


Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads


Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads •  Ads displayed throughout the tax season on many ad networks


Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads •  Ads displayed throughout the tax season on many ad networks •  Content selection is automated and continuous



Logs Logs

Logs

MapReduce Carrot2 Solr Heuristics


<?xml version="1.0" encoding="UTF-8"?> 
 <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> 
 <topic> 
 <rank>1</rank>

<text>Ptp</text>

<post> 
 <post_id>aBHMBWxzar4lKMacfArRo0</post_id> 
 <subject>Final K-1 Disposition of PTP Units</subject> 
 <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my units in five separate transactions in 2010. TT does not allow me to report all 5 transactions while stepping through the K-1 form -- these transactions are reported on Schedule D, but also need to be on Form 4797, Part II, Box 10. I can't seem to make the linkage work. I would appreciate some guidance on how to make this happen.</detail> 
 <response>OK, several steps needed for your situation:
 1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition was not via a sale"
 2) Then search for the topic "sale of business property" you will be taked to a topic entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens here you will have the opportunityut to enter the sale amounts associated witht he Form 4797.

3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds, other - here you will enter the rest of the sale, that portion attributable to capital gains.
 Hope this helps you,
 </response> 
 <viewsCount>60</viewsCount> 
 <answersCount>2</answersCount> 
 <asker>Xuxan</asker> 
 <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0? rmode=ad</display_post_url> 
 </post>


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture




Search everywhere •  Search first, ask second –  Used to be ask first, search later or never!




Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s


Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase


Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase •  Questions asked –  50 % decrease…is that good?


Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase •  Questions asked –  50 % decrease…is that good? •  Increased consumption –  38% users, 43% content…very good!


Live Community Search •  •  •  •  •  •  •  •  •  •

Why Solr? Auto suggest In-product search Web-site search Instant answer Instant question Answer bot Advertising Search everywhere Architecture


Search cluster

App server

Indexing server

Database cluster


NLP •  Search is not enough…unfortunately


NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times


Uh, what?


Too much what!


?


I wish NLP could help!


NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works…


HwO do iput 10 99 i don,t know what to do need help help me.


Where do I enter a 1099?


schema.xml <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer> </fieldtype>


dictionary <?xml version="1.0" encoding="US-ASCII"?>
 <dictionary>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="suitcas">suitcase</entry>
 <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</ entry>
 <entry score="10" root="none" synonym="townhom" domain="ttlc" id="townhous">townhouse</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross sale</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="trinidad">Trinidad</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry>
 <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="massachusett">Massachusetts</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="denver">Denver</entry>
 <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></ entry>
 <entry score="10" root="none" synonym="unallow" domain="ttlc" id="disallow">not allowed</entry>
 <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>


regular expressions (many)

if text =~ / any/ text.gsub!(/ any where /, ' anywhere ')
 text.gsub!(/ any(body| body| one) /, ' anyone ')
 text.gsub!(/ any( thing| things|things) /, ' anything ')
 text.gsub!(/ any(one|thing|where) else /, ' any\1 ’) end if text =~ / don / text.gsub!(/ don i /, ' do not i ')
 text.gsub!(/ don (have|know|see|want) /, ' do not \1 ')
 text.gsub!(/ (are|be|have|is|was|were) don /, ' \1 done ’) text.gsub!(/ don (not|nt|t) /, ' do not ’) end
 text.gsub!(/ (do|can) (ai|ii) /, ' \1 i ’) text.gsub!(/ d (oyou|you) /, ' do you ')
 text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, ' i \2 ’) text.gsub!(/ crap{1,10} /, ' crap ’) text.gsub!(/ gr{1,} /, ' ')


Spell Checker Stemmer (Porter) Word Collocation Stop Phrase Correction Stop Word Removal Synonyms Substitution Tax Domain Correction Phrase Encoding


# NLP is not easy! # this class wraps our NLP sf = SemanticFilter.new # does it work? sf.act_on_post( "HwO do iput 10 99 i don,t know what to do need help help me." ) =>[" wheretoent 1099 ”] sf.act_on_post( "Where do I enter a 1099?" ) =>[" wheretoent 1099 ”]


NLP •  •  •  •

Search is not enough…unfortunately Our domain is noisy…ugly at times How it works… It works well, but it’s not perfect


“Stop guessing what I’m looking for!”


NLP •  •  •  •  •

Search is not enough…unfortunately Our domain is noisy…ugly at times How it works… It works well, but it’s not perfect Not just for search…



Recommendations •  Deliver unanswered questions to contributors


Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually


Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior


Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple contributors


Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple contributors •  Uses Mahout machine learning library


Answered

Unanswered

NLP

NLP

User vectors

Post vectors

Mahout Heuristics



Next Steps •  We’re going to rewrite it!


Next Steps •  We’re going to rewrite it! … most of it ;)


Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing


Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query


Next Steps •  •  •  •

We’re going to rewrite it! … most of it ;) Real-time indexing Question vs. Query Social feedback – Page ranking


Next Steps •  •  •  •

We’re going to rewrite it! … most of it ;) Real-time indexing Question vs. Query Social feedback – Page ranking •  Social dictionaries – Content classification


Next Steps •  •  •  •

We’re going to rewrite it! … most of it ;) Real-time indexing Question vs. Query Social feedback – Page ranking •  Social dictionaries – Content classification •  Beer?!


Thank you. Floyd_Morgan@intuit.com @fmorgan


Appendix •  User search •  SEO


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.