30 The results from the Vision AI API regarding Safe Search Annotations were perhaps the most interesting and revealing (Fig. 12). These insights came from analyzing the screenshot of each website. There were clearly some areas where the fake media websites really stand out: Racy (Possible, Likely, Very Likely), Medical (Very Likely), Spoof (Possible, Likely, Very Likely) and Violence (Very Likely). On those same areas, credible media websites did in general quite well, with the exception of Racy (Likely, Possible) and Spoof (Likely, Possible). While none of these areas were majorities (percentage wise), they do indicate that there are big differences (visually) between credible and fake news media websites.
Conclusion & Next Steps
This was a project sparked by curiosity. From an engineering perspective, I wanted to see what type of insights could be gained by applying Google pre-trained ML models to different types of metadata. Also, from a data scientist point of view, would those insights be valuable in solving a real-world complex problem such as credible vs fake media? It can’t be said that these methods would be able to distinguish a single website between fake or credible. However, by aggregating the insights gathered from multiple fake and credible websites over the course of a few days it was possible to clearly distinguish them. Regarding next steps, there are definitely some interesting possibilities to explore further. Typically, credible news cycles have a dominant story in the headlines for a few days (this week Brexit). Therefore, extending the data collection period from days to some months could lead to deeper understanding and some interesting findings. Other possibility would be to use more advanced methods to extract the initial metadata from the websites. Having a good quality and filtered data is quite important before using the AI/ML API’s. Lastly, an interesting research avenue to purse would be to apply the same concept to other public vendors AI/ML APIs such as Amazon Web Services and/or Microsoft Azure. It would allow additional information to be discovered plus, it would create a very interesting comparison and result validation.
References  Gartner, “Magic Quadrant for Cloud Infrastructure as a Service, Worldwide”, 2018, https://www.gartner.com/doc/reprints?id=12G2O5FC&ct=150519&st=sb  ABC News, "When Fake News Stories Make Real News Headlines", 2016, http://abcnews.go.com/Technology/fake-newsstories-make-real-news-headlines/story?id=43845383  Wired, “Inside The Macedonian Fake-News Complex”, 2017, https://www.wired.com/2017/02/veles-macedonia-fake-news/  Google, “Google Cloud Natural Language API – Sentiment Analysis” https://cloud.google.com/naturallanguage/docs/basics#sentiment-analysis-values  OpenSources.co, “Curated resource for assessing online information sources”, http://www.opensources.co/  Polifacts, “PolitiFact's guide to fake news websites and what they peddle”, 2017, https://www.politifact.com/punditfact/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/