VOLUME 9 ISSUE 11
QURIOSITY MONTHLY NEWSLETTER OF QUANTINUUM – THE QUANT & ANALYTICS COMMITTEE @SIMSR
THE CURIOUS CASE OF SELF DRIVING CARS!! Self-driving cars and their complexity | p. 04
Interview with Dr. Nilakantan Narasinganallur | p. 20
Surprising puzzles and questions to rack your brain | p. 24
EDITOR’S NOTE Welcome to the latest issue of Quriosity, the monthly newsletter of Quantinuum! Quantinuum - the Quant and Analytics committee of K.J. Somaiya Institute of Management Studies and Research aims to empower students and professionals alike to organize and analyze numbers and in turn, to make good and rational decisions as future managers. The newsletter published monthly consists of articles which will enrich the young minds by informing about the contributions made in the field of quant, analytics, and mathematics. The objective of Quriosity is to publish up-to-date articles on data analytics, alongside relevant and insightful news. This way the magazine aspires to be vibrant, engaging and accessible, and at the same time integrative. This issue covers an article on “Driverless car” as the future of travel which is engulfed in a shroud of ethical conundrum as to whether it will save the car passenger or the pedestrian in the scenario of an accident. There are two sub articles which explains the concept of ‘Matrix Factorization’ and ‘Understanding Boxplots’. We will attempt to include practical examples on how to plot boxplot using tools like MegaStat and how to utilize Excel for analytics in the future. In the QuantGuru section we have highlighted significant contributions of Raman Parimala to the field of mathematics. We have also included an interview with Dr. N.S. Nilakantan and included a movie review on Moneyball which takes a sabermetric approach to scouting and analyzing players in baseball. We have got some interesting information about updates in data science and technology followed by some intriguing puzzles. If you wish to submit articles or news items, either individually or collaboratively, you are welcome to write to us at – firstname.lastname@example.org Thank you and Happy Reading! Quriosity Editorial Team Quantinuum@SIMSR Editorial Team: VVNS Anudeep (+91 9441201685) Khushbu Mehta (+91 9930158610) Tanmay Nikam (+91 9699288587) Dhyan Baby K (+91 9809245308) Akshay Nandan R
Saumya Joshi Abhishek Bawa Kaustubh Karanje Himanshu Data Shubham Thakur
IN THIS ISSUE
Cover story The curious case of self-driving cars!! by VVNS Anudeep … pg.04 Sub article Matrix Factorization – Recommendation System: Netflix Prize by Shubham Thakur … pg.09 Sub article Understanding Boxplots by Abhishek Bawa … pg.12 Quant Guru Raman Parimala by Kaustubh K… pg.16 Curiosity Update by Saumya Joshi … pg.17 News Digest by Saumya Joshi … pg.18 Event Report by Tanmay Nikam … pg.19 Faculty Interview by Tanmay Nikam… pg.20 Movie Review by Dhyan Baby K … pg.23
Quant Fun by Kaustubh K … pg.24 Quant Connect … pg.28
QUANT COVER STORY
The curious case of self-driving cars!! by VVNS Anudeep PG IB 2017-19 All Tom cruise fans would remember scenes from the movie minority report where a Selfdriving car chases him all around the city. At that point in time, it seemed futuristic and fiction for many people. Today itâ€™s not futuristic and it is definitely not fiction anymore. A self-driving car, also known as a robot car or driverless car is a vehicle that is capable of sensing its environment and moving with little or no human input. Try closing your eyes and walking in a closed room by listening to instructions from your friend, extremely tough to stay on the right course isnâ€™t it? The task to make a Self-driving car work is much more complex and involves a lot of precision. Why go for Self-driving cars? To be factual, the full-scale benefits and the costs associated with these Self-driving cars are largely hypothetical today. The impact of Self-driving cars on the environment, economy, safety, and public health cannot be directly established, a lot of information is needed to arrive at conclusions. But the projected benefits are the reason behind the pursuit of the idea. Safety: Year on year the number of people who die in motor vehicle crashes is increasing rapidly. Hypothetically Self driving cars could bring these numbers down as algorithm driven software could be less prone to errors and more accurate in following the traffic rules. Equity: Self-driving cars can mobilize those people who cannot drive by themselves such as elderly or disabled. Shipping and delivery: The shipping and delivery industry is expected to benefit the most from these cars as human-centric concerns like hunger, fatigue of the drivers, etc. would reduce shipping and delivery times considerably. Apart from the above-explained benefits there are many other expected benefits like energy conservation and Increase in productivity etc. Self-driving cars and their complexity Humans are built to process so much information and data very quickly and make a logical decision in a split second. Consider the case of walking or even driving a car where with a single plain sight human brain intuitively processes the distance between the obstacles, current position, the shortest route to the destination and follows it to the destination. In case of any sudden change in these parameters humans reprocess and reassess and adjust accordingly. All of this is done effortlessly and quite easily by most of the humans. While it is so simple for humans, a self-driving car has to develop a sense of vision, understand the position and keep assessing this till the destination is reached. The entire cycle has to be really fast to be anywhere near the speed of a human being. This requires a lot of high sensitivity equipment, efficient algorithms, and quick data processing and thus arises the complexity
How do self-driving cars work!
The three major processes involved here are a) mapping and localization, b) obstacle avoidance and c) path planning. Although different manufacturers can use different sensor suites and algorithms depending on their unique cost and operational constraints, the processes across vehicles are similar. Mapping & Localization Before making any navigation decisions, the vehicle needs to understand the map of the surrounding and precisely locate itself in the map. A laser range finder (LIDAR-Light detecting and ranging devices) is generally used to scan the environment. The mapping is done on the basis of time taken by the laser to reach the object and travel back. Why use a laser range finder when a video camera can do the job? The answer to the question is very simple. A video camera is simple to extract the scene and build a 2D environment but using laser range finders the depth of the scene is readily available to build a 3D map. The vehicle filters and discretizes data collected from each sensor and often aggregates the information to create a comprehensive map, which can then be used for path planning.
An example of a Google car’s internal map at an intersection tweeted by Idea lab founder Bill Gross. Gross claims that Google’s Self-Driving Car gathers almost 1 GB of data per second. Laser beams diverge as they travel long distances and hence it gets very difficult to get accurate readings greater than 100m. For the vehicle to know its relative position it has to use its GPS. GPS estimates can be off by many meters due to undesirable atmospheric conditions, reflections from buildings and accumulated position error over a period of time. Uber’s self-driving prototypes use sixty-four laser beams, along with other sensors, to construct their internal map; Google’s prototypes have, at various stages, used lasers, radar, highpowered cameras, and sonar. No single sensor is sufficient for the safe running of the car. The algorithm uses data from all the sensors and uses it along with pre-built road users and their behaviour patterns and commonly used traffic signals. For example, its speculated that Google’s self-driving cars can successfully identify the biker extend his arm to make a manoeuvre and successfully slow down enough for the biker to pass. Obstacle avoidance Obstacles are defined in the library with predefined shape and motion descriptors. The vehicle uses a probabilistic model to track the predicted future path of moving objects based on its shape and prior trajectory. For example, if a two-wheeled object is traveling at 40 mph, it is most likely a motorcycle and not a bicycle and will get categorized accordingly by the vehicle. The information like the previous, current and future locations of all the obstacles in vehicles vicinity help in taking the right course of action. This process shall help in smarter decision making especially in busy intersections. Path Planning Path planning as the name indicates uses the information processed in the above stages to plan a safe path avoiding the obstacles while following road rules to the destination. Different manufacturers can have different algorithm strategies based on the objectives, the kind of equipment used etc. The algorithm generally has a long-range plan and a short-range plan. The short-range plan keeps getting updated dynamically based on the obstacles and their movement. Based on the short-range plan the long-range plan gets updated. Once the path is defined based on the speed, safety and time requirements the instructions are passed to on-board processors and 6
actuators. Altogether, this process takes on average 50ms, although it can be longer or shorter depending on the amount of collected data, available processing power, and the complexity of the path planning algorithm. The process of localization, mapping, obstacle detection, and path planning is repeated until the vehicle reaches its destination. Control & Execution: As the name suggests this step refers to the execution of the path plan made earlier. The path can be broken into position (x, y), an angle (yaw- a yaw is rotation is the movement around yaw axis of a rigid body) and a speed(v). The control algorithm generates instructions for the vehicle such as steering wheel angle or acceleration level while considering the constraints of the road, obstacles and wheel slip etc. PID (proportional Integral derivative) algorithm helps execute the plan and exercise control over the vehicle. This is a very simple and preliminary algorithm that is used in the basic controller of a self-driving car. The PID controller calculates an error value. This error is the difference between actual and planned course of action. This value has 3 major components of error a)
P: Proportional component- This term applies to the correction proportional to the error. Depending on the frequency at which the algorithm calculates the error, the oscillation is more or less important. The coefficient đ??žđ?‘? indicates the degree of oscillation desired. đ?‘Ž = âˆ’đ??žđ?‘? Ă— đ?‘’
D: Derivative component â€“ The PD component understands that the error decreases and slightly reduce the angle it adopts to approach a smooth path. đ?‘‘đ?‘’ đ?‘Ž = âˆ’đ??žđ?‘? Ă— đ?‘’ âˆ’ đ??žđ?‘‘ Ă— đ?‘‘đ?‘Ą
I: Integral component â€“ This I component is the result of accumulated error over a time because of the mechanical error that causes to turn vehicle a little more or less and thus an integral component is added to the earlier components to adjust the error. đ?‘Ž = âˆ’ đ??žđ?‘? Ă— đ?‘’ âˆ’ đ??žđ?‘‘ Ă—
âˆ’ đ??žđ?‘– Ă— âˆŤ đ?‘’ đ?‘‘đ?‘Ą
The PID controller algorithm is simplistic and used for standalone cases. It is difficult to add lateral and longitudinal controls. When humans drive, they drive adjusting naturally to the size, mass, and dynamics of the vehicle. To replicate these more complicated controllers (MPC- Model predictive control etc.) are being developed and used. Role of Machine learning: Machine learning algorithms are extensively used to find solutions to various challenges in the self-driving cars applications. The applications based on machine learning include speech and gesture recognition and language translation. Both supervised and unsupervised algorithms are used in the steps of mapping and localization, obstacle avoidance, path planning, and control and execution. The major areas where machine learning algorithms are used is rendering of 7
surrounding environment and forecasting the changes that are possible to these surroundings. The machine learning algorithms broadly can be classified as decision matrix algorithms, cluster algorithms, and regression algorithms. One category of the machine learning algorithm can be utilized for multiple key areas. For example, the regression algorithms can be utilized for object localization as well as object detection or prediction of the movement. Decision matrix algorithms: These algorithms identify the relationship between the sets of the data and then make a decision out of it. Whether a car needs to take a turn or decelerate to a level and continue for the turning is decided using these algorithms. Regression algorithms: Adaptive boosting is a regression-based machine learning algorithm that is most commonly used in this area. It can be used for classification as well in certain applications. This an adaptive learning algorithm. To create a powerful learner adaptive boosting uses multiple iterations and thus the name adaptive. A new learner is added and the weights are adjusted so that it takes care of the example incorrectly classified in the previous rounds. The regression algorithms can be used for short path planning and long learning. Other regression algorithms that are used for self-driving cars are decision forest regression, neural network regression and Bayesian regression etc. Hurdles in the path: A fair number of scientists believe that these cars can generate as much as 1GB data per second. Theoretically speaking this would amount to Petabytes of data generated per year by a normal car. Although all the generated information is not important the question problem posed is still unanswered. Another concern area is the current testing is confined entirely to pre-mapped areas and it's very unsafe to do the test in an unmapped area. While the data problem may be solved but the threat of these vehicles getting hacked is very high and raises serious security concerns. Public acceptance, personal data usage and the threat of data privacy are still the biggest hurdles for the evolution of these vehicles.
Reference: https://www.ucsusa.org/clean-vehicles/how-self-driving-cars-work#.W-k5dZMzbIV http://www.alphr.com/cars/7038/how-do-googles-driverless-cars-work https://www.landmarkdividend.com/self-driving-car/ https://www.iotforall.com/how-do-self-driving-cars-work/ https://www.telegraph.co.uk/cars/features/how-do-driverless-cars-work/ https://medium.com/udacity/how-self-driving-cars-work-f77c49dca47e https://www.nytimes.com/2018/03/19/technology/how-driverless-cars-work.html https://www.hyundai.news/eu/technology/how-do-self-driving-cars-work/ https://robohub.org/how-do-self-driving-cars-work/
Matrix factorization Recommendation system: Netflix prize by Shubham Thakur PG Core 2018-20 Staying relevant in oneâ€™s business is essential to survive but what should one do when consumer demand is ever changing and a business has to upgrade itself constantly? Sounds like the scenario that every business faces nowadays. This is especially true for those enterprises who try to sell their products to customers who have plenty of choices! Technology helps companies to stay relevant in these cases.
Modern consumers are flooded with choices. E-commerce portals and content providers offer a huge selection of products to consumers to meet a variety of needs. Matching consumers with the most apt products are the key to enhancing user satisfaction and to create loyal customers. Hence many retailers are becoming interested in recommendation systems, which tracks the behavior of users and their interest about a particular kind of product and provide personalized recommendation that suits them. Because good personalized recommendations can add another dimension to the user experience, e-commerce leaders like Amazon, Alibaba and online content consuming services like Netflix have made recommender systems essential part of their websites. These systems are basically of two types; first is Content filtering approach and other one is Collaborative approach. In the Content based filtering approach, the system first creates a profile for each user or product and characterizes its nature. For example, a movie profile could include attributes like its genre, actors, box office popularity, release date and so forth. User profile might include demographics of user provided by them at the time of sign up. The profile allows the program to associate user with suitable movies and serials. But the major drawback here is that these strategies requires certain amount of external information that might not be available or might not be easy to collect. Alternative to this is collaborative filtering. Here we try to rely on previous transactions or ratings without requiring the creation of explicit profiles. The major highlight of this technique is that it is domain free. It can be associated to any product or service and it can address data aspects that are often elusive and difficult to fit in profile using previous technique. It is generally more accurate than content-based techniques but however suffers from a problem called Cold start problem due to its inability to address the systems new products and users; here Content based techniques are superior. Collaborative Filtering has two primary areas - Neighborhood methods and latent factor models. Neighborhood methods are centered on creating the relationships between items or between users. The item-oriented approach measures a user’s preference for an item based on ratings of “neighboring” items by the same user. Here neighboring product means other products that tends to get similar kind of ratings when rated by the same users. For example, take the movie “Saving Private Ryan”. Its neighbors might include war movies, Spielberg movies, and Tom Hank’s movies, among others. To predict a particular user’s rating for Saving Private Ryan, one would look for the movies nearest neighbors that this user actually rated. User oriented approach finds the like-minded users who can complement each other’s rating. One of the most successful realizations of latent factor models is based on matrix factorization. In its basic form, matrix factorization identifies both items and users by vectors of factors inferred from item rating behavior. High correspondence between item and user factors leads to recommendation. These methods are becoming popular in recent years by combining great scalability with predictive accuracy. Also, they do offer much needed flexibility for modeling according to various real-life situations. Recommender systems heavily rely on different types of 10
input data, which are often placed in a matrix with one dimension representing users and the other dimension representing items of interest. The most important data is high-quality explicit feedback, which includes input by users regarding their interest in products. For example, Netflix collects star ratings for movies and TiVo users indicate their preferences for TV shows by pressing thumbs-up and thumbs-down buttons. These explicit user feedbacks are called ratings. Usually, explicit feedback comprises a matrix called sparse matrix, since any single user is likely to have rated small percentage of possible items. One strength of matrix factorization is that it allows additional information to be added. When this explicit feedback is not available, systems can infer user preferences using implicit feedback, which indirectly reflects opinion by observing user behavior, including purchase history, browsing history, search patterns, or even mouse movements. Implicit feedback usually denotes the presence or absence of an event, so it is typically represented by a densely filled matrix. In 2006, the online content company Netflix announced a competetion to improve its existing recommeder system named as “Cinematch”. Netflix announced prize of $1 million to those teams who can deliver a alternative system that can improve their recommender system by at least 10%. The contest was won by team “BellKor’s Pragmatic Chaos” who made a significant improvement of 10.06%. References: https://www.ionos.com/digitalguide/online-marketing/online-sales/how-to-use-recommendationsystems-in-e-commerce/ https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5197422 https://medium.com/recombee-blog/machine-learning-for-recommender-systems-part-1algorithms-evaluation-and-cold-start-6f696683d0ed https://ieeexplore.ieee.org/document/5197422/metrics#metrics https://www.marutitech.com/recommendation-engine-benefits/
Understanding Boxplots by Abhishek Bawa PG Core A 2018-20
What is a boxplot? A boxplot is a descriptive statistics method for graphically depicting groups of numerical data using the concept of quartiles. It is a standardized way of displaying the distribution of data with the help of five-point summary. The five-point summary consist of the following components: 1. 2. 3. 4.
Minimum Q1 Median Q3 12
5. Maximum Understanding Boxplot The five-point summary helps an analyst to find out how tightly the dataset is coupled, any potential outliers (Data below the minimum value or greater than the maximum value) and the symmetry of the dataset (i.e. How well the values in the dataset are spread out).
Fig.1. Representation of a boxplot The above image shows the representation of a boxplot. In the image above : 1. Q2: It is the middle value of the dataset (50th percentile) 2. Q1 (25th percentile): It is the middle value between the smallest (not minimum value) and the median value. 3. Q3 (75th percentile): It is the middle value between the median and largest value (not maximum value). 4. IQR (Inter-quartile range) = (Q3 – Q1). This represents the dataset between 25th and 75th percentile. 5. OF -> Outer Fence, IF -> Inner Fence There are two types of outliers: 1. Mild outliers: These are the values between the lower bound and upper bound on either side of the dataset. 2. Extreme outliers: These are the values beyond the upper bound on either side of the dataset. The boundaries of the fences are defined as follows: Inner Fence (IF)
Outer Fence (OF)
Lower Bound (LB)
Q1 – (1.5 * IQR)
Q3 + (1.5 * IQR)
Upper Bound (UB)
Q1 – (3 * IQR)
Q3 + (3 * IQR)
Taking an example Let’s take Virat Kohli’s ODI career. While the entire dataset cannot be shown here, assume the following statistics: No. of innings played = 208 Total runs scored = 10232 Batting Average = 59.8 Highest Score = 183 Now depicting Virat Kohli’s dataset of runs scored in each innings on a boxplot, the following output is obtained:
Virat Kohli's ODI Career 35
3.5 3 2.5 2 1.5 1 0.5 0
In the image shown above, the median runs scored is 35, which is lesser than the batting average (59.8). Hence the dataset is left skewed data. As expected, there is no outliers, as Kohli cannot score less than 0 runs in an innings. With this, the following are the observations: 1. 2. 3. 4. 5.
Minimum: 0 Q1: 10 Median: 35 Q3: 80.5 Maximum: 183
Advantages of using boxplot 1. Box plots may seem more primitive than a histogram or kernel density estimate. However, they consume lesser space than the other two and are particularly useful for comparing distributions between several data groups or sets. a. Choice of number and width of bins techniques can heavily influence the appearance of a histogram 14
b. Choice of bandwidth can heavily influence the appearance of a kernel density estimate. 2. Boxplots can easily handle large datasets. 3. Simple and intuitive method for depicting outliers. Disadvantages of using boxplot 1. Exact values are not retained. Values are approximated to the nearest value. Hence precision of the data points can be lost. 2. Boxplot is not as visually appealing as other graphical techniques. Conclusion A boxplot is a visually effective method of quickly viewing a clear summary of one or more datasets and comparing different sets of results which are conducted from different experiments. With a glance, a boxplot allows a graphical display of the distribution of results and provides indications of symmetry and skewness within the data.
References: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51 https://sciencing.com/advantages-disadvantages-box-plot-12025269.html
Raman Parimala by Kaustubh Karanje PG Core A 2018-20
Raman Parimala is an Indian mathematician known for her contribution to the field of Algebra. She is popularly called a “supreme and powerful algebraist” for her work. She was born on 21st November 1948 in Tamil Nadu. She completed her M.Sc. from Madras University in 1970 and Ph.D. from Bombay University in 1976. After that she worked as a professor at Tata Institute of Fundamental Research in Mumbai (Bombay) and worked as visiting faculty at many other renowned universities as well. Her guide at TIFR was R. Sridharan who was also an expert in algebra. Raman Parimala has done research in Number theory, algebraic geometry, and topology. She is considered an expert in algebraic geometry. Her study of quadratic forms additionally led her to analyze real algebraical geometry likewise as advanced algebraical pure mathematics and therefore the co-homology (sequence of abelian groups associated to a topological space, often defined from a cochain complex) theories that are coupled thereto. Parimala has put this expertise to work in a series of elegant publications either supporting or refuting long-standing conjectures. Her study of low rank quadratic spaces, for example, led her to a new definition of discriminant that is an invariant for involutions of central simple algebras that allowed her to settle decomposability queries for involutions that date from to Albert within the Nineteen Thirties. She was invited as a plenary speaker at the International Congress of Mathematics. There she gave a talk on Study of Quadratic forms- some connections with geometry. She has also addressed on the topic - Arithmetic of linear algebraic groups over two-dimensional fields. Prof. Parimala enjoys teaching and inspires students to pursue a career in mathematics. She received the Shanti Swarup Bhatnagar Award in 1987 and also received an honorary doctorate from the University of Lausanne in 1999, and in 2003 was awarded the Srinivasa Ramanujan Birth Centenary Award. “Math has the beauty of poetry; its abstractions are combined with rigor”- Raman Parimala
References: https://www.agnesscott.edu/lriddle/women/parimala.htm http://nobelprizeseries.in/tbis/r-parimala
CURIOSITY UPDATE by Saumya Joshi MMS 2018-20 ISRO’s successful launch of GSAT-29 on board the second development flight GSLV-Mk III D2 On November 14, 2018, ISRO launched the communication satellite GSAT-29 on board its second development flight GSLV-Mk III D2 from Satish Dhawan Space Centre at Sriharikota. The launch vehicle has double payload carrying capacity than the Mk II version. It is 43.5 m tall; with a lift off mass of 641 tons, it is the heaviest launch vehicle of India. It is Capable of lifting up to 4 tons mass to geosynchronous transfer orbit or a 10 ton payload to Low Earth Orbit. Vehicle is powered by two S200 solid motors, 1 L110 liquid core stage and a powerful liquid cryogenic stage C25. The payload is an advanced high throughput satellite GSAT-29 which is a multi-band, multi-beam communication satellite with a lift-off mass of 3.4 tons; GSAT-29 payloads are configured to cater to the communication requirements of users from Jammu & Kashmir and North Eastern regions of India. ISRO’s successful launch of PSLV rocket carrying India’s first hyperspectral imaging satellite At exactly 9.58 am on 29th November, Indian Space Research Organisation's reliable workhorse PSLV rocket soared into the skies from Sriharikota's first launchpad carrying with it India's first hyperspectral imaging satellite (HysIS), an advanced earth observation satellite, and 30 other foreign satellites. NASA, InSight touched down on Mars About news from NASA, InSight touched down on Mars at 11:52:59 a.m. PT (2:52:59 p.m. ET) on Nov. 26, 2018. The lander plunged through the thin Martian atmosphere, heatshield first, and used a parachute to slow down. It fired its retro rockets to slowly descend to the surface of Mars, and land on the smooth plains of Elysium Planitia. InSight's goal is to study the interior of Mars and take the planet's vital signs, its pulse, and temperature. To look deep into Mars, the lander must be at a place where it can stay still and quiet for its entire mission. That's why scientists chose Elysium Planitia as InSight's home.
References: https://mars.nasa.gov/insight/timeline/landing/summary/ https://www.cnet.com/news/nasa-insight-wows-with-mars-landing-but-work-just-getting-started/ https://timesofindia.indiatimes.com/india/isro-launches-indias-first-hyperspectral-imaging-satalong-with-30-foreign-sats/articleshow/66859630.cms
NEWS DIGEST by Saumya Joshi MMS 2018-20 The Menace of Deep Fakes As if fake news was not enough of a problem, now we have the menace of deep fake news. A few months back a video of Donald Trump appeared online stating his objectionable views on Paris climate agreement, it caused quite an uproar, but later it was revealed to be doctored clip. FAKE videos can now be created using a machine learning technique called a â€œgenerative adversarial networkâ€?, or a GAN. GAN can look at thousands of photos of a person, and generate an entirely new photo. Same it can do with audio. Recognizing deep fakes is a tedious task, but because creating deep fakes requires quite an expertise in machine learning and AI so people spreading propaganda do not make many efforts in this regard, but it sure is a serious threat in near future. https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth
Top 3 Big Data Analytics Trends For 2019 1) Dark Data Dark data is the data acquired through computer network operations which could not be used for deriving insights because of one reason or the other. Organizations collect vast amounts of data and are generally able to analyze only a small fraction of it, sometimes they are not even aware that data is being collected. It is important to understand that any data left unexplored is a missed opportunity and may become a cause of security threat. 2) Chief Data Officers Though a relatively new concept, but the demand for CDOs is increasing in organizations. Human Resource managers are on a lookout for CDOs, so if you have an expertise in enterprise-wide data cleaning, analysis, and visualization, CDO is the profile to aim for. 3) Edge Computing In edge computing, the data is processed closer to the source from which it is produced rather than a central location. If we look in the context of the Internet of Things then it would be the sensors and other embedded devices where the data will be processed. Edge computing will play a crucial role in realizing ideas such as smart cities, industry 4.0, ubiquitous computing. https://www.cioapplicationseurope.com/news/top-big-data-trends-that-will-dominate-2019nid-541.html
Introduction to Classical Machine Learning workshop by Tanmay Nikam PG FS 2018-20 “People worry that computers will get too smart and take over the world, but the real problem is that they're too stupid and they've already taken over the world.” ― Pedro Domingos Quantinuum - the Quant and Analytics committee of K.J. Somaiya Institute of Management Studies and Research is organizing a workshop on ‘Introduction to Classical Machine Learning’. The workshop will be conducted on 8th December between 9 a.m. and 1:30 p.m. The event is free to attend with only prior registration required. It will be limited seats only event with seats allocated on a first come first served basis. The workshop will be conducted by Mr. Raj Darshan Dhyani and Mr. Prashant K. Sharma. Mr. Raj Darshan Dhyani is currently the Chief Technology Officer at iMarkServe. He is an IIT Bombay graduate (1984) and has done his Post Graduate in Management, JBIMS (1987). He has 30 years of industry experience. Mr. Prashant K. Sharma is a Research Assistant, IIT Bombay who won the Best Presentation Award Winner in Beijing, China for a Machine Learning project on Quadcopters. He has worked on projects like Music generation using Generative Adversarial Networks (GANs), Diabetic Retinopathy: detecting the effect of various stages of the disease based on retinal scans, FIFA playing agents: Trained reinforcement learning agent to play FIFA skill games etc. Machine learning is the study of algorithms and mathematical models that make computers act without being explicitly programmed and giving them the ability to progressively improve their algorithms and models from experience. In the past decade, machine learning has given us selfdriving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome and that’s just the beginning. It is already disrupting the IT industry and is opening up new avenues for Internet of Things, NLP, and self-teaching AI. Machine learning has the potential to disrupt every industry and forever change how humans interact with the world.
Interview with Dr. Nilakantan Narasinganallur by Tanmay Nikam PG FS 2018-20 Quantinuum members interacted with our mentor Prof. Nilakantan Narasinganallur, who shared his insights regarding various subjects and answered few questions raised by the team. Prof. Nilakantan has 32+ years of experience in various industries and has been teaching in management institutes for over 10 years now. Professor specializes in Operations Research/Management Science Modeling and Applications, Statistical Techniques, Business simulation exercises, Analytics-oriented education and was ever eager to answer the students and share his views. Below are the excerpts from the interview with sir: Q: Can you share with us some of the inspiring moments or interesting anecdotes from your student life. A: During my OR days, we realized that most of our problems we only have to solve and cannot seek faculty help. During those days, faculty members were also not easily reachable. A few of us used to work out problems on our own and present in the class. One such problem, framed in a book from the US, was concerned with travel and comparative costs of taxi and public transport. We solved the problem and arrived at the solution of using taxi instead of public transport. In the context of the US, this was possible but we could not convince Indian audience. Thus, we realized the importance of not only solving problem but also selling the solution to the user/ audience. This also emphasizes the aspect of buying in by the user.
Q: What was your inclination to choose your field of specialization? A: I was always inclined towards mathematics but was not much into pure mathematics. Applied mathematics appealed to me. Operations Research is a bundle of techniques applying mathematics to management problems.
Q: What are the major changes happening in your field of specialization? A: OR is slowly being integrated as part of an all-encompassing field of Analytics. However, every specialized field in Analytics be it OR, Machine learning, Forecasting, Visualization, etc. have their own niche importance.
Q: Please share with us some of the most impactful experiences you had while teaching or researching. A: I am pleased with the realization that I have spearheaded a small revolution in SIMSR - that of teaching OR/Statistics etc. with computers. Today, faculty members of streams outside quantitative methods are also migrating to computer-based teaching. This is not to say that this could not have been done by others. I was there at the right time and we took right decisions like emphasizing the computer-based learning as well as expanding the computer network within SIMSR.
Q: What are the current trends in research and innovation that you find interesting? A: A lot of things are happening in the broad spectrum of Analytics. The future will be seeing more complex systems and solutions provided by complex processes and software. Broadly, Analytics could be divided into three areas which interact and merge - Statistics/ Quantitative methods, Machine learning, and IT systems. Specialization in some section of Analytics is quite important.
Q: What advice would you give to a young researcher? A: Established academicians and researchers have indicated that you require minimum of 10000 hours of study/practice to master anything. This is very much true in the case of research also. You become master of your domain if you spend 10000 hours, studying and researching in it. That will be my advice.
Q: What advice would you give to students? A: People have wrong notions that MBA is general knowledge and you can master it by collecting tidbits of knowledge and dropping these in appropriate circumstances. In my interactions with students, I have always been emphasizing the importance of deep knowledge in the area of specialization. As Samuel Johnson said - there are two types of knowledge with us, one is what we know ourselves and another is we know where to find it.
Q: Why should students pursue research? A: Because research reinforces your learning and teaches you to streamline your working.
Quantinuum team extends a big thank you to Prof. Nilakantan for sharing his valuable insights with the team. 21
Moneyball by Dhyan Baby K PG FS 2018-20 Directed by: Bennet Miller “Baseball isn’t just numbers. It’s not science”. This is response that Billy Beane – General Manager of Oakland Athletics, got from one of his scouts when he tries to reinvent baseball by taking a sophisticated sabermetric approach to scouting and analyzing players “You can’t approach baseball from a statistical bean counting point of view”, the scout said. Moneyball is a 2011 American sports movie on baseball which follows the Oakland Athletics 2002 season. After another disappointing season which ended with the departure of three of his key players, Beane, played by an impeccable Brad Pitt, is faced with a conundrum of how to replace his star players and keep up with other franchises who were ready to spend huge money to make their dream team. He meets Peter Brand, an economist from Yale, played by Jonah Hill who believes that baseball thinking is medieval and needs to change. He shows Beane that they can build a title winning team with their limited budget, if they use simple statistics. Their aim should not be buying players but buying wins by checking how much runs they can get from each player. The movie revolves around how Beane puts his complete career on the line by believing this theory and the team finally goes on to win 20 consecutive games, then an American League record and recording one of the most famous season for the franchise by finishing first in American League West. Sabermetrics was a term coined by Bill James in 1980. The term is derived from SABR – Society for American Baseball Research and Bill defined it as "the search for objective knowledge about baseball". The approach uses statistical analysis as better way to understand the importance of a player. Earlier measurements like batting averages and pitcher wins were used which had many flaws. Better sabermetrics measurements like weighted on-base average, secondary average, runs created gave teams a better understanding of the players and more edge while recruiting players. The Boston Red Sox went on to hire Bill James in 2003 and subsequently won two World Series. Time Magazine once named Bill as one of the 100 most influential people in the world Scouts used to overlook good players due a variety of biased reasons and flaws. Bringing a player down to numbers showed really how much value they brought to the team. Yes sports are won on the field with fundamental play but these numbers help scouts and teams to find the talent that is ideal for them and this approach has gone on to change baseball as we know. Moneyball is a romantic rendition of a sport which is much more to the American people and is a highly recommended movie for all.
QUANTFUN by Kaustubh Karanje PG Core A 2018-20 1) A group of 5 people want to keep their secret document in a safe. They want to make sure that in future, only a majority (>=3) can open the safe. So they want to put some locks on the safe, each of the locks have to be opened to access the safe. Each lock can have multiple keys; but each key only opens one lock. How many locks are required at the minimum? How many keys will each member carry?
2) Loop the Loop puzzle
3) KAKURO Kakuro puzzles are similar to crosswords, but instead of letters the board is filled with digits (from 1 to 9).
The board's squares need to be filled in with these digits in order to sum up to the specified numbers. You are not allowed to use the same digit more than once to obtain a given sum. Each Kakuro puzzle has a unique solution. Good luck!
Answer 110 locks and 6 keys.
Solution- For each group of 2 people, there must be a lock which none of them have a key to. But the key of such a lock will be given to the remaining 3 people of group. Thus, we must have atleast 5C2 = 10 Locks. Each lock has 3 keys, which is given to unique 3-member subgroup. So each member should have 10*3/5 = 6 keys.
QUANT CONNECT Quantinuum, the Quant and Analytics committee of K.J. Somaiya Institute of Management Studies and Research aims to empower students and professionals alike to organize and understand numbers and, in turn, to make good and rational decisions as future managers. The newsletter published monthly consists of a gamut of articles for readers ranging from beginners to advanced learners so as to further enrich the young minds understand the contributions made to the field of mathematics along with a couple of brain- racking sections of Sudoku to tickle the gray cells. For any further queries and feedback, please contact the following address: K.J. Somaiya Institute of Management Studies and Research, Vidya Nagar, Vidyavihar, Ghatkopar East, Mumbai -400077 or drop us a mail at email@example.com Mentor: Prof. N.S.Nilakantan (+919820680741) Email â€“ firstname.lastname@example.org Team Leaders:
Purav Shah (+91 8511929416) VVNS Anudeep (+91 9441201685) Yatharth Jaiswal (+91 9969698361) Editorial Team: VVNS Anudeep Khushbu Mehta (+91 9930158610) Tanmay Nikam (+91 9699288587) Dhyan Baby K (+91 9809245308) Akshay Nandan R Saumya Joshi Abhishek Bawa Kaustubh Karanje Shubham Thakur
Quriosity Volume 09 Issue 11