CLEAR Journal March 2018 Edition

Page 1


B.Tech Electrical and Electronics Engineering

CLEAR MARCH 2018

1


CLEAR Journal (Computational Linguistics in Engineering And Research) M. Tech Computational Linguistics, Dept. of Computer Science and Engineering, Govt. Engineering College, Sreekrishnapuram, Palakkad678633 www.simplegroups.in simplequest.in@gmail.com

Editorial………………………………………… 3

Chief Editor Shine S Assistant Professor Dept. of Computer Science and Engineering Govt. Engineering College, Sreekrishnapuram, Palakkad678633

Teaching Machines to Spot the Essentials.....................................6

Editors Bhavya K Muhammed Shameem K Pavithra C P Remya K R Cover page and Layout Muhammed Shameem K

CLEAR JUNE 2018 Invitation………………………………………21 Last word………………………………………22

Fahma Bakkar K A Smart Reply For Gmail ....................................................10 Muhsina V P Natural Language Processing of Clinical Notes for Identification of Critical Limb Ischemia ....................................................14 Poornima M Automatically Adding Sounds to Silent Videos .............................18 Sandeep Nithyanandan

CLEAR MARCH 2018

2


Dear Readers, This latest edition of CLEAR Journal contains articles which cover the topics like teaching machines, Gmail reply systems, identification of critical limb ischemia, adding sounds to silent videos etc. The previous edition covered the articles related to some trending topics like AI and Self driving cars, modern trends in deep learning, Hebbian Learning, Dragon personal digital assistant, Long short term memory RNNs etc. We are getting good feedbacks from our readers who are very much interested in the field of Computational linguistics. We are glad to get such responses and are responsible to work according to the suggestions from our readers. On this hopeful prospect, I proudly present this edition of CLEAR Journal to our faithful readers and look forward to your opinions and criticisms.

Best Regards, Shine S (Chief Editor)

CLEAR MARCH 2018

3


WORKSHOP ON TKINTER

A Workshop on GUI Programming using Tkinter was organized at GEC, Sreekrishnapuram. The workshop was conducted at M.Tech Computational Linguistics Lab for S4 computer science students.

CLEAR MARCH 2018

4

The inaugural ceremony of the launching of SIMPLE Groups Activities for the academic year 2017-2018 was held on


CONOZENSA-TECHNICAL TREASURE HUNT

As part of INVENTO-18- the technical fest of GEC Palakkad, technical treasure hunt named CONOSENZA was conducted. Students from various branches were participated.

INTERNSHIP ï‚· Rahul M, Nayana R M and Sandhini S have been selected for internship program at ICFOSS Trivandrum.

CLEAR MARCH 2018

5


Teaching Machines to Spot the Essentials Fahma Bakkar K A M.Tech Computational Linguistics Government Engineering College, Palakkad fahmabakkarka@gmail.com

A large data sets is describing using a novel machine-learning algorithm that analyses a physical system and extract essential information needed to understand the underlying physics. The algorithm is capable of identifying the relevant degrees of freedom and executing RG steps iteratively without any prior knowledge about the system. An artificial neural network based system is to performs this task. In one and two dimensions, can apply the algorithm to classical statistical physics problems. Model demonstrate that machine-learning techniques can extract abstract physical concepts and consequently become an integral part of theory- and model-building. Machine Leaning is an application of Artificial Intelligence, having the ability of automatically learn and improve from experience without being explicitly programmed. This enabled ground-breaking advances in computer vision, speech recognition and game-playing, achieving super-human performance in tasks in which humans excelled while more traditional algorithmic approaches struggled. Also has been applied to physics problems, typically for the classification of physical phases and the numerical simulation of ground states. Maciej Koch-Janusz, a researcher at the Institute for Theoretical Physics at ETH CLEAR MARCH 2018

Zurich, Switzerland, and Zohar Ringel of the Hebrew University of Jerusalem, Israel, have now explored the exciting possibility of harnessing machine learning not as a numerical simulator or a 'hypothesis tester', but as an integral part of the physical reasoning process. Physical systems will make difference in their microscopic details than from macroscopic scales. Those universal properties of differing their physical characteristics, are revealed by the powerful renormalization group (RG) procedure, which systematically retains ‘slow’ degrees of freedom and integrates out the rest. But the degrees of freedom may be difficult to identify. An artificial neural network based on a model-independent, informationtheoretic characterization of a real-space RG procedure, which performs the task. One of the conceptually most profound tools of theoretical physics is Renormalization Group (RG) approach, since its inception. In theoretical Physics, the renormalization group (RG) refers to a mathematical apparatus that allows systematic investigation of the changes of a physical system as viewed at different distance scales. In particle physics, it reflects the changes in the underlying force laws as the energy scale at which physical processes occur varies, energy/momentum and 6


resolution distance scales being effectively conjugate under the uncertainty principle A change in scale is called a scale transformation. The renormalization group is intimately related to scale invariance and conformal invariance, symmetries in which a system appears the same at all scales (so-called self-similarity). This is often a challenging conceptual step, particularly for strongly interacting systems, and may involve a sequence of mathematical mappings to models, whose behaviour is better understood. Introduce an ANN algorithm to identifying the physically relevant degrees of freedom in a spatial region and performing an RG coarse-graining step. The input data are samples of the system configurations drawn from a Boltzmann distribution; no further knowledge about the microscopic details of the system is provided. The internal parameters of the network, which ultimately encode the degrees of freedom of interest at each step, are optimized (‘learned’, in neural network parlance) by a training algorithm based on evaluating real-space mutual information (RSMI) between spatially separated regions. The identification of the important degrees of freedom, and the ability to execute a realspace RG procedure, has not only quantitative but also conceptual significance: it allows one to gain insights into the correct way of thinking about the problem at hand, raising the prospect that machine-learning techniques may augment the scientific inquiry in a fundamental fashion.

CLEAR MARCH 2018

The RSMI Algorithm The algorithm, named RSMI, is orderrecursive and allows data processing in the fast-time, an idle period for most algorithms (data acquisition period). The RSMI optimum output converges in the meansquare sense to the SMI optimum output as the number of pulses increases from 1 to the SMI order, always with smaller computational cost.

Fig. 1: The RSMI algorithm.

The RSMI neural network architecture. The hidden layer H is directly coupled to the visible layer V via the weights λ (red arrows). However, the training algorithm for the weights estimates mutual information between H and the environment E. The buffer B is introduced to filter out local correlations within V .

7


The workflow of the algorithm. The CDalgorithm-trained RBMs learn to approximate probability distributions P(V, E) and P(V). Their final parameters, denoted collectively by _(V,E) and _(V), are inputs for the main RSMI network learning to extract P_(H_V) by maximizing 位. The final weights _ i,j of the RSMI network identify the relevant degrees of freedom. Three distinct RBMs are used. Two are trained as efficient approximators of the probability distributions P(V, E) and P(V), using the celebrated contrastive divergence (CD) algorithm. Their trained parameters are used by the third network, which has a different objective: to find P_(H_V) maximizing I位. To the end introduce the real-space mutual information (RSMI) network, whose architecture is shown in Fig. 1.a. The hidden units of RSMI correspond to coarse-grained variables H. The results of deep learning models gets better with more training data and larger models. These in turn require more computation to train the system. Better algorithms, new insights and improved techniques also helps us to achieve better results. Another benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning. Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. Automatically learning features at multiple levels of CLEAR MARCH 2018

abstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features. Just think of the time it saves us because of its self feature learning property!. The trained weights 位 define the probability P_(H_V) of a Boltzmann form, which is used to generate MC samples of the coarsegrained system. Those, in turn, become input to the next iteration of the RSMI algorithm. The estimates of mutual information, weights of the trained RBMs and sets of generated MC samples at every RG step can be used to extract quantitative information about the system as show below and in the Supplementary Information. Also emphasize that the parameters 位 identifying relevant degrees of freedom are re-computed at every RG step. This potentially allows RSMI to capture the evolution of the degrees of freedom along the RG flow.

Fig. 2: The output of algorithm.

8


Technically speaking, the machine performs one of the crucial steps of one of the conceptually most profound tools of modern theoretical physics, the so-called renormalization group. The algorithm of Koch-Janusz and Ringel provides a qualitatively new approach: the internal data representations discovered by suitably designed machine-learning systems are often considered to be ‘obscure’, but the results yielded by their algorithm provide fundamental physical insight, reflecting the underlying structure of physical system. This raises the prospect of employing machine learning in science in a collaborative fashion, combining the power of machines to distil information from vast data sets with human creativity and background knowledge. Artificial neural networks based on RSMI optimization have proved capable of extracting complex information about physically relevant degrees of freedom and using it to perform a real space RG procedure. The RSMI algorithm proposed allows for the study of the existence and location of critical points, and RG flow in their vicinity, as well as estimation of correlations functions, critical exponents and so on. This approach is an example of a new paradigm in applying machine learning in physics: the internal data representations discovered by suitably designed algorithms are not just technical means to an end, but instead are a clear reflection of the underlying structure of the physical system. Thus, in spite of their ‘black box’ reputation,

CLEAR MARCH 2018

the innards of such architectures may teach us fundamental lessons. This raises the prospect of employing machine learning in science in a collaborative fashion, exploiting the machines’ power to distil subtle information from vast data, and human creativity and background knowledge.

References [1] Koch-Janus M,Ringel Z.Mutual information, neural networks and the renormalization group. Nature Physics.DOI: 10.1038/s41567-018-0081-4 [2] LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature 521, 436–444 (2015). [3] Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 584–589 (2016). [4] Torlai, G. & Melko, R. G. Learning thermodynamics with Boltzmann machines. Phys. Rev. B 94, 165134 (2016). [5] Hershey, J. R., Rennie, S. J., Olsen, P. A. & Kristjansson, T. T. Super-human multitalker speech recognition: A graphical modelling approach. Comput. Speech Lang. 24, 45–66 (2010). [6] Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).

9


Smart Reply for Gmail Muhsina V P M.Tech Computational Linguistics Government Engineering College, Palakkad muhsinavp5@gmail.com

Email is one of the most popular modes of communication on the Web. Most global users of internet turn to email rather than other social network, to connect and share the information. Due to increase in email overload, it has become challenging for the users to process and replay to the coming messages. It’s easy to read emails, but responding those emails takes much effort. On smart phones, takes too much time to type email message replay. So, to overcome this difficulty Google introduce a new Gmail feature called smart reply. It scans our mails and suggesting quick responses to messages. Smart reply saves time by automatically generating responses. Semantically diverse suggestions are generated, i.e complete email responses, with just one tap on mobile. In previous works, long short term memory networks (LSTMs) are used to predict sequence of text. Here, input sequences are incoming messages and output distribution is over the space of possible responses. Training this framework on a large corpus of conversation data produces a fully generative model that can produce a response to any sequence of input text. In order to deploy the system into a product that used globally by millions, it address several challenges:  Response Quality: How to ensure that the individual response options are always high quality in language and content.  Utility: How to select multiple options to show a user so as to CLEAR MARCH 2018

maximize the likelihood that one is chosen. Scalability: How to build a scalable system by efficiently searching the target response space.

To handle these challenges, smart reply :a novel end-to-end method and a system for automated email response generation, is proposed. Smart Reply uses the smart learning algorithm that also used in Gmail to index email, add reminders, calendar invitation and smart attachments. It suggests three possible response based email received. For instance, if a contact emails you a question like, “Do you have any documentation for the how to use in software” Inbox will now display replies like, “I don’t, sorry ” or “I will have to look for it” or “I will send it to you”. Tapping on a suggestion that automatically pastes that text into the email. Users are also have choice to add some additional data if necessary. Example given in figure1 Architecture Smart reply system consist of following components. Preprocess Email: Perform actions like tokenisation, language detection, sentence segmentation etc on the input email. Triggering model: A feed-forward neural network to decide whether or not to suggest response. Network 10


consist of embedding layer and fully connected hidden layer. Triggering module is the entry point of the Smart Reply system. If decision is positive, then resonses are selected by LSTM. Otherwise complete the execution and do not show any suggestion. Feed forward network of triggering component produces probability score for every incoming mails. If the score is above threshold, trigger and run LSTM scoring.

etc. It also uses signals like whether the recipient is in the contact list of sender. Response selection: The important task of smart reply system is to find most likely response for incoming messages. An LSTM network is used to predict the approximate best response for the original messages. Hence improves the scalability. Network: LSTM is sequence to sequence learning method, which reads the input message(token by token) and encode a vector represenation. After encoding, compute softmax to get the probability of first output token given the input token sequence. Computing the probabality the of next token by keeping feeds on previous previuos response and the input sequence. During the inference, approximate the most likely response greedly by taking the most likely response at each timestamp and feeding it back. Response set generation: Two of the core challenge, when biulding smart reply system, are response quality and utility. Create a set of high quality responses that deliver a positive user experience

Figure 1:Example of smart reply suggesstion

Data:- Training set consist of pairs (o, y) wher o is the incoming message and y is boolean varible to indicate if the message had response. Features:- Extract features i.e unigrams, bigrams, from the message body, subject,

CLEAR MARCH 2018

Ensure that capturing same intent responses are taken only one in the response suggestion. For example, minor lexical variations such as “Yes, I’ll be there.” and “I will be there.”). The goal is to generate a structured response set that effectively captures various intents are used by people in natural language conversations. The first step is to automatically generate a set of canonical responses messages that capture the variability in language. Canonicalize the responses by extracting the semantic structure using dependency parser.

11


In the next step, partition all response messages into semantic cluster.

and skip-grams. And then add these features as “feature� in the same graph. Edges are created between response node and feature node. By propagating semantic information throughout the graph, semantic labelling for all responses are learned. It is semisupervised approach. After some iteration, sample some of nodes of unlabelled nodes from the graph, manually label these sample nodes and repeat tha alorithm until convergence. Finally, extract the top k members for each semantic cluster and validate the quality with the help of human evaluators.

Figure 2: Architecture of smart reply system Here, each cluster represents a meaningful response intent. All messages within a cluster share the same information but in different clusters, informations are different. Each semantic clusters define response space for scoring and selecting possible responses and for promoting diversity among the responses. Since a large, labelled dataset is not available for semantic intent clustering, a graph based, semi-supervised approach is used. In graph based approach, manually define a few clusters with a small number of example responses are added as seeds for each cluster. Then construct a base graph with frequent response messages as response nodes (VR). For each response message, extract a set of lexical features, i.e ngrams CLEAR MARCH 2018

Suggestion diversity: After ďŹ nding a set of most likely responses from the LSTM, select a small set to show to the user.. Enforcing diverse semantic intents is critical to making the suggestions useful. So providing users with a varied set of reponse by omitting redundant response and by enforcing negative(or positive) responses. If the top two responses contain atleast one positive(negative)response and none of the top 3 resposes is negative(positive), the third response is replaced with a negative(positive) one. This is done by performing a second LSTM pass where the search is restricted to only positive or negative responses in target set. So Smart Reply is a novel end-to-end method for automatically generating short, complete email responses. This system is currently used in inbox by gmail. The core of the system is state-of-art deep LSTM model that can predict full responses. Now 10% of mobile replies in Inbox are composed with assistance from the Smart Reply system. Smart reply utilises machine learning to give bettere response. It will rollout globally on Android and iOS. 12


References [1] I. G. P. Affairs. “Interconnected world: Communication & social networking” http://www.ipsosna.com/newspolls/pressrel ease.aspx?id=5564.Press Release, March 2012. [2] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (NIPS), 2014 [3] H. Sak, A. Senior, and F. Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2014.

[4] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. [5] O. Vinyals and Q. V. Le. A neural conversation model. In ICML Deep Learning Workshop, 2015. [6] L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. In Proceedings of ACLIJCNLP, 2015. [7] S. Ravi and Q. Diao. Large scale distributed semi-supervised learning using streaming approximation. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) , 2016

Capsule networks-emulating the brain’s visual processing strengths: Capsule networks, a new type of deep neural network, process visual information in much the same way as the brain, which means they can maintain hierarchical relationships. For typical identification tasks, capsule networks promise better accuracy via reduction of errors—by as much as 50 percent. Generative adversarial networks: A generative adversarial network (GAN) is a type of unsupervised deep learning system that is implemented as two competing neural networks. One network, the generator, creates fake data that looks exactly like the real data set. The second network, the discriminator, ingests real and synthetic data. Over time, each network improves, enabling the pair to learn the entire distribution of the given data set

CLEAR MARCH 2018

13


Natural Language Processing of Clinical Notes for Identification of Critical Limb Ischemia Poornima M, M.Tech Computational Linguistics Government Engineering College, Palakkad poornima198m@gmail.com

Critical limb ischemia (CLI) is referred as limb threat, which is an advanced stage of peripheral artery disease (PAD). It is defined as a triad of ischemic rest pain, arterial insufficiency ulcers, and gangrene. The latter two conditions are jointly referred to as tissue loss it also reflects the development of surface damage to the limb tissue due to the most severe stage of ischemia. Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) Peripheral arterial disease (PAD) affects 8.5 million in US, 200 million worldwide. It is associated with high risk of mortality and morbidity. There is lack of awareness of PAD-associated risks for adverse cardiovascular outcomes. PAD is diagnosed by ankle-brachial index (ABI) test but this method remains underutilized. Natural language processing (NLP) algorithm for PAD identification is to develop and validate a subpheno typing NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. Lower extremity peripheral artery disease (PAD) affects millions of people worldwide . Advanced cases of PAD may manifest as critical limb ischemia (CLI) which is associated with considerable morbidity, mortality and high risk of major cardiovascular events. Population ageing and high prevalence of diabetes are the risk factors for CLI, it has been estimated that the consequent number of CLI patients is likely

CLEAR MARCH 2018

to increase in both developing and developed countries. Electronic health records (EHRs) have been widely used to improve the quality of patient care and as a source for automated identification of patients for research studies. The clinical diagnosis of CLI is based on the presence of signs and symptoms, while billing codes are used primarily for administrative purposes. Billing codes are used to mine structured information while natural language processing (NLP) is used to extract meaningful information from unstructured data. ICD billing codes are used primarily for administrative transactions and reimbursements. Additionally, ICD codes are used for diverse secondary purposes including epidemiology studies, cohort identification and for health services research. NLP is applied to clinical narrative may overcome the limitations of billing code algorithms for identification of CLI by recognition of text which describes signs and symptoms used to establish a diagnosis. The PAD-NLP algorithm automatically ascertained cases from clinical notes using PADrelated keywords and a set of rules for classification of a PAD patient. The PAD-NLP algorithm consisted of two main components: text-processing and patient classification. The text processing component analyzed the text of each clinical note by breaking down sentences 14


into words using MedTagger, which is an open access NLP tool, it identify PAD-related concepts which were mapped to specific categories that are used for patient classification. NLP technology was used for automated extraction and encoding of clinical information from narrative clinical notes. Med-Tagger is a knowledge driven clinical NLP system is used for sentence detection, word tokenization, section identification, contextual information and concept identification. After sentence detection, word tokens are identified using space between two words. Additionally, MedTagger identifies contextual information from clinical notes including: assertion, temporality and experiencer. MedTagger is used to identify assertion, temporality and experiencer of CLI-related keywords from clinical notes. The steps for the NLP algorithm were:

clinical notes and added. Fourth, the PAD-NLP algorithm produced output on two levels: document and patient levels. At the document level, each clinical note was processed to find PAD-related keywords and if found produced output in the form of PAD-related keywords along with ±2 sentences from clinical notes. Fifth, as CLI keywords were included in the list of keywords used by the PADNLP algorithm these concepts were also identified. Their relevant category along with certainty (positive, negative or possible), temporality (current, historical) and its experiencer (patient or someone else) were also extracted during the text processing phase. The CLI-NLP algorithm identified keywords from the document level output. Keywords were categorized as diagnostic and location. The list of diagnostic keywords included both the isolated mechanisms for wound or combined mechanisms (wounds which occurred as a consequence of a combination of co-existing mechanisms. Importantly, “ischemia” had to be considered as one of these mechanisms. In the absence of these criteria patients were classified as not having CLI (controls). The rule for CLI cases was:

Figure 1: System Overview

First, a list of PAD-related terms was identified by cardiovascular experts from narrative clinical notes. Second, these terms were mapped to corresponding concepts and their synonyms in the unified medical language system (UMLS) Meta thesaurus which were also added to this list. Third, this list was further expanded during the interactive refinement of PAD-NLP algorithm when additional synonyms and other lexical variants were identified in the CLEAR MARCH 2018

One diagnostic keyword+one location keyword within two sentences anchored by a diagnostic keyword in the same note.

The billing codes listed in Table 2 are used for the identification of CLI. These billing codes 15


were previously used by other studies for identification of CLI patients. The physician abstracted CLI relevant information using keywords based method from each PAD patient clinical notes. A board verify the abstracted information. The cardiologist conducted independent chart abstraction to clarify questions about patient classification raised by the primary abstractor while following predefined criteria documented in the manual for abstraction. A consensus decision was then reached by the two abstractors. In case of disagreement cases were reviewed by another cardiologist and a consensus opinion was used for classification. The clinical criteria used for CLI diagnosis by manual chart abstraction. The inter-annotator agreement between the two annotators was 95%. The clinical criteria for CLI diagnosis were based upon current clinical practice guidelines. The CLI-related ICD-9 codes and CLI-NLP algorithm are compared for identifying CLI with the gold-standard manual abstraction to calculate sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1score of each method. We calculated 95% confidence intervals (CI) for each aforementioned measure. These measures were calculated as follows: 

PPV = True Positives/ positives+ false positives)



Sensitivity = True positives/(True positives + False Negatives)

(true

The PAD cohort dataset contained 792 PAD cases. There were 295 CLI cases (37%) and 497 (63%) controls (without CLI). The dataset contained 270,336 clinical notes and on average each note contained 395 word tokens. The average age of patients in the dataset was 71 CLEAR MARCH 2018

years. The dataset was comprised of 44% women and 90% of patients were white. The CLI-NLP algorithm had superior PPV (96%), compared to CLI-related billing code (67%); for identification of CLI (p < 0.001). Sensitivity of the two approaches was similar (CLI-NLP 84%; billing codes 88%; p < 0.12). However, the F1score was superior for CLI-NLP (90%) compared to billing codes (76%) (p < 0.001). The CLI-NLP algorithm had higher specificity compared to billing code (CLI-NLP 98%, billing codes 74%; p < 0.001). Clinical diagnosis of CLI is based on the presence of signs and symptoms which are recorded in narrative clinical notes. Billing codes under-performed for identification of CLI while NLP had excellent performance. This is because the NLP system enables automated data extraction from narrative clinical notes of the clinical characteristics used by clinicians to diagnose CLI, which are not expressed by billing codes. Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. Automated identification of cases is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. Clinical notes in EHRs contain documentation of patient-provider encounters, as well as the assessment and plan for management which are not available in billing codes. The NLP technology enables automated data extraction from narrative clinical notes and has played a vital role in meaningful use of EHRs for clinical and translational research. Moreover, the novel PAD-NLP subphenotyping algorithm described here will eventually be linked to clinical decision support systems for prompt implementation of evidence-based management

16


strategies at the point of care. Therefore, automatic identification of CLI cases may contribute to a learning health care system in cardiovascular care thereby bringing innovation to healthcare delivery and responding to the 2017 Learning Healthcare System scientific statement from the American Heart Association.

References [1] I.J. Kullo, T.W. Rooke, Peripheral artery disease, N. Engl. J. Med. 2016 (374) (2016) 861–871. [2] M.D. Gerhard-Herman, H.L. Gornik, C. Barrett, et al., 2016 AHA/ACC guideline on the management of patients with lower extremity peripheral artery disease: a report of the American college of cardiology/American heart association task force on clinical practice guidelines, J. Am. Coll. Cardiol. 69 (11) (2017) e71–e126.

society consensus for the management of peripheral arterial disease (TASC II), J Vasc Surg. (2007) S5–67.

[4] A. Farber, R.T. Eberhardt, The current state of critical limb ischemia: a systematic review, JAMA Surg. 151 (11) (2016) 1070–1077. [5] M.H. Shishehbor, C.J. White, B.H. Gray, et al., Critical limb ischemia: an expert statement, J. Am. Coll. Cardiol. 68 (18) (2016) 2002–2015. [6] P.P. Goodney, L.L. Travis, B.K. Nallamothu, et al., Variation in the use of lower extremity vascular procedures for critical limb ischemia, Circ. Cardiovasc(2012) [7] P.R. Cavanagh, B.A. Lipsky, A.W. Bradbury, G. Botek, Treatment for diabetic footulcers, Lancet 366 (9498) (2005) 1725– 1735.

[3] L. Norgren, W.R. Hiatt, J.A. Dormandy, M.R. Nehler, K.A. Harris, F.G. Fowkes, Inter-

Lean and augmented data learning-addressing the labeled data challenge: Using these techniques, we can address a wider variety of problems, especially those with less historical data. Expect to see more variations of lean and augmented data, as well as different types of learning applied to a broad range of business problem. Probabilistic programming-languages to ease model development: Probabilistic programming languages have the ability to accommodate the uncertain and incomplete information that is so common in the business domain. We will see wider adoption of these languages and expect them to also be applied to deep learning. Lean and augmented data learning-addressing the labelled data challenge: Using these techniques, we can address a wider variety of problems, especially those with less historical data. Expect to see more variations of lean and augmented data, as well as different types of learning applied to a broad range of business problems.

CLEAR MARCH 2018

17


Automatically Adding Sounds to Silent Videos Sandeep Nithyanandan M.Tech Computational Linguistics Government Engineering College, Palakkad sandeepn96@gmail.com

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today. Machine learning is one of the fastest-growing and most exciting fields out there, and deep learning represents its true bleeding edge. Deep Learning is a new area of machine learning research, which has been introduced with the objective of moving machine learning closer to one of its original goals: Artificial Intelligence and it is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. There is a lot of excitement around artificial intelligence, machine learning and deep learning at the moment. Automatically adding sound to silent videos is one of the new application that uses the deep learning technology. In this task, the system must synthesize sounds to match a silent video. The system is trained using 1000 examples of video with sound of a drumstick striking different surfaces and creating different sounds. A deep learning model associates the video frames with a database of pre-rerecorded sounds in order to select a sound to play that best matches what is happening in the scene. The system was then evaluated using a turingtest like a setup where humans had to determine CLEAR MARCH 2018

which video had the real or the fake (synthesized) sounds. The system was then evaluated using a turing-test like a setup where humans had to determine which video had the real or the fake (synthesized) sounds. The algorithm that teaches computers to predict sounds was presented by researchers from MIT’s Computer Science and Artificial Intelligence Laborato(CSAIL). When shown a silent video clip of an object being hit, the algorithm can produce a sound for the hit that is realistic enough to fool human viewers.

How It Works 1) The first step to training a sound-producing algorithm is to give it sounds to study. For which 1,000 videos of an estimated 46,000 sounds that represent various objects being hit, scraped, and prodded with a drumstick is collected. 2) Fed those videos to a deep-learning algorithm that deconstructed the sounds and analyzed their pitch, loudness and other features. 3) Then to predict the sound of a new video, the algorithm looks at the sound properties of each frame of that video, and matches them to the most similar sounds in the database. The result is that the algorithm can accurately simulate the subtleties of different hits, from the staccato taps of a rock to the longer waveforms of rustling ivy. Pitch is no problem either, as it can synthesize hit-sounds ranging from the lowpitched “thuds” of a soft couch to the highpitched “clicks” of a hard wood railing. 18


Predicting Visually Indicated Sounds We can formulate these task as a regression problem – one where the goal is to map a sequence of video frames to a sequence of audio features. The problem is solved using a recurrent neural network that takes colour and motion information as input and predicts the subband envelopes of an audio waveform. Finally, generate a waveform from these sound features. The neural network and synthesis procedure are shown in Figure.

to estimate a corresponding sequence of sound features . These sound features correspond to blocks of the cochleagram which is shown in figure. These regression problem is solved using a recurrent neural network (RNN) that takes image features computed with a convolutional neural network (CNN) as input.

Image Representation In these model it is helpful to represent motion information explicitly using a two-stream approach. While two-stream models often use optical flow, it is challenging to obtain accurate flow estimates due to the presence of fast, nonrigid motion. Instead, here compute space time images for each frame- images whose three channels are gray scale versions of the previous, current, and next frames. This image representation is closely related to 3D video CNNs as derivatives across channels correspond to temporal derivatives.

Sound Prediction Model Here using a recurrent neural network (RNN) with long short-term memory units (LSTM) that takes CNN features as input. To compensate for the difference between the video and audio sampling rates, replicate each CNN feature vector k times, where k = T/N (we use k= 3).

Fig: Train a neural network to map video sequences to sound features. These sound features are subsequently converted into a waveform using either parametric or examplebased synthesis. It represent the images using a convolutional network, and the time series using a recurrent neural network. Alsu show a subsequence of images corresponding to one impact.

Regressing

Sound

Features

Given a sequence of input images we would like CLEAR MARCH 2018

This results in a sequence of CNN features that is the same length as the sequence of audio features. At each timestep of the RNN, it use the current image feature vector to update the vector of hidden variables. Then compute sound features by an affine transformation of the hidden variables. During training, minimize the difference between the predicted and groundtruth predictions at each timestep. To increase robustness of the loss by predicting the square root of the subband envelopes, rather than the envelope values themselves. To make 19


the learning problem easier, uses PCA to project the 42-dimensional feature vector at each timestep down to a 10-dimensional space, and predict this lower-dimensional vector. To evaluate the network, invert the PCA transformation to obtain sound features. For which train the RNN and CNN jointly using stochastic gradient descent. When training from scratch, augmented the data by applying cropping and mirroring transformations to the videos. Two methods for generating a waveform from the predicted sound features. The first is the simple parametric synthesis approach which iteratively imposes the subband envelopes on a sample of white noise. This method is useful for examining what information is captured by the audio features, since it represents a fairly direct conversion from features to sound. However, for the task of generating plausible sounds to a human ear, we find it more effective to impose a strong natural sound prior during conversion from features to waveform. We form a query vector by concatenating the predicted sound features, searching for its nearest neighbor in the training set as measured by distance, and transferring the corresponding waveform.





These technology opening two possible directions for future research. The first is producing realistic sounds from videos, treating sound production as an end in itself. The second direction is to use sound and material interactions as steps toward physical scene understanding.

References: [1] https://medium.com/@vratulmittal/top-15deep-learning-applications-that-will-rule-theworld-in-2018-and-beyond-7c6130c43b01 [2] https://youtu.be/0FW99AQmMc8 [3] Visually Indicated Sounds, Andrew Owens, arXiv:1512.08512v2 [cs.CV] 30 Apr 2016. [4] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia , pages 675–678. ACM, 2014. [5]https://www.engadget.com/2016/06/13/machi nes-can-generate-sound-effects-that-foolhumans/

Hybrid learning models-combining approaches to model uncertainty: Hybrid learning models make it possible to expand the variety of business problems to include deep learning with uncertainty. This can help us achieve better performance and explainability of models, which in turn could encourage more widespread adoption. Expect to see more deep learning methods gain Bayesian equivalents while a combination of probabilistic programming languages start to incorporate deep learning. Automated machine learning (AutoML) - model creation without programming: Developing machine learning models requires a time-consuming and expert-driven workflow, which includes data preparation, feature selection, model or technique selection, training, and tuning. AutoML aims to automate this workflow using a number of different statistical and deep learning techniques.

CLEAR MARCH 2018

20


M.Tech Computational Linguistics Dept. of Computer Science and Engg, Govt. Engg. College, Sreekrishnapuram Palakkad www.simplegroups.in simplequest.in@gmail.com

SIMPLE Groups Students Innovations in Morphology Phonology and Language Engineering

Article Invitation for CLEAR- June-2018 We are inviting thought-provoking articles, interesting dialogues and healthy debates on multifaceted aspects of Computational Linguistics, for the forthcoming issue of CLEAR (Computational Linguistics in Engineering And Research) Journal, publishing on June 2018. The suggested areas of discussion are:

The articles may be sent to the Editor on or before 10th June, 2018 through the email simplequest.in@gmail.com. For more details visit: www.simplegroups.in Editor,

Representative,

CLEAR Journal

SIMPLE Groups

CLEAR MARCH 2018

21


Hello world, This issue of CLEAR journal covered many topics related to the areas like deep learning, artificial intelligence, natural language processing and speech processing. How deep learning and artificial intelligence contribute to add sounds to silent videos is much interesting. The teaching assistance model demonstrates that machine-learning techniques can extract abstract physical concepts and consequently become an integral part of theory- and model-building. It is surprising that the field of natural language processing could capture the attention of medical field too, recently identification of Critical Limb Ischemia has been successfully done by adopting NLP based techniques. It is interesting to note that Google has launched an automatic response system to give proper reply to the received emails inorder to reduce the burden in a busy schedule. However, it is clear that the field of computational linguistics is advancing day by day to capture the real attention in almost every other fields. These articles are made based on the recent works or researches done in the field of computational linguistics. CLEAR is thankful to all who have given their valuable time and effort for contributing their thoughts and ideas . Simple group invites more aspirers in this field. Wish you all the success in your future endeavours‌!!! Bhavya K

CLEAR MARCH 2018

22


CLEAR MARCH 2018

23


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.