Students’ Corner
Smarter, Faster, More Responsible AI
Mr. Abhishekh Pahadi Student SIBM, Bengaluru
articles which human evaluators have difficulty distinguishing from articles written by humans. We'll see why GPT-3 in spite of being such a phenomenal invention is still in experimental phase and an early glimpse in changing the world.
Introduction Preamble Recent work in the field of NLP (Natural Language Processing) has shown dramatic breakthrough thanks to GPT-3; which stands for Generative Pre-trained Transformer, this is the 3rd version and both versions before this worked on same model by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific datasets of tens of thousands of examples. This technology is developed by OpenAI which is US based for profit organization working in field of AI. GPT-3 is an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model. It was pre-trained with almost 570 billion GBs of text. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, questionanswering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, there are few datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web network. Finally, GPT-3 can also generate samples of news
Recent years in AI have shown a trend towards pre-trained language representations in NLP systems, mainly because they are agile, flexible and applied in increasingly taskagnostic ways. Earlier, single-layered representations were learnt using word vectors and fed to task-specific architectures. After that RNN (Recurrent Neural Networks) were used with multiple layers and contextual state representation to form stronger models. Now, pretrained recurrent or transformer language models have been fine-tuned omitting the need for task-specific architectures. However, major limitations to this approach are that while architecture might be task-agnostic, datasets still need to be task-specific to achieve stronger performance on desired tasks. Removing these limitations would be desirable for numerous reasons stated later in this article. Here is why models that are pre-trained with
Fig. 1 : Number of Examples in Context (K)
A-GRAM | 10