Elizabeth Horishny - Student Research and Creativity Forum - Hofstra University by Hofstra University

col 1

col 2

col 3

Romantic Computing Generating poetry with neural networks Elizabeth Horishny Under Advisory of Dr. Simona Doboli at Hofstra University

May 16, 2022

Character-Level Recurrent Neural Network with LSTM Tokens are characters rather than words. I trained models with slight variations in: ❍ hidden layer size ❍ Number of LSTM layers ❍ Length of training All models had the same chunk size, batch size, learning rate

Traditional Language Models probablistically generate meaningful text.

P (w1, ..., wm) =

I think, in the end, it is better to die, And live on the sea, and be what you are– But die, and be what you wish to be. It is a sad reality–but true– That death is what makes us human, And that we are the consequence. But in the end, life is better than death– And that the consequence is life’s loss. I do not pretend to say what’s best, But merely that the world is hell. I say that life is, and ever was, The last of human perfection. But I would not pretend to say it– To soothe my penitents’ fears, That life could not be happiness;

Models

What is a Language Model?

m−1 Y

Generative Pre-Trained Transformer NEO

Experiment

Background

P (wi|w1, ..., wi−1)

chirps 0.2 The bird 0.5 flies 0.25 dies

(1)

i=1

Figure 1: where w : word, m: sequence length, i: ith token.

What is a Neural Network?

Dearest audience! She was a child of Nature; Her very bloom was of the air, And her very sparkles of the eye; And she was the first of those, Who saw the sun rise

Transformer Models GPT models are pre-trained. Meaning, they require far less time and data for training. GPT-2 Vocabulary 50,257 Layers 48 Batch-size 512 Context-size 1024 Parameters 1.5B

GPT-NEO 50,257 n/a n/a 2048 2.7B

GPT-3 50,257 96 3.2M 2048 175B

Generalizations: Definitely an improved version of GPT-2, especially in longer generations.

[267,679,2,93] Generative Pre-Trained Transformer 3

Methods Traditional Neural Networks can not be expected to perform sequential tasks. Language Models must have an additional element of ’remembering’ previous outputs.

Experiment was done with the intention of keeping generation parameters as similar as possible. Max-length was 300 for Char-RNN and 70 words for transformer models. Temperature was 0.7 across all models, except GPT-3, because of repeat bland generations

Dearest audience, I welcome you all To my dark and dreary play Which I have written In hopes that it will Bring you some small measure Of the joy that I have found In its crafting. Please do not be alarmed By the blood and gore Which stains

Figure 2: Simplified depiction of a neural network

Challenges of Text Generation Language models are trained until they adhere to some standards. Generally, language must be: ❍ Grammatical; easy to read ❍ Non-redundant ❍ Focused theme- especially long-term ❍ Sentences flow to each other naturally. [1]

I love you all so much, I wish I could take you with me On all my journeys. You are the stars that light up my sky, The sun that warms my soul, The moon that guides my way. You are everything to me,

Generalizations: Out-competes all other models in terms of coherence, form, and meaning. And, with zero training. Creativity could considered lacking at times. Figure 6: Blue prompts encompass GPT-3 Zero-Shot commands.

Figure 7: Hugging Face models are GPT-2 models.

Coherence and Meter across all poems GRUEN is a formulaic approach to calculating the coherence of text.

Data the models were trained on All models (except GPT-3, at least directly) were either trained or fine-tuned with the entire bibliography of a couple romantic poets English Romanticism: Percy Bysshe Shelley 334 poems Lord Byron 294 poems

In other words, does it pass the Turing Test? Additional Challenges of Poetry: Generated poems must sufficiently ❍ Follow aesthetic form. (rhyme, meter) [2] ❍ Poetic themes, emotive [2] ❍ Heavy use of figurative language [3]

The winged words on which my soul would pierce Into the height of Love’s rare Universe, Are chains of lead around its flight of fire— I pant, I sink, I tremble, I expire!

Recurrent Neural Networks with Long Short-Term Memory Like a traditional neural network, recurrent neural networks (RNN) still have input, output, and the hidden layers in between. But now, output is sent back as additional input. All previous outputs are kept- this gives us context.

She walks in beauty, like the night Of cloudless climes and starry skies; And all that’s best of dark and bright Meet in her aspect and her eyes;

188,883 Lines, 917,634 Words, 5,278,155 Chars, 5,356,670 Bytes Figure 8: Higher Gruen = More likely to pass the Turing test

Results

Figure 4: LSTM unit with 3 inputs and 2 outputs

Dearest audience, mad shadows, Who lose like that gentle heart, and will die That upon the burning vapoured in high, Wrought out of his sea and are his und with power, And in the sea of sunless land to wear The driver, at last displayed the winds. The rapid that oneselv

Figure 9: Lower Deviation = more syllabic similarity between lines

Meter consists of the syllabic content of each line, and how much lines vary within a stanza. Conclusions

Character-Level Recurrent Neural Network

Figure 3: ultra-simplified rnn

Dearest audience,

Dearest audience, The thin the stern doft howling burn, The love who some be trinkled wide thee the stars Comes a which when the delight thy prouctory stars The has own since, for the gaze which talt that self? Have pines unteath spectation atennal day W

❍ ❍ ❍ ❍

Format picked up by all models Language Models are certainly capable of creative text generation. GPT-3 performed best on meter and GRUEN metrics. https://beta.openai.com/playground Much more to explore!

Long Short-Term Memory (LSTM) helps differentiate the most important information in a given point in a sequence. Without LSTM’s, an RNN’s memory will become increasingly saturated. In other words, the RNN remembers too much.

Generalizations: Vocabulary and themes are very Shelley-like. Not real words, but could be considered a creative choice. Meter picked up fairly well. Rhyme not seen often.

W. Zhu and S. Bhat, “Gruen for evaluating linguistic quality of generated text,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 94–108, 2020.

Transformers

Generative Pre-Trained Transformer 2

J. H. Lau, T. Cohn, T. Baldwin, J. Brooke, and A. Hammond, “Deep-speare: A joint neural model of poetic language, meter and rhyme,” 2018.

Transformers process text at once rather than token-by-token. This gives transformer-based models to 1) process text concurrently, 2) build associations between words. Essentially, Transformers have the ability to understand what a pronoun is referring too. [4]

Roses are red, When the mountains are blue; When the stars are purple, And when the clouds are quivering, And when the wind is wild; We feel that our bodies are a thing Immensely fixed on sensation, And that, whatever we feel, Might be already.

Dearest Audience, I Audience! Audie, e’tis a pity that the owlet’s name Should not wear upon me, and such an awful fate!

Non mai ai noi buce trava’i dirge.Oh, no! the owlet’ Generalizations: Far more coherent and striking poetic resemblance. But, long-term generalizations are privy to repetition. Strange random tangents also appear- thematically inconsistent at times.

References

P. Gervás, “Computational modelling of poetry generation,” A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. X. Zhang and M. Lapata, “Chinese poetry generation with recurrent neural networks,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. A. Zugarini, L. Pasqualini, S. Melacci, and M. Maggini, “Generate and revise: Reinforcement learning in neural poetry,” 2021 International Joint Conference on Neural Networks (IJCNN), 2021.

Figure 5: Here, the Transformer’s associate ’tasty’ with grass and ’hungry’ with ’cow’ and ’ate’.

Transformers, explained: Understand the model behind GPT, BERT, and T5. Google Cloud Tech, Sep 2021. – ROM-COM

1/1