Issuu on Google+



Ruby Debugger

RUBY DEBUGGER Graduation Project 2013-2014

1st semester handbook Students (ID) Riham Mohamed Hassan Mohamed Magdy Mohamed Dokmak MennatAllah Mahmoud Mohamed Nuha Khaled Zakaria Yomna Khaled Ahmed Lamiaa Elmorsy Saad

(941) (1133) (1200) (1235) (1281) (1611)

Sarah Hassan Moawad Marwa Ahmed Salem Noha Elsherbiny Yassin Yomna Anwar Tageldin Dina Ossama Ali Ahmed Sultan

(953) (1173) (1234) (1280) (1317)

Supervisors Dr. Hicham G. Elmongui Dr. Riham Mansour Technical Sponsor NEZAL Entertainment Computer and Communication Program – SSP Faculty of Engineering, Alexandria University


Ruby Debugger

TABLE OF CONTENTS PREFACE ............................................................................ III OVERVIEW.......................................................................... 1 CHAPTER ONE: INTRODUCTION .......................................... 2 1.1 WHAT IS RUBY? ................................................................... 2 1.2 WHAT IS A DEBUGGER? ......................................................... 3 1.3 RUBY DEBUGGERS SURVEY ...................................................... 7 1.4 DEBUGGERS SURVEY............................................................ 10 CHAPTER TWO: PREPARATION STAGE ............................... 12 2.1 LEARN RUBY ...................................................................... 12 2.2 PREPARE YOUR ENVIRONMENT .............................................. 14 CHAPTER THREE: RUBY CORE ............................................ 21 3.1 INSTALLING RUBY ................................................................ 21 3.2 HOW TO READ THE SOURCE CODE? ........................................ 22 3.3 THE COMPOSITION OF RUBY ................................................. 25 3.4 EXTENDING RUBY WITH C EXTENSION ..................................... 28 CHAPTER FOUR: MEMORY PROFILER................................. 37 4.1 WHAT IS MEMORY PROFILING? ............................................ 37 4.2 PHASE ONE: OBJECT DUMPING ............................................. 39 4.3 STEP BY STEP TO IMPLEMENT OBJECT DUMPER: ...................... 56 i|Page

Ruby Debugger

CHAPTER FIVE: RUBY’S MEMORY ...................................... 70 5.1 RUBY’S HEAP AND STACK ..................................................... 70 5.2 MEMORY INSPECTION ......................................................... 81 CHAPTER 6: RUBY‘S VIRTUAL MACHINE ............................ 88 6.1 OVERVIEW ........................................................................ 88 6.2 RUBY’S YARV MACHINE...................................................... 95 6.3 METHOD CALLS................................................................ 127 6.4 APPROACH FOR PAUSE/RESUME OF THE YARV MACHINE......... 131 FUTURE WORK ............................................................... 136 REFERENCES ................................................................... 140

ii | P a g e

Ruby Debugger

PREFACE This book is for documenting a graduation project for Computer Engineering Major for our Bachelor’s degree. The purpose of a graduation project is to provide student with the opportunity to apply what we have learned in Computer Engineering study. We have decided to choose a project that would challenge all of our abilities, extend our learning and stretch our potential. We have had two options either to choose a field that we already know and can do but take it to a new challenging level or to work in something new and challenging to discover something we have never done or knew before. After many team discussions we decided to go for a ‘Ruby Debugger’. A Ruby debugger experience would give us a great experience in both coding low-level infrastructure and open source environment. This project would integrate most of the basic concepts we have learned, boost our experience, increase our intellectual curiosity and make us more flexible in future and able to face any challenge in the field. In this book, we will clarify everything we have gone through from the very beginning. The preparations, getting into the Ruby community, source code tracing and understanding and finally the implementation. The biggest challenge we have faced in this project, was the lack of resource for beginners as well as not finding clear answers to our questions. This was acceptable as we have chosen a field that not every computer engineer would work on. That’s why, while writing this book, we have intended to clear all the steps we have taken to reach the resulting output. Through our work, we have managed to learn the Ruby programming language, from which we learned a lot of programming methodologies and basic concepts. We have learned how to deal with a professional source code as Ruby, how to trace, track the code and be flexible with the problems that we face. We also have written C-extensions for Ruby, and have learned how you may extend a source code in such interface. The whole experience was worth it.

iii | P a g e

Ruby Debugger

We express our deep gratitude to our supervisors Dr. Hicham G. Elmongui, Dr. Riham Mansour, Nezal Entertainment CEO Eng. Mohamed Ali and his team for their mentoring, guidance, support and inspiring collaboration. Our thanks are extended to the Ruby source code community who helped in answering our questions and guiding us. We are looking forward to finish this project in June 2014, as this was our goal from the start. The Challenge is ON.

Ruby debugger team,

iv | P a g e

Ruby Debugger

OVERVIEW The book consists of 6 chapters, describing our work on the Ruby debugger project. We start with an introduction about ruby and debuggers in chapter1, then we talk about how did we get prepared for working on that project in chapter 2, then we will give you a brief guide about how to deal with the Ruby's source code in chapter 3. In chapter 4 we will start talking about our first task in three main tasks which is the memory profiler. In chapter 5 we will talk about the stack and the heap and how did we use them to implement the memory profiler. In chapter 6 we will have a general talk about Ruby's virtual machine which is one of the main things that could possibly help us in the three tracks. Meanwhile, the focus now is on both code profiling and line debugging.



Ruby Debugger

CHAPTER ONE: INTRODUCTION In this chapter we are going to introduce the ruby language, what is a debugger and why we choose to make a ruby debugger.

1.1 What is Ruby? Ruby is a dynamic programming language with a complex but expressive grammar and a core class library with rich and powerful API. Its creator, Yukihiro “Matz” Matsumoto blended parts of his favorite language (Perl, Smalltalk, Eiffel, Ada, and Lisp) to form a new language that balanced functional programming with imperative programming. Ruby is a pure object-oriented language that is written in C, but it is also suitable for procedural and functional programming styles. It included powerful meta-programming capabilities. Matz on Ruby Yukihiro Matsumoto, known as Matz to the English-speaking Ruby community, is the creator of Ruby and the author of Ruby in a Nutshell (O'Reilly) (which has been updated and expanded into the present book).He says: I knew many languages before I created Ruby, but I was never fully satisfied with them. They were uglier, tougher, more complex, or more simple than I expected. I wanted to create my own language that satisfied me, as a programmer. I knew a lot about the language's target audience: myself. To my surprise, many programmers all over the world feel very much like I do. They feel happy when they discover and program in Ruby.


Throughout the development of the Ruby language, I've focused my energies on making programming faster and easier. All features in Ruby, including object-oriented features, are designed to work as ordinary programmers (ex: me) expect them to work. Most programmers feel it is elegant, easy to use, and a pleasure to program. Matz's guiding philosophy for the design of Ruby is summarized in an oft-quoted remark of his: “Ruby is designed to make programmers happy.”


Ruby Debugger

Ruby is easy to learn. Everyday tasks are simple to code, and once you've done them, they are easy to maintain and grow. Apparently difficult things often turn out to have been not difficult after all. Ruby follows the Principle of Least Surprise, things work the way you would expect them to, with very few special cases or exceptions. And that really does make a difference when you're programming. Ruby's first version was published in December 1995 .Many newsgroups and online forums were created for the sole purpose of talking about Ruby and how to improve the programming language in the next version. Although there is a small core of computer programmers who are the main developers of Ruby, many people help in its development. New versions are always released, until the latest version 2.1 that has been published in December 2013. For many reasons Ruby is becoming the favorite for many programmers and developers, Ruby's fans are increasing every day in a way that makes you curious, but what does Ruby need? Many developers think that Ruby will be like a dream with a proper debugger. Before Knowing what is intended by the word “proper debugger� let’s have a small introduction about debuggers.

1.2 What is a debugger? A debugger is a tool used by programmers to trace, test and find errors in a program. It becomes more helpful when programs become longer and more complicated. It saves you hours of searching for a small error that could be a semicolon or a bracket. A crash happens when the program cannot normally continue because of a programming bug, for example, the program might have tried to use an instruction not available on the current version of the CPU or attempted to access unavailable or protected memory. When the program crashes or reaches a preset condition, the debugger typically shows the location in the original code. Most debuggers also are capable of running programs in a step-by-step mode, besides stopping on specific points. They also can often modify the state of programs while they are running. Some debuggers operate on a single specific language while others can handle multiple languages transparently.


Ruby Debugger

Debuggers have many interesting features and here is a list of some of them:

A. Breakpoints: A breakpoint is a signal that tells the debugger to temporarily suspend execution of your program at a certain point. When execution is suspended at a breakpoint, your program is said to be in break mode. Entering break mode does not terminate or end the execution of your program. Execution can be resumed at any time. Breakpoints provide a powerful tool that enables you to suspend execution where and when you need to. Rather than stepping through your code line-by-line or instruction-by-instruction, you can allow your program to run until it hits a breakpoint, then start to debug.This speeds up the debugging process, without it, it would be virtually impossible to debug a large program. There are many types of breakpoints: 

Simple breakpoint: that stops at a particular point in your code.

Complex breakpoint: that stops when the program has passed the specified point a number of times or stops at the specified point only when an expression is true.

Code breakpoints: halt program execution when a particular instruction is executed. Code breakpoints are supported directly in hardware on most machines and are fast.

Data breakpoints: halt program execution when a variable is referenced. Data breakpoints are slow.

Method breakpoints: act in response to the program entering or exiting a particular method. They let you target your debugging sessions by method you wish to investigate, rather than by line number. Method breakpoints let you follow the program flow at the method level as well as check entry and exit conditions.

Exception breakpoints: are triggered when the specified exception is thrown.

Line breakpoints: which require specific source references.

Conditional breakpoints: are breakpoints with condition and your thread will only stop at specified line if condition matches instead of just stopping on that line like in case of line breakpoint. 4|Page

Ruby Debugger

B. Watchpoints: In its simplest form, a watchpoint halts a program when a specified register or variable is changed. The watchpoint halts the program at the next statement or machine instruction after the one that triggered the watchpoint. There are two types of watchpoints:  Simple watchpoint: that stops when a specified variable changes  Complex watchpoint: that stops when the variable has changed a specified number of times or stops when the variable is set to a specified value.

C. Single step mode: Single-step debugging allows executing your code one line at a time and find a location in your code where you want to pause execution and start single-stepping.  Step over: Executes one step in the program, at the current level. If the program calls a function, that function is executed all at once, rather than by stepping through it.  Step into: Stops at the first line of the first function called on this line.  Step out: Runs the program until the current function returns.

D. Change values at runtime: When you are debugging, sometimes, you would want to change the value of a variable. For example a variable is null and you know you are going to encounter the dreaded “NullPointerException”. If you could just set the correct value to it you can proceed with the debugging and worry about the null check later.

E. View process memory: It allows you to view the content of all memory blocks allocated in the heap of the process that you select. This tool can be useful for developers that need to trace memory leaks in their software.


Ruby Debugger

F. Detect uninitialized variable: Uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software. So it needs to be detected.

G. Reverse debugging: Also known as "historical debugging". These debuggers make it possible to step a program's execution backwards in time. Various debuggers include this feature. As we have seen debuggers have many functionalities that makes programming an easier process. We have said before that Ruby needs a proper debugger but, does this mean that Ruby doesn't have any debuggers? The answer is no, there are few debuggers for ruby that are not efficient enough to be used , Ruby itself have some instructions with debugging functionalities but they are still limited on few functionalities that's why Ruby needs a proper debugger; a debugger that's highly efficient and with enough functionalities. And that what we decided to do. The first step to solve a problem is to define it, to do so we made a small survey about the recent Ruby debuggers to know what is bad about them and another small survey about the most popular debuggers in general to know what is good about them. Our aim from these surveys is to get a list of features for our debugger, in other words we wanted to know what exactly we need to do.


Ruby Debugger

1.3 Ruby debuggers survey In this section we will talk about the existing Ruby debuggers and what are their pros and cons.

A. RubyMine: RubyMine is a commercial IDE for Ruby and Ruby on Rails built on JetBrains' IntelliJ IDEA platform. RubyMine provides intelligent code completion for Ruby and Ruby on Rails code, on-the-fly code analysis and refactoring support for both plain Ruby projects and web applications built with Ruby on Rails. It offers a query processor, symbol resolver, expression interpreter, and debug support interface at its top level. It also offers more sophisticated functions such as running a program step by step, stopping (breaking) at some event or specified instruction by means of a breakpoint, and tracking the values of variables. Also Rubymine have the ability to modify program state while it is running.

Features of Rubymine debugger: ● ● ● ● ● ● ● ● ● ● ●

Click And Follow Git Compare Switch Between File and Spec Run Tests Right From The Spec Git Annotate Remote Debugging Explore frames Print stack examine a suspended program evaluate expressions set watches.

There is a negative feedback was given for using Rubymine debugger as it not free which means it is not an open source , a little bit slower, Regex file open and warp highlighted text . 7|Page

Ruby Debugger

B. Ruby Gems & Ruby Gems debugger: Rubygems is a software that allows you to easily download, install and use software packages of the Ruby programming language on your system. These software packages are called gems and contain a Ruby application or library.The gem command allows you to interact with Ruby Gems. In order to check the gems installed type in the command “gem search” in the terminal. Debugger is a fast implementation of the standard Ruby debugger debug.rb .The gem for debugging is called debugger 1.6.3 , to get the source code and have a look at it type in the command “gem unpack debugger” . How to install? To install the ruby debugger first check if it is already installed using the gem search command, if it is not installed then type in the command “gem install debugger”. How to use? At the beginning of the ruby program add the line require 'debug'. Example: require 'debug' x=5 puts x*5 Now to start debugging type in the command “ruby -rdebug (your filename)”. To know all the commands that will help in debugging and how to use them type “help”. Commands available include:      

Break “line_number” for inserting breakpoint wat “expression” setting a watch point to some expression n going over a line Run until program ends or hits a breakpoint p “expression” evaluate the expression and print its value Many other commands that will be shown once you type the “help” command in the terminal. 8|Page

Ruby Debugger

Ruby also have some classical yet efficient tools which are the profiling tools. Profiling is a runtime analysis that gathers information about memory usage, function calls, elapsed time in functions, etc. There are different methods to collect information from a running program:  Instrumentation way that adds program instructions.  Event-based way that adds hooks to trap program events.  Sampling way that interrupts the program to look inside its memory space. Depending on what you are looking for, the profiler outputs different results: call-graph, object allocation, etc.

C. Ruby’s default profiler: Ruby has a built-in module called “Profiler” that records function calls. It is an eventbased profiler that uses the “Kernel #set_trace_func” method to trap all the function calls. In order to use this profiler, you must run ruby with” –r” profile option that will require the “profile.rb” file.The output and the performance of this module aren’t satisfying enough.

D. Sampling with gperftools: Sampling profilers give an advantage over event-based profilers like ruby-prof as it can be used in a production environment without changing anything in your configuration and with a small overhead.” Perftools.rb” is one of them. The output leads to a file containing the captured data. A readable representation can be built with the command “pprof.rb“ The major drawback of using a sampling method is that we only see what happens when the profiler interrupts the program.

Profiling your memory usage in Ruby: If you want to have a little bit more elaborated statistics, you can use external frameworks like memprof, ruby-prof or GC:Module. But all of them have problems with specific versions of Ruby or doesn’t have support to properly get information about the memory usage.


Ruby Debugger

E. GC::Module: The GC profiler provides access to information on GC runs including time, length and object space size.  GC::Profiler.clear  GC::Profiler.disable  GC::Profiler.enable 

F. memprof: The default option only shows a count of the objects instantiated. The framework has more options to dump individual objects and see how many bytes have you used.

G. ruby-prof gem: Ruby community provides a gem called “ruby-prof”.Its a C extension and it produces outputs with many different formats that made it faster and richer than “profile.rb”.To use the whole features of ruby-prof and memory statistics we need a patched version of Ruby interpreter, which is very ugly and time consuming. Once installed, you can use ruby-prof gem and get a nice PDF graph of calls. The call graph always shows 50% of the time used to execute the code. Finally, we haven’t found any support for memory and time profiling in the IDEs, so the best option for profiling right now is go to the line of code that you want to test and call to “get_memory_usage” before and after it to see how much memory is used to execute this line, which is not efficient any more specially in case of big programs.

1.4 Debuggers survey In this survey we gathered some information about the most successful debuggers in general not only for Ruby so that we can benefit from their successful features in our debugger. 10 | P a g e

Ruby Debugger

A. GDB: GNU Debugger which is called GDB is the most popular debugger for UNIX systems to debug C and C++ Programs. GNU Debugger helps you in finding out if a core dump happened then what statement or expression did the program crash on, find if an error occurs while executing a function, what line of the program contains the call to that function, and what are the parameters, see what are the values of program variables at a particular point during execution of the program, find what is the result of a particular expression in a program. GDB allows you to do things like run the program up to a certain point then stop and print out the values of certain variables at that point, or step through the program one line at a time and print out the values of each variable after executing each line. GDB uses a simple command line interface. Even though GDB can help you in finding out memory leakage related bugs but it is not a tool to detect memory leakages.

B. Visual studio: The Microsoft Visual Studio Debugger is a debugger that ships along with all versions of Microsoft Visual Studio. More advanced features of the most recent versions of this debugger include: full symbol and source integration, edit and continue support, enabling source code to be modified and recompiled on-the-fly without having to exit the current running program or restart the debugger ,remote machine debugging, attaching and detaching to and from processes, tracing into DLL code when symbolic debugger information is present, standard as well as more advanced breakpoint features, including conditional, address, data breakpoints, many ways of viewing program state and data, including multiple watch windows, threads, call stack, and modules. The main short coming of the Visual Studio Debugger is its inability to trace into Kernelmode code. However, this is possible using a free Visual DDK extension.

11 | P a g e

Ruby Debugger



If I had eight hours to chop down a tree, I'd spend six hours sharpening my ax ~Abraham Lincoln

The project preparation is a very important phase because it is the basis for a successful project. In this phase, the necessary basics and knowledge will be defined as well as knowing the important tools that would help us. A good project preparation will ensure that the project can be realized with the lowest possible lead-time and investment. In this chapter, we share our preparation for the project through learning Ruby and illustrating the tools we used.

2.1 Learn Ruby There are many online resources to learn Ruby in few days.” The Ruby Programming Language” by David Flanagan and Yukihiro Matsumoto is one of the best documentations for ruby programming language .It you don't enjoy reading formal books , you may enjoy 'Why's (Poignant) Guide to Ruby “ by “Why The Lucky Stiff “. It is as same as the first book, but represented by cartoons stories and a way that suits everyone. After you learn Ruby, It is always helpful to follow the ruby communities to get involved and informed with the latest updates, questions and solutions to common problems. Here is a list of some online communities that will help you a lot:

A. Ruby-lang: It got many things that you would need for both developing ruby and getting into the source code:  Downloads: to get the source code and compilation instructions.  Get Started: gives a track of tutorials and resources for learning ruby.  Explore a new world (Documentation, books, Libraries, Success Stories).

12 | P a g e

Ruby Debugger

 Community (Mailing list, User Groups, Blogs, Ruby core, Issue tracking): Subscribing to this community would help you to get all news and updates about ruby. It is a good way to participate and get more detailed information. URL:

B. Ruby-doc: This site contains a documentation for the ruby programming language. It is available in many languages (English, French…etc.).You would find:  Getting Started: gives a track of tutorials and resources to learn ruby with.  Core API: Documentation and explanation for Ruby core API.  Standard Library API: Documentation and explanation for Ruby Standard Library API.  How to get started.  Ruby Books: List all books for ruby. URL: After learning the Ruby language, it is time to get into Ruby source code. The following list is for some useful resources that could help you in understanding the source code. Most of these resources are for previous versions of ruby , so some of their details won't work on ruby 2.1 but the main concept is most of the time the same.

A. Ruby Hacking Guide: The definitive resource for people who want to learn the C programming details of how Ruby works internally. Intended for C hackers. It was just recently translated into English from the original Japanese. URL:

B. Ruby Under a Microscope: A Book by Pat Shaughnessy. The book is a visual, conceptual explanation of how Ruby works intended for Ruby developers who don't know C. It contains some C code snippets and explanations, however, for people who are interested in the C code. 13 | P a g e

Ruby Debugger

C. Pat Shaughnessy Blog: This blog contains many articles about ruby MRI source code and its internal conceptual details. URL:

D. Ruby MRI code walk tour: It is a very high level cruise through the MRI code base. URL:

2.2 Prepare your environment In this section we are going to give you some tools and describe how to prepare and install them. These tools helped us to start the long journey of the Ruby debugger. The purpose of this section is to save your time and effort from surfing the Internet trying to find these tools or installing them. The operating system used was Linux (Ubuntu 12.10). So all the commands are on Linux terminal. Tools used throughout the whole process

A. SVN Subversion It is a version control or revision control system distributed as free software under the Apache license. Developers use Subversion to maintain current and historical versions of files such as source code files or any other files. Follow the followingng to use Subversion: 1. You must have putty client software.Command to install putty: sudo apt-get install putty

14 | P a g e

Ruby Debugger

2. Make sure that svn is installed on the system by typing : which svn This command will return the path containing svn. 3. Go inside this directory: cd /svn

4. Create another directory inside the /svn directory. sudo svnadmin create /svn/repository

5. Restart the server. With Appache server type:

sudo service apache2 restart

6. Make the 3 directories that are the main directories involved in any repository (Not necessary to create all 3 it depends on the project itself)

svn mkdir –parents /sv/repository/{trunk,tags,branches} –m “Directory creation”

15 | P a g e

Ruby Debugger

Note: You can remove the braces and execute this command 3 times according for each of them trunk, tags and branches. After that you're ready to work with svn.

Rregular commands involved in using svn 1. checkout / co: Creating a Working Copy of the project.

svn checkout <Path of the repository on the server>


svn co <Path of the repository on the server>

2. add: schedule the file, directory, or symbolic link repository when you next commit.

to be added to the

svn add <Path to certain file or folder>

3. commit: commits the changes that happened to the repository (Files added , Files deleted , Files modified/changed etc..).

svn commit For more about svn check the references. 16 | P a g e

Ruby Debugger

B. Eclipse-cdt for C/C++ A good interface that enables you to debug Ruby MRI using GDB debugger which is a C debugger. It's actually very easy to install and use. First Install eclipse-cdt for C/C++ type:

sudo apt-get install eclipse-cdt

Note: make sure that the gcc compiler is already installed on the system by typing

gcc -version

If there is a response then the gcc compiler is successfully configured on your system but if not, then please visiting: “”. Then import the MRI repository to eclipse Steps to import MRI repository to eclipse: 1. Before importing the ruby repository into eclipse for debugging , we must download the repository from online (we Started with Ruby 2.0 then converted to Ruby2.1-preview2).


2. Open eclipse and go to file → new → MakeFile project from existing code. 17 | P a g e

Ruby Debugger

3. Choose the Linux GCC compiler and browse for the file you've just downloaded from the Ruby repository. 4. Go to run → Run configurations 5. For the C/C++ section, browse for the../ruby-2.1.0-preview2/ruby file which will execute your ruby program. 6. On your Desktop or elsewhere, create a ruby file with the extension “.rb” ex: grad.rb. 7. In eclipse click on the Arguments tab. 8. Inside the text area, copy and paste the path that contains the file grad.rb. 9. Click on run and you're ready to debug the source code.

C. Doxygen Doxygen is a documentation generator, a tool for writing software reference documentation. The documentation is written within code, and is thus relatively easy to keep up to date. Doxygen can cross reference documentation and code so that the reader of a document can easily refer to the actual code. Benefits of using Doxygen:  It can have a huge impact on your understanding to the source code and will help you locate functions or structs easily through its good web interface.  It documents the code by drawing the call graph of each function which enables you to know how to reach the initial call of some function (Ex:Where does the parser begin to parse ruby code ).

18 | P a g e

Ruby Debugger

D. Sublime: Sublime text is a sophisticated text editor for code, markup and prose. You'll love the slick user interface, extraordinary features and amazing performance. Although it is just a text editor, but it has features that help you get through the whole source code with just few commands.

Sublime Features: Examples on the main features we used in sublime. 1. Go to Anything: Use “go to anything “to open files with only few keystrokes, and instantly jump to symbols, lines or words. Triggered with Ctrl+P, it is possible to:  Type part of a file name to open it.  Type @ to jump to symbols, #to search within the file, and : to go to a line number. These shortcuts can be combined, so tp@rf may take you to the function read_file in the file text_parser.y . Similarly, tp:100 would take you to line 100 of the same file.

2. Go to Symbol : Use “go to Symbol “to instantly jump to symbols in the current file by pressing (Ctrl+R) keys. 3. Go to Symbol in Project: Use “go to Symbol in Project” to instantly jump to symbols in the project folder by pressing (Ctrl+Shift + R). 19 | P a g e

Ruby Debugger

4. Find in Files: Use “find in Files” to search for a keyword in all files in the project folder by pressing (Ctrl+Shift + F).

Sublime got many other tips and tricks that ease going through the source code in a very fast efficient way.

Sublime Installation: Sublime Text is available for OS X, Windows and Linux. It's cross platform feature makes it easier to use it all the time.  Windows: you can download sublime directly from the official website ’ ”.  Linux : to download it from the terminal , you may use the following commands : sudo add-apt-repository ppa:webupd8team/sublime-text-3 sudo apt-get update sudo apt-get install sublime-text-installer

20 | P a g e

Ruby Debugger

CHAPTER THREE: RUBY CORE The ruby Source code is written in C, there are ruby versions written in java “JRuby” and other versions implemented by ruby itself “Rubinius”, the c version is referred to as Ruby MRI or Matz's Ruby Interpreter (also called CRuby). Since, Ruby MRI is the reference implementation of the Ruby programming language we decided to build our work on it. In this chapter, we are going to have a short journey through the ruby source code, how to install it, how to dig through it, give a general map for its contents and finally how to add/change in it.

3.1 Installing ruby As we stated before we are working on the Linux environment so here are the steps to get ruby source code on your Linux machine.

Ruby 2.1 installation 1. Download the TAR file from the following link : 2. Open the terminal (Ctrl+alt+t). 3. write the following commands :

cd Downloads tar –xvf ruby-2.1.0-preview2.tar.gz cd ruby-2.1.0-preview2 ./configure Make sudo make install

21 | P a g e

Ruby Debugger

3.2 How to read the source code? The two main principles that you need to work on when dealing with the source code are: • Setting your goal. • Visualizing your goal. • You need to know what you are searching for or what you want to do, after that you need to know how to do it. Let the first step be exploring the contents of the source code, open files, try to read them and know what they do. When you feel familiar enough with these files, go to the next step which is defining as much black boxes as you can , these black boxes are important files in the source code for sure, but they don't help you with your goal so, you are collapsing your work area into fewer files. To do so, you need to analyze your code, there are two methods for analysis: A. Static analysis. B. Dynamic analysis.

A. Static analysis: Static analysis is to read and analyze the source code without running the program, and here are some tips that would guide you in case you do static analysis:  Names are important. Source code analysis is actually an analysis of names. File names, function names, variable names, type names, member names ...etc. Names will help you know what does an entity do or represent.  Know about coding rules beforehand to some extent. For example, in C language, extern function often uses prefix to distinguish the type of functions. And in object-oriented programs, function names sometimes contain the information about where they belong to in prefixes, and it becomes valuable information (e.g. rb_str_length). 22 | P a g e

Ruby Debugger

 Read files bottom up. This will help you follow the functions sequence.  Read file documentations. Sometimes documentations will help you understand more about the code.  Read the directory structure. Look at in what policy the directories are divided. Try to grasp the overview such as how the program is structured, and what the parts are.  Read the file structure. While browsing (the names of) the functions, also look at the policy of how the files are divided. You should pay attention to the file names because they are like comments whose lifetime is very long.  Investigate abbreviations. As you encounter ambiguous abbreviations, make a list of them and investigate each of them as early as possible.  Understand data structure. Try to find and understand as much as possible the data structures used or implemented in the code, it’s better to start with header files. Try to imagine the data structure from the file names. For example, if you find frame.h, it would probably be the stack frame definition. Also, you can understand many things from the member names of a struct and their types. For example, if you find the member next, which points to its own type, then it will be a linked list. Similarly, when you find members such as parent, children, and sibling, then it must be a tree structure. When prev, it will be a stack.  Understand the calling relationship between functions. After names, the next most important thing to understand is the relationships between functions. So, make a "call-graph".

23 | P a g e

Ruby Debugger

 Read functions. What is important when reading functions is not “what to read” but “what not to read”

There are many tools that can help you in the static analysis, we used Doxygen and Sublime Text tools described before in chapter 2. You can choose any other tool but, when you choose a tool it should be equipped with at least the following features: • List up the function names contained in a file. • Find the location from a function name or a variable name (It’s more preferable if you can jump to the location).

B. Dynamic analysis: Dynamic analysis is to watch the actual behavior using tools like a debugger. And here also some tips that may help you in the dynamic analysis process.  Following the behavior using the debugger. If you want to see the path of code execution and the data structures produced as a result, it’s quicker to look at the result by running the program actually than to emulate the behavior in your brain. In order to do so easily, use the debugger. We used the gdb and eclipse debuggers.  Print everywhere. There is a word called “printf debugging”. This method also works for analysis other than debugging. If you are watching the history of one variable, for example, it may be easier to understand to look at the dump of the result of the print statements embedded, than to track the variable with a debugger.  Modifying the code and running it. Say for example, in the place where it’s not easy to understand its behavior, just make a small change in some part of the code or a particular parameter and then re-run the program. Naturally it would change the behavior, thus you would be able to infer the meaning of the code from it.

24 | P a g e

Ruby Debugger

It’s better to start studying a program by dynamic analysis. That is because what you can see there is the “fact”. The results from static analysis, due to the fact of not running the program actually, may be “prediction” to a greater or lesser extent. If you want to know the truth, you should start from watching the fact.

3.3 The composition of Ruby Now it is time to start to read the source code, but what is the thing we should do first? It is looking over the directory structure. In most cases, the directory structure, meaning the source tree, directly indicate the module structure of the program. The files at the top level can be categorized into six:  Documents  The Ruby source code itself.  The tool to build Ruby.  Standard extension libraries.  Standard Ruby libraries.  Others. The source code and the build tool are obviously important. Aside from them, there are also some useful files such as:  ChangeLog: This file records of changes on Ruby. This is very important when investigating the reason of a certain change. 

README.EXT: How to create an extension library is described, but in the course of it, things relating to the implementation of ruby itself are also written.

25 | P a g e

Ruby Debugger

This is a listing for the most important files and what they refer to: Ruby Language Core: class.c

: Class relating API


: Exception relating API


: Evaluator


: Garbage collector


: Reserved word table


: Object system


: Parser


: Constants, global variables, class variables


: The main macros and prototypes of ruby



: The prototypes of C API of ruby. Intern seems to be an abbreviation of internal, but the functions written here can be used from extension libraries. : The header file containing the macros relating to signals


: The definitions relating to the syntax tree nodes


: The definitions of the structs to express the context of the evaluator

Class Libraries: array.c bignum.c compar.c dir.c enum.c file.c hash.c io.c marshal.c math.c numeric.c

: : : : : : : : : : :

Class Array Class Bignum Module Comparable Class Dir Module Enumerable Class File Class Hash (Its actual body is st.c) Class IO Module Marshal Module Math Class Numeric,Integer,Fixnum,Float 26 | P a g e

Ruby Debugger

pack.c prec.c process.c random.c range.c re.c signal.c sprintf.c string.c struct.c time.c

: : : : : : : : : : :

Array#pack,String#unpack Module Precision Module Process Kernel#srand(),rand() Class Range Class Regexp (Its actual body is regex.c) Module Signal Ruby-specific sprintf() Class String Class Struct Class Time

The Ruby class libraries are basically implemented in the completely same way as the ordinary Ruby extension libraries. It means that these libraries are also examples of how to write an extension library. After reading, analyzing and understanding the code, it’s time for implementation. As we stated before our goal is to make a Ruby debugger with the following main features:  Line Debugging.  Time Profiling.  Memory Profiling. We decided to start with the memory profiler and the first question that faced us was “How will we add our code to the source code?” 1. Write C-extensions. 2. Change directly in the source code and recompile. For the first stage of our work and as being beginners we adopted the extensions method as we found that it's a cleaner method. We won't change something in the source code that might cause an error. Also, by adopting this method the user may use our debugger by simply downloading it, he doesn't have to use our modified source code version too. In the following section we will describe what a C extension is and how to write it. 27 | P a g e

Ruby Debugger

3.4 Extending Ruby with C extension It is easy to extend Ruby with new features by writing code in Ruby. But every now and then you need to interface to things at a lower level. Extending ruby at low level has two advantages:  The possibilities are endless. You are interfacing the source code of the MRI at low level, so you can do almost anything.  Speed. Ruby is slow (as compared to compiled languages like C). It gets the job done, but sometimes it takes quite long time doing it. Sometimes it is necessary to speed things up a bit. To write a C extension you need, at a bare minimum two things: 1. An extconf.rb file, this file is used by Ruby to generate the Makefile that is used to compile the extension. 2. The source file for the extension You may write a c extension for two reasons:  To do something that can be done with Ruby code but you need it to run faster.  To add extra features to ruby and then require this extension in your Ruby program and use the new features. Before exploring how an extension can be written let's explore first how ruby objects are represented in C and how to retrieve C data types from ruby and vice versa.

28 | P a g e

Ruby Debugger

A. Data types conversion between C and Ruby : Data in Ruby is represented by the C type `VALUE'. Each VALUE data has its data-type.To retrieve C data from a VALUE, you need to: 1.

Identify the VALUE's data type.


Convert the VALUE into C data.

The macro TYPE() defined in ruby.h shows the data type of the VALUE. TYPE() returns the constant number T_XXXX . To handle data types, your code will look something like this:

switch(TYPE(obj)) { case T_FIXNUM: /* process Fixnum */ break; case T_STRING: /*process String*/ break; case T_ARRAY: /*process Array*/ break; default: /*raise exception*/ rb_raise(rb_eTypeError,”not valid value”); break; }

29 | P a g e

Ruby Debugger

There are many macros in the ruby.h file to convert VALUE to C data type. For example:  FIX2INT() or FIX2LONG(): convert T_FIXNUM to a C integer.  NUM2INT() and NUM2LONG(): convert any Ruby numbers into C integers.  NUM2DBL() : can be used to retrieve the double float value.  StringValuePtr(): to get a char* from a VALUE.

B. Extension Configuration: “extconf “and “mkmf “are the parts of the Ruby extension build system that generate the header files and Makefile(s) needed to compile the C part of the program. A file named extconf.rb is generally placed in the ext subdirectory of the project, and extconf requires mkmf to do all of the heavy lifting for it. An example extconf.rb looks like:

require “mkmf” if(find_header(“header file name”,”header_file_path”))&& find_header(“another header file” ,”its_path”)) then create makefile(“Extension_name”) else puts “No support avaible” end

find_header(): lets you include any header file to be able to use the functions and structs in this header file. The first parameter is the header file name. For example: vm_core.h. The second parameter is the path of the header file on your computer. Save this file as extconf.rb file in a folder in the ext folder in the MRI folder and when you write your c extension we will come again to this file to make the extension. 30 | P a g e

Ruby Debugger

C. The C File : Here we write the C implementation of the extension. Let’s begin with a small ext to print “Hello Ruby”. An example of the c file: #include “ruby.h” void Init_myext() { printf(“Hello Ruby from C!\n”); }

When the ruby interpreter loads this c extension it will call the function beginning with Init_ xxx so any extension must have Init_xxx function.After init_ the name of the extension in the extconf.rb file must be written.Save this file as myext.c in the same folder containing the extconf.rb file. Now your extension is ready to be compiled.

Compiling the extension: There are two ways to compile an extension:

A. The first way : 1. Open the terminal. 2. Go to the folder containing the extension files. 3. Then type : ruby extconf.rb make

31 | P a g e

Ruby Debugger

4. Create a test file: To test your extension you can create a test file in the same folder. An example of the test file: require “myext”

Save it as test.rb. 5. Run the test.rb from the terminal: ruby test.rb

Output: Hello Ruby from C!

B. The second way : 1. Add the extension in the ext folder in the source code. 2. Inside the extension, create a new file called “depend” which represents the dependencies folder that the ruby will use to check for the header files. 3. Add the following line inside extconf.rb $INCFLAGS << " -I$(topdir) -I$(top_srcdir)"

"ex: ruby -I<directory>" and send it to the $INCFLAGS environment variable which is set in the make file and the topdir and topsrcdir environment variables after that is automatically set during ruby compilation 4. Recompile Ruby from the terminal. 5. Use the ruby file in your source code path to run ruby from the terminal. 32 | P a g e

Ruby Debugger

For example: "~/Desktop/rubydbg/trunk/src/ruby-2.0.0-p247/ruby test.rb" instead of “ruby test.rb” ,because the PATHS included in the make file are now set to the paths of the ruby folder Note: Each time you want to recompile the extension , don't use the old steps : “ruby extconf.rb” : because it will generate a make file with the default paths which are in the /usr/ folder. Always recompile the ruby folder. C APIs to create Modules, Classes, and Methods You can create methods, modules, classes, ….etc. in the c extension that can be used by the ruby program that will require this extension.There are many APIs to create classes, modules, structs, constants.....etc we will show some of them here.

A. Create new Ruby modules: VALUE rb_define_module(const char* name);

Where:  name : is the name of the module in Ruby. Ex: VALUE m_module = rb_define_module(“Example”);

33 | P a g e

Ruby Debugger

B. Create new Ruby classes: VALUE rb_define_class(module,const char* name,VALUE super);

Where:  module: the name of the module containing the class.(optional)  name : is the name of the module in ruby  super: the superclass can be one of the pre-defined types (rb_cObject, rb_cArray, etc) or a class that has been defined in this module. Ex: VALUE r_class = rb_define_class(m_module, "rclass", rb_cObject);

C. Create new Method for a class: rb_define_method(class,”method_name”,implementation,number_of_args)

Where:  class: the class containing the method.  “method name”: the method name used in ruby program  implementation: the c implementation of the function which will be outside the init_ function.  number_of_args: the number of arguments the function takes. 34 | P a g e

Ruby Debugger

Ex: rb_define_method(r_class, "rmethod", c_method, 1) ;

Here is another simple extension: myext2.c #include “ruby.h” void c_method(VALUE self, VALUE arg) { char* str= StringValuePtr(arg); printf(“str”); } Void Init_myext2() { VALUE m_module = rb_define_module(“rmodule”); VALUE r_class = rb_define_class(m_module,”rclass”,rb_cObject); Rb_define_method(r_class,”rmethod”,c_method,1); }

Follow the same steps to compile the extension.

35 | P a g e

Ruby Debugger

To test the extension here is a sample test file: testb.rb require “myext2” include “rmodule” str =”hello” c = c.rmethod str rclass.rmethod str

After running the test file, the word hello will be printed twice. For more information about the APIs used to extend Ruby you can refer to the README.EXT file in the Ruby source code.

36 | P a g e

Ruby Debugger

CHAPTER FOUR: MEMORY PROFILER 4.1 What is Memory Profiling? In software engineering, profiling is a form of dynamic program analysis that measures:  The space (memory)  Time complexity of a program  The usage of particular instructions  Frequency and duration of function calls. The most common use of profiling information is to aid program optimization. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler).A number of different techniques may be used by profilers, such as event-based, statistical, instrumented, and simulation methods. Memory Profiler is a powerful tool for finding memory leaks and optimizing the memory usage in programs. With the help of the profiling guides, the automatic memory analyzer, and specialized trackers, you can make sure that your program has no memory or resource leaks, and that the memory usage is as optimal as possible.

Memory Profiler is needed for:      

Getting proper statistics for memory usage in a ruby program. Dumping individual objects. Knowing the number of bytes reserved by each object. Getting a pointer to the memory portion allocated for each object. Getting CPU time statistics. Getting the sequence of method calls (i.e., which method calls the currently executed method).  Getting the instance variable table and methods for each class.  Getting the parameters of a method.

37 | P a g e

Ruby Debugger

We have explored many Memory Profilers for Ruby but, we found that they have many problems as they do not get the exact memory size of each object in the program, they do not specify the type of each object, do not get the scope of the object. So we decided to solve these problems in our Memory Profiler by making an extension to ruby source code which gets this information. Our output is a table as follows: Objects Each object in the user program

Type Class of the object

Memory Size Scope Memory allocation to Class or method which it each object belongs to

The implementation of this output is divided into two phases: Phase one: is to take each of this ruby objects and get the class of this ruby object, the memory allocated to each of those ruby objects and the class or method to which it belongs. Phase two: is to get all the ruby objects created or used in the user program either being stored in the heap or the stack and output them in the same sequence in which they are created.

38 | P a g e

Ruby Debugger

4.2 Phase one: Object dumping 4.2.1 Object Details in the source code: One of the main concerns of Memory Profiler is to have the capability of dumping an object and viewing its contents .When large hierarchies are involved in the Ruby code, it is helpful to understand the memory layout of ruby objects. We need an algorithm that dumps an object layout in a useful manner. This algorithm would be used by the all objects dumper to track memory and detect memory Leak. In this section, we take you on a journey of developing an Object Dumper algorithm. We assume we have a reference of Ruby Object and it’s required to return its layout, contents and memory related details. As a first step, we need to understand how Ruby Object structures and functionality implemented in source code that may help in extracting the required information. An exploration into the source code was needed. Important files, structs, functions and implementations were explored as a preparation to Ruby Dumper extension implementation. The target file is mainly ruby.h, but will also briefly explore other files and ruby APIs.

“VALUE” and object struct: In Ruby, the body of an object is expressed by a struct and always handled via a pointer. A different struct type is used for each class, but the pointer type will always be “VALUE”.

The definition of VALUE in ruby.

typedef unsigned long VALUE;

39 | P a g e

Ruby Debugger

The structs, on the other hand, have several variations, a different struct is used based on the class of the object.

struct RObject struct RClass struct RFloat struct RString struct RArray struct RRegexp struct RHash struct RFile struct RData struct RStruct struct RBignum

: Ordinary object , all things for which none following class object : Class object : Floating point numbers : String : Array : Regular expression : Hash table : IO, File, Socket, etc… : All the classes defined at C level,except the ones mentioned above : Ruby’s `Struct` class : Big integers

Let’s look at the definition of a few object structs. Examples of object struct from ruby.h: struct RString { struct RBasic basic; union { struct { long len; char *ptr; union{ long capa; VALUE shared; }aux; }heap; char ary[RSTRING_EMBED_LEN_MAX +1]; }as; }; 40 | P a g e

Ruby Debugger

Another struct: struct RArray { struct RBasic basic; union { struct { long len; union{ long capa; VALUE shared; }aux; Const VALUE *ptr; }heap; const ary[RARRAY_EMBED_LEN_MAX]; }as; };


Another important point to mention is that most object structs start with a member `basic` of type `struct RBasic`. As a result, if you cast this `VALUE` to `struct RBasic*`, you will be able to access the content of `basic`, regardless of the type of struct pointed to by `VALUE`.

41 | P a g e

Ruby Debugger

Because it is purposefully designed this way, `struct RBasic` must contain very important information for Ruby objects. Here is the definition for `struct RBasic`:

struct RBasic { VALUE flags; const VALUE klass; };

‘flags’ are multipurpose flags, mostly used to register the struct type (for instance `struct RObject`). The type flags are named “T_xxxx", and can be obtained from a `VALUE` using the macro `TYPE()`. Here is an example:

VALUE str; str = rb_str_new(); /* creates a Ruby string ( it’s a struct is RString) */ TYPE(str); /*The return value is T_STRING */

The all flags are named as `T_xxxx`, like `T_STRING` for `struct RString` and `T_ARRAY` for `struct RArray`. They are very straightforwardly corresponded to the type names. The other member of `struct RBasic`, `klass`, contains the class this object belongs to. As the `klass` member is of type `VALUE`, what is stored is (a pointer to) a Ruby object. In short, it is a class object.

42 | P a g e

Ruby Debugger

A. Struct RClass: In Ruby, classes exist as objects during the execution. Of course. So there must be a struct for class objects. That struct is `struct RClass`. Its struct type flag is `T_CLASS`. As classes and modules are very similar, there is no need to differentiate their content. Thatâ&#x20AC;&#x2122;s why modules also use the `struct RClass` struct, and are differentiated by the `T_MODULE` struct flag.

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl; };

First, letâ&#x20AC;&#x2122;s focus on the `m_tbl` (Method TaBLe) member. `struct st_table` is an hashtable used everywhere in `ruby`. basically, it is a table mapping names to objects. In the case of `m_tbl`, it keeps the correspondence between the name (`ID`) of the methods possessed by this class and the methods entity itself.

43 | P a g e

Ruby Debugger

you can get the super class of any class using MACRO RCLASS_SUPER(c) , As c is a `VALUE`, itâ&#x20AC;&#x2122;s (a pointer to) the class object of the superclass. In Ruby there is only one class that has no superclass (the root class): `Object`.

Thereâ&#x20AC;&#x2122;s one thing to be careful about. As `struct RClass` is the struct of a class object, its instance variable table is for the class object itself. In Ruby programs, it corresponds to something like the following:

class C @ivar = "content" end

Only `T_OBJECT`, `T_MODULE` or `T_CLASS` have an `iv_tbl` member. Some functionalities are in the source code that may help in fetching the table contents as: 'rb_obj_instance_variables' in variable.c returns the instances variables of the passed object in an array.

VALUE obj_instance_variables(VALUE obj) { VALUE ary; ary = rb_ary_new(); rb_ivar_foreach(obj,ivar_i,ary); return ary; } 44 | P a g e

Ruby Debugger

It uses rb_ivar_foreach that takes object, and the function to apply on each entry in the table and a data structure to save it in.

B. Struct RString: “struct RString” is the struct for the instances of the String class and its subclasses. #define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1)) struct RString { struct RBasic basic; union { struct { long len; char *ptr; union { long capa; VALUE shared; } aux; } heap; char ary[RSTRING_EMBED_LEN_MAX + 1]; } as; }; ptr is a pointer to the string, and `len` the length of that string.'capa' is the capacity of the string.

C. Struct RArray: #define RARRAY_EMBED_LEN_MAX 3 struct RArray { struct RBasic basic; union { struct { 45 | P a g e

Ruby Debugger

long len; union { long capa; VALUE shared; } aux; const VALUE *ptr; } heap; const VALUE ary[RARRAY_EMBED_LEN_MAX]; } as; };

Unlike RString, ptr points to the content of the array, and len is its length. aux is exactly the same as in struct RString. aux.capa is the “real” length of the memory pointed by ptr , and if ptr is shared, aux.shared stores the shared original array object.

D. Struct RRegex: struct RRegexp { struct RBasic basic; struct re_pattern_buffer *ptr; const VALUE src; unsigned long usecnt; };

“ptr” is the compiled regular expression.

46 | P a g e

Ruby Debugger

E. Struct RHash: struct RHash { struct RBasic basic; struct st_table *ntbl; int iter_lev; const VALUE ifnone; };

/* possibly 0 */

Itâ&#x20AC;&#x2122;s a wrapper for `struct st_table`. `st_table` will be detailed later on. `ifnone` is the value when a key does not have an associated value, its default is `nil`. `iter_lev` is to make the hashtable reentrant (multithread safe).

F. Struct RFile: struct RFile { struct RBasic basic; struct rb_io_t *fptr; };

Where rb_io_t is a struct defined in io.h :

typedef struct rb_io_t { int fd; /* file descriptor */ FILE *stdio_file; /* stdio ptr for read/write if available */ int mode; /* mode flags: FMODE_XXXs */ rb_pid_t pid; /* child's pid (for pipes) */ int lineno; /* number of lines read */ VALUE pathv; /* pathname for file */ void (*finalize)(struct rb_io_t*,int); /* finalize proc */ 47 | P a g e

Ruby Debugger

rb_io_buffer_t wbuf, rbuf; VALUE tied_io_for_writing; /* * enc enc2 read action write action * NULL NULL force_encoding(default_external) write the byte sequence of str * e1 NULL force_encoding(e1) convert str.encoding to e1 * e1 e2 convert from e2 to e1 convert str.encoding to e2 */ struct rb_io_enc_t { rb_encoding *enc; rb_encoding *enc2; int ecflags; VALUE ecopts; } encs; rb_econv_t *readconv; rb_io_buffer_t cbuf; rb_econv_t *writeconv; VALUE writeconv_asciicompat; int writeconv_pre_ecflags; VALUE writeconv_pre_ecopts; int writeconv_initialized; VALUE write_lock; } rb_io_t;

All RFile details is defined in the io, as it is not used a lot externally.

48 | P a g e

Ruby Debugger

G. Struct RData: “struct RData” has a different tenor from what we saw before. It is the struct for implementation of extension libraries. Of course structs for classes created in extension libraries are necessary, but as the types of these structs depend on the created class, it’s impossible to know their size or struct in advance. That’s why a “struct for managing a pointer to a user defined struct” has been created on ruby’s side to manage this. This struct is struct RData.

struct RData { struct RBasic basic; void (*dmark)(void*); void (*dfree)(void*); void *data; };

“data” is a pointer to the user defined struct, “dfree” is the function used to free that user defined struct, and “dmark” is the function to do “mark” of the mark and sweep. Because explaining “struct RData” is still too complicated, for the time being let’s just look at its representation (the following figure).

Other structs are straightforward has only Rbasic and their main contents. 49 | P a g e

Ruby Debugger

VALUE can also not be a pointer. The 6 cases for which VALUE is not a pointer are the following:

A. Small integers B. Symbols C. “true” D. “false” E. “nil” F. “Qundef” A. Small integers: All data are objects in Ruby, thus integers are also objects. But since there are so many kind of integer objects, if each of them is expressed as a struct, it would risk slowing down execution significantly. For example, when incrementing from 0 to 50000, we would hesitate to create 50000 objects for only that purpose. The integers have 1 bit for the sign, and 30 bits for the integer part. Integers in this range will belong to the `Fixnum` class and the other integers will belong to the `Bignum` class. Also, to convert `int` or `long` to `VALUE`, we can use macros like `INT2NUM`. Any conversion macro `XXXX2XXXX` with a name containing `NUM` can manage both `Fixnum` and `Bignum`. For example if `INT2NUM` will convert both `Fixnum` and `Bignum` to `int`. If the number can’t fit in an `int`, an exception will be raised, so there is no need to check the value range.

B. Symbols: What are symbols? As this question is quite troublesome to answer, let’s start with the reasons why symbols were necessary. In the first place, there’s a type named ID used inside Ruby. Here it is : ID defined in ruby.h typedef unsigned long ID;

50 | P a g e

Ruby Debugger

This ID is a number having a one-to-one association with a string. However, it’s not possible to have an association between all strings in this world and numerical values. It is limited to the one to one relationships inside one Ruby process Symbol objects are used a lot, especially as keys for hash tables. That’s why Symbol, like Fixnum, was made embedded in VALUE.

C. “true false nil” These three are Ruby special objects. `true` and `false` represent the boolean values. `nil` is an object used to denote that there is no object. Their values at the C level are defined like this in ruby.h:

#define Qfalse 0 #define Qtrue 2 #define Qnil 4

/* Ruby's false */ /* Ruby's true */ /* Ruby's nil */

This time it’s even numbers, but as 0 or 2 can’t be used by pointers, they can’t overlap with other VALUE. It’s because usually the first block of virtual memory is not allocated, to make the programs dereferencing a NULL pointer crash. And as Qfalse is 0, it can also be used as false at C level. In practice, in Ruby, when a function returns a boolean value, it’s often made to return an int or VALUE, and returns Qtrue/Qfalse. For Qnil, there is a macro dedicated to check if a VALUE is Qnil or not, NIL_P().NIL_P() in ruby.h #define NIL_P(v) ((VALUE)(v) == Qnil)

51 | P a g e

Ruby Debugger

The name ending with p is a notation coming from Lisp denoting that it is a function returning a boolean value. In other words, NIL_P means “is the argument nil?” It seems the “p” character comes from “predicate.” This naming rule is used at many different places in Ruby.

D. “Qundef” In ruby.h #define Qundef 6 /*undefined value for placeholder */ This value is used to express an undefined value in the interpreter. It can’t (must not) be found at all at the Ruby level.

“st_table” st_table has already appeared several times as a method table and an instance table. st_table is a hash table. It is a data structure that records one-to-one relations, for example, a variable name and its value, or a function name and its body, etc. The following is the data type of st_table in st.h:

struct st_table { const struct st_hash_type *type; st_index_t num_bins; unsigned int entries_packed : 1; #ifdef __GNUC__ /* * C spec says, 52 | P a g e

Ruby Debugger

* A bit-field shall have a type that is a qualified or unqualified * version of _Bool, signed int, unsigned int, or some other * implementation-defined type. It is implementation-defined whether * atomic types are permitted. * In short, long and long long bit-field are implementation-defined * feature. Therefore we want to supress a warning explicitly. */ __extension__ #endif st_index_t num_entries : ST_INDEX_BITS - 1; union { struct { struct st_table_entry **bins; struct st_table_entry *head, *tail; } big; struct { struct st_packed_entry *entries; st_index_t real_entries; } packed; } as; };

struct st_table_entry in st.c : typedef struct st_table_entry st_table_entry; struct st_table_entry { st_index_t hash; st_data_t key; st_data_t record; st_table_entry *next; st_table_entry *fore, *back; }; 53 | P a g e

Ruby Debugger

st_table is the main table structure. st_table_entry is a holder that stores one value. st_table_entry contains a member called `next` which of course is used to make st_table_entry into a linked list. This is the chain part of the chaining method.

To get st_table size, st_memsize(const st_table *table) in st.c would be used as it takes the st_table and returns its size.

4.2.2 Using API for dumping objects: A. Objspace The objspace library extends the ObjectSpace module and adds several methods to get internal statistic information about object/memory management. This library is for (memory) profiler developers and MRI developers who need to know about MRI memory usage. The ObjectSpace module contains a number of routines that interact with the garbage collection facility and allow you to traverse all living objects with an iterator. ObjectSpace also provides support for object finalizers, procs that will be called when a specific object is about to be destroyed by garbage collection. Here we give a brief explanation to objectspace library and the useful functions in it to our memory profiler: 54 | P a g e

Ruby Debugger

 static VALUE allocation_class_path(VALUE self, VALUE obj) If we have a class which have some objects in it and we need to know to which class an object belongs, this function will be useful as it return the class of the given object (helps in getting the scope).  static VALUE allocation_method_id(VALUE self, VALUE obj) Returns the method identifier for the given object.  static VALUE memsize_of_m(VALUE self, VALUE obj ) This function gets the size of each object in memory.  static VALUE memsize_of_all_m(int argc, VALUE *argv, VALUE self) If we want to get the memory size of all living object in a class for example this function returns the total memory size of instances of the given class.  static VALUE count_objects_size(int argc, VALUE *argv, VALUE os) for any type such as hash ,class this function counts objects size (in bytes) for each of those types  static VALUE reachable_objects_from(VALUE self, VALUE obj) This method returns all reachable objects from ‘obj’. If “obj” has two or more references to the same object “x”, then returned array only includes one “x” object. If “obj” is a non-markable (non-heap management) object such as true, false, nil, symbols and Fixnums then it simply returns nil. If obj has references to an internal object,then it returns instances of ObjectSpace::InternalObjectWrapper class. This object contains a reference to an internal object and you can check the type of internal object with “type” method. If “obj” is instance of ObjectSpace::InternalObjectWrapper class, then this method returns all reachable object from an internal object, which is pointed by obj. With this method, you can find memory leaks. When ObjectSpace reachable_objects_from returns an object with references to an internal object, an instance of this class is returned. You can use the type method to check the type of the internal object. 55 | P a g e

Ruby Debugger

4.3 Step By Step to Implement Object Dumper: This part we will discuss the steps from scratch to reach our final output which is memory profiler. We organized this section to be a good guide for beginners in this filed to be more familiar with every details. We are hoping to help you as much as possible. This section will consist of Object Dumper extension with the output to each one and the explanations of them in details to understand it. When we start thinking in memory profilers as dumping ruby objects (assuming that I already have the object- see the phase two part), we divide our target in three main parts:  Getting the type of each object  Getting the related or important informations to object  Getting the memory size of objects As we see in chapter three how to create a simple extension we will use the same structure here but with different functionality according to our purpose. We assume that we worked on the extention’s name is Object Dumper with c file name (myext.c) and test file name is (test.rb) and a configuration file name (extconf.rb). You can follow the steps to create an extension as see in chapter three but with the name of file that we use and enjoy copy and paste our code.

56 | P a g e

Ruby Debugger

4.3.1 Getting the data type of each object: The purpose is to get the type of the given object as shown in this table: Input


“memory profiler”








[10 , “Something” ,102.5]


To get the type you have two ways either from macro “TYPE” or from macro “CLASS_OF”.

Using “TYPE”: Go to myext.c in your extension and type the following code: Static VALUE SimpleDebugger(VALUE self , VALUE obj) { type = TYPE(obj); //gets integer number for float’s type switch(type){ case T_STRING: printf(“String \n”); break; case T_ARRAY: printf(“ARRAY \n”); break; case T_COMPLEX: printf(“Complex number\n”); break; 57 | P a g e

Ruby Debugger

case T_MODULE: printf(“module\n”); break; /*list All types with their handling*/ . . . } Return Qnil; } void Init_myext() { VALUE klass; klass = rb_define_class("Debug", rb_cObject); rb_define_method(klass, "SimpleDebugger", SimpleDebugger, 1); rb_define_method(rb_cObject,"SimpleDebugger", SimpleDebugger, 1); } We use “TYPE” which is found in the “ruby.h” file in the source code .This function get the class number of each object then decide this number belong to which class like : T_Hash ,T_ Array, ..,etc. To more understand the internal definition of “TYPE” function in source code, here is the definition: #define TYPE(x) rb_type ((VALUE)(x))

58 | P a g e

Ruby Debugger

Using “CLASS_OF”: CLASS_OF returns the name of the class that represents the type of the object. You would call it in SimpleDebugger function in your extension like this : Static VALUE SimpleDebugger(VALUE self , VALUE obj) { VALUE L =CLASS_OF(obj); return L ; }

The Ruby test file for the 2 previous extensions would be like this:

require_relative 'myext' debug = str = "memory profiler" debug.SimpleDebugger str puts Object.SimpleDebugger str Output in both extension: String We use CLASS_OF which is found in the “ruby.h” file in the source code .This function get the class belong to each object directly. Here is the definition of CLASS_OF in source code: #define CLASS_OF (v) rb_class_of ((VALUE)(v))

59 | P a g e

Ruby Debugger

4.3.2 Dumping each data type: The next stage is to get the important information related to each object. There is a way to get the information by calling macros like in String and Regex. Here is the simple example of how to get the length of the string and pointer to the string using macros already defined in the source code. This method also applies to REGEX. In SimpleDebugger function in myext.c: Static VALUE SimpleDebugger(VALUE self , VALUE obj) { int type = TYPE(obj); int s_len ; switch (type) { case T_STRING: printf("String \n"); s_len = RSTRING_LEN(obj); printf("length of string = %d\n", s_len); ptr = RSTRING_PTR(obj); printf("pointer on string = 0x%lx -> \"%s\"\n", (VALUE)ptr, ptr);

break; } Return Qnil; } In test.rb: require_relative 'myext' debug = str = "memory profiler" debug.SimpleDebugger str puts Object.SimpleDebugger str 60 | P a g e

Ruby Debugger

Output: String length of string = 15 pointer on string = 0xef59d0 -> "Memory Profiler"

We use macros implemented in ruby.h file in source code related to string like:  RSTRING_LEN(obj) This macro gets the length of the input string from the RSTRING struct:

#define RSTRING_LEN(str) \ (!(RBASIC(str)->flags & RSTRING_NOEMBED) ? \ RSTRING_EMBED_LEN(str) : \ RSTRING(str)->as.heap.len)

 RSTRING_PTR(obj) This macro gets the pointer to the location of the string in the memory from the RSTRING struct

#define RSTRING_PTR(str) \ (!(RBASIC(str)->flags & RSTRING_NOEMBED) ? \ RSTRING(str)->as.ary : \ RSTRING(str)->as.heap.ptr

61 | P a g e

Ruby Debugger

There is another method to get important information by accessing the information directly from structs like in the case of RCOMPLEX, RRATIONAL and RHASH. Here is the example of complex number and how to get its important information. In SimpleDebugger function in myext.c: Static VALUE SimpleDebugger(VALUE self , VALUE obj) { int type = TYPE(obj); int s_len ; switch (type) { case T_COMPLEX: real=FIX2INT(RCOMPLEX(obj)->real); printf("real is %d\n",real); imag=FIX2INT(RCOMPLEX(obj)->imag); printf("imaginary is %d\n",imag); printf(“Complex”); break; } return Qnil; }

In test.rb: require_relative 'myext' debug = str = Complex(2,3) puts Object.SimpleDebugger str

62 | P a g e

Ruby Debugger

Output : real is 2 imaginary is 3 Complex

In the case of complex numbers this is how the struct looks like in the source code in ruby.h: struct RComplex { struct RBasic basic; VALUE real; VALUE imag; };

To access the “VALUE real” in example we use “->” as you see in the code “RCOMPLEX(obj)->imag” As you notice from the code we use the “FIX2INT” which converts its input to integer value to display it correctly. When we are accessing data from structs we noticed that some structs contains some unions. A union is like a structure in which all members are stored at the same address. Members of a union can only be accessed one at a time. The union data type was invented to prevent memory fragmentation. The union data type prevents fragmentation by creating a standard size for certain data. Just like with structures, the members of unions can be accessed with the . and -> operators.

63 | P a g e

Ruby Debugger

Here is the example of the Arrays. This extension will get the capacity, length and a pointer to its location in memory. This point take us to another method for accessing information as in the case of Array, object, class and Bignum. Static VALUE SimpleDebugger(VALUE self , VALUE obj) { int type = TYPE(obj); int s_len ; switch (type) { case T_ARRAY: s=INT2FIX(RARRAY(obj)->as.heap.aux.capa); capacity=FIX2INT(s); printf("capacity of the array is %d\n",capacity); n=INT2FIX(RARRAY(obj)->as.heap.aux.shared); sharing=FIX2INT(n); printf("shared value of the array is %d\n",sharing); f=INT2FIX(RARRAY(obj)->as.heap.len); length=FIX2INT(f); printf("length of the array is %d\n",length); pointer=RARRAY(obj)->as.heap.ptr; printf("pointer on array = 0x%lx ", (VALUE)pointer); break; } Return Qnil; }

In test.rb: require_relative 'myext' debug = str = [2 ,3, 32333,"memory" , 35768899,77777,98787] puts Object.SimpleDebugger str

64 | P a g e

Ruby Debugger

Output: capacity of the array is 7 shared value of the array is 7 length of the array is 7 pointer on array = 0x25b23a0 size of Array in memory = 56 Array

In the case of array this is how the struct with unions looks like in the source code: struct RArray { struct RBasic basic; union { struct { long len; union { long capa; VALUE shared; } aux; VALUE *ptr; } heap; VALUE ary[RARRAY_EMBED_LEN_MAX]; } as; };

65 | P a g e

Ruby Debugger

Another important extension is the part related to the class. Actually when we are in the trip to get the information related to each type, class was a big problem with us because of its instance variable with is stored in the instance_variable_table. We apply many ways to access this table and get its instances but finally we success to get them as you will show in this extension: Static VALUE SimpleDebugger(VALUE self , VALUE obj) { int type = TYPE(obj); int s_len ; switch (type) { case T_CLASS: printf("pointer of the Class is = 0x%lx",(INT2FIX(RCLASS(obj)->ptr))); printf("pointer of the Class method table is = 0x%lx ",(INT2FIX(RCLASS(obj)->m_tbl))); printf("pointer of the Class index table is = 0x%lx",(INT2FIX(RCLASS(obj)->iv_index_tbl))); printf("pointer of the Class type is = 0x%lx", (INT2FIX(RCLASS(obj)->m_tbl->type))); printf("pointer of the Class no_of_bins is = 0x%lx",(INT2FIX(RCLASS(obj)->m_tbl->num_bins))); printf("pointer of the Class is = 0x%lx", (INT2FIX(RCLASS(obj)->m_tbl->num_entries))); break; } /*Return the instance variable table content to the class*/ return rb_obj_instance_variables(obj); }} 66 | P a g e

Ruby Debugger

In test.rb: require_relative 'myext' require 'objspace' class BigNumbers end class Numbers < BigNumbers @x = 2; @y = 4; def getno end end puts Object.SimpleDebugger Numbers Output: pointer of the Class is = 0xff8126465881 pointer of the Class method table is = 0xff8126465981 pointer of the Class index table is = 0x1 pointer of the Class type is = 0xff812352cde1 pointer of the Class no_of_bins is = 0x25 pointer of the Class is = 0x3 @x @y

67 | P a g e

Ruby Debugger

4.3.3 Getting Memory Size: In this part, our main concern is getting the memory size used by each object. In order to get the memory size we need to use functions already semi-implemented in the “objectspace.c” file. As we were using Ruby 2.0 source code. When we tried the normal method to call objspace’s functions from extension we faced problem that this file dos not have a header file which lead to can call this function. The next extension will get the memory size of Hash. But the same way in all other types and that will be illustrated in our complete extension. In ext.c, SimpleDebugger function: static VALUE SimpleDebugger(VALUE self, VALUE obj) { int size =0; size += st_memsize(RHASH(obj)->ntbl); printf("size of Hash in memory = %d\n",size); return Qnil; } In test.rb require_relative 'myext' debug = str = str [“memory”] = 10 str [“profiler”] = 20 puts Object.SimpleDebugger str

68 | P a g e

Ruby Debugger

Output: size of Hash in memory = 192

In order to get the size, firstly we need to include the “st.h” file because we need macro “st_memsize” which is defined in it. Using this macro enables us to get the size of any object. In our case of Hash, we first access the index table which content the keys and values of the hash and then get size and send this table to the size macro.

In Ruby 2.1, Objspace API was integrated with the source code more , such as these implementations were moved to gc.c .Now, we can call the function directly from our extension that gets the size for all types. The function implemented in gc.c is:

rb_obj_memsize_of(VALUE obj)

We called this function directly from the extension without including any files.

69 | P a g e

Ruby Debugger

CHAPTER FIVE: RUBY’S MEMORY In this chapter we are going to have a closer look on the Ruby's heap and stack, how they are implemented, how to access them and how we used them in our target to make a memory profiler.

5.1 Ruby’s Heap and Stack 5.1.1 Ruby objects: Almost everything in Ruby is an object. Everything – strings, variables, classes and even procedures and the AST. For instance, we have the following structs in Ruby: RFloat – represents a float value in the language. RString – represents a string value in the language. RClass – represents a class in the language. …etc All those types are actually child type of a single parent type: RBasic. This is how RClass and RFloat are defined: struct RClass { struct RBasic basic; struct st_table *iv_tbl; struct st_table *m_tbl; VALUE super; }; struct RFloat { struct RBasic basic; double value; };

70 | P a g e

Ruby Debugger

Each type has an RBasic field as the first field in the struct. This means that each type can essentially be casted to RBasic. So far so good â&#x20AC;&#x201C; this is actually a fairly standard way to implement inheritance in C. All those types can be casted to an RVALUE struct, which is defined roughly as follows: typedef struct RVALUE { union { struct { VALUE flags; /* always 0 for freed obj */ struct RVALUE *next; } free; struct RBasic basic; struct RObject object; struct RClass klass; struct RFloat flonum; struct RString string; struct RArray array; struct RRegexp regexp; struct RHash hash; struct RData data; struct RTypedData typeddata; struct RStruct rstruct; struct RBignum bignum; struct RFile file; struct RNode node; struct RMatch match; struct RRational rational; struct RComplex complex; } as; #ifdef GC_DEBUG const char *file; int line; #endif } RVALUE;

71 | P a g e

Ruby Debugger

This is a big union. An RVALUE can be casted to an RBasic, RClass, RFloat, RString, etc. They’re all the same size.

5.1.2 The Heap: A heap is actually just a big array of RVALUE objects. Every time a new object is created it is added as RVALUE to a data structure called heap. Ruby does not have only one heap, at the beginning it creates a heap of free slots of certain limited size and each time a new object is created it is put in one of the free slots in the heap.Some slots in the heap are empty. A slot r is considered empty if == 0. All the free slots from different heaps form a free list. The free list variable points to the beginning of the free list. When the heap is filled another heap is created which is bigger than the previous heap y a factor called GC_HEAP_GROWTH_FACTOR which is defined to e equal 1.8. So, the size of each newly created heap is bigger than the previous one y 1.8. When is the heap filled? For efficient use of memory ruby always recycle the chunks of memory that is carrying unused objects with a process called garbage collection. If after garbage collection is applied on the heap, the heap still has no empty slots then the heap is completely filled and a new heap is created (actually many heaps are created at a time) So let's talk about garbage collection operation.

5.1.3 Garbage Collection: As mentioned when the heap has no empty slots, which happens very frequently, Ruby runs its “garbage collection” (GC) code. The garbage collector’s job is to find which of these RValue’s are no longer being referenced by your program and can be recycled and reused for some other value. Here’s how it works, at a high level.

72 | P a g e

Ruby Debugger

The objects pointed to by global variables and the objects on the stack of a language are surely necessary. And objects pointed to by instance variables of these objects are also necessary. Furthermore, the objects that are reachable by following links from these objects are also necessary. These necessary objects are all objects which can be reached recursively via links from the “surely necessary objects” as the start points. They are called live objects. This is depicted in the following figure:

Figure: live objects

Ruby uses the mark and sweep strategy of garbage collection First, put “marks” on the root objects. Setting them as the start points, put “marks” on all reachable objects. This is the mark phase. At the moment when there’s not any reachable object left, check all objects in the object pool, all objects that have not marked are “swept” into a single linked list using the “next” pointer in each RValue structure.

73 | P a g e

Ruby Debugger

Once all the RValues in the system are marked, the remaining RValues are “swept” into a single linked list (free list) using the “next” pointer in each RValue structure as shown in next figure, the marked slots has the letter M. If after The mark phase no slots were find to sweep, new heaps are created.

Figure: Marking Heap

5.1.4 Stack: A general C program has the following parts in the memory space: 1. The text area: where the code lies. 2. A place for static and global variables. 3. The machine stack: Arguments and local variables of functions are piling up in the machine stack. 4. The heap: the place allocated by `malloc()`. In this section we are interested in the machine stack, so let’s have a closer look at it. Machine Stack: The machine stack is a memory area that is managed by the stack data structure properties. The stack consists of a number of frames. One stack frame corresponds to one function call. Each time calling a procedure, the information which is necessary to execute the procedure such as the local variable space and the place to return is stored in a stack frame and it is pushed on the stack. When returning from a procedure, the frame which is on the top of the stack is popped and the state is returned to the previous method.

74 | P a g e

Ruby Debugger

For a typical procedural language (c-language) what is changing during the execution is only the stack, on the contrary, the program remains unchanged wherever it is. The execution of Ruby is also basically nothing but chained calls of methods which are procedures. C has only one stack and Ruby in its early versions had seven stacks, by simple arithmetic, the executing image of Ruby is at least seven times more complicated than C. But it is actually not seven times at all, itâ&#x20AC;&#x2122;s at least twenty times more complicated. Stack Pointer ruby_frame

Stack Frame Type struct FRAME

Description The records of method calls.


struct SCOPE

The local variable scope.


struct BLOACK

The block scope.


struct iter



Whether or not the current FRAME is an iterator. The class to define methods on.



The class nesting information.

As we can see from the table, each stack has a different pointer name and different frame name that somehow describes it function.

75 | P a g e

Ruby Debugger

The basic stack is the stack consisting of frames of type FRAME and its function is to record method calls but, the question is why does ruby need more than one stack? A simple answer is that Ruby has many capabilities that c doesn't have and to achieve these capabilities Ruby needed more stacks. For example, Ruby has something called closures; a closure is a function or reference to a function together with a referencing environmentâ&#x20AC;&#x201D;a table storing a reference to each of the non-local variables of that function. A closure allows a function to access those non-local variables even when invoked outside its immediate lexical scope. Example: def counter(): x=0 def increment(y): nonlocal x x += y print(x) return increment

In the previous example the function increment uses the non-local variable x, now in case increment(y) is called inside counter(), counter is still in the stack as it didn't return yet. But what would happen if increment (y) was called outside counter ()? How would it get the value of x resulted from the last execution of counter, if counter wasn't in the stack. This is one of the reasons why Ruby needed more stacks to keep track of some values after it becomes unavailable in the machine stack. When the Ruby version 1.9 was released , it appeared that major changes has occurred concerning the Ruby core code, the main change was the introduction of the YARV machine.

76 | P a g e

Ruby Debugger

“YARV (Yet Another Ruby VM) is a bytecode interpreter that was developed for the Ruby programming language by Koichi Sasada. The goal of the project was to greatly reduce the execution time of Ruby programs.” After the introduction of the YARV the implementation of the stack changed completely but with the main concepts retained. Now, we will talk about the YARV's internal stack and what's the difference between it and the Ruby's stack. YARV is a stack-oriented virtual machine, it uses a stack internally to keep track of intermediate values, arguments and return values. But alongside YARV’s internal stack, Ruby also keeps track of your Ruby program’s call stack: which methods called which other methods, functions, blocks, lambdas, etc. Ruby program’s call stack has the same structure of a stack as we described before, this means it consists of frames, these frames are called rb_control_frame_t and defined in the vm_core.c file as the following struct:

typedef struct rb_control_frame_struct { VALUE *pc; /* cfp[0] */ VALUE *sp; /* cfp[1] */ rb_iseq_t *iseq; /* cfp[2] */ VALUE flag; /* cfp[3] */ VALUE self; /* cfp[4] / block[0] */ VALUE klass; /* cfp[5] / block[1] */ VALUE *ep; /* cfp[6] / block[2] */ rb_iseq_t *block_iseq; /* cfp[7] / block[3] */ VALUE proc; /* cfp[8] / block[4] */ const rb_method_entry_t *me; /* cfp[9] */ #if VM_DEBUG_BP_CHECK VALUE *bp_check; /* cfp[10] */ #endif } rb_control_frame_t;

77 | P a g e

Ruby Debugger

The Ruby stack represents the path YARV has taken through your Ruby program and its current location. The CFP pointer indicates the “current frame pointer.” Each stack frame in your Ruby program stack contains, in turn, a different value for the self, PC and SP registers, we will explain more about them shortly. The type field in each rb_control_frame_t structure indicates what type of code is running at this level in your Ruby call stack. As Ruby calls the methods, blocks or other structures in your program the type might be set to METHOD, BLOCK or one of a few other values. To simplify things lets imagine how a small ruby program consisting of only “puts 2+2” is implemented

After parsing, tokenization and compilation steps, the Ruby instructions are converted into YARV instructions as seen in the right block. The rb_control_frame_t struct has some variables , the most important now are : a variable called pc (program counter) which is a pointer to the YARV instruction currently executed, sp (stack pointer) which is a pointer to the location of the top of the YARV internal stack ,self and a type field that we explained before. Now what happens really when the YARV instruction starts to be executed? In the previous example we can see that YARV following the instructions 78 | P a g e

Ruby Debugger

pushed the selfvalue then the first object which is the number 2, then the second object which is also number 2 then perform the opt_plus operation by popping the top of the stack (the argument), popping once more (the receiver) performing the addition operation and pushing the result back again on the top of the stack. Now, the result is popped from the stack and sent to the puts function which will print it on your screen! Now, this was a one-line Ruby program , as the program gets longer, more structures from which Ruby gets its essence get involved, and instead of having seven stacks as we described before Ruby now have one stack (Ruby call stack) with nine types of frames , we can know them from the following function defined in the vm.c file :

vm_frametype_name(const rb_control_frame_t *cfp) {

switch (VM_FRAME_TYPE(cfp)) { case VM_FRAME_MAGIC_METHOD: return "method"; case VM_FRAME_MAGIC_BLOCK: return "block"; case VM_FRAME_MAGIC_CLASS: return "class"; case VM_FRAME_MAGIC_TOP: return "top"; case VM_FRAME_MAGIC_CFUNC: return "cfunc"; case VM_FRAME_MAGIC_PROC: return "proc"; case VM_FRAME_MAGIC_IFUNC: return "ifunc"; case VM_FRAME_MAGIC_EVAL: return "eval"; case VM_FRAME_MAGIC_LAMBDA: return "lambda";

default: rb_bug("unknown frame");}}

5.1.5 GC.C File: All the memory management operation and structs are placed in the gc.c file. Let’s take a tour in the important structs and functions in this file ruy 2.0.0:  Line 128: RValue struct : It is the struct of any object in ruby ad the struct of heap slots mentioned previously. 79 | P a g e

Ruby Debugger

 Line 208: struct rb_objspace: It is the general struct for the heap which carries all the information about the heap. It points to the beginning of each existing heap, their number, the linked list of the free slots .In all heaps, heap size...etc Also carries some variables related to GC operation and memory allocation.  Line 272 : Some Macros carrying some data about the heap like heap length or slots used and these data is returned from the structs.  Line 467: link_free_heap_slot(): Add a free slot to the list of free slots (free_slots)  Line 474: unlink_free_heap_slot(): Remove a slot from the list of free slots (free_slots)  Line 481: assign_heap_slot(): The functions that creates new heap.  Line 558: add_heap_slots(): Calls "assign_heap_slot()".  Line 577: init_heap(): It calls add_heap_slots() which calls assign_heap_slot(): which creates new heap.It does some initializations before creating the heap.  Line 624: set_heaps_increment(): As said before each heap created is bigger than the previously created heap by 1.8.This function is responsible for determining the size of the next heap.  Line 635: newobj(): Gets one of the free slots of the heap and puts in it a new object. Also removes this slot from the list of free slots.

80 | P a g e

Ruby Debugger

 Line 1052 : rb_objspace_each_objects(): It is a special C API to walk through Ruby object space which we will talk about after a while in dumping the heap.  Line 1592: int is_live_object(): Checks if the object is live.  Line 2582: gc_mark(): Takes a pointer and mark this object if live.  Line 1880: slot_sweep(): Takes a slot and adds it to the free list.  Line 2091:gc_sweep(): Loops the slots in the heap that are not marked and calls slot_sweep() and gives the slot to the slot_sweep() function.

5.2 Memory Inspection To profile the memory we explored two ways: 1. Inspecting the Heap. 2. Inspecting the stack.

5.2.1 Heap Inspection: Our main goal is profiling memory so, we thought that dumping the heap can help in defining what exists in memory. As mentioned before the rb_objspace_each_objects() is special C API to walk through Ruby object space which is carrying the heap. It takes two parameters: 1. Callback function: which loop inside the heap and do whatever you want with each slot. It has a certain layout defined by the ruby team.

81 | P a g e

Ruby Debugger

This is a sample callback code to iterate liveness objects: Int sample_callback(void *vstart, void *vend, int stride, void *data) { VALUE v = (VALUE)vstart; for (; v != (VALUE)vend; v += stride) { if (RBASIC(v)->flags) { // liveness check // do something with live object 'v' } return 0; // continue to iteration }

Where: vstart: Point the first living object in the heap to grasp. vend: Points to the end of the heap. stride: The size of RValue i.e the size of each slot in the heap. data: It will carry the parameter data from the API. 2. Data: It is a variable you might use to carry some data in it.

The sequence of the API:  The "rb_objspace_each_objects()" function puts the callback function and data in a struct and sends it as a parameter to "objspace_each_objects()" function.

 The "objspace_each_objects()" function loop in the object space to get each existing heap then sends the start of this heap to the callback function.

 The callback function loops the living objects i the heap and for each object you can do whatever you want as to define its type.

82 | P a g e

Ruby Debugger

The heap dump Extension  Implementation In this extension we used the rb_objspace_each_objects() API to loop each heap and dump its contents. We followed the same sequence of the API and in the callback function we made it call the function that brings the contents of the objects--------->OTHER EXT.  Result: It gives a lot of objects and those objects contained the variables of the ruby program currently running and also some pieces of code used y the interpreter.  Problems with heap dump from the memory profiling view: We faced two problems: 1. The objects dumped not only the objects created y the running ruby program but also some internal instructions that are out of the profiling scope. 2. The heap dump is not continuously called but it will dump the contents of memory only when it is called so, the changes happening through the running program cannot be traced like this. What we looked for after that is to find a way to access the memory either in continuously or to access the memory every time a new object is created.

5.2.2 Stack Inspection: After heap dump we thought of accessing the stack such that every newly created object will placed inside it and can be used to profile the memory.  VALUE rb_tracepoint_new (0 , rb_event_flag_t events, void(*func)(VALUE, void *) , void *data):

83 | P a g e

Ruby Debugger

Some useful API's in Ruby 2.1 Ruby 2.1 came with some new features and APIs that help in accessing the stack and in ruby profiling some of them included in the debug.h file like: 1. rb_event_flag_t events: Takes one of predefined events in ruby.h file like: RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 RUBY_INTERNAL_EVENT_GC_START 0x400000 RUBY_INTERNAL_EVENT_GC_END 0x800000 RUBY_INTERNAL_EVENT_OBJSPACE_MASK 0xf00000 RUBY_INTERNAL_EVENT_MASK 0xfffe0000 2. Void function: The function that will be executed every time the event happens. 3. Data: a variable to carry any data the user may need.  VALUE rb_tracepoint_enable(VALUE tpval): Takes as a parameter the VALUE returned from rb_tracepoint_new() to enale tracing. How to use example: objtracer = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, newobj_handler, 0); rb_tracepoint_enable(objtracer);  VALUE rb_tracepoint_disable(VALUE tpval): Takes as a parameter the VALUE returned from rb_tracepoint_new() to disable tracing.

84 | P a g e

Ruby Debugger

 int rb_profile_frames(0, int limit, VALUE *buff, int *lines): Collects stack frames.It provides low-overhead C-API access to the VM's call stack. no object allocations occur in this path. Takes parameters: Limit: the limit of the number of frames to collect. Buff: carries in it the frames. Lines: calc_lineno is called for each frame and the result is stored in this buffer. How to use Example: The void function called by the rb_tracepoint_new() can call the rb_profile_frames() function and give it as parameters the o Limit of the number of frames and two empty arrays o The first is (buff) array of VALUES to carry the frames o The second (lines)) is array of integers (). Then you can loop the buff to do whatever you want with each frame.

int num = rb_profile_frames(0, sizeof(stackprofSt.frames_buffer), stackprofSt.frames_buffer, stackprofSt.lines_buffer); int i; for (i = 0; i < num; i++) { int line = stackprofSt.lines_buffer[i]; rb_iseq_t *frame = stackprofSt.frames_buffer[i]; int l = frame->local_table_size; printf("%d\n",l); }

There are some APIs that can be used with the frame to get some information about it like  VALUE rb_profile_frame_path(VALUE frame);  VALUE rb_profile_frame_absolute_path(VALUE frame);  VALUE rb_profile_frame_label(VALUE frame); 85 | P a g e

Ruby Debugger

 VALUE rb_profile_frame_base_label(VALUE frame);  VALUE rb_profile_frame_full_label(VALUE frame);  VALUE rb_profile_frame_first_lineno(VALUE frame); How to use example: In the previous example we can use them with each frame: for (i = 0; i < num; i++) { int line = stackprofSt.lines_buffer[i]; rb_iseq_t *frame = stackprofSt.frames_buffer[i]; rb_profile_frame_absolute_path( frame); int l = frame->local_table_size; printf("%d\n",l); }

 rb_trace_arg_t *rb_tracearg_from_tracepoint(VALUE tpval): Returns a rb_trace_arg_t struct containing information about the event happened.Takes as a parameter any variable of type VALUE.  VALUE rb_tracearg_object(rb_trace_arg_t *trace_arg): When the rb_tracepoint_new() function detect that the event it is tracing happened this function return the object that caused the event to happen. It takes as a parameter the return of the rb_tracearg_from_tracepoint functions. How to use Example: When a new object is created and the event flag given to rb_tracepoint_new() is RUBY_INTERNAL_EVENT_NEWOBJ rb_tracearg_object will return this newly created object.

86 | P a g e

Ruby Debugger

VALUE t; rb_trace_arg_t *tparg = rb_tracearg_from_tracepoint(t); VALUE obj = rb_tracearg_object(tparg); Stack Trace Extension: We used some of the previously explored APIs to trace the creation of any new object and to get the information we need to profile the memory like the size of each object and its contents.

Implementation: The APIs used:     

rb_tracepoint_new(). rb_tracepoint_enable() rb_tracepoint_disable() rb_tracearg_from_tracepoint() rb_tracearg_object()

The extension is supposed to trace any new object created in the running ruby program then bring its type, contents and size in memory. The result:  Every newly created object is returned immediately when it is created.  The type of each object is returned. Problems with Stack Inspection from the memory profiling view:  Some few object are returned which were not created in the running ruby program.  When we try to ring the contents of each objects it returns that it is an empty object so, we suspect that the object is returned immediately when it is created before being supplied with data.

87 | P a g e

Ruby Debugger

CHAPTER 6: RUBYâ&#x20AC;&#x2DC;S VIRTUAL MACHINE 6.1 Overview 6.1.1 Introduction to Virtual Machines: A virtual machine is a software that mimics a machine (i.e., a computer) that executes programs exactly like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine:1- A system virtual machine provides a complete system platform which supports the execution of a complete operating system. These usually emulate an existing architecture, and are built with the purpose of either providing a platform to run programs where the real hardware is not available for use (for example, executing software on otherwise obsolete platforms), or of having multiple instances of virtual machines leading to more efficient use of computing resources, both in terms of energy consumption and cost effectiveness or both. 2- A process virtual machine (also, language virtual machine) is designed to run a single program, which means that it supports a single process. Such virtual machines are usually closely suited to one or more programming languages and built with the purpose of providing program portability and flexibility (amongst other things). An essential characteristic of a virtual machine is that the software running inside is limited to the resources and abstractions provided by the virtual machineâ&#x20AC;&#x201D;it cannot break out of its virtual environment. Ruby's Virtual machine is of course a process virtual machine so our main focus will be on the concept of a process virtual machine. A process virtual machine, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform88 | P a g e

Ruby Debugger

independent programming environment that abstracts away details of the underlying hardware or operating system, and allows a program to execute in the same way on any platform. Ruby actually runs as a single process having a process id, this process is called “ruby” which is used to interpret or “run” ruby programs.

$ ruby test.rb // Creates a process called ruby which executes the file.

To examine this , try to open the System Monitor “On Linux Systems” or Windows task manager “On Windows OS” and look for a process called ruby or if you're using Ubuntu “That's in my case” Type inside the terminal:-

$ top

This command will give you a general profile about the current state of your computer in general “Ex: processor speed, Number of tasks, Used memory, Cached, information about processes currently in execution etc....” A process virtual machine provides a high-level abstraction — that of a high level programming language. Process virtual machines are implemented using an interpreter “In our case it's Ruby's YARV Machine”. Next we will talk about virtual machine architectures and link it with Ruby's YARV Machine.

89 | P a g e

Ruby Debugger

6.1.2 Virtual Machine Architectures: A discussion of VMs is also a discussion about computer architecture in the pure sense of the term. Because VM implementations lie at architected interfaces, a major consideration in the construction of a VM is the fidelity with which it implements these interfaces. Architecture, as applied to computer systems, refers to a formal specification of an interface in the system, including the logical behavior of resources managed via the interface. Implementation describes the actual embodiment of an architecture. Abstraction levels correspond to implementation layers, whether in hardware or software, each associated with its own interface or architecture. Three concepts are important for VM construction which are the instruction set architecture, the application binary interface, and the application programming interface.

90 | P a g e

Ruby Debugger

High level language VMs â&#x20AC;&#x153;HLLsâ&#x20AC;?: For process VMs, cross-platform portability is clearly a key objective. However, emulating one conventional architecture on another provides cross-platform compatibility only on a case-by-case basis and requires considerable programming effort. Full cross-platform portability is more readily achieved by designing a process-level VM as part of an overall HLL application development environment. The resulting HLL VM does not directly correspond to any real platform; rather, it is designed for ease of portability and to match the features of a given HLL or set of HLLs.

91 | P a g e

Ruby Debugger

Figure 4 shows the difference between a conventional platform-specific compilation environment and an HLL VM environment. In a conventional system, shown in Figure 4a, a compiler front end first generates intermediate code that is similar to machine code but more abstract. Then, a code generator uses the intermediate code to generate a binary containing machine code for a specific ISA and operating system. This binary file is distributed and executed on platforms that support the given ISA/OS combination. In an HLL VM, as shown in Figure 4b, a compiler front end generates abstract machine code in a virtual ISA that specifies the VMâ&#x20AC;&#x2122;s interface. This virtual ISA code, along with associated data structure information (metadata), is distributed for execution on different platforms. Each host platform implements a VM capable of loading and executing the virtual ISA and a set of library routines specified by a standardized API. In its simplest form, the VM contains an interpreter. More sophisticated, higher performance VMs compile the abstract machine code into host machine code for direct execution on the host platform. An advantage of an HLL VM is that application software is easily ported once the VM and libraries are implemented on a host platform. While the VM implementation takes some effort, it is much simpler than developing a fullblown compiler for a platform and porting every application through recompilation. The Sun Micro-systems Java VM architecture and the Microsoft Common Language Infrastructure, which is the foundation of the .NET framework, are widely used examples of HLL VMs. The ISAs in both systems are stack-based to eliminate register requirements and use an abstract data specification and memory model that supports secure objectoriented programming.

6.1.3 Stack machines VS Register Machines: A stack machine is a real or emulated computer that uses a data structure of a pushdown stack in execution of its instructions rather than individual machine registers. In our example here it'll be â&#x20AC;&#x153;Control frame pointerâ&#x20AC;? A Register machine is also a real or emulated computer in which each instruction explicitly names the specific registers to use for operand and the result values. In our example here it is the PC, SP, EP etc... Actually in Ruby's YARV machine, it's a mix of both stack machine and register machine in which they are trying to exclude any disadvantages from both of them.

92 | P a g e

Ruby Debugger

A stack based virtual machine implements the general features described as needed by a virtual machine, but the memory structure where the operands are stored, is a stack data structure. Operations are carried out by popping data from the stack, processing them and pushing in back the results in LIFO (Last in First Out) fashion. In a stack based virtual machine, the operation of adding two numbers would usually be carried out in the following manner (where 20, 7, and ‘result’ are the operands).

In this figure, the SP “Stack pointer” points to the first value “top of the stack”. 1. POP 20 2. POP 7 3. ADD 20, 7, result. 4. PUSH result This behavior of carrying out operations is called the reverse polish notation “post fix notation”. Because of the PUSH and POP operations, four lines of instructions is needed to carry out an addition operation. An advantage of the stack based model is that the operands 93 | P a g e

Ruby Debugger

are addressed implicitly by the stack pointer (SP in above image). This means that the Virtual machine does not need to know the operand addresses explicitly, as calling the stack pointer will give (Pop) the next operand. In stack based VMâ&#x20AC;&#x2122;s, all the arithmetic and logic operations are carried out via Pushing and Popping the operands and results in the stack. In the register based implementation of a virtual machine, the data structure where the operands are stored is based on the registers of the CPU. There is no PUSH or POP operations here, but the instructions need to contain the addresses (the registers) of the operands. That is, the operands for the instructions are explicitly addressed in the instruction, unlike the stack based model where we had a stack pointer to point to the operand. As I mentioned earlier, there is no POP or PUSH operations, so the instruction for adding is just one line. But unlike the stack, we need to explicitly mention the addresses of the operands. The advantage here is that the overhead of pushing to and popping from a stack is non-existent, and instructions in a register based VM execute faster within the instruction dispatch loop. Another advantage of the register based model is that it allows for some optimizations that cannot be done in the stack based approach. One such instance is when there are common sub expressions in the code, the register model can calculate it once and store the result in a register for future use when the sub expression comes up again, which reduces the cost of recalculating the expression. The problem with a register based model is that the average register instruction is larger than an average stack instruction, as we need to specify the operand addresses explicitly. Whereas the instructions for a stack machine is short due to the stack pointer, the respective register machine instructions need to contain operand locations, and results in larger register code compared to stack code.

94 | P a g e

Ruby Debugger

6.2 Ruby’s YARV Machine 6.2.1 Introduction to Ruby’s YARV Machine: As we all know, Mr Koichi Sasada is the one who designed and coded the YARV machine which was introduced in the versions of ruby 1.9.x and then became the official virtual machine for CRuby. The YARV machine has its own interpreting language in which any instruction written in the higher level program (i.e. written in ruby) will be first tokenized, then parsed, then compiled and interpreted by the VM. The term “compiled” doesn't mean actually “compilation” but it means that the instruction has been translated into a series of “bytecode” instructions that will be executed by the VM afterwards, we will see that later on. The Figure bellow shows running time of benchmarks on old-ruby and YARV. This results were evaluated on Pentium-M 1.2Ghz, 1024MB memory, Windows XP using Cygwin, gcc 3.4.4.

Before the introduction of the YARV machine , Old interpreter (matzruby) traverses abstract syntax tree (AST) naively which is obviously slow while YARV compile that AST to YARV byte codes and run it on the YARV machine itself which made a huge performance upgrade.

95 | P a g e

Ruby Debugger

YARV reuses many parts of old-ruby, namely the Ruby script parser, the object management mechanism, the garbage collector and more. In fact, YARV is implemented as an extension module for old-ruby. Though it takes a vastly different approach to the problem of running Ruby programs from old-ruby. In old-ruby when a program is parsed, an abstract syntax tree “AST” is created. This is a hierarchical tree of all tokens (identifiers, keywords, operators, etc.) in the program. The MRI would iterate over this tree directly when running Ruby programs. While this may be simpler to implement, it takes more memory and is quite inefficient in the long run. On the other hand when YARV is used after the program is parsed and the abstract syntax tree “AST” created, the compiled instructions for the YARV machine is created. At this point the higher level program “written in ruby programming language” is now translated to another language which are represented as a set of YARV instructions the YARV machine then executes these set of compiled “byte code” instructions “YARV instructions” without having to traverse the abstract syntax tree “AST”.

Also, the YARV machine began to expand more and more, method frames in ruby 1.8 were classified into several method frames for cfunc “C-functions” which was kind of not a good style as well as not abstract. After the introduction of the YARV machine the concept of a “Control frame” was introduced with a “Control frame pointer virtual register”, which made an abstraction where any method whether it is a Cfunc method or ruby method would have a control frame , moreover , there has been some changes in something called the “Method dispatching” inside the VM where C-functions implemented for Ruby are to be called ”dispatched” according to the ruby script written in the higher level code and at some cases they had to construct special YARV instructions to prevent the “overhead” caused by method dispatching such as arithmetic operations “+,-,/,*,%..etc.” and logical operations “AND <&>, OR <||> etc. “but we will talk about that later. The thing is, Ruby was once considered one of the slowest computer languages widely used. It was just slow and inefficient, sometimes prohibitively slow. Now, with YARV, performance is increased dramatically, putting Ruby closer to Python (which is considered fast for a modern scripting language) in terms of performance. In the following sections, we are going to introduce the YARV stack, which is the stack used to execute “compiled instructions”. Next we will talk about a special YARV instruction called “trace” made for special debugging functions then we will introduce

96 | P a g e

Ruby Debugger

the concept of instruction sequencing in Ruby which is about how instructions tend to be executed in the proper sequence .Finally we will see how variables are stored in the stack but actually this section is still incomplete because till now we haven't examined everything related to the YARV machine, don't worry there are more surprises we will be facing.

6.2.2 YARV Stack: As we've mentioned earlier, Ruby's YARV machine is a combination of a stack machine and a register machine in which compiled instructions are inserted in a stack data structure along with other variables or arguments to be executed by the YARV machine. We look at the stack machine methodology of YARV. It's actually a stack machine that doesn't only keep track of intermediate values but also keeps track of your ruby program call stack. In fact, YARV is a “double stack machine!”. It not only has to track the arguments and return values for its own internal instructions, but also it has to do it for your Ruby arguments and return values as well, for example:

setlocal <variable_name> // Internal YARV instruction. Puts add(x) // Ruby

Starting point, Some Failure/Success Trials: Before we've reached to the point of the YARV stack and when we were first introduced to the CRuby source code , we were in a confusion and actually had that question popping in our head “From where should we start ?“. We were first discovering some of the files that we suspected them to have a relation with the virtual machine directly and yes we had a problem that the code was written with the C language in a somehow more complicated way than we used to face . It was a bit hard at the first but we managed to understand things gradually while reading the source code and tracing it. Our first journey began with wandering through the files that implement various parts in Ruby and actually most of the files were ignored because they had no direct relation with the YARV machine “like parse.y”. 97 | P a g e

Ruby Debugger

The second question we asked ourselves was “what files should I focus on to be able to trace the YARV machine” the answer to this question was hard at first but we started to narrow our range of sight to focus on specific files that are closely related to the YARV. We first began by pressing the “debug” button in eclipse and it stopped at the function “setlocale(LC_TYPE , “”)” this function just checks the language settings on your system and encoding type “UTF8” on your own machine. Then Ruby will start its own setup phase of allocating memory space for its stacks and object space allocation for its garbage collector and so forth “It's kind of a complicated phase”. Then Ruby tends to initialize it's functionalities including the classes implemented inside ruby and things related to garbage collection and so forth in the file inits.c “You can find more details in the previous chapters”. Then we began to watch the parsing phase “Didn't actually figure out the whole process since it's not of interest in construction of a debugger”. With time and debugging, we learned how to narrow our search space and focus on certain files or even certain parts of a file. The main files that we considered after this long journey were “vm_eval.c , vm.c , vm_exec.c , vm_core.h , insns.def , , vm_trace.c” . We will start now exploring the YARV stack.

Trials in io.c file:Several trials have been made on the file io.c to explore more how Ruby tends to output something on the screen “console”. The reason behind those trials was to try to interrupt the output and stop the execution at that point. io_write() function present in io.c file was a explored. Code snippet: static VALUE io_write(VALUE io, VALUE str, int nosync) { rb_io_t *fptr; long n; VALUE tmp;

98 | P a g e

Ruby Debugger

rb_secure(4); io = GetWriteIO(io); str = rb_obj_as_string(str); tmp = rb_io_check_io(io); if (NIL_P(tmp)) { /* port is not IO, call write method for it. */ return rb_funcall(io, id_write, 1, str); } io = tmp; if (RSTRING_LEN(str) == 0) return INT2FIX(0); str = rb_str_new_frozen(str); GetOpenFile(io, fptr); rb_io_check_writable(fptr); n = io_fwrite(str, fptr, nosync); if (n == -1L) rb_sys_fail_path(fptr->pathv); return LONG2FIX(n); } We didn’t find anything related to io_write. Another trial was with the function rb_io_puts() which is closely related to the puts functionality present in Ruby ( Ex : puts “Something” ). VALUE rb_io_puts(int argc, VALUE *argv, VALUE out) { int i; VALUE line; /* if no argument given, print newline. */ if (argc == 0) { rb_io_write(out, rb_default_rs); return Qnil; } for (i=0; i<argc; i++) { if (RB_TYPE_P(argv[i], T_STRING)) { line = argv[i];

99 | P a g e

Ruby Debugger

goto string; } if (rb_exec_recursive(io_puts_ary, argv[i], out)) { continue; } line = rb_obj_as_string(argv[i]); string: rb_io_write(out, line); if (RSTRING_LEN(line) == 0 || ! str_end_with_asciichar(line, '\n')) { rb_io_write(out, rb_default_rs); } } // Added printf statement in the code printf("in rb_io_puts in io.c\n",line); return Qnil; } Output: x+y= in rb_io_puts in io.c 12 in rb_io_puts in io.c x*y= in rb_io_puts in io.c 32 in rb_io_puts in io.c x/y= in rb_io_puts in io.c 2 in rb_io_puts in io.c

100 | P a g e

Ruby Debugger

How "puts" works in Ruby: Most object oriented programming languages provide an interface that an object can implement to describe itself as a string, so when it is given as an argument to a "printing" function, it is converted to its desired format. In Ruby, this is achieved by implementing the to_s method, so if you define a class implementing .this method, puts will behave like this:

class MyClass def to_s "<MyClass Instance>" end end instance = puts instance


<MyClass Instance>

So everything is well and works as expected until you suddenly decide to subclass the Array or the String class. By modifying the above program with this snippet:-

class MyArray < Array def to_s "<MyArray instance>" end end

101 | P a g e

Ruby Debugger

mystring ="Hello World!") myarray = puts mystring


Hello World!

puts myarray

Output: nil

As you can see, I've implemented the to_s function but it didn't get called when I passed the instances to the puts function. That means I did something wrong, right? mystring.to_s "<MyString Hello World!>" myarray.to_s "<MyArray instance>" Well, apparently not. So what kind of trickery is going on here? I've implemented the necessary method but it's not being called for those specific subclasses, so it's bound to be an implementation detail of the function we are calling. Let's take a look at how puts is implemented in Ruby. Opening the io.c file and searching for "puts" we end up finding the implementation of the rb_io_puts(...) function (see the above function implementation for rb_io_puts() ). TYPE is a macro that expands to rb_type, which ultimately checks the .flags field from the object cast as struct RBasic to identify it's type. So, for the string behavior it's simple:

102 | P a g e

Ruby Debugger

TYPE(argv[i]) == T_STRING is True, which makes the code jump to the string label and simply write its contents. For the array case, the code calls rb_check_array_type (from array.c), which cascades to testing TYPE(argv[i]) == T_ARRAY && T_ARRAY != T_DATA. That results in true, so the code enters the if (!NIL_P(line)) block, calling io_puts_ary (that ends up calling rb_io_puts for each item of the array) and goes to handle the next parameter. Now, only if both tests above fail the interpreter ends up calling rb_obj_as_string, which, as you may have guessed, is responsible for calling the higher level to_s implementation. So now you might be thinking "if I don't subclass Array nor String ruby will certainly call my to_s implementation, right?". Well, wrong! Returning back to the previous Ruby's source code snippet, you will notice the following:

VALUE rb_check_array_type(VALUE ary) { return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary"); }

Without digging deeper in the code, what might this mean? Let's do a little test. Remember our first example, the MyClass implementation? We'll build up on it:

class MyClass def to_s "<MyClass Instance>" end end instance = puts instance

103 | P a g e

Ruby Debugger

Output: <MyClass Instance>

class MyClass def to_ary %W(MyClass Instance as Array) end end puts instance

Output: MyClass Instance as Array

Weird isn't it? Turns out that rb_check_array_type returns a valid pointer if the parameter implements the to_ary method. This means that whenever an object has this method implemented, it will be treated as an array by puts and the to_s method will never be called by it. So, to summarize: puts does call to_s as long as you don't: 1. Subclass from String nor Array 2. Implement the to_ary method

104 | P a g e

Ruby Debugger

The YARV stack: The main concept: The YARV stack as we've mentioned above is used to store internal YARV instruction or simply “Ruby compiled instructions” that will be executed by the two files and insns.def files. The YARV stack works as follows:For example we will look at this code sample “Written in Ruby” def add(integr) integr+integr end x=5 puts add(x)

1- Each line or part of line in Ruby is being executed (i.e A line could have more than one function call or possibly an iterating block like a loop block or possibly more than one event on the same line) a “trace” instruction must be pushed to the YARV stack that holds the instructions that should be executed so YARV will enter this code:

INSN_ENTRY(trace){ { rb_num_t nf = (rb_num_t)GET_OPERAND(1); DEBUG_ENTER_INSN("trace"); // skip this ADD_PC(1+1); //Increments the address inside PC to point to the next instruction present on the stack . . . }

105 | P a g e

Ruby Debugger

After it increments the contents of the PC register it'll pass control to the insns.def file which will tend to execute the actual implementation of the YARV instruction and in our first instruction it is the trace instruction. DEFINE_INSN trace (rb_num_t nf) () () { rb_event_flag_t flag = (rb_event_flag_t)nf; if (RUBY_DTRACE_METHOD_ENTRY_ENABLED() || RUBY_DTRACE_METHOD_RETURN_ENABLED() || RUBY_DTRACE_CMETHOD_ENTRY_ENABLED() || RUBY_DTRACE_CMETHOD_RETURN_ENABLED()) { switch(flag) { case RUBY_EVENT_CALL: RUBY_DTRACE_METHOD_ENTRY_HOOK(th, 0, 0); break; case RUBY_EVENT_C_CALL: RUBY_DTRACE_CMETHOD_ENTRY_HOOK(th, 0, 0); break; case RUBY_EVENT_RETURN: RUBY_DTRACE_METHOD_RETURN_HOOK(th, 0, 0); break; case RUBY_EVENT_C_RETURN: RUBY_DTRACE_CMETHOD_RETURN_HOOK(th, 0, 0); break; }} EXEC_EVENT_HOOK(th, flag, GET_SELF(), 0, 0 /* id and klass are resolved at callee */,(flag & (RUBY_EVENT_RETURN | RUBY_EVENT_B_RETURN)) ? TOPN(0) : Qundef); }

106 | P a g e

Ruby Debugger

The trace instruction actually belongs to the Tracepoint API, set_trace_func and the Dtrace API, Actually still not yet working in Ruby 2.0.x and haven't yet tried to activate that in Ruby 2.1 but it seems very interesting if we want to use its event to try to pause the execution at some point. After that, it jumps to file and calls the routine “ END_INSN(trace)” which means that it finished execution of this instruction. 2- Next is the “putspecialobject” YARV instruction which means that he saw some “special” line of code that is not present in Ruby's internal implementation which is in our case the function add(integer) .

INSN_ENTRY(putspecialobject){ { VALUE val; rb_num_t value_type = (rb_num_t)GET_OPERAND(1); DEBUG_ENTER_INSN("putspecialobject"); ADD_PC(1+1); . . . } The value_type here means “What is the type of the expression written in this line of code? “, so when it jumps to the insns.def file, it'll switch on the type of the “special object” that is written in ruby language. putspecialobject (rb_num_t value_type) () (VALUE val) { enum vm_special_object_type type = (enum vm_special_object_type)value_type; switch (type) {

107 | P a g e

Ruby Debugger

case VM_SPECIAL_OBJECT_VMCORE: val = rb_mRubyVMFrozenCore; break; case VM_SPECIAL_OBJECT_CBASE: val = vm_get_cbase(GET_ISEQ(), GET_EP()); break; case VM_SPECIAL_OBJECT_CONST_BASE: val = vm_get_const_base(GET_ISEQ(), GET_EP()); break; default: rb_bug("putspecialobject insn: unknown value_type"); } }

The type in our example is VM_SPECIAL_OBJECT_VMCORE “I'm actually not sure about it's true functionality yet but it seems that it tends to push this instruction when the programmer tends to write something that is not identified inside ruby like variables or methods “In our example add(integer)” and so forth”. In our example, this will setup a control frame for the function add(integer) and it'll be a vm_general_call type instead of a cfunc method call. 3- The steps above are repeated for the arguments of the “add” function which is in this case the “integer”. But here the value_type will be VM_SPECIAL_OBJECT_CBASE and the “val” set will the EP pointer and the ISEQ pointer which means that both pointers maybe be pointing on the same stack or pointing on different stacks “Still actually not sure about that”. 4- The instructions “putobject” and “putiseq” will be pushed onto the stack “still not sure about their functionality” but it seems to be related to the two macro function calls GET_EP() and GET_ISEQ() , here it'll push the “integer” as a local variable to the function add() and an instruction sequence to “integer” which means that integer is the first parameter for the function add “That's the only possible thing I came up with the iseq pointer till now”

108 | P a g e

Ruby Debugger

INSN_ENTRY(putobject){ { VALUE val = (VALUE)GET_OPERAND(1); DEBUG_ENTER_INSN("putobject"); ADD_PC(1+1); . . . } INSN_ENTRY(putiseq){ { VALUE ret; ISEQ iseq = (ISEQ)GET_OPERAND(1); DEBUG_ENTER_INSN("putiseq"); ADD_PC(1+1); . . . }

And then in insns.def, it'll return the instruction sequence related to the function parameter “integer” of the function add(integer). 5- The next instruction will be “opt_send_simple” is pushed and here it'll start constructing a control frame for the function add(integer).

INSN_ENTRY(opt_send_simple){ { VALUE val; CALL_INFO ci = (CALL_INFO)GET_OPERAND(1);

109 | P a g e

Ruby Debugger

DEBUG_ENTER_INSN("opt_send_simple"); ADD_PC(1+1); . . . }

Here it'll search for the method add(integer) inside Ruby's method entry data structure “possibly the m_tbl”.

DEFINE_INSN opt_send_simple (CALL_INFO ci) (...) (VALUE val) // inc += -ci->orig_argc; { vm_search_method(ci, ci->recv = TOPN(ci->argc)); In the function vm_search_method :vm_search_method(rb_call_info_t *ci, VALUE recv) { VALUE klass = CLASS_OF(recv); This is something to do with method caching #if OPT_INLINE_METHOD_CACHE if (LIKELY(GET_METHOD_SERIAL() == ci->method_serial && RCLASS_EXT(klass)->class_serial == ci->class_serial)) { /* cache hit! */ return; } #endif ci->me = rb_method_entry(klass, ci->mid, &ci->defined_class); ci->klass = klass; ci->call = vm_call_general; This is something to do with method caching

110 | P a g e

Ruby Debugger

#if OPT_INLINE_METHOD_CACHE ci->method_serial = GET_METHOD_SERIAL(); ci->class_serial = RCLASS_EXT(klass)->class_serial; #endif } CALL_METHOD(ci); } 6- The next instruction is the “pop” instruction which means that it'll pop something from a certain stack “Not yet sure maybe the control frame stack referenced by the Control frame pointer or maybe the stack referenced by the EP or maybe the ISEQ stack or possibly the stack holding YARV instructions”.

INSN_ENTRY(pop){ { VALUE val = TOPN(0); DEBUG_ENTER_INSN("pop"); ADD_PC(1+0); PREFETCH(GET_PC()); POPN(1); // Here it pops the top of the stack . . . }

And that's the end of the line containing the method add(integer) , Now it'll jump to the line “x=5” to execute its instructions , which means the “trace” instruction will be pushed on to the YARV stack “As it jumped to a new line”. 7- The line x=5 is in execution now , which means that the variable x is an object that will be identified in the ruby program so, the YARV instruction “putobject” is pushed then executed.

111 | P a g e

Ruby Debugger

INSN_ENTRY(putobject){ { VALUE val = (VALUE)GET_OPERAND(1); DEBUG_ENTER_INSN("putobject"); ADD_PC(1+1); . }

But after this instruction is pushed , x is not alone but it is assigned with the value 5 , added to that x is a “local variable” to the “main” ruby program “ The meaning of main to be discussed later on” , So the EP “Environment pointer” will be used to push the x and it's value of 5 onto the stack .So it'll jump to the next instruction setlocal and push it on top of the stack. 8- Next is the instruction “setlocal” which is used to set the value of 5 to x so it'll be pushed on top of the stack.

INSN_ENTRY(setlocal_OP__WC__0){ { #define level 0 lindex_t idx = (lindex_t)GET_OPERAND(1); VALUE val = TOPN(0); # possibly gets the x variable and push it with the value 5 DEBUG_ENTER_INSN("setlocal_OP__WC__0"); ADD_PC(1+1); PREFETCH(GET_PC()); POPN(1); #it'll pop the instruction of x “putobject” and will insert its value possibly another stack. . . . }

112 | P a g e

Ruby Debugger

Here it'll push the value of x “putobject” which is 5. DEFINE_INSN setlocal (lindex_t idx, rb_num_t level) (VALUE val) () { int i, lev = (int)level; VALUE *ep = GET_EP(); for (i = 0; i < lev; i++) { ep = GET_PREV_EP(ep); } *(ep - idx) = val; } Then after that it executes the next line: puts add(x) and of course the trace instruction will be pushed, executed then popped. 9- puts has a special YARV instruction called “putself” which is used to output anything on the console “ integer , string etc .. “.

INSN_ENTRY(putself){ { VALUE val; DEBUG_ENTER_INSN("putself"); ADD_PC(1+0); . . }

Then in insns.def it'll call a function called GET_SELF() since this simple script contains no Ruby objects or classes the self pointer will be set to the default “top self” object. This is an instance of the Object class Ruby automatically creates when YARV starts up. It serves

113 | P a g e

Ruby Debugger

as the receiver for method calls and the container for instance variables in the top level scope. The “top self” object contains a single, predefined to_s “to_string” method which returns the string “main” – you can call this method by running this command in your console:

$ ruby -e 'puts self'

Later YARV will use this self value on the stack when it executes the “send” instruction. Self is the receiver of the puts method, since I didn’t specify a receiver for this method call. 10 – puts has a parameter which is add(x) , this add(integer) should be first executed before puts is executed which means that add(integer) will be called. This means that the “getlocal” YARV instruction will be pushed onto the stack to be executed next.

INSN_ENTRY(getlocal_OP__WC__0){ { VALUE val; #define level 0 lindex_t idx = (lindex_t)GET_OPERAND(1); DEBUG_ENTER_INSN("getlocal_OP__WC__0"); ADD_PC(1+1); . } Then it'll get the top value of the stack pointed by the EP pointer “Which in this case is the value of 5”.

DEFINE_INSN getlocal (lindex_t idx, rb_num_t level) () (VALUE val)

114 | P a g e

Ruby Debugger

{ int i, lev = (int)level; VALUE *ep = GET_EP(); for (i = 0; i < lev; i++) { ep = GET_PREV_EP(ep); } val = *(ep - idx); } 11 – The parameter of puts is a function call , which means it has to execute a control frame for the function and therefore it will call the code of the YARV instruction “opt_send_simple” then it'll call the function add(). Which in turn will go to another line of code and push the “trace” YARV instruction as well. 12- Now Ruby's method stack has the function “add(integer)” so it'll be already registered and it won't have to register that method in Ruby. Then skips that line and goes to the next line. 13 - The line “return integer+integer” contains two things, an operation between two values and then another instruction which means there are two instructions to be pushed, executed then popped from the top of the stack. Let's look at integer+integer , they are two local variables to the function add and Ruby must “get” their values , which means that it'll push the getlocal twice for the same variable “Maybe they will improve that in the future”. After that there is an operation “addition operation” and surprisingly there is a YARV instruction called “opt_plus” that is a special YARV instruction used specifically for addition operations.

INSN_ENTRY(opt_plus){ { VALUE val; CALL_INFO ci = (CALL_INFO)GET_OPERAND(1); // Get the function that performs the addition operation. VALUE recv = TOPN(1); // The first integer VALUE obj = TOPN(0); // The second integer DEBUG_ENTER_INSN("opt_plus"); ADD_PC(1+1); PREFETCH(GET_PC());

115 | P a g e

Ruby Debugger

POPN(2); //Pop the two instructions related to getlocal . . . }

Then it'll call the insns.def routine that performs the addition operation opt_plus (CALL_INFO ci) (VALUE recv, VALUE obj) (VALUE val) { if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS,FIXNUM_REDEFINED_OP_FLAG)) { /* fixnum + fixnum */ #ifndef LONG_LONG_VALUE val = (recv + (obj & (~1))); if ((~(recv ^ obj) & (recv ^ val)) & ((VALUE)0x01 << ((sizeof(VALUE) * CHAR_BIT) - 1))) { val = rb_big_plus(rb_int2big(FIX2LONG(recv)), rb_int2big(FIX2LONG(obj))); } #else long a, b, c; a = FIX2LONG(recv); b = FIX2LONG(obj); c = a + b; if (FIXABLE(c)) { val = LONG2FIX(c); } else { val = rb_big_plus(rb_int2big(a), rb_int2big(b)); } #endif

116 | P a g e

Ruby Debugger

} else if (FLONUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (!SPECIAL_CONST_P(recv) && !SPECIAL_CONST_P(obj)) { if (RBASIC_CLASS(recv) == rb_cFloat && RBASIC_CLASS(obj) == rb_cFloat && BASIC_OP_UNREDEFINED_P(BOP_PLUS, FLOAT_REDEFINED_OP_FLAG)) { val = DBL2NUM(RFLOAT_VALUE(recv) + RFLOAT_VALUE(obj)); } else if (RBASIC_CLASS(recv) == rb_cString && RBASIC_CLASS(obj) == rb_cString && BASIC_OP_UNREDEFINED_P(BOP_PLUS, STRING_REDEFINED_OP_FLAG)) { val = rb_str_plus(recv, obj); } else if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_PLUS, ARRAY_REDEFINED_OP_FLAG)) { val = rb_ary_plus(recv, obj); } else { goto INSN_LABEL(normal_dispatch); } } else { INSN_LABEL(normal_dispatch): PUSH(recv); PUSH(obj); CALL_SIMPLE_METHOD(recv); } }

It also handles all kinds of addition operations with float and double objects “big decimals and fixnums“ 14 – Next it'll “return” the result of the operation in which the result is calculated in post fix notation, then the two “getlocal” instructions related to the two values of the same

117 | P a g e

Ruby Debugger

variable integer are popped and the value is pushed on to the stack “Which in this case is 10”, So the value of 10 is pushed onto the stack and the two “5”'s are popped. At that point, the trace instruction is called and then it'll execute the “end” script written in ruby to “leave” the block. This is done via the “leave” instruction that will in turn pop the frame using the c method vm_pop_frame() that is currently in execution “or possibly the instruction sequence that was pushed previously.

INSN_ENTRY(leave){ { VALUE val = TOPN(0); DEBUG_ENTER_INSN("leave"); ADD_PC(1+0); PREFETCH(GET_PC()); POPN(1); . . . } DEFINE_INSN leave () (VALUE val) (VALUE val) { if (OPT_CHECKED_RUN) { if (reg_cfp->sp != vm_base_ptr(reg_cfp)) { rb_bug("Stack consistency error (sp: %"PRIdPTRDIFF", bp: %"PRIdPTRDIFF")", VM_SP_CNT(th, reg_cfp->sp), VM_SP_CNT(th, vm_base_ptr(reg_cfp))); } } RUBY_VM_CHECK_INTS(th); if (UNLIKELY(VM_FRAME_TYPE_FINISH_P(GET_CFP()))) {

118 | P a g e

Ruby Debugger

vm_pop_frame(th); // The control frame currently executed will be popped. #if OPT_CALL_THREADED_CODE th->retval = val; return 0; #else return val; #endif } else { vm_pop_frame(th); RESTORE_REGS(); } } 15 – Then it'll go back to the puts line where the function that will output something on the screen is called which means that the “putself” YARV instruction will be called at this point executing the “opt-send-simple” YARV instruction first to output 10 on the screen and that's the end of the execution then popping this instruction from the YARV stack. The main aim of this section was to describe how YARV “pushes” and “pops” from and onto the stack, there are still more details and functionalities but, our aim was to trace the execution then see how to pause/resume the execution of the YARV machine.

6.2.3 YARV Instruction Trace: As we mentioned earlier, the “trace” instruction is a specially made instruction to hook into line by line execution of the YARV instruction and it serves the Tracepoint API and the set_trace_func method. In insns.def file I intended to put some code that will output me the sequence of instruction that are executed. Let's have a look. def add(integ) return integ+integ end x=5 puts add(x)

119 | P a g e

Ruby Debugger

x=6 puts add(x) x=7 puts add(x) x=8 puts add(x)

Output: 10 12 14 16 ~/hello.rb:12:in '<main>' & '<main>' ~/hello.rb:16:in '<main>' & '<main>' ~/hello.rb:17:in '<main>' & '<main>' ~/hello.rb:12:in 'add' & 'add' ~/hello.rb:13:in 'add' & 'add' ~/hello.rb:14:in 'add' & 'add' ~/hello.rb:18:in '<main>' & '<main>' ~/hello.rb:19:in '<main>' & '<main>' ~/hello.rb:12:in 'add' & 'add' ~/hello.rb:13:in 'add' & 'add' ~/hello.rb:14:in 'add' & 'add' ~/hello.rb:20:in '<main>' & '<main>' ~/hello.rb:21:in '<main>' & '<main>' ~/hello.rb:12:in 'add' & 'add' ~/hello.rb:13:in 'add' & 'add' ~/hello.rb:14:in 'add' & 'add' ~/hello.rb:22:in '<main>' & '<main>' ~/hello.rb:23:in '<main>' & '<main>' ~/hello.rb:12:in 'add' & 'add' ~/hello.rb:13:in 'add' & 'add' ~/hello.rb:14:in 'add' & 'add' Where â&#x20AC;&#x153;~/â&#x20AC;? denotes path file, that is changeable from device to another. 120 | P a g e

Ruby Debugger

If we have a closer look here we will see how actually YARV tends to execute the instructions line by line and this could be actually beneficial to us in future work. Another program:

def palindrome(string) str=string.downcase; str=str.gsub(/[^a-z]/,'') return str.eql? str.reverse end set_trace_func proc { |event, file, line, id, binding, classname| puts "#{file}:#{line} #{classname} #{id} called" if event=='c-call' } palindrome("abcbc")

Output: ~palindrome with set_trace.rb:2 String downcase called ~/palindrome with set_trace.rb:3 String gsub called ~/palindrome with set_trace.rb:4 String reverse called ~/palindrome with set_trace.rb:4 String eql? called ~/palindrome with set_trace.rb:1:in '<main>' & '<main>' ~/palindrome with set_trace.rb:7:in '<main>' & '<main>' ~/palindrome with set_trace.rb:7:in 'block in <main>' & '<main>' ~/palindrome with set_trace.rb:8:in 'block in <main>' & '<main>' ~/palindrome with set_trace.rb:9:in 'block in <main>' & '<main>' ~/palindrome with set_trace.rb:10:in '<main>' & '<main>' ~/palindrome with set_trace.rb:7:in 'block in <main>' & '<main>' ~/palindrome with set_trace.rb:8:in 'block in <main>' & '<main>' ~/palindrome with set_trace.rb:9:in 'block in <main>' & '<main>' ~ /palindrome with set_trace.rb:1:in 'palindrome' & 'palindrome' Where â&#x20AC;&#x153;~/â&#x20AC;? denotes path file, that is changeable from device to another.

121 | P a g e

Ruby Debugger

And the rest, you can view it via eclipse. But notice here “block in <main>“which refers to the set_trace_func. The piece of code that is added inside the implementation of the “trace” instruction”

#ifdef RUBY_DEBUGGER_ENABLE /* Dokmak Note: call the function that will stop the VM */ if(beginstopVM) { rb_control_frame_t *cfp = th->cfp; rb_iseq_t *iseq = cfp->iseq; int lineNumber = rb_vm_get_sourceline(cfp);

char *fileName = RSTRING_PTR(iseq->location.path); fprintf(stderr , "%s:%d:in '%s' & '%s' \n", fileName, lineNumber, RSTRING_PTR(iseq->location.label) , RSTRING_PTR(iseq->location.base_label)); rb_thread_t *runTH = iseq->is_entries->once.running_thread; } #endif

122 | P a g e

Ruby Debugger

6.2.4 Instruction Sequencing: Instruction sequencing here or simply the “ISEQ” structure it is meant to refer to the behavior of the instructions that are supposed to be executed. Let's look at the puts add(integer) in the previous example:

def add(integer) return integer+integer end x=5 puts add(x)

puts add(x) here will have to first execute the add function then “puts” the result on the screen “if that's a valid output”. From what I traced that the ISEQ pointer tends to change whenever a function call needs to be carried out. Let's look at another example:

def a puts caller end def b puts a end def c puts b end c

123 | P a g e

Ruby Debugger

Output: caller.rb:6:in `b' caller.rb:10:in `c' caller.rb:13:in `<main>'

In this example, the ISEQ pointer will call c then in turn will need b, then b will need a, then after the a is executed it'll pass control to b to output a then c will output b then c is outputted, by output I mean the “caller location” of each function “It's number line and file and its scope or control frame”. Also try to add this line to a new ruby file RubyVM::InstructionSequence.compile('puts 2+2').to_a Output using Interactive irb Ruby:

124 | P a g e

Ruby Debugger

That line will output the actual sequence of instructions that should be executed to be able to execute puts 2+2 by first pushing 2 then 2 then performing the addition operation then popping 2 and popping 2 then pushing 4 then output 4 on the screen by calling the routine inside the “opt_send_simple” YARV instruction to call the function from io.c that is responsible to output the result on the console. Important note: Since this above simple script contains no Ruby objects or classes “puts 2+2”, the self pointer will be set to the default “top self” object. This is an instance of the Object class Ruby automatically creates when YARV starts up. It serves as the receiver for method calls and the container for instance variables in the top level scope.

6.2.5 How variables are stored in the stack: This section is actually a small section and will be further expanded as long as we have more explorations inside the source code. But what I'm sure about is that , local and global “main” variables are stored in to the stack but we are still unsure whether it is the original YARV stack , or other stack related to the EP pointed by the EP variable. This EP pointer serves as a pointer to local variables on the stack and tends to involve in the arithmetic and logical operations Let's have a look at our previous sample program: The instruction “setlocal” or “getlocal” which is used to set the value of x which is 5 or get the value of integer which is also 5 , so It’ll be pushed on top of the stack.

INSN_ENTRY(setlocal_OP__WC__0){ { #define level 0 lindex_t idx = (lindex_t)GET_OPERAND(1); VALUE val = TOPN(0); // possibly gets the x variable and push it with the value 5 DEBUG_ENTER_INSN("setlocal_OP__WC__0"); ADD_PC(1+1); PREFETCH(GET_PC()); POPN(1); // it'll pop the instruction of x “putobject” and will insert its value possibly another stack. .}

125 | P a g e

Ruby Debugger

Here it'll push the value of x “putobject” which is 5.

DEFINE_INSN setlocal (lindex_t idx, rb_num_t level) (VALUE val) () { int i, lev = (int)level; VALUE *ep = GET_EP(); for (i = 0; i < lev; i++) { ep = GET_PREV_EP(ep); } *(ep - idx) = val; }

You might have noticed here that the macro GET_EP() is involved in the process of getting or setting an Object “In this case is an integer or fixnum in Ruby” , that environment pointer is used to handle only objects that represents values in Ruby which will be our focus in the future work. Finally, the YARV section is yet incomplete and is definitely full of lots of surprises and interesting stuff to learn. But what's left in this section is exploring more about whether there is just one stack which is the YARV stack or there are more stacks involved.

126 | P a g e

Ruby Debugger

6.3 Method Calls 6.3.1 Introduction to method “function” calls: A method in Ruby is a set of expressions that returns a value. With methods, one can organize his code into subroutines that can be easily invoked from other areas of their program. Other languages sometimes refer to this as a function. A method may be defined as a part of a class or separately. Methods are called using the following syntax: method_name(parameter1, parameter2,…) If the method has no parameters the parentheses can usually be omitted as in the following: method_name If you don't have code that needs to use method result immediately, Ruby allows to specify parameters omitting parentheses: results = method_name parameter1, parameter2 results = method_name(parameter1, parameter2).reverse Methods are defined using the keyword “def” followed by the “method name”. Method parameters are specified between parentheses following the method name. The method body is enclosed by this definition on the top and the word end on the bottom. By convention method names that consist of multiple words have each word separated by an underscore.

def output_something(value) puts value end

127 | P a g e

Ruby Debugger

Methods return the value of the last statement executed. The following code returns the value x+y. def calculate_value(x,y) x+y end

6.3.2 Control Frames and the Control Frame pointer: Let's look at our previous sample program: def add(integer) integer+integer end x=5 puts add(x)

Here the function add will have a control frame, end will have a control frame , puts will have a control frame so there are 3 control frames in this program. A control frame contains lots of information about a function or a block or a loop like “100.times” or a proc”stands for procedure” or a lambda “Similar to the proc” and so forth. These are examples of “blocks” in Ruby. We focused till now on function calls to be able to get the call stack in Ruby. Let's take a look at the components of a control frame in Ruby:-


128 | P a g e

Ruby Debugger

Refers to the program counter “virtual” register used to point to “or holds the memory address” of the next instruction to be executed in Ruby. VALUE* sp Refers to the current instruction being executed. VALUE flag Possibly a flag related to the control frame → Still unknown VALUE self self till now is similar to the “self” keyword in Ruby language that is much similar to the “this” keyword in Java which means the receiver that is using this control frame → Still not clear till now. VALUE klass The Ruby class type of this control frame → Still unknown and confusing VALUE* ep Possibly the control frame can have arguments like function call arguments or local variables inside this block which is referenced by the ep pointer. And possibly could hold the last local variable accessed by the YARV machine when it used the control frame → Still needs more exploration. rb_iseq_t* block_iseq Maybe some blocks have some sequence of execution of some YARV instructions that is required to execute a block call and possibly similar to the iseq struct used in the file insns.def. → Still not tested or used. VALUE proc //Unknown and confusing. const rb_method_entry_t* me

129 | P a g e

Ruby Debugger

Each and every method or function whether it is written in Ruby or a C implemented function “like (puts) must have a function or method implemented in the C language” So there must be a sort of a method entry created for that function. Actually the rb_method_entry_t is a struct originally, let's have a look at this struct.

rb_method_flag_t flag

Flags attached to each method entry → Still unknown

char mark //Unknown rb_method_definition_t* def

Method definition is a structure but the main thing in it is that it has “detailed” information about the components of each function “Function definition, arguments, the function's body … so forth”.

ID called_id

The id of the method caller. It could be another method or a recv “receiver” which is an instance of a class that's receiving the method tea.put_cream() tea here is the receiver of the call.

VALUE klass

This is the class of the method, It is still unknown and confusing.

130 | P a g e

Ruby Debugger

6.3.3 C method calls and Ruby method calls: We must differentiate between C-method calls and Ruby method calls in which Ruby method calls are the calls to the methods defined in the higher level programming “Written in Ruby” But C-method calls are the methods that are called “Written in C” that should serve some functionalities in Ruby. These C-methods and Ruby methods both have their control frames inserted in the stack which makes us question “How we are going to get our call stack from this Control frame stack or YARV stack” still unknown if it has its own separate stack. But the thing is once this is achieved, that is, once we find the data structure that holds all control frames we could check on its type which will be mainly a “vm_general_call” which means it's a method written in the higher level programming “written in Ruby”. This section will be extended in future work because there is a relation between C method calls and Ruby method calls internally. Finally we've reached the end of this section, and of course we'll discover and learn more and more about Ruby internals. The challenge is still on.

6.4 Approach for Pause/Resume of the YARV machine 6.4.1 Thread Signaling Before we discuss further details, we have to first define what’s actually a Thread and what’s it’s relation with this heading. A Thread or referred to as a “Light-weight process” in which it allows multiple and concurrent executions within a single process running in the CPU. Ruby as we referred above is a single process with a single “Main thread” which is referred by the (rb_thread_t) structure. So till now, Ruby follows the approach of a single process, single Thread. Now we wonder how Ruby contains the ability to create Threads in the “Higher level”. The answer lies in the fact that that single “Main thread” has the ability to hold other Threads along with it “In the source code: Living Threads, Running Threads etc... “And their type is (rb_thread_t) as well. So Ruby contains a Main Thread along with it the Threads that are created in the higher level programming in Ruby.

131 | P a g e

Ruby Debugger

How did we start <Thread Sleep>? We started by debugging the gets() function which is responsible for “waiting” for input from the user which tends to stop execution of Ruby until the user takes an action. And after he/she presses the Enter key, Ruby resumes working or in particular it’s Main Thread. 1st program used:


This function led us to expand our search to debug another program 2nd program used:

sleep 6 #sleep_timeval(..) /*Stops execution for a certain quantum of time specified by the timeval struct.*/ sleep #sleep_forever(...) /* Stops the main thread from executing forever.*/ sleep_timeval(GET_THREAD() , time , 1) /* time is an instance of the timeval struct having two parameters*/ /*Insert one of them inside the implementation of the "trace" YARV instruction and recompile.*/ struct timval{ // time tv_sec //Time in seconds tv_usec //Time in usecs }

132 | P a g e

Ruby Debugger

Let’s look at the function sleep_forever ‘sleep_forever(GET_THREAD() , 0 , 1)’ will sleep forever (Without interrupting) cannot signal the thread to resume. It takes as arguments the Main Thread, and arguments 0, 1 are essential for the function to work (Things related with deadlock checking and concurrency). This has a great disadvantage because it’ll stop the virtual machine forever without the ability to resume it again. Therefore it’s a bad approach. But it proves that Threads are involved if we want to control instruction execution. Notes:  If time.tv_sec is assigned by INFINITY "C macro referring to infinity" then it'll work like the function sleep_forever(...).  Remove the static keyword from the method stab and the method implementation itself so that it could be accessed by other files.  Include <thread.c> inside insns.def file containing the implementation of the "trace" YARV instruction.

Perfect solution: Thread Signaling A signal is a software interrupt delivered to a process. The operating system uses signals to report exceptional situations to an executing program. Some signals report errors such as references to invalid memory addresses; others report asynchronous events, such as disconnection of a phone line. The GNU C library defines a variety of signal types, each for a particular kind of event. Some kinds of events make it inadvisable or impossible for the program to proceed as usual, and the corresponding signals normally abort the program. Other kinds of signals that report harmless events are ignored by default. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. Finally, one process can send a signal to another process; this allows a parent process to abort a child, or two related processes to communicate and synchronize.

133 | P a g e

Ruby Debugger

This means that we can control Threads through signals. For more information, please check our references.

What was actually done?

************* **In insns.def** ************* #ifdef RUBY_DEBUGGER_ENABLE /* Dokmak Note: call the function that will stop the VM */ I f(beginstopVM) { /* Some code <Refer to it above> */ . . /***************New Code*****************/ char keyPressed = getchar(); //if(keyPressed != '\n') // If key pressed no “ENTER” Key //{ // rb_signal_exec(th, SIGSTOP); //} //else if(keyPressed == '\n') //{ // rb_signal_exec(th, 0); //} /****************************************/ } #endif

‘rb_signal_exec(th,SIGSTOP)’ sends a signal to the Main Thread [th] to stop execution. Observation during debugging the source code: SIGSTOP is called an interrupt signal to the Thread so there is an interrupt handler that is implemented in the filer signal.c

134 | P a g e

Ruby Debugger

which contains other various signals, ex: SIGTERM, which terminates the thread that should be interrupted.

Note: We should include signal.c as well in the insns.def file to be able to see the signal macros. ‘rb_signal_exec(th,0)’ means Continue execution. ‘0’ means that it will send an EXIT interrupt which means that it’ll direct the Thread to continues its execution.

135 | P a g e

Ruby Debugger

FUTURE WORK As you have seen we have passed along way in understanding the Ruby's source code, starting from just staring for hours in the code and feeling hopeless to wandering freely in the code and searching for specific functions and structs and even making changes in the code. Our focus now is to finish the Memory Profiler as fast as possible in parallel with learning more about the Ruby's VM, accomplishing this task would even help us in the line debugging and Time Profiler tasks as it would supply us with tools and experience that should make the next steps even easier and faster. We will state some steps we think it will help us in achieving our goal. A. Virtual Machine: 1. Trying to look for the actual stack that is holding local/global variables or the stack holding the control frames or the stack holding instruction sequences or whether they are all handled in one stack which is the YARV stack holding the YARV instructions which will be our output to form a “call stack” and “break points on functions and other blocks”. 2. Pausing or resuming the virtual machine, trying to pause the execution on some line of code and get some information from this line (“Line debugging”). 3. By understanding point 1, we will be able to link the line debugging work with the memory profiling work on basis of obtaining the value of objects at each pause of execution. 4. Solving problems that Ruby developers are facing with the current debuggers and then migrate to solving other problems of other debuggers. 5. Trying to trace the gets() functionality in Ruby that waits for an input from the user that could serve as a method that stops the execution of Ruby and resumes it as long as the input is taken from the user.

136 | P a g e

Ruby Debugger

B. Memory Profiler: 1. Accessing objects from the stack: We tried using the API described previously in chapter 5 but, it didn't gave us the desired functionality so we are going to try accessing the stack using some changes in the source code as this will give us more control on the functionalities we need to achieve. 2. Objspace API: We spoke earlier about this API, that its implementation was enhanced in Ruby 2.1. We need to explore it and to start using it to get nearer to our goals. This API implementation can be found in objspace extension in the source file, and we are mainly interested on those files: • • • •

gc.c objspace.c objspace_dump.c objspace_tracing.c

The only function that we could get benefit from it till now,is the memory size function. It's time to start testing and using its functionalities. 3. Scope We want to get the Scope of the object, to identify where it was declared,called or changed. Knowing the scope would help our users to track memory by a more efficient way. We did a great exploration in this part, yet we didn't reach an output yet. In our trials, we found 3 paths that may take us to our goal otherwise we will need to reach the scope using the VM's features. I - Getting the function location, and its parameters: • proc.c got most of the functions that may help in this . 137 | P a g e

Ruby Debugger

Like: rb_method_parameters, rb_method_location functions. • Some of proc.c and other useful macros are defined in method.h. Like: rb_method_entry_location, rb_obj_method_location...etc. • We are still trying to use these functions either with the help of a header or re-implementation for using them. • Weird Bug appeared here that we are still working on, on including a header file, some of its functions are not visible. So either it needs an Extern declaration or other solution we didn't reach yet. II - Using objspace_tracing.c's functions. In objspace_tracing we have functions like: allocation_sourceline, allocation_class_path, allocation_method_id...etc. we are still on the exploration of how to use such functions or re-implement it for our cause. Interesting information from Kochi Sasda was that: From Ruby 2.1, we can store object allocation related information with ObjectSpace.trace_object_allocations. Currently, this feature supports to store the following information about "where/when". * Where * file name * line nmber * class name (representing String) * method id * When * GC generation (This is not about "Genrational GC". It is GC.count at allocation) To use objspace tracing function, you may try this on Ruby, to know the expected output.

#### require 'objspace' def where_are_you_from? obj 138 | P a g e

Ruby Debugger

file = ObjectSpace.allocation_ sourcefile(obj) line = ObjectSpace.allocation_ sourceline(obj) "#{file}:#{line}" if file && line end ObjectSpace.trace_object_allocations{ $a = a = # created at t.rb:10 b = # created at t.rb:11 p where_are_you_from?(a) #=> "t.rb:10" p where_are_you_from?(b) #=> "t.rb:11" } c = # created at t.rb:16 p where_are_you_from?($a) #=> "t.rb:10" p where_are_you_from?(c) #=> nil ObjectSpace.trace_object_allocations_clear p where_are_you_from?($a) #=> nil ####

III - Using Nodes: As ruby code is converted into AST Nodes, a thought about getting the objspace's node, and getting its information from the node's information.In the Node side, how to get its line "nd_line" , its type ..etc , unexpected outputs though was implemented.We are working on knowing if it is possible to pass the object's node or no . We are also still considering the efficiency of this track. IV - Integration Until this moment, all our work is about getting more information as much as we can, yet we need to finally integrate this output to reach each object information and the whole memory information for a strong memory profiler.

139 | P a g e

Ruby Debugger

REFERENCES 1. Aman Gupta. (2013). stackprof. Available: 2. ARM infocenter. (1998). features of debugger. Available: hddcjhf.html. Last accessed 20 dec 2013. 3. Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato.(2002).Version Control with Subversion. Retrieved from: 4. ( Concepts of Signals. Available: Last accessed 29th Dec 2013. 5. Chris Lalancette. (2011). Writing Ruby Extensions in C . Available: 6. Available: Last accessed 15th Dec 2013. 7. Daria Dovzhikova. (December 24, 2013).RubyMine 6.0.2 Out!.Available: Last accessed 10th Dec 2013.


8. Dave Thomas, Chad Fowler, and Andy Hunt (2004).Chapter 21: Extending Ruby. In:Programming Ruby, The Pragmatic Programmersâ&#x20AC;&#x2122; Guide. 2nd ed. United States of America: The Pragmatic Bookshelf. p261-p270. 9. Ian J. Hayes (14th April 2013). Stack Machine. Not specified: Ian J. Hayes. P1-7. 10.James Britt and Neurogami. (2001). Help and documentation for the Ruby programming language-GC::Profiler. Available: Last accessed 20th dec 2013 . 11.James E. Smith, Ravi Nair (May 2005).The Architecture of Virtual Machines. Not specified: IEEE. p1-7. 12.Jean-Luc. (5th Nov 2013).Ruby Pause thread. Available: Last accessed 29th Dec 2013. 13.Joe Damato & Aman Gupta. (2010). Garbage Collection and the Ruby Heap. Available: 140 | P a g e

Ruby Debugger

14.Joseph Silvashy. (24 Oct 2013).Ruby sleep or delay less than a second? Available: Last accessed 30th Dec 2013. 15.Kostja Stern. (July 15, 2012). Stack based vs Register based Virtual Machine Architecture, and the Dalvik VM.Available: Last accessed 5th Dec 2013.1716.Mark Volkmann. (2007).Ruby C Extensions[PDF]. 17.Narihiro Nakamura. (2012).





18.Nick Quaranto. (April 2009).RubyGems.Available: Last accessed 10th Dec 2013. 19.Nicolas Zermati. (28-3-2012). Introduction to Ruby code optimization. Available: Last accessed 20th Dec 2013. 20.oldrock. (13th Jun 2012). How to install svn repository in my shared hosting server. Available: Last accessed 23th Dec 2013. 21.Raimon Bosch. (January 6, 2012). Profiling your memory usage in Ruby. Available: Last accessed 20th dec 2013. 22.R. P. Goldberg (July, 1973). ARCHITECTURE OF VIRTUAL MACHINES. 2nd ed. Electronic Systems Division, U.S. Air Force, Hanscom Field, Bedford, Massachusetts: IEEE. P1-39. 23.Ruby Hacking Guide. (17 January 2011).Ruby Hacking Guide/Threads. Available: Last accessed 29th Dec 2013. Ruby Threads. Available: Last accessed 29th Aug 2013. 25.saumil shah. (2011). features of debugger. Available: Last accessed 20 Dec.2013. 141 | P a g e

Ruby Debugger

26.Shaughnessy, Pat (2012).Ruby Under a Microscope: Learning Ruby Internals Through Experiment. San Francisco: No Starch Press. p74-77. 27.tutorialspoint. (2013). what_is_gdb. Available: Last accessed 20 Dec 2013. 28.Wikibooks. Ruby Programming/Syntax/Method Calls. Available: 29.Wikipedia. (May 2013). Debugger. Available: Last accessed 20 Dec 2013. 30.wikipedia. Revision control. Available: Last accessed 23th Dec 2013. 31.wikipedia. Stack machine. Available: Last accessed 10th Nov 2013. 32.wikipedia. Virtual Machine. Available: Last accessed 20th Oct 2013. 33.Wikipedia. (2009). Visual_Studio_Debugger. Available: Last accessed 20 Dec 2013. 34.Wikipedia. YARV. Available: Last accessed 23th Dec 2013. 35.Yunhe Shi, David Gregg, Andrew Beatty, M. Anton Ertl. ().Virtual Machine Showdown: Stack Versus Registers. Available: Last accessed 9th Nov 2013.

142 | P a g e

Ruby debugger