Visualizing movie data 2 Time Series by Henk_Lamers

Time Series Time Series

Visualizing movie data

Henk Lamers

2 2

Visualizing movie data

Time Series

Long time ago I heard the word ‘ubiquitous’ for the first time and I was curious what it meant. I looked it up and the word means: present, appearing, or found everywhere. So these time series graphs are a type of graphs that you can find everywhere. Because this project is about visualizing our movie data I need three or more data sets to compare with each other. I hope to find out how our qualifications relate to the ones of IMDb (Internet Movie Database), Metacritic and/or Rotten Tomatoes. Imagine I would like to see the results of the first one hundred films, compared to results of these websites. This would allow me to draw the necessary conclusions. It will end-up in a lot of handwork, but for now I will accept that because I’m in a learning process.

143 Movie reviews of the 200 movies that we watched in 2015.

Visualizing movie data

The shape of data I felt it necessary to briefly reflect the context in which the Time Series originated. Everyone who works with Excel or another software package knows the bar graph. But where does it originally come from? Not from Microsoft. The first ‘inventors’ of the ‘bar graph’ are Johann Heinrich Lambert (1728-1777) and William Playfair (1759-1823). At least that is generally claimed. If, however, you search a bit further in history, there is another person who has had an influence on the two previously mentioned persons. Namely Joseph Priestley (1733-1804). He is best remembered for his pioneering work in chemistry. In particular for the discovery of oxygen. Having it isolated in its gaseous state. But he was also a prolific theologian, an innovative educator, and a liberal political philosopher.

Johann Heinrich Lambert

In 1765 he published the ‘Chart of Biography’, which is the first use of a line to indicate the length of a lifetime. It covers a fixed timespan, from -1200 to 1800, and includes two thousand names. Priestley organized his list in six categories: Statesman and Warriors, Divines and Metaphysicians, Mathematicians and Physicians, Poets and Artists, Orators and Critics, and Historians and Antiquarians (lawyers were placed here). Priestley’s principle of selection was fame, not merit, therefore, as he mentions: ‘the chart is a reflection of current opinion’. The chart was also arranged in order of importance: ‘statesmen are placed on the lower margin, where they are easier to see, because they are the names most familiar to readers’. These timelines directly inspired Wiliam Playfair’s invention of the bar chart, which first appeared in his Commercial and Political Atlas, published in 1786. Playfair was driven to this invention by a lack of data. In his Atlas he had collected a series of 34 plates about the import and export from different countries over the years, which he presented as line graphs or surface charts: line graphs shaded or tinted between abscissa and function. An abscissa is the number whose absolute value is the perpendicular distance of a point from the vertical axis For the point (2, 3), 2 is called the abscissa and 3 the ordinate. Because Playfair lacked the necessary series data for Scotland, he graphed its trade data for a single year as a series of 34 bars, one for each of 17 trading partners’. For Playfair graphics were preferable to tables because graphics showed the shape of the data. William Playfair invented several types of diagrams: in 1786 the line, area and bar chart of economic data, and in 1801 the pie chart and circle graph, used to show part-whole relations. Johann Heinrich Lambert was a Swiss polymath who made important contributions to mathematics, physics (particularly optics), philosophy, astronomy and map projections. Lambert was the first to prove conclusively that the number π (pi) was an irrational number, that is, can not be written as a fraction. A good example of his graphs in which he shows how the temperature varies during the course of the year at various depths below ground level. 4

Joseph Priestley

William Playfair

Time Series

Detail from Joseph Priestley’s Chart of Biography.

In this bar chart Scotland’s imports and exports from and to 17 countries in 1781 are represented.

Lambert constructed a ‘suitable’ curve through the points that correspond with the observations. The graph below shows how this curve changes with the depth below ground.

Visualizing movie data

VMD_02_01

I can imagine to place the numbers 1 to 10 on the left side of the graph and at the bottom the titles of the films. That seems logical, but it is not because movie titles can be very long. For example: ‘A Pigeon Sat on a Branch Reflecting on Existence’. So you would expect the film titles on the left side and then, the numbers 1 to 10 at the bottom. At this moment I think the best solution would be when you place your cursor on a data point and the movie title is displayed at that point. But maybe I run too much ahead of myself. I have now read the original data of Ben Fry’s Time Series chapter in the program and I changed the display format. 6

Time Series

VMD_02_02

When focussing on the data, the first thing you notice about the IMDb, Metacritic and Rotten Tomatoes reviews is that they work with floats. Our own movie data works also with floats but the end results in ints. So I actually have to run all 100 film programs again and see what the end result is using floats. When I have those results, I have to type them in a text file. And then I do the same with the results of IMDb, Metacritics and Rotten Tomatoes. I left out Metacritics in the end. It sometimes happens that we have seen a film but that it is not found on IMDb or Rotten Tomatoes. In that case, the film gets a zero. The first thing I noticed in our chart, which

uses our own data, is that it looks quite messy. There is not really some logic to find in the positioning of the points. The reason for this is that our films are chosen randomly. This results in random positions for the positioning of the set of points. The sequence is the real sequence of the first 100 films we have seen in 2015 though. Furthermore, the points are positioned at the bottom. This is caused by the largest value in the other data series. Our data set ranges from 0.0 to 10.0. While the other two data sets range from 5.1 to 46.4. Therefore these other two sets have still to be adjusted. But I do not have the right data for them yet. 7

Visualizing movie data

VMD_02_03

At this moment I have added all the scores from IMDb and Rotten Tomatoes. I can now hit the “]” key and the “[“ key to go through the three different graphs. It all looks a bit scarce. But you do get an impression of how the scores are distributed. I’ve also added titles as a place holder (We, IMDb and RT (Rotten Tomatoes)).

Time Series

VMD_02_04

I have increased the number of films to 150. It now looks somewhat less scarce. Eleven films from Rotten Tomatoes are not evaluated. That makes them stand to zero at the bottom of the chart. However, these films are evaluated on IMDb and by us.

Visualizing movie data

VMD_02_05

At the bottom, I added the numbers of films we have seen. I also reduced the white background space slightly. This ensures that everything is shown less cramped in the display window. It would even be better when you could read the titles of the movies instead of our numbering. But perhaps I can add that at a later stage. Because after all these graphs are only about comparing our voting behaviour with IMDb and RT. A quick conclusion about it teaches that our differences are slightly wider spread. It ranges from 3.3 to 5.9 points. IMDb ranges from 4.1 to 9.3. Rotten Tomatoes series go from 4.5 to 9.8 (if you do not count the 0.0). 10

Time Series

VMD_02_06

Futura Medium Futura Bold

I have added horizontal and vertical grid lines that may be helpful to compare the data points better. On the left side of the graph are now the scores of 0.0 to 10.0 displayed. And as a result, there is no need for the positioning of additional tick marks. The horizontal and vertical lines do their work instead. I think that score numbers are displayed too long. I have now four digits after the point because we are working with floats. The function ceil does not help in this case. Because that rounds everything off upwards. Floor rounds everything downwards. The feature Iâ&#x20AC;&#x2122;ve used is nf. Now there is just one number shown after the decimal point.

I use two versions of the Futura. Futura Medium and Bold. Furthermore, I also labelled the numbers. That makes the chart clearer.

Visualizing movie data

VMD_02_07

I now go ahead replacing the points with a line. Actually this is a bit rubbish. The scores of the films have nothing to do with each other. Each score of a film state is a value on its own. So there is no mutual connection with a line necessary. But as a variation it is perhaps interesting. I also changed the colours. The white field is replaced with a dark gray. Because then the coloured lines stand out better.

Time Series

VMD_02_08

In this version all scores are displayed on top of each other to see where the differences are. The title of the data sets should change with it if you choose another data set. But I donâ&#x20AC;&#x2122;t like it anyway. It is a poor and chaotic graphic image. So this seems to be not a good option.

Visualizing movie data

VMD_02_09

I now have retrieved some items from one of the earlier sessions. The line connections remained blue and the points themselves are white. The points are most important so they are allowed to stand out. Iâ&#x20AC;&#x2122;ve made them a little smaller. This has as a result that (when points are close to each other) they overlap each other less.

Time Series

VMD_02_10

This proposal introduces roll-overs. I now get feedback that I already can see on the x and y axes but much more precise. But actually you would like to see the movie title when your cursor is at a data point. I think Iâ&#x20AC;&#x2122;m going to do that at a later stage. But I am unsure about it. I think itâ&#x20AC;&#x2122;s more important that I get some sense of what you can do with the data.

Visualizing movie data

VMD_02_11

I do have the feeling that the lines have become too dominant. Especially now that youâ&#x20AC;&#x2122;re getting direct feedback on the cursor. The lines are no longer functional. I will also try if I can make the middle block more squared. You lose that the smaller rectangles are not square anymore. However, it does create more room in the width. I also reduced the proximity of the cursor and increased the point size of 10 to 12. And Futura Bold is used for the values under the cursor.

Time Series

VMD_02_12

Filled the lower space of the graph with a blue-ish color. Still not sure about the connections of the data points.

Visualizing movie data

VMD_02_13

I have made the background of the chart the same colour as the background colour. That gives a completely different picture. I initially had accentuated the vertical lines. But I think the horizontal lines can better be accentuated. These lead you too much more meaningful data. I have given the horizontal lines 50% transparency in the beginning. But afterwards I got a better result by decreasing the line width to 0.5 pixels. Which is basically logically impossible.

Time Series

VMD_02_14

It seems silly to transform this graph to a bar graph. I must then let the program draw rectangles instead of one flat plane. But then I have a problem. Because I have 150 bars in a width of 600 pixels. This means that the width of one bar can be a maximum of 3 pixels or less. At 4 pixels, the total lower surface is filled again by overlapping bars. But with 3 pixels I think itâ&#x20AC;&#x2122;s just about acceptable and it even has some form of sophistication.

Visualizing movie data

VMD_02_15

As a last proposal I introduced tabs for the three different data sets. But I found the Futura Bold way too heavy in these white tabs. So I opted for the Futura Medium.

Time Series

VMD_02_16

Now I have to do a few more things. The white area behind the title is way too loud and is almost visually independent of the graph. Plus the bar chart layout is not the best Iâ&#x20AC;&#x2122;ve seen so far. As a final detail I go back to the design of VMD_02_12. I now only use the Futura Medium. I also adjusted the colour. I chose red and green. Two distinctly different colours. The strong contrast between the two colours allows the separation-line between the two planes extra stand out. And thus it seems to me that this session is finished.

But there is one more thing, I made a very simple animation between the three different datasets. The datasets of us, the IMDb and Rotten Tomatoes interpolate their points. Watch it here: Loftmatic at Vimeo

Visualizing movie data

Conclusion With regard to the scores, I can conclude that our scores are further apart than those of IMDb and RT. Of the 150 films, we have seen six that score 9.5. IMDb has only one. RT has one film with a score of 9.8. We also get the lowest number in the scores. A 3.0. IMDb goes up to 4.1. RT goes up to 4.5. We consider a film with eight points as a good film. We have seen thirty films with an eight or more. IMDb has twenty-three films with an eight or more. RT has 20 films with an eight or more. We consider a film that scores five or lower as a very bad film. We have seen nine very bad films. IMDb has two. And Rotten Tomatoes has three. It is of course interesting that you can make all kinds of graphs. And I certainly learned from that. But the endresult is wrong. It is simply not true. The visual horizontal relationship between films does not exist. And is purely arbitrary. For example, if you have seen three films that all have a different score. Then there is no horizontal connection. Because it just depends on what dates you have seen those three films. If you see them on a different date then the horizontal connection is different. So for that matter the version with dots is fine. And the vertical bars are also suitable. Because dots and bars are visually separated from each other. Just like scores of films.

143 Movie reviews of the 200 movies that we watched in 2015.

Time Series