Toxic Dump - Blog updates http://toxicdump.org/ ToxicDump.org A dumping ground for science, art and everything in between. en Tue, 07 Sep 2010 18:11:31 -0700 Tue, 07 Sep 2010 18:11:31 -0700 http://www.rssboard.org/rss-specification ToxicDump.org lucas@toxicdump.org (Lucas Vieira) lucas@toxicdump.org (Lucas Vieira) lucas@toxicdump.org (Lucas Vieira) Creating human-readable scales for data visualization in charts. http://toxicdump.org/blog/view/1/Creating-human-readable-scales-for-data-visualization-in-charts If you ever had the task to generate a graphic from scratch, then you probably encountered this problem: how do we pick a "round" number for the maximum range of the graphic which will fit all of our data nicely?

Here's an example. Your app has a statistics section which displays a line chart of the number of entries versus time. At the beginning, when the number of entries is small - say 6 entries - your graphic could display a range from 0 to 10. But after a while you'll have some hundreds, or even thousands of entries.

How do you find which is the next "best display range" for any given maximum value m? For a maximum value of 332, which would be better: a range from 0 to 340, 400, 500 or 1000?

In this post, I'll show you how to create a function \mbox{R}(m) that will give you that display range value, and we'll be using logarithms to achieve that.

A very brief introduction to logarithms

For those not too familiar with logarithms, they are the inverse function of exponentials. Say you have the equation 2^x = 5. In order to find the value of x, you need to use logarithms.

A logarithm is a number, the power to which a certain base (in this case, 2) must be raised to return another number (in this case, 5). This would be written as \log_2 5. In our example, the value of the logarithm is about \log_2 5 \approx 2.32192809... In short, you have this correlation between a logarithm and an exponential functions:

b^x = a \Leftrightarrow \log_b a = x

When the base isn't specified for the logarithm, it is usually assumed to be e \approx 2.71828..., also known as Euler's number. This seemingly random number has several useful properties, but not for our particular problem here. Note, also, that a logarithm with this base is also called a natural logarithm, usually written unambiguously as \ln x.

A very useful property of logarithms is that you can easily convert a logarithm in a base to any other base you wish. Say you have the logarithm of x in a base a, but you actually want it in base b. You can just do this:

\log_b x = \frac{\log_a x}{\log_a b}

So even if you have an unknown base for your logarithm function, you can convert it to any other base you wish by just diving the value given by that function using your value, to the value it gives for the base you want.

This is useful - and worth pointing out - because a lot of programming languages give you only a natural logarithm function, and we'll be working with a base 10 logarithm from now on.

Finding the order of magnitude using decadic logarithms

A decadic logarithm (also known as common logarithm) has a base 10. It is useful in our case because we happen to use a decimal system for our numbers.

In any positional numerical base, you can figure out how many digits a certain integer value will have by taking the logarithm of that value using the numerical base as the base for the logarithm, rounding down to the nearest integer and then adding one.

For example, the number 1452 has \lfloor \log_{10} 1452 \rfloor + 1 = \lfloor 3.16.. \rfloor + 1 = 3 + 1 = 4 digits. Here, \lfloor x \rfloor is the floor function, which gives the smallest integer closer to x. It is a very standard mathematical function in most programming languages.

Let's call the maximum value in our dataset m, and we'll create a function \mbox{D}(m) which will give us the number of digits in the integer part of m:

\mbox{D}(m) = \lfloor \log_{10} m \rfloor + 1

So for any m, we can find the next biggest power after m by simply calculating 10^{D(m)}. This will give us 1 for any 0 \leq m < 1, 10 for any 1 \leq  m < 10, 100 for any 10 \leq m < 100, etc. In effect, the function \mbox{D}(m) gives us the exponent for the next decimal power (or order of magnitude) after m.

Breaking our range into smaller steps

Now, when m = 100, our next order of magnitude is 3, that is, 10^3 = 1000. If we use that for the range of our chart, we'll have all the data drawn in just the bottom 10% of the graphic.

That doesn't look too good, does it? It's just too much empty space, and it makes the maximum value in our dataset look much smaller than it really is.

Each order of magnitude is a 10x increase, and we want to break it in n equal parts. We'll do that by using a sibling of the floor function, the ceiling function, \lceil x \rceil, which returns the largest integer closer to x. This function is also common in most programming languages, usually by the name "ceil".

We'll use the ceiling function to create a "step-factor" which will have steps of size \tfrac{1}{n}. You do this simply by evaluating \tfrac{1}{n} \lceil x n \rceil. By itself, this looks like this:

But note that for each new order of magnitude we reach, it should take 10 times longer to step up to the next. To do that, we plug in our next power of ten inside the ceiling function, but as a division, so we end up with:

\frac{1}{n} \left\lceil \frac{m n}{ 10^{D(m)} } \right\rceil}

Now, if you evaluate this as a function on x, you'll see it just steps between 0 and 1 n times before resetting back to zero, at which point it will take 10 times longer for each step up.

We just have to make it step across powers of ten now, and to do that we multiply \tfrac{1}{n} to 10^{D(m)}. The result is our function \mbox{R}(m), which looks like this:

\mbox{R}(m) = \frac{10^{D(m)}}{n} \left\lceil \frac{m n}{ 10^{D(m)} } \right\rceil}

Where n is the number of steps. Good values for n are 2, 4, 5 and 10.

Here's how this would look like in pseudocode, with the appropriate decadic logarithm conversion:

function getGraphRange(maxValue):
    numDigits = floor( log(maxValue) / log(10) ) + 1
    nextDecimalPower = pow(10,numDigits)
    step = ceil( maxValue * numSteps / nextDecimalPower )
    range = ( nextDecimalPower / a ) * step
    return range

But what about the subdivisions and their labels?

Now that we have a round-ish range for our chart that our users can understand, we still have the problem of dividing that range in an appropriate number of parts so the labels will be easily understood. You don't really want the labels to mark 0, 110, 220, etc.

This can be accomplished using our function \mbox{D}(m). First, we need to find the first power of ten below our range (and not the maximum m), and that's simply 10^{D(R(m))-1}. We divide our range to this value, and then multiply by the number of steps n we defined before. The result is the number of subdivisions you need.

d = n \frac{\mbox{R}(m)}{ 10^{D(R(m))-1} }

Tip: If you're using n = 10 for the range, it's probably a good idea to use 5 for this equation in order to avoid clutter in your graphic, as 10 usually gives you too many subdivisions.

Conclusion

Data visualization is not a trivial task as it may seem at first. Doing it improperly, without considering the quirks of human perception, will seriously corrupt the information you are trying to convey. So it is very important to consider both the mathematics and psychology involved.

As others have mentioned, mathematical ignorance and the limitations of our instinctive notions of numbers can be dangerous and have serious political and personal consequences, so be careful with your data!

]]>
Sun, 28 Feb 2010 06:08:30 -0800 1_1267366110
lucas@toxicdump.org (Lucas Vieira) The amazing work of Graham Annable, AKA "Grickle" http://toxicdump.org/blog/view/4/The-amazing-work-of-Graham-Annable_-AKA-Grickle About a month ago I came across this Reddit post, "Totally creeped out by a cartoon." The link pointed to The Hidden People, a short YouTube video by artist Graham "Grickle" Annable. Turn on your audio and watch it:

I thought that was brilliant, so I had to check more of this guy's work. He has a few more shorts on his YouTube channel. They're all good, even though the first ones repeat the same joke, it's still quite amusing.

Here are a few of my favorites:

Space Wolf

Day Off

The Last Duet on Earth

Botched

Zoo

In his animated shorts, Grickle often uses Krzysztof Penderecki's "Threnody to the Victims of Hiroshima" for kicking up the creep factor, but there's more than creepiness and humor in Grickle's work.

More than just amusing

After watching all of his videos, I decided to explore more of his work. He has a few drawings and comics up in his blog and his Flickr account, and a quick Google search reveals a few of his comics online as well.

By then I was convinced of his worth, and decided to buy one of his books: The Book of Grickle published by Dark Horse Comics. It arrived today, and I eagerly read all of it in a single sitting. Twice. It was damn good!

The thing is, Grickle has an unusual and refreshing style of storytelling. While his drawings and themes are minimalist and sparse, they are very subtly expressive and well drawn, but carrying a whole lot of depth. His work has been described as "comic book poetry," and I honestly can't sum it up better myself.

He manages to be extremely profound, moving and entertaining, page after page, using nothing but snapshots of anonymous, generic characters that tell us more about ourselves and human nature than you'd expect. The stories always seem to build upon those subtle nonevents of normal life, sometimes with a twist, sometimes with a deeper commentary, but sometimes not. And that feels more than enough.

It is not uncommon for an entire page to be devoted to a silent moment of a character's curiosity or discomfort, which not only brings characters to life, but adds a new hint to something more meaningful than a cheap laugh. It's amazing!

I can't help but be reminded of Scott McCloud's Big Triangle, in which he describes how abstract characters are more effective at connecting with the reader. But it's not just visuals, it's the details of situations, and not the situations themselves, that Grickle use to captivate. It is very effective, and I haven't seen anyone doing that as well as he has.

All in all, reading Grickle is a great experience in every sense of the word, and to me this was the beginning of a long appreciation for this guy's work.

But wait, there's more

Currently, Graham Annable is the creative director at Telltale Games. They have recently released the pilot "episode" for a game based on his work. Since it was so cheap, I decided to try it out too.

Nelson Tethers: Puzzle Agent is a fun little puzzle/adventure game based on the "Grickle mythos", including the Hidden People. The game successfully captured the gist of Grickle's art and humor. While some of the puzzles are rather easy, the game certainly has a lot of potential. Here's hoping for more of it.

]]>
Wed, 11 Aug 2010 13:55:39 -0700 4_1281560139