2D Graphics Algorithms (part 2) – transcript
Previously, we’ve gone over a so-called “naive” line drawing algorithm, which, while workable, doesn’t have the best performance. Bresenham’s line algorithm, named for its inventor Jack Bresenham, is the most commonly used method for drawing lines, as it requires using only addition and subtraction per pixel instead of slower division and multiplication operations.
Like with our naive line algorithm, in Bresenham’s, we treat line drawing as two separate cases:
When our slope is between 1 and negative 1, we draw each pixel such that x increases by 1 while y increases by the slope. When our slope is greater than 1 or less than negative 1, we do the reverse, drawing each pixel such that y increases by 1 while x is incremented by the inverse of the slope.
The difference from our previous method is that, in Bresenham, we aren’t using the equation of our line except once to find the slope: instead of plugging in x values to get y values or vice versa, we simply increment by the slope (or inverse of the slope) for each pixel.
So here for example, we have our line starting dead center on the leftmost pixel and sloping up to the right with a slope less than 1. For each subsequent pixel-center x coord, we can find the line’s corresponding y coord by simply accumulating the line slope. So for a slope of 0.4, the y coord for the second pixel is 0.4 units higher, the third pixel is 0.8 units higher, the fourth pixel is 1.2 units higher, and so forth. Be clear that these are coordinates relative from the center of the first pixel.
The next question is how to use these relative y coordinates of the line to get pixel y coordinates. The most straight forward solution is to round to the nearest integer, giving us relative pixel y coordinates we can add to the starting pixel’s y coord. Rounding 0.4 gets us 0, 0.8 gets us 1, 1.2 also gets us 1, etc. We add these values to the y pixel coordinate of the first pixel, yielding our y pixel coordinates for the rest of the line.
The problem is that rounding, like multiplication and division, is a relatively costly operation which we’d rather avoid. So the Bresenham algorithm instead proceeds by observing that the y pixel coordinates here get incremented by 1 every time the accumulated slope reaches a threshold of point 5. So when the accumulated slope first passes 0.5, we go up a pixel; when it hits 1.5, we go up another pixel; and when we hit 2.5, we go up another pixel, and so forth. So rather than rounding, we can simply increment our y pixel coordinate each time the accumulated slope reaches these point 5 thresholds.
Let’s see how this plays out in code. The basic structure is the same as our previous algorithm, with vertical lines treated exactly the same as their own special case. We also again treat lines with a run greater than rise as one case and lines with a run greater than rise as another. Zooming in on the run greater than rise case, where the slope, m, is between 1 and -1, things are very similar, except we no longer calculate the line intercept value b; instead, we have an offset that starts at 0 and a threshold to check against that starts at 0.5. Again, the idea is that we iterate from the smallest x value to the largest x value, computing the proper y value coord for that x, and drawing a pixel at that coord. So depending upon whether x1 or x2 is smaller, we designate one of the end points as our starting x and y values. We then use a for in loop to loop from the smaller starting x up to the larger end x, drawing a pixel at the current x and y, incrementing the offset by delta (which is our slope), and if the offset is now greater than the threshold, we increment y by one pixel and increment the threshold. Notice that our offset and threshold are always positive, but the variable adjust is negative 1 if our slope is negative because in that case we want to move the y coordinate down, not up.
So again, we’re accumulating the offset by delta and then incrementing the y coordinate and the threshold every time the offset passes the threshold.
The case for a rise greater than run is virtually identical but for swapping the x’s and y’s (except in the call to setPixel) and but for the initial delta of run over rise instead of rise over run.
So far so good: we have a working Bresenham algorithm that works using just simple addition and comparisons without any multiplication or division (aside from the one time we compute the slope). However, we could get even better performance on most hardware by forgoing the use of any float values and instead just using integers. This is perfectly doable once we observe that the particular values of the offsets and thresholds only matter in proportion to each other; we can multiply them both by anything we want as long as we scale them equally. So first off we want to double everything so that our threshold values go from 0.5, 1.5, 2.5, 3.5, and so forth to 1, 3, 5, 7, and so forth. This requires not only doubling our starting threshold value but also doubling the value by which we increment the threshold. So the threshold starts out at 1 and each time we increment it, we increment it by 2, not 1.
Now our threshold values are integers, but what about the delta? Well when the delta is rise divided by run, we can get an integer by multiplying by run, and when the delta is run divided by rise, we can get an integer by multiplying by rise.
So in sum, we multiply everything by 2 and also either the rise or the run, depending on the slope. Remember that when the slope is between 1 and -1, our delta is normally rise over run, our starting threshold is 0.5, and the threshold increment is 1. If we multiply them each by absolute(run) and 2, we get a delta of absolute rise times 2, a starting threshold of absolute run, and a threshold increment of absolute run times 2. For the other case of a slope not between 1 and -1, our delta is normally run over rise, our starting threshold is again 0.5, and the threshold increment is again just 1. If we multiply them each by absolute rise and 2, we get a delta of absolute run times 2, a starting threshold of absolute rise, and a threshold increment of absolute rise times 2.
So we’ve effectively gotten rid of floats in our algorithm. You can see this code at work in our draw module as the bresenhamLineOpt method.
Before moving on, I should note that details of the Bresenham line algorithm often differ in various implementations: the core idea in all cases is to use an accumulated offset tested against a threshold, but it is not always described in those terms, and some implementations handle the offset and threshold differently. In particular, it’s very common to keep the threshold constant and instead adjust down the accumulated offset. So instead of a threshold that goes from 0.5 to 1.5 to 2.5 and so forth, the threshold can be kept at 0.5, instead decrementing the offset when it passes the threshold. Here for example, you see that when the threshold is met and the pixel coordinate adjusted by one, the offset gets decremented by one, going into negative territory (denoted as the blue lines). Because the offset passes the threshold at the same pixels this way, it all works out the same. Does adjusting just one variable in the comparison instead of two make the loop perform better? I’m not sure, but I imagine it depends upon the hardware. In either case, I just find this variant initially more difficult to visualize and explain, which is why I presented the other variant.
Here we see the result of the same large image shrunken down to a quarter size, but using different algorithms. On the left, we used nearest neighbor interpolation, but on the right, we used bilinear interpolation. The bilinear technique avoids the pixilation of nearest neighbor and so is preferred in almost all circumstances. The one exception is with pixel art. Here we see the same small pixel image doubled in size using nearest neighbor on the left and bilinear on the right. Whereas bilinear interpolation effectively blurs the image, nearest neighbor preserves the hard-edged, pixely look of the smaller original.
Let’s look at how these two interpolation techniques work.
First off, consider the simple case of scaling a one-dimensional image using nearest neighbor. We have a row of 5 pixels we want to resize to a row of 8 pixels. For our purposes here, we can think of pixels as little squares, even if that isn’t totally accurate in other contexts.
The essence of what we want to do here is translate from one coordinate system to another. The five pixel image is a coordinate system that runs from 0.0 at the left edge of its leftmost pixel to 5.0 at the right edge of its rightmost pixel. Likewise, the eight pixel image is a coordinate system that runs from 0.0 on the left edge to 8.0 on the right edge. The question is how the coordinates in our five pixel “source” image line up with the coordinates in our eight pixel “destination” image. Clearly 0.0 in the source cooresponds to 0.0 in the destination, and 5.0 in the source corresponds to 8.0 in the destination, but what about everything in between? Well that’s a simple math problem of scaling. The point that lies at N percent along one axis lies at the N percent point along the other axis, e.g. the point that lies halfway along one axis corresponds to the point that lies halfway along the other, and the point that lies one third along one axis corresponds to the point that lies one third along the other. In this case, halfway between 0.0 and 5.0 is 2.5, and halfway between 0.0 and 8.0 is 4.0, so 2.5 in our source corresponds to 4.0 in our destination.
Given a particular point on one axis, we can find the corresponding point by first figuring the proportion of the point along the axis by dividing by the length of the axis, then multiplying the result by the length of the other axis. For our purposes, we want to go from destination coordinates to source coordinates, so our formula is source coord = destination coord divided by destination length multiplied by the source length. So given a destination coordinate 5.5, we find the corresponding source coordinate by dividing by 8 and then multiplying by 5, yielding 3.4375.
In nearest neighbor interpolation, the idea is that, for every pixel center in the destination, we find the corresponding point in the source and simply copy the pixel value there. So because 5.5 corresponds to 3.4375, which is falls in the bounds of the purple pixel, that destination pixel will have the same purple color.
If we do the same for the destination pixel at coordinate 1.5, we divide by 8, multiply by 5, and get 0.9375, which in the source is in bounds of the first pixel, so that destination pixel will have the same blue color.
We use this method for every pixel and this is what we get. Every pixel value in the destination comes verbatim from the nearest corresponding pixel in the source.
Applied in two dimensions, we simply interpolate along two axes. (Note that whether our y axis goes up down is arbitrary as long as we’re consistent between the source and destination.) Here, we find the source coordinate corresponding to the pixel center at 1.5, 0.5, yielding 0.6, 0.33, which lies in the bounds of the red pixel, so the destination pixel will have that same red color. Again, exact same principle, just applied twice for two dimensions.
Consider though the case of the pixel center at 2.5, 0.5. The corresponding point in the source lies perfectly on a boundary between two pixels, raising the question which color to copy. Whether we consider points on boundaries to fall one way or the other is arbitrary, but we should be consistent. In this case we decide that points on vertical boundaries count as inside the pixel to the right, so we copy the purple value.
Linear interpolation starts from the same place: we want to find the source coordinate corresponding to each pixel center coordinate in the destination. The difference in linear interpolation is that we don’t take the color value of the overlapping pixel verbatim but rather factor in the color of the second-nearest pixel. So here, when a pixel center corresponds to a point 7.8 inside this blue pixel of the source, we don’t just take the blue color verbatim but factor in the orange of the second-nearest pixel.
The question, now, is how to blend the two colors. We want the blending to be proportional to how close the coordinate is to the other pixel. A coordinate lying directly on the border between two pixels should take equally from both, while a coordinate lying on the exact center of a pixel should take nothing from other pixels. So were our coordinate here 8.0, it would lie directly on the border between the blue and orange pixels and so should result in an equal mix of the two. But were our coordinate here 7.5, the center of the blue pixel, we would just take the blue color verbatim. Another way to put it is that, at coordinate 7.5, we take 100 percent of the blue and 0 percent of the orange; at coordinate 8.0 we take 50 percent of the blue and 50 percent of the orange; at coordinate 8.5, we take 0 percent of the blue and 100 percent of the orange. So, for a coordinate 7.8, which lies 30 percent of the distance from the blue pixel center to the orange pixel center, we take 70 percent of the blue and 30 percent of the orange.
As a formula, we first find the distance of the coordinate between the pixel centers expressed as a percentage along the way. The hiccup is that we can’t simply subtract 7.5 from 7.8 because we don’t have the figure 7.5: we just have the coordinate. To find the coordinate of the center of the pixel to the left, we subtract 0.5, take the floor (i.e. we round down), and add back 0.5. The distance is then our coordinate minus this pixel center. This distance is effectively the percentage of where the coordinate lies along the distance between the two surrounding pixel centers. Here 7.8 minus 7.5 gets us a distance of 0.3, which correctly denotes that our coordinate is 30% of the way between the left pixel center and the right pixel center.
As a small optimization, we can avoid adding back the 0.5. We subtract 0.5 and use that as an “adjusted coord” from which we subtract the floor of that same adjusted coord. We effectively moved both our start and end point by 0.5, so we get the same distance between them.
For another example, say our coordinate is instead 8.2. Again, the problem is to find the distance between the coordinate and the first pixel center to its left. That pixel center is here again 7.5, and 8.2 minus 7.5 gets us 0.7. 8.2 is 70 percent of the way from the blue pixel center to the orange pixel center, so we want a color value that takes 30 percent from the blue and 70 percent from the orange. To figure this, we first note that the ratio we want to take from the right pixel is the same as our distance, 0.7, and the ratio we want to take from the left pixel is the inverse, 1 – 0.7, which is 0.3. With these ratios, we can blend the two pixel colors by calculating each color channel separately. For the red channel value, for example, we multiply the left pixel red value by the left ratio and multiply the right pixel red value by the right ratio, and then add those two products together to get a our blended red value. Do this for red, green, and blue, and we get our color value.
So let’s say here our blue pixel has a red channel value of 30 and the orange pixel has a red channel value of 150. Then leftRed multiplied by leftRatio is 30 * 0.3, yielding 9; rightRed multiplied by rightRatio is 150 multiplied by 0.7, yielding 105; add 9 and 105 together gets us our red value, 114.
Understand that this formula finds the value that lies 70 percent along the distance between our left red value 30 and our right red value 150. While the more intuitive method is to subtract 30 from 150 to get 120, then take 70 percent of that to get 84, and then add back in 30 to get 114, that method requires first identifying which color value is lesser to decide which to subtract from the other and so requires a branch when implemented in code. This less intuitive formula here requires no branching. On the other hand, it does require two multiplications instead of just one. Which formula ends up being more efficient probably depends upon the hardware.
Linear interpolation applied in two dimensions we call bilinear interpolation. Again, like in nearest neighbor, the first thing to do is find our coordinate in both dimensions. So say our destination pixel center maps to source coordinate 8.2, 4.9. What we want to do then is factor in all four nearest pixels: the blue, purple, and red, as well as the orange. We can do this by first interpolating between the top pair and the bottom pair of pixels to get another pair of color values, then interpolate that pair to get our final color. Alternatively, we could first interpolate the two left pixels and separately the two right pixels, then interpolate their results together. The choice is arbitrary. In this case, we’ll pair them off top and bottom.
So to interpolate the blue and orange pixels together and then separately the purple and red pixels together, we need the distance from coordinate 8.2 to the center of the left pixels, which will again be 0.7. So using this ratio, we can get two new color values, which we can then in turn interpolate together, but this time using the ratios between the top and bottom we get by subtracting 4.5 from 4.9, yielding 0.4. So the top color contributes 60 percent while the bottom color contributes 40 percent.
It may seem odd that we can interpolate the pixels in pairs instead of all together, but we can see it all checks out if we ask how much each of the four pixels should contribute to the end result. The left pair should contribute 30 percent while the right contributes 70, and the top pair should contribute 60 percent while the bottom contributes 40. Multiplying these values for each pixel tells us how much each contributes: the blue contributes 18 percent, the orange 42, the purple 12, and the red 28, all adding up to 100 percent, as it should. And note that, in line with intuition, the orange contributes the most, the purple the least, and the red more than the blue because the coordinate is closer to the red pixel than it is to the blue.
Looking at code, we use our coordinate to get distances and from the distances ratios. We then use the ratios to first compute the red component interpolated between the two top pixels, then the red component interpolated between the two bottom pixels, and last we get our final red value by interpolating between these composited top and bottom red values.
Lastly, there is the question of how to deal with the edge case–the literal edge case–of coordinates lying on the edge of the image. How can we interpolate between pixels that don’t exist? Well, the simplest and most common solution is to treat the edges, for the purpose of interpolation, as surrounded by duplicates of the outermost pixels. So here now we could do a four way interpolation on this coordinate. To be clear, the four-way interpolation results in the same color as the two-way interpolation would, but this way we can use the same code for all coordinates without branching for special cases, which, depending upon hardware, may be the more efficient approach.
Another solution is to wrap the image around, which is a good solution when our image is being tiled, like in a wallpaper.
If we’re compositing our image over a solid background color, we’ll get the cleanest looking edges if we interpolate with that background color as the border around our image. So here, if we’re going to display this resampled image on a solid black background, it makes sense to interpolate using a black border.
For an actual implementations of image resampling with nearest neighbor and bilinear interpolation, we won’t belabor the actual code here, but you’ll find their implementations in our draw module.