2D Graphics Algorithms (part 1) – transcript
First off, we’ll need a simple means to render an image pixel by pixel. Pyglet doesn’t allow this directly, but we can use Pyglet to generate an image and then display that image in a window. As we saw in our Tetris game, Pyglet includes a class Image which represents an image we can blit to a location in the window. In Tetris, we created image objects by loading images from files. For our purposes here, we want to create a new blank image without loading anything from a file, so we use the imagae.create function. In the first line here, we’ve created a 500 by 500 blank image. These pyglet image objects though don’t directly allow for modification of their pixels, so for that purpose I’ve created a wrapper class, MutableImage, in a module I called draw. (You’ll find the draw module in the code example directory.) So here we import this draw module, and create a MutableImage from the Pyglet Image. We can then use the methods of MutableImage to set individual pixels. Pixel coordinate are oriented from the bottom-left pixel, 0, 0, and colors are specified as tuples of the Red, Green, and Blue components as values from 0 to 255 (255 being max intensity). So the call to setPixel here sets the pixel at coordinate 120, 50 to the color (0, 255, 0), which means fully green.
Be clear, though, that the setPixel method and MutableImage’s other mutating methods don’t immediately affect the Pyglet Image object. Only when we invoke updateImage do the changes in the MutableImage show up in the Pyglet Image.
Lastly here, the draw module contains a displayWindow function that takes a Pyglet image and displays it in a window of the same size as the image. DisplayWindow does all the business of creating an on_draw handler and invoking app.run(), so this here is a complete program that creates a 500 by 500 blank image in memory, sets one of its pixels to green, and then displays it in a 500 by 500 window.
Understand that we can create a MutableImage with a non-blank image. Here, we load an image from a file, wrap it in a MutableImage, and using the getPixel method, get the color of the pixel at 200, 350. We then set the pixel at 120, 50 to that same color, then update the image and display it in a window.
There are actually many ways of representing colors, but the most commonly used is the RGB scheme, where the three additive primary colors, red, green, and blue, are each represented by a number, usually an integer from 0 to 255 because then each component can be represented by a single byte. If all three color components are 0, the color is black; if all three color components are 255, the color is max white. Commonly, though, we wish the pixels of an image to also have a component representing transparency, such that we can composite the image over another with the image underneath showing through, in whole or in part, behind some or all of the pixels of the top image. This measure of pixel transparency is usually called alpha, which is most commonly stored as a byte per pixel, just like the color components, and so represented as an integer between 0 and 255. For the purpose of operations, however, we usually think of alpha as a float value between 0.0 and 1.0. The alpha component of a fully transparent pixel is 0.0 as a float (and 0 as an integer), while the alpha component of a fully opaque pixel–a pixel that’s not transparent at all–is 1.0 as a float (and 255 as an integer). When a fully transparent pixel is drawn on top of another image, it’s effectively not drawn at all because it’s totally see-through; when a fully opaque pixel is drawn on top of another, its color simply replaces whatever it draws over. The complicated part is when a semi-transparent pixel is drawn over another, because then the color and alpha components of the resulting pixel is a mix of the pixel on top and the pixel underneath.
So first off consider what happens when you shine a light through two overlapping tinted windows. The percentage of light that gets filtered out depends upon the percentage of light filtered by each window individually, but we can’t simply sum the percentages together because once the light shines through one window, there is less light for the second window to filter. So once the light passes through window1, the percentage of light then shining through the other window is (1 – window1), so the amount filtered by window2 is actually (1 – window1) * window2. Now that we have the proper percentage filtered by the second window, we can add the percentages together to get the total amount of light filtered by both windows togther.
So now let’s look at what’s called alpha compositing, or alpha blending, where we draw one image with transparency on top of another image.
First off, semi-transparent pixels are like tinted windows in that they partly let light through. So the formula is the same for the compositeAlpha, which is the alpha component of the resulting pixel when one pixel is drawn on top of another. You take the alpha of one of the pixels and add it to the quantity 1 minus that value, multiplied by the alpha of the other pixel. And just like with tinted windows, it doesn’t matter which pixel we designate as alpha1 and which as alpha2. We can prove this very quickly in a few steps. Just do the multiplication, swap the position of the terms, then factor, and we end up with the same formula but with alpha1 and alpha2 swapped. This is consistent with the intuition that the amount of light that gets through is the same no matter which of the two transparent surfaces is on top.
Also consistent with our intuition, when one or both alpha values are 1, the compositeAlpha is also 1, because of course if either or both surfaces are fully opaque, no light is getting through at all. And when one alpha value is 0, the compositeAlpha equals the other alpha value, whatever it is, because of course, if one surface is totally see-through, it doesn’t contribute to the result.
As for computing the composite color components, there it matters which pixel is drawn on top. For example, here we see two semi-transparent squares drawn partially overlapping twice. On the left, the red square is drawn on top, but on the right, the orange square is drawn on top. Notice that the overlapping portion on the left is a darker, more saturated shade of orange than the one on the right. Consistently with intuition, the surface on top has the stronger contribution to the resulting color because the bottom surface color is filtered through the top. So the overlapping portion on the left has a stronger contribution from the red than on the right. (Though, actually in this case, I would characterize the overlapping portion in both examples more orange than red, but that’s because our orange square here is considerably brighter than our shade of red. If the red square were bright and the orange square dark, the overlapping portions would likely look more red than orange.)
While the composite color differs in the two examples, the overlapping composite alpha value does not. In both examples, the orange square had an alpha value of 0.6 and the red square had an alpha value of 0.3, producing a composite alpha value of 0.72 no matter whether the orange or red is on top. But looking at the blue color component, our orange color had a blue value of 175, and our red color had a blue value of 50. When the red is on top, this produces a composite blue value of 123, but when the orange is on top, it produces a composite blue value of 154.
The formula that gets us these composite color components looks quite similar to the composite alpha formula:
First, we refer to the pixel drawn on top as the source pixel (abbreviated src), and we refer to the pixel on top of which we’re drawing the destination pixel (abbreviated dest).
Second, be clear that when we consider a pixel by itself, whatever color values it has are effectively diminished by the pixel’s alpha value. For example, when a pixel’s alpha value is 0.7, the alpha is effectively cutting the intensity of the color components to 7/10ths, so if the blue component is 100, the actual visible blueness of the pixel after taking alpha into account is 70 (because 100 * 0.7 is 70). So in our formula, we first account for the alpha: the blue component of the source pixel is multiplied by the alpha of the source pixel, and the blue component of the destination pixel is multiplied by the alpha of the destination pixel. So the blueness of the composite pixel is a combination of these two values of blueness, but like with our alpha formula, the contribution of the second value of blueness is diminished because it is underneath the other: the amount of blue light we see through the top pixel is limited by the transparency of the top pixel; if the top pixel (the source pixel) has an alpha of 0.7, we only get (1 – 0.7)–30%–of the blueness underneath. That’s why we multiply the destination pixel blueness by (1 – srcAlpha) before adding it.
Once we add our intensities of blueness together, what we have is the actual visible intensity of blueness of the composite pixel. By multiplying in the srcAlpha and destAlpha, we’ve already accounted for the alpha of the composite pixel. However, what we usually want is the color components without alpha taken into account, which is why we then factor out the alpha by dividing by the composite alpha. To be sure, when we actually display our composite pixels, we’ll want to get the ‘actual’ color values, but if we’re generating image data, we want it in a form we can correctly composite with other images. We want it in the same form as the original source pixel and destination pixel.
So looking at the calculation for our example:
First , the srcBlue value is 50, the srcAlpha is 0.3, the destBlue is 175, the destAlpha is 0.6, and the compAlpha is 0.72.
So we plug these values into our formula, getting 50 times 0.3 plus the quantity 1 – 0.6 times 175 times 0.6, all divided by 0.72. This gives us approximately 122.917, which we round to the integer 123.
Again, this is the intensity of blueness of the overlapping pixels without alpha factored in, so it is not the intensity of blueness you actually see here on screen. When our transparent squares get displayed, they of course must get composited on to the black background, but when you composite on top of opaque black pixels, the color values all come out as the ‘actual’ color values. What we are actually seeing on screen here on this black background is an orange color with a blue intensity of 89.
So you can think of the ‘actual’ values as the colors you see against a black background. Just be clear that, were we to display our transparent surfaces on any other color background (including white), the color values would change.
These ‘actual’ color values, the values multiplied by the alpha, are more commonly known as precomputed values. If you’re going to be compositing images a lot, it can be more efficient to store their colors as precomputed values because it simplifies the composite formula, removing two multiplications and a division.
If we ever need the ‘true’ color values without alpha factored in, we simply divide by the alpha value.
I’m sure you recall the slope-intercept form of the equation of a line. y equals the slope, m, multiplied by x, and added the y-intercept, b. Given any two points on the line, we can find the values of m and b. To find m, we divide the rise by the run, where rise and run are, respectively, the differences between the two y values and the two x values of the two points. Once we have m, we can find b by plugging the x and y value of either of our two points into the equation and solving for b.
So, here for example, with the points 10, 20 and 50, 80, the slope is 80 – 20 over 50 – 10, which is 1.5, and then plug in the x y values 10 and 20, then solve for b to get 5. So the slope-intercept equation of this line is y = 1.5x + 5.
For our line drawing algorithm, we have to distinguish between two cases: when the slope is between -1 and 1, it runs more horizontal than vertical, and so, while every pixel has a unique x value, some adjacent pixels share the same y value. But when the slope is greater than 1 or less than -1, it runs more vertical than horizontal, so while every pixel has a unique y value, some adjacent pixels share the same x value. There’s actually a third case, where the slope is equal to 1 or -1, in which case the line runs perfectly diagonally such that each pixel of the line has both a unique x and unique y value. For our purposes, though, we can lump this third case in with one of the other cases, so we’ll lump it in with the case of a slope between -1 and 1.
So now, in each of these two cases, we can frame the problem of drawing a line in different ways. When drawing a line with a slope between -1 and 1, the problem is to find the corresponding y value for each unique x value. But when drawing a line with a slope less than -1 or greater then +1, the problem is to find the corresponding x value for each unique y value. If the slope equals -1 or 1, we can treat it as a case of finding the corresponding y value for each unique x value.
So now looking at the naive line drawing algorithm, we’ll define a function naiveLine that takes two points as arguments, each specified as a tuple of x and y coords. So first off, we assign the x and y values to local variables x1 and y1 for the first point and x2 and y2 for the second point. (If you’re confused by the syntax here, Python allows us to conveniently assign the elements of a tuple or list to variables using commas to separate the target variables. x1 here receives the first element of point1 and y1 receives the second. The number of variables must match the number of elements; so if point1 here had something other than 2 elements, this would trigger an exception. In this case, though, we know point1 should have two elements, an x value and y value.) In any case, with our coordinate values we can compute the rise and run. It doesn’t matter which y coordinate we subtract from the other, but which ever order we choose, we must use the same order when subtracting the x coordinates. If the run turns out to be 0, that means the slope is infinite, meaning vertical, which we must treat as a special case. In a vertical line, the x coord is unchanged for all pixels, so we simply loop through all y values in the line segment and draw a pixel at x1, y. (We could just as well use x2, because if the run is 0, x1 and x2 are equal.) For our purposes now, assume a function plot, which draws a pixel at the specified coordinates. You’ll note that to iterate through the range of y values, we use the Python range function, which returns a list of integers, starting at the first argument and running up to (but not including) the second argument. So to loop from y1 up to and including y2, we make the second argument y2 + 1. The other complication here is that range’s first argument must be lesser than its second, which might not be the case if y1 is greater than y2. So before the loop, we test if y2 is less than y1 and, if so, swap the values of y2 and y1. This ensures that y1 is less than y2.
Looking now at the rest of the function, if the run is not equal to 0, we compute the slope by dividing rise by run. (Note that we must explicitly make one of the values a float because, in Python 2, the division of an integer by an integer always returns an integer, whether it’s mathematically correct or not.) Once we have our slope we can compute b, the y-intercept. Now that we have our equation of the line, we decide which case we’re dealing with: if the slope is between 1 and -1, then our line is more horizontal than vertical, so we need to find the corresponding y value for each x value; otherwise, our line is more vertical than horizontal, so we need to find the corresponding x value for each y value. The logic in both cases looks very much like that for the case of a straight up vertical line, except we must actually calculate the corresponding y and x values. We get each y value by plugging each x into our line equation, and we get each x value by plugging each y value into our line equation. Note that in both cases, we have to round our answers and convert to an int because we’re looking for the nearest integer coordinates. (Strangely, the round function returns a float value instead of an int, such as 3.0 instead of 3.)
With a few changes, we can include naiveLine as a method in the MutableImage class. The first argument must be self, and we need another parameter for the color to draw. Then our calls to plot are replaced by calls to MutableImage’s setPixel method, with the color as third argument.
So now to draw lines on our mutable images, we can invoke their naiveLine method.
This is a workable way of drawing lines, but in the next video, we’ll look at a more efficient approach, the Bresenham line drawing algorithm.
testing points in polygons
For the case of a rectangle, testing whether a point lies inside the bounds of a polygon is quite trivial. If the point’s x coord is greater than or equal to the left x-coord and less than or equal to the right x coord, and if the point’s y coord is greater than or equal to the bototm y coord and less tha nor equal to the top y coord, then the point lies inside the rectangle. But what about testing whether a point lies inside a complex polygon with jagged edges?
The most common solution is sometimes called ray casting (not to be confused with some other rendering techniques of the same name.) The principle is that if we cast a ray from outside the polygon to the point, the number of edges that intersect the ray tells us if the point lies inside the polygon. If the number of intersecting edges is even, then the point lies outside, but if odd, the point lies inside. This follows from the intuition that the first intersection is when the ray crosses into the polygon, and any subsequent intersection thereafter is the ray leaving and re-entering. Here, for example, the bottom ray intersects three times: at the first intersection, it enters; at the second intersection, it leaves, and at the third and last intersection, it enters again. So an odd number of intersections means the point lies inside.
Now, be clear that the ray can be cast to the point from any direction, but most commonly it is cast horizontally from the left, as this makes the edge intersection determinations simpler. To test if this ray intersects this edge, we first check if the y coordinate of the ray is in the bounds of the edge. If so, we then find the x coord on the edge for the ray’s y coord, and if this x coord is less than the point’s x coord, then the ray must pass through that edge to get to the point.
Also be clear that the odd/even rule doesn’t apply when the point is itself on an edge. Assuming we consider edge points to lie within the polygon bounds, we must check if the point itself has the same x coord as any intersection with an edge.
There’s also the matter of when the ray passes through a vertex. Because the vertex joins two edges, it will get counted as two intersections. To prevent this double counting, our intersection test treats vertices as part of an edge only when the other vertex of the edge is higher or lower: it doesn’t matter which convention we choose, as long as we’re consistent. So here, if we consider vertices to belong only to edges where the other vertex lies above them, the top edge here (highlighted green) counts as an intersection while the bottom edge here (highlighted red) does not. So now we have an odd number of intersections that correctly identifies this point as lying inside the polygon.
This vertex over-or-under rule works even when the ray runs along a horizontal line. Without the rule, this here is treated as four intersections, erroneously suggesting the point lies outside the polygon. Once we take the rule into account, though, only one edge counts as an intersection because the others are intersected at their vertices yet their other vertices do not have a greater y coord. The rule also works here if we follow the other convention of including vertex intersections only when the other vertex lies below: the horizontal line gets excluded, and so we again correctly have an odd number of edge intersections.
Lastly, if you’re going to be testing whether a lot of points lie inside a polygon, it’s probably more efficient to first test whether they lie inside the rectangular bounds around the polygon, which can be found simply by finding the min and max x and y coords of the polygon. If the point to test doesn’t lie inside the bounding rectangle, then the point can’t lie inside the polygon, and so we can skip the more complicated test.
Scanning a triangle.
A similar problem is how to fill in the pixels bound by a polygon. We’ll just look at the case of triangles because it’s both simple and most relevant to 3d graphics. Any polygon can be decomposed into triangles, and GPU’s generally deal best with filling triangles, as the algorithm is more efficient than the algorithms for filling more complex polygons.
There are a few ways of filling or scanning a triangle, as the process is called. We’ll demonstrate the simplest technique, which is by no means definitive or the most used in practice.
In this examples here, we have a triangle with points a b and c. What we want to do is fill in the pixels, row by row, filling both the rows between edge AB and AC (shown in green) and the rows between AB and BC (shown in blue). Notice that one edge, the tallest edge, bounds all of the rows. This actually holds for any triangle: one edge will span the whole height of the triangle. The only exception is a triangle with a horizontal edge, in which case two of the edges will span the whole height of the triangle. As we’ll see, though, we won’t have to treat this as a special case.
We can simplify our code by encapsulating edges as a class. Our edge constructor takes two points as argument, and we assign their coords to the instance variables x1 y1 and x2 y2. For simplicity, we’ll only deal with cases of whole integer coordinates. In our later code, we’ll need to determine which y coord is greater than the other, so we’ll do that work here, assigning the greater y value to self.topY and the lesser y value to self.bottomY. We also calculate the height, taking the absolute value of the difference between the y’s. We then calculate the slope and y-intercept of the line.
The xFromY method, given a y value, returns the corresponding x value on the line. Notice we have to again treat vertical lines as a special case.
Lastly, we’ll later need to determine whether a point lies to the left or right of the edge. We can do this using the equation of the line. First we put all the terms of the equation on one side such that it is equal to 0. Assuming we have our slope and y-intercept, we can plug any x and y coordinates into the formula: if the result is zero, then the point lies on the line itself, but if the result is non-zero, its sign tells us which side of the line the point falls on. If the slope is positive, a positive result means the point is to the left and a negative result means the point is to the right. For a negative slope, the reverse is true: a positive result means the point is on the right and a negative result means it’s on the left.
So in our Edge class, we’ll include an isPointOnRight method which plugs a point into the equation, but flipping the result if the slope is positive. Also note that we must treat a vertical slope as a special case. For a vertical slope, we simply test if the x coord is less than or greater than either x coord of the edge.
Now that we have our Edge class, we start off our triangle scanning function by identifying the tallest edge. In the case of two tallest edges, it doesn’t matter which of the two we select. The findTallestEdge function takes a list of edges, and returns a tuple of the tallest edge and a list of the other edges.
If you’re not familiar with the syntax of colons inside square brackets, this is what Python calls a slice, which create a copy of the list given a range of indices. You specify a start index left of the colon and an end index right of the colon; when the left index is unspecified, Python assumes index 0, and when the right index is unspecified, Python assumes the last index. So when we write ‘edges bracket 1 colon’, that returns a copy of the list but only including the items starting at index 1. Then later, when we write ‘edges bracket colon’ without specifying any indices, Python returns a copy of the whole list.
In any case, now we can easily identify the tallest edge. To identify the integer x values between two edges, we define a findBoundXs function, which takes a left edge, right edge, and a y coordinate. Our function will assume that the y value intersects both edges, and that for these intersections, the x value of the leftedge is lesser than that of the right edge. So using the xFromY method, we get the x coordinates of both edges for this y coordinate, but note that we need to round to integers. Because we want the integers inside the bounds, we want to round the left x up and round the right x down. To do so, we use Python’s math.ceil to round up and math.floor to round down. ‘ceil’, here, is short for ‘ceiling’. Note that in Python 2, ceil and floor both return float values, so we explicitly convert their results to ints with the int function. Otherwise, we’d get an error passing float values to the range function.
The next step is to draw pixels for each bound x between the edges. The drawAllBoundXs function again takes a left and right edge, this time with an assumption that every y value of the shorter is common to the other edge, and that for every y interception, the x of the left edge is lesser than that of the right edge. These assumptions should hold true if our edges are from the same triangle and one is the tallest of the three edges. Anyway, the function finds the shorter of the two edges because it’s the y values of this edge that we loop over. So we loop from the short edge’s bottomY to its top y, and for each scanline, we find the x’s and draw a pixel for each.
One more function we’ll need is a function that returns the point shared by two edges. It takes a list of two edge and returns the point they share in common. As we’ve written it, the function assumes that there is in fact a shared point, and only one shared point.
Finally we can define the function that orchestrates these others, a scanTriangle function that takes a list of three edges of a triangle. First, we identify the tallestEdge with our findTallestEdge function, assigning the list of the other two edges to a variable ‘others’. Then we need to determine whether the tallest edge is on the left or the right by comparing it to one of the other two edges. We can do so by determining if the common point of the other two edges is to the right of the tall edge. Once we have that, we can then loop over our two other edges, and when the other edge is not horizontal, invoking drawAllBoundXs with the tall edge and other edge as argument.
Finally, to actually use this function as part of our MutableImage class, we’ll make scanTriangle a method of the class, add a color parameter, and lump all of the helper functions into scanTriangle for lack of a better place to put them. Note that in drawAllBoundXs, we replace plot with self.setPixel.
So here we’re drawing a green triangle, formed by the points (200, 400), (300, 100), and (100, 200).