Game Input and Output (transcript)
What’s called alternately the display device or display adaptor, or graphics device or graphics adaptor, is the component of the computer responsible for sending to the monitor the image data stored in what’s called the framebuffer.
Each time the screen is to be redrawn, the display device reads the data in the framebuffer and transmits it to the monitor. The frequency at which this is done is known as the refresh rate; most monitors today run at a refresh rate of at least 60 hz (meaning 60 times per second).
Display devices come in two kinds: discrete devices (aka video cards), and integrated devices, (aka on-motherboard video). With a discrete graphics device, the framebuffer occupies a portion of the card’s own dedicated memory. In contrast, most integrated graphics devices have no dedicated memory of their own, so their framebuffers simply reside in main system memory. This incurs a significant performance penalty because the graphics device must contend with the CPU and other devices to access memory. Consequently, most PC gamers opt for video cards instead of integrated graphics.
When the framebuffer content is modified, it’s quite possible that one or more screen refreshes may occur during the modification. This produces visual artifacts, mainly flickering and tearing. Here you can see an example of tearing in a game, where the framebuffer contains a partly updated image: one portion of the screen is displaying the newest frame while the other portion is displaying the previous frame.
A double buffering scheme can eliminate these artifacts by using two framebuffers: while the content of one buffer is read and displayed, the next frame to display is written to the other buffer; when the next frame is complete, the buffers trade places, such that the new frame is displayed while the frame after is rendered into the undisplayed framebuffer. Thus the next frame is always written to the buffer not currently being displayed. The swapping of buffers here is known as ‘page-flipping’, and as long as the page flip is synchronized to always occur in between screen refreshes, users should not see any flickering or tearing. This synchronization is called ‘vsync’, ‘v’ standing for vertical because a screen refresh is done top to bottom. Though vsync eliminates flicker and tearing, it can degrade performance by making the program sometimes wait before rendering the next frame. That’s why many games give you the option to disable vsync: you may prefer tearing to a lower framerate.
On some systems, the framebuffer is mapped to process address space such that programs can read and write the framebuffer directly. On PC’s however, this capability is usually curtailed by the OS, as you wouldn’t want any program to write anything to screen at just any time. Most PC programs draw to the screen indirectly using the OS’s windowing system: a program renders the content to display in its window, but it’s the windowing system that copies the appropriate portion of that content to the appropriate portion of the framebuffer, making sure windows are drawn in the right position and right order.
Some applications, though, namely games, may want to run in a fullscreen mode where they have control over the whole screen; in such modes, applications are typically given direct access to the framebuffer.
Now, the earliest PC video cards didn’t do anything more than provide a dedicated framebuffer and transmit its contents to the screen. Then came the so-called ‘graphics accelerators’, which assisted the CPU in generating the image by performing common 2d rendering tasks. In the mid-90’s, these graphics accelerators added 3d rendering capabilities and soon after became known as GPU’s, graphics processing units.
The earliest of these devices, such as the 3dfx Voodoo card, focused on the portion of 3d rendering called texture mapping, in which the vertices of the polygons get filled-in with textures. Then in 1999, Nvidia’s Geforce 256 was the first consumer card with hardware transform and lighting, which effectively offloaded many of the geometry calculations from the CPU. At this point, these 3d acceleration feature were all fixed-function, meaning the algorithms were baked into the hardware. To give programmers more flexibility and thereby achieve different visual effects, GPU’s, starting with the Geforce 3, added shaders. Shader, in this context, refers to a programmable portion of the 3d rendering pipeline. Rather than performing the same algorithms in all cases, programmers can write their own shaders to customize how the vertices and pixels get computed. A few years later, GPU’s added geometry shaders, and most recently, general-purpose shaders, shaders which can run any kind of code, much like a CPU can. This doesn’t mean that CPU’s are obsolete: GPU’s are optimized for performing repetitive operations on many pieces of data, such as many vertices or many pixels. CPU’s in contrast, are optimized for performing tasks with lots of branching. So GPU’s are good for important parts of 3D rendering and some other things, like physical simulation, but not so good for the general logic that makes up the bulk of most programs.
To take advantage of the 3D acceleration of GPU’s, we most commonly use one of two API’s: Direct3D, which is the graphics portion of Microsoft’s DirectX for Windows, or OpenGL (GL standing for graphics library), which is a cross-platform standard. As previously discussed, OpenGL is the API used by pyglet and the API we’ll be covering in detail.
Sound is of course, an analog, continuous phenomenon, but to represent sound digitally, we have to use discrete numbers. The solution is what’s called sampling: at regular intervals we take a reading, a sample, of the amplitude of the sound wave and record it as a number. The more frequently and more accurately we record these amplitudes, the higher quality the digital representation of the sound. For example, CD standard audio is sampled as two separate waveforms, two separate channels, for stereo, each at a frequency of 44.1 kilohertz (meaning 44,100 samples per second), and each sample recorded as a 16-bit number.
The electronic components which convert analog sound signals to digital data are called ADC’s (analog to digital converters), and the components which translate the other way, from digital data to analog signal, are called DAC’s (digital to analog controllers). So to play back a digital recording, the computer uses a DAC.
Because sound, unlike images, is always a continuous phenomenon over time, a ‘framebuffer’ equivalent for sound doesn’t really make sense: there’s not really such a thing as the ‘current’ sound at any one moment. Still, sound devices include some amount of playback buffer, usually enough for at least a few moments of audio. This buffer is typically read and output by the device in a loop, which explains why audio sometimes stutters when the program feeding the audio gets bogged down: the program failed to update the buffer in time before the device looped back to the start of the buffer.
discrete vs integrated
The trend with PC sound devices is in a sense backwards from the trend with display devices. While PC’s used to require discrete cards for sound, most computers of the last decade have a sound device integrated into the motherboard. And whereas discrete sound cards usually have fancy sound processing features, such as channel mixing, frequency modulation, and 3D positioning, integrated sound devices rely upon the CPU to do all of that stuff. An integrated sound device typically doesn’t do much more than convert up to 8 channels of digital sound to an analog signal, with all of the mixing and other effects done in software, usually in the driver. While in the past, this would create an undesirable strain on the system, today’s CPU’s are fast enough that the strain is not much noticeable. And unlike 3D graphics, which continue to get more and more complex, sound in even the most demanding games seems to have hit a complexity plateau in recent years. While, discrete sound cards still do find use today in high-end audio recording, such as in a studio environment, most gamers today don’t bother with discrete sound cards, as the performance and sound quality difference on today’s systems is almost always virtually imperceptible.
On Windows, the sound API used most commonly for games is DirectSound. For similar functionality cross-platform, two options are OpenAL (as in Open audio library) and OpenSL ES (as in open sound library for embedded systems). OpenAL isn’t an entirely open standard, as the name might imply, but there is an open source implementation. OpenSL ES is an open standard controlled by the same group as OpenGL, but as the name implies, it is mainly targeted at embedded systems. Another standard from the same group, OpenMAX AL, is available on desktops, but it lacks some features used in games, namely 3D positional sound. In our code, we’re using Pyglet, which itself uses DirectSound on windows, OpenAL on mac, and pulse audio on Linux.
Keyboards, mice, and game controllers these days all use USB, so their connection to the CPU is through the USB controller in the system chipset. So when the user, say, hits a button or moves the mouse, the USB controller sees this and sends an interrupt to the CPU; the USB driver then queries the USB controller for the reason of the interrupt; and then it is the responsibility of the windowing system to determine which process should receive a message (called an ‘event’) with this information. For example, when I click on my web browser window, the windowing system determines that the click should be directed to my web browser based on the current position of the mouse cursor and so sends the event to just my browser, not any other windows I might have open. Or in the case of keyboard input, the windowing system keeps track of which window currently has keyboard focus and so sends keyboard events to only that window.
Programs may also get user input by querying the current momentary state of a device, such as querying which gamepad buttons are currently pressed down. However, a mouse can only reports its relative movements as they occur, not its absolute position. In some games, such as a first person shooter, we need this movement data so that we can translate it into player movements, and while windowing systems send the mouse movements as events, this data is not always reported with high-enough frequency or fidelity. On Windows, you can more accurately read mouse input with the DirectInput API (which is also what we use on Windows to get gamepad input). On Linux and Mac, you can instead use SDL, the simple direct media layer, which provides similar functionality.
In many games, player actions may depend upon sequences of controller input. In Street Fighter, for example, Ryu shoots fireballs with the sequence down, down-forward, forward-punch. To implement such input gestures, a game must keep track of recent input for a second or two rather than just input of the current moment.
Games need timing devices in order to measure the elapse of time, and some games may need them to simply get the so-called ‘wall time’ (which is the time displayed by the clock on your wall).
If we were to have just one timer in our system, we would want it to be:
1) high-resolution (meaning the timer reports fine-grained units of time, such as nanoseconds, which are a billionth of a second)
2) low jitter (meaning the timer reports its values with high accuracy, with little random drift)
3) we want the reported time to be strictly increasing, aka monotonic, meaning that despite whatever amount of jitter, reported values are always greater or at least equal to any previously reported values, such that time never seems to go backwards, even by just a tiny fraction of a second.
and lastly, 4), the reported value should be in absolute terms (aka wall time), not in relative terms. Some timers report values which are only meaningful relative to the values they report at other moments; for example, if a relative timer reports values in milliseconds, you can measure the passage of milliseconds by subtracting the later value with the earlier value. What a relative timer cannot tell you is the time of day or the date.
The PC has accumulated several timers over the years, many of them legacy features kept around only for the sake of compatibility with old code. Unfortunately, even the newest timer, the HPET (High-Precision Event Timer) does not meet all of our ideal criteria, as it reports relative time, not absolute time.
To keep track of wall time, PC’s use the RTC (real-time clock), which was added way back in the 80’s. The RTC, running on battery-power, keeps track of the time of day and date when the system is off, storing it in the CMOS; the OS reads this value from the CMOS on startup, and while the system is running, the OS keeps this internal value current using other timers, such as the HPET.
For querying the OS’s wall time value or the HPET, the OS provides system calls. Now, the overhead of a CPU interrupt triggered by a system call would defeat the purpose of accurate time readings: just think, what good would a nanosecond resolution timer be it takes the system whole milliseconds to read its value? Fortunately, today’s x86 processors include instructions for invoking system calls without the overhead of an interrupt, so the HPET, for example, can be queried relatively quickly, in about 1 microsecond. 1 microsecond of overhead of course means we still lose some of the timer’s precision, but it’s adequate for most purposes, including games.
If you’re curious about more PC timer details, I refer you to this short pdf, which I’ve found is the best source on the topic.