Quote:
By this point you essentially know how SLI works inside and out, and you have all the necessary equipment to get you very own SLI system up and running. Now it's time to put it all to good use! In this final section we'll get down to finally assembling and configuring your SLI setup. In the interest of time and convenience, we'll assume that you already have a fundamental understanding of how to build your own computer and know what goes where. For SLI systems it's not all too different, but there are a handful of points that you should keep in the back of your mind during the building process.
INITIAL INSTALLATION
After the primary card has been installed, there are some points of interest that should be noted depending on your operating system type. There will also be some general statements of what is true and what isn't in terms of installation procedures regarding the graphics cards themselves.
WINDOWS XP
Windows XP is still a fine operating system for those who don't wish to upgrade to Vista or 7 as long as the limitation to exclusively 2-way SLI and lack of DirectX 10/11 is acceptable. For Windows XP users, upgrading from a single card to SLI is very simple. As mentioned before there are no specific drivers or special software that need to be installed to use SLI, and this goes for all operating systems. With the system powered down, simply install the new card into the secondary (bottom-most) PCIe slot and connect the appropriate power leads. Upon startup the system should recognize the new hardware, and the option to enable SLI will become available in the NVIDIA Control Panel. When building a system from scratch, it is best to only have one card installed while the operating system is installed and drivers are loaded. After the core components have been successfully initialized, you can now add the second card. Many users have reported no problems when installing Windows XP with both cards installed, however problems can always arise, and there's nothing wrong with taking a bit of extra precaution. Because of the restriction to the standard 2-way SLI mode (barring older and less efficient version of Quad SLI with the 7900/7950 GX2s) some users may not need to use a 64-bit operating system.
WINDOWS VISTA
After many patches and updates, Windows Vista has become just as stable if not more so than is predecessor, XP. Vista is the only operating system that supports 3-way and Quad SLI, as well as DirectX 10, making it the primary choice for gaming and hardware enthusiasts. As stated previously, there is no additional driver or software that needs to be downloaded to make use of SLI. Just like XP, if upgrading from a single card to two (or more,) simply install and connect the cards when powered down and allow the OS to recognize the new devices. When installing a new copy of Vista, it is strongly encouraged that only one graphics card is installed at the time, as a significant number of reports have indicated that Vista can cause problems when multiple video devices are present upon a new installation with behaviors ranging between device manager error codes to outright exclusion, most of the time not permitting the use of SLI. Thankfully, these errors are most commonly solved by removing all but one card and slowly reintroducing them between startups and shutdowns, giving the OS the opportunity to recognize each new card at a time. Due to Vista's heightened resource consumption as well as the ability to support up to four GPUs in SLI, users are encouraged to have at least 2 GB of high-grade DDR2 RAM or better and the 64-bit variant of Windows Vista.
WINDOWS 7
Windows 7 stands as the ideal gaming operating system at the time of this article; compared to Vista, it consumes fewer system resources, has a more responsive and customizable interface, and is noticeably more stable. 7 supports DirectX 11 out of the box whereas this must be patched in for Vista. With the exception of a few new and improved features, 7 remains largely similar to Vista, which for SLI systems means the installation process should be treated the same. For PCs in general, hardware requirements have been relaxed slightly, although with continuing technology trends it has become relatively easy and inexpensive to obtain mid-range hardware.
SLI: COMPATIBILITY
For some time, SLI was very specific in relation to the type of hardware that could be used. Cards had to be from the same manufacturer, have the same BIOS, same clock speeds; essentially identical. As driver complexity progressed, the limitations have been relaxed greatly. In order to clear up any remnants of confusion that may still linger, here are some pieces of information regarding the procedures for installing cards to be used in SLI: they must be from the same family (7950, 8600, 9800, GTX 280, etc.,) be of the same model (GS, GT, GTX, and so on,) and have the same size memory volumes. The cards do not have to be from the same manufacturer, have the same clock speeds, or have matching VGA BIOS versions. The rule used to be that the slower card had to be installed as the primary device, but this is no longer so; the graphics cards can be run asynchronously while in SLI without causing any negative effects in performance or frame constancy.
SLI: BRIDGE INSTALLATION
After the cards have been installed, it is strongly encouraged that users also install the SLI bridge which is provided with all SLI-ready nForce motherboards. A simple two-pronged bridge is used for the standard two-way form of SLI while the six-pronged bridge is meant for 3-way SLI systems. Note that there is no "proper" orientation for the 2-way bridge, and if two cards running in SLI have two goldfinger ports (indicating support for 3-way SLI) the bridge does not need to be installed on a particular set; also, two 2-way bridges may be used, but this will serve no purpose other than aesthetic appeal. The 3-way SLI bridge can be used to connect cards in a two-way configuration, but because of its wider size and the placement of the goldfingers on the cards, it can only serve this purpose when the cards being used support 3-way SLI. The bridge is not required to enable SLI, however it provides a pure data link for the GPUs to communicate over, and when this link is not present, the PCIe bus will become the means of communication. This, in many systems, can lead to supersaturation of the PCIe bus, which will cause performance to suffer.
SLI: HIERARCHY
Becoming familiar with SLI's device hierarchy convention will allow users to gauge idle and load temperatures more efficiently as well as communicate possible issues to the community more effectively. For the standard 2-way SLI configuration, the GPUs are labeled in a top-down fashion; the primary (top-most) graphics cards is GPU 0 and the secondary (bottom-most) is GPU 1. For 3-way SLI, the labels remain the same, with the tertiary (middle) card being GPU 2. For Quad SLI, the numbering system becomes slightly more confusing: the primary card contains GPUs 0 (primary) and 3 (quaternary) while the secondary houses GPUs 1 (secondary) and 2 (tertiary.) For a visual aid on the numbering convention, click here for 3-way SLI and here for Quad SLI.
ACTIVATION AND CONFIGURATION
Now that the cards are installed and have been properly detected by the operating system, the time has come to make them work together, and do so effectively. Here we'll go over how to enable SLI and also how to configure rendering behaviors through the NVIDIA Control Panel.
ACTIVATION
The process of enabling SLI is a very simple one. After installing the cards physically (preferably with the SLI bridge connecting them) and installing the video driver, it is recommended to check the Windows Device Manager and ensure that no cards have been flagged with any hardware faults or issues. If all cards appear in the Device Manager and are functioning normally, open the NVIDIA Control Panel. On the left should be a tree menu with "3D Settings" at the top. Select this section (expand if necessary) and open "Set SLI and PhysX configuration." As shown below, all that is needed is for the radio button to be set to "Enable SLI" and for the setting to be applied. During this time your screen will go black and may flash several times with a blinking white cursor in the top-left corner; this is normal and may take several minutes depending on your setup. Once the desktop returns you will be asked if you'd like to keep or revert your settings within the next few seconds, so if for any reason your display does not return to normal, the drivers will default your system back to independent rendering.
CONFIGURATION
There is much more to SLI than simply enabling the load engine: the NVCP is bursting with features to help you refine your gameplay experience to suit your preferences and your system's capabilities. Settings range from maximum performance to pure image quality with plenty of levels in between, ensuring flexibility and powerful control over vastly different configurations. For 3D applications, these tools are all found in the "Manage 3D Settings" subcategory of your NVCP. There are many options that can be changed, and knowing their function can help tremendously in diagnosing the source of a potential performance snag, so this subsection will focus strictly on defining them and detailing what they do.
Ambient Occlusion: (R185 driver set and later) Ambient occlusion, sometimes known as SSAO, is a pixel shading technique that enhances scene realism by reducing the level of ambient light on a render target by taking shadow projection from other visible objects into account. Shadow volumes are simulated by comparing depth buffer values of one pixel and a randomly-selected neighbor (to reduce computing requirements as well as noise generation.) The end result is a very accurate portrayal of natural shadow behavior, but often at a high cost in performance depending on the scene's pixel density. Applications that support ambient occlusion natively do not have support for driver-level ambient occlusion. See here for a few examples of typical game scenes with and without ambient occlusion.
Anisotropic Filtering: This is an image improvement technique which reduces the amount of blurriness of textures when they are viewed at an oblique angle, such as a runway in a flight simulator or a billboard on a racing game. Instead of using an orthogonal filter pattern (ex. bilinear and trilinear filtering) which cannot replicate perspective distortions of non-perpendicular faces, the GPU will scale the height and/or width of a mipmap by an integer ratio relative to the perspective distortion of the texture; the ratio is dependent on the maximum sampling value specified, followed by taking the appropriate samples. AF can function with anisotropy levels between 1 (no scaling) and 16, defining the maximum degree which a mipmap can be scaled by, but AF is commonly offered to the user in powers of two: 2x, 4x, 8x, and 16x. The difference between these settings is the maximum angle that AF will filter the texture by. For example: 4x will filter textures at angles twice as steep as 2x, but will still apply standard 2x filtering to textures within the 2x range to optimize performance. Most mid- and high-end GPUs can perform 16x filtration without any appreciable performance reduction.
Anti-Aliasing - Gamma Correction: Anti-aliasing, explained in detail below, can manipulate a frame to appear more smooth and realistic by reducing the subjective level of jagged edges on objects and actors. However, color and alpha values produced by the sampling process aren't always representative of how an edge would look in the real world. Gamma correction substitutes color sampling algorithms during the AA process to give edges a more believable coloration by comparing color values to the neighboring pixels, preventing an edge from being presented as too dark or too light. This has a minimal cost on performance and can be enabled on low-end systems with little effect.
Anti-Aliasing - Mode: Not every application can be treated the same, especially with regards to anti-aliasing, which is where this option becomes useful to users; by controlling the way anti-aliasing is applied in an application, diagnosing a possibly supersaturated frame buffer, enforcing a higher level of AA, or overcoming a CPU bottleneck becomes a much simpler task. The application be allowed to determine AA sampling rates and methods, these rates can be overridden by the driver, both the driver and 3D engine can sample the frame by a specified amount, or AA can be disabled entirely. Overriding the application settings can cause lower performance than normal because of additional buffers used for compatibility; enhancing the application settings poses the same problem and can disproportionately reduce performance given the effective sample rate.
Anti-Aliasing - Setting: Aliasing, better known as the "stair-step effect" or "jaggies," is the result of one of the largest principle problems with modern 3D hardware: rasterization, or the act of reducing an image of infinite detail to finite (pixels.) When this occurs, visual data is lost and can create noticeable visual inconsistencies between objects with varying depth and alpha values, often taking the shape of jagged, contrasting blocks where our eyes expect smoothness. Anti-aliasing is the definitive method for reducing the prominence of these jagged edges; by increasing the frame resolution by a predetermined factor and merging color/alpha and Z values of neighboring pixels, the resulting image can be manipulated to look much softer and more realistic. However, this setting can be highly computationally intensive depending on the frame's pixel density and the selected sample size, as it may drastically increase memory volume and bandwidth consumption, and, because of its nature, amplifies the pixel fill rate requirement for producing so many frames per second. For a demonstration of various AA levels, click here for a "face value" spread of popular sample sizes; click here for closer examination, and here for significant magnification.
Anti-Aliasing - Transparency: This form of AA controls how textures with alpha (opacity) channels are sampled during the rendering phase. Multisampling is the setting of choice because of its respectable performance-versus-image quality yield, but can produce some anomalous results in older applications. Supersampling is much more process-intensive but offers the best-looking results with a low chance to return any artifacts.
Conformant Texture Clamp: This feature controls how texture boundaries are handled in OpenGL applications. For best performance and image quality this is best left at "Use Hardware," but if artifacts or poor update times are observed, the "Use OpenGL Specification" is recommended. Finally, if these issues are still being exhibited, simply disable this option.
Error Reporting: This OpenGL-specific function handles error checking throughout the rendering pipeline. Disabling this will ignore any potential errors that may arise, but it will also grant better performance. This only needs to be enabled for troubleshooting purposes.
Extension Limit: To allow for maximum compatibility, the driver can truncate certain code sequences to prevent looping errors and unresponsiveness in older applications, which is primarily what this option governs. When disabled, the code will run as-is with no alteration to its length, and is generally the best setting unless consistent problems are being noted. When enabled, this can splice code that has been erroneously flagged as extraneous and can lead to execution halts and crashes.
Force Mipmaps: (Removed in drivers R190 and later) Mipmaps are a sequence of textures used at progressive distances to improve the subjective appearance of an object. Disabling this will not force texture filtering and can often give the best performance with a low anomalous return rate. Bilinear filtering will use a moderate form of linear interpolation to more accurately blend overlaying pixels, giving better image quality with a modest performance hit. Trilinear filtering uses a stronger linear interpolation method for better image quality at a higher cost of performance. When used in conjunction with anisotropic filtering, linear interpolation can produce very clear textures at extreme viewing angles.
Maximum Pre-Rendered Frames: When permitted, this will allow the CPU to prepare vertex data for a set number of frames ahead of the current one being displayed to maximize performance. This can smooth performance out at low frame rates, but can also affect input latency quite noticeably when set to a high value. A setting of 3 frames ahead is recommended for the best compromise between performance and real-time representation.
Multi-Display/Mixed-GPU Acceleration: This is a setting to ensure compatibility with single or multi-monitor configurations. When using a single monitor, this should be set as "Single display performance mode" unless significant visual artifacting (unrelated to GPU overclocking or overheating) is seen, in which case "Compatibility performance mode" should be selected instead. If using more than one monitor, this ought to be set at "Multiple monitor performance mode." If you notice artifacting on a multi-display setup, you may select the compatibility mode to remedy the situation.
SLI Performance Mode: This controls the SLI rendering mode used globally or with the specified program. As covered previously, the options available are: single-GPU, split-frame rendering, alternate frame rendering 1, alternate frame rendering 2; 3-way split-frame rendering, 4-way alternate frame rendering 1, 4-way alternate frame rendering, and 4-way alternate frame/split-frame hybrid rendering. Access to the 3-way and 4-way modes requires a 3-way SLI or Quad SLI system respectively.
Texture Filtering - Anisotropic Filtering Optimization: This simple control handles texture compression optimizations applies to mipmaps used in anisotropic filtering. Enabling this may improve performance slightly but also reduce image quality, while disabling it has the opposite effect. Rarely will disabling this cause any noticeable effect in performance even with a mainstream gaming configuration.
Texture Filtering - Negative LOD Bias: The LOD bias, or a Level Of Detail bias, is a function set by the graphics engine to determine how clearly a texture appears at a given distance from the camera. Some applications use a negative LOD bias to sharpen stationary textures, but this can produce shimmering (similar to artifacting; an abnormally bright pixel) when moving, especially when alpha-dependent frame effects are used such as motion blurring. Setting this to "Clamp" will prevent the LOD bias from dropping below zero while setting it to "Allow" will permit negative LOD biases.
Texture Filtering - Quality: This function controls the level of filtering optimizations applied to rendered textures, granting four separate levels: High Quality, Quality, Performance, and High Performance. High Quality applies no optimizations to a texture while Quality allows the use of those that reduce storage size without reducing subjective detail. Performance employs stronger optimizations to further reduce space consumption, but can produce slightly muddled textures when viewed at moderately close distances; High Performance will apply numerous optimizations to offer the best performance at the cost of appreciable image quality.
Texture Filtering - Trilinear Filtering: This filtering control allows the driver to make an educated guess of whether or not to apply trilinear filtering to a texture plane by comparing its relative angle to the camera. Enabling this will have a minimal effect on image quality and can improve performance on low-end machines slightly. When disabled, trilinear filtering (if enabled) will be applied to all visible textures normally.
Threaded Optimization: This feature allows the driver to allow multithreaded 3D applications to utilize a second physical (execution, not logical) processor. "Auto" is the recommended option for best compatibility with both older and newer programs, and may need to be disabled if problems occur.
Triple Buffering: Modern graphics cards, by default, operate using a technique called double buffering. The video card will store the frame currently being displayed into buffer A while it draws the next frame in buffer B. When the back buffer (B) completes its frame, the display buffer (A) will be purged and then the completed frame will be flipped over to be outputted to the monitor. Triple buffering forces the card to create a third buffer, in effect using an additional 50% of the available memory for frame storage. The display buffer will store and transmit completed frames as per usual, but instead of the GPU only being able to write to one back buffer, it can now write into two, which will be copied to the display buffer according to the frame queue. This technique can prove to be useful when the frame rate falls below the refresh rate of the monitor, a critical point when the GPUs can begin to complete their work before or just after the monitor has refreshed. Consequently, with double buffering, this will cause both buffers to be filled and the GPU is forced to wait before completing the succeeding frame; with the addition of another back buffer, the GPU can continue to render frames without being locked out. Triple buffering can only be forced when vertical synchronization (covered below) is active; this is to reduce the chance of frame shearing. Triple buffering can reduce performance because of the increased memory consumption, and will offer no benefit if the frame output rate is higher than the refresh rate of the monitor. Additionally, triple buffering can introduce slightly higher input latencies because of prolonged storage times.
V-Sync: V-Sync, or Vertical Synchronization, is the process of linking the frame output of the GPU(s) to the refresh rate of the monitor. For example, a display with a vertical refresh rate of 75Hz updates the screen 75 times per second; when V-Sync is enabled the GPU(s) synchronize their output frames to that of the monitor, effectively limiting the output maximum to 75 FPS. When V-Sync is disabled, the GPU(s) are not limited by the refresh rate of the monitor and produce as many frames as possible by their hardware limitations. While it may appear that system performance has increased without this enabled, you may actually be seeing less than if you had V-Sync enabled. Since the frames are not in sync with the monitor's update rate, some frames are not displayed or are partially rendered, and the latter consequence is known as tearing. Tearing is when a frame is not completely displayed on the screen and can have either a black line running across the bottom or a half of the screen missing, which is a common result of V-Sync being disabled. However, V-Sync isn't entirely advantageous: due to the nature of vertical synchronization forcing the graphics subsystem to wait for the monitor's refresh rate rather than operating independently of it, the overall frame rate will slow significantly if the visual computing requirements for a scene exceed those available to produce a frame rate matching or exceeding the refresh time of the display, no matter how slightly. For a monitor with a refresh rate of 60 Hz, any frame rate between 59 FPS and 30 FPS will subjectively equate to 30 FPS because of how the frames are stored in the frame buffers and how the completion times correspond to the monitor's refresh rate. Assuming the frame rate is a constant 45 FPS: when the refresh happens, frame 1 is displayed for the first time while two-thirds of frame 2 are written to the back buffer. The next refresh happens, but frame 2 is not yet complete so the monitor continues to display the first frame. On the next refresh, the last third of frame 2 will have been written and copied to the display buffer while one-third of frame 3 is written to the back buffer. By the next refresh, frame 3 is completed and has been flipped to the display buffer to be shown on the monitor. Two frames are completed every four cycles, giving a frame rate of exactly 30 FPS. The equivalent frame rates for double buffering can be calculated with the following: (rr/(ceiling(rr/or))), where rr is the refresh rate of the monitor and or is the output rate (frame rate) of the GPUs in any case where it is below the refresh rate.