SYNCHRONOUS DRAM - High-Performance Memory Technologies

High-Performance Memory Technologies

8.1 SYNCHRONOUS DRAM

As system clock frequencies increased well beyond 50 MHz, conventional DRAM devices with asynchronous interfaces became more of a limiting factor in overall system performance. Asynchro--Balch.book Page 173 Thursday, May 15, 2003 3:46 PM

174 Advanced Digital Systems

nous DRAMs have associated pulse width and signal-to-signal delay specifications that are tied closely to the characteristics of their internal memory arrays. When maximum bandwidth is desired at high clock frequencies, these specifications become difficult to meet. It is easier to design a sys-tem in which all interfaces and devices run synchronously so that interface timing becomes an issue of meeting setup and hold times, and functional timing becomes an issue of sequencing signals on discrete clock edges.

Synchronous DRAM, or SDRAM, is a twist on basic asynchronous DRAM technology that has been around for more than three decades. SDRAM can essentially be considered as an asynchronous DRAM array surrounded by a synchronous interface on the same chip, as shown in Fig. 8.1. A key architectural feature in SDRAMs is the presence of multiple independent DRAM arrays—usually either two or four banks. Multiple banks can be activated independently and their transactions inter-leaved with those of other banks on the IC’s synchronous interface. Rather than creating a bottle-neck, this functionality allows higher efﬁciency, and therefore higher bandwidth, across the interface. One factor that introduces latency in random accesses across all types of DRAM is the row activation time: a row must ﬁrst be activated before the column address can be presented and data read or written. An SDRAM allows a row in one bank to be activated while another bank is actively engaged in a read or write, effectively hiding the row activation time in the other bank. When the current transaction completes, the previously activated row in the other bank can be called upon to perform a new transaction without delay, increasing the device’s overall bandwidth.

The synchronous interface and internal state logic direct interleaved multibank operations and burst data transfers on behalf of an external memory controller. Once a transaction has been started, one data word ﬂows into or out of the chip on every clock cycle. Therefore, an SDRAM running at 100 MHz has a theoretical peak bandwidth of 100 million words per second. In reality, of course, this number is somewhat lower because of refresh and the overhead of beginning and terminating transactions. The true available bandwidth for a given application is very much dependent on that application’s data transfer patterns and the capabilities of its memory controller.

Rather than implementing a DRAM-style asynchronous interface, the SDRAM’s internal state logic operates on discrete commands that are presented to it. There are still familiar sounding signals such as RAS* and CAS*, but they function synchronously as part of other control signals to form commands rather than simple strobes. Commands begin and terminate transactions, perform refresh operations, and conﬁgure the SDRAM for interface characteristics such as default burst length.

SDRAM can provide very high bandwidth in applications that exploit the technology’s burst transfer capabilities. A conventional computer with a long-line cache subsystem might be able to fetch 256 words in as few as 260 cycles: 98.5 percent efﬁciency! Bursts amortize a ﬁxed number of overhead cycles across the entire transaction, greatly improving bandwidth. Bandwidth can also be improved by detecting transactions to multiple banks and interleaving them. This mode of operation

Column

-Balch.book Page 174 Thursday, May 15, 2003 3:46 PM

High-Performance Memory Technologies 175

allows some new burst transfers to be requested prior to the current burst ending, thereby hiding the initial startup latency of the subsequent transaction.

Most of the input signals to the state logic shown in Fig. 8.1 combine to form the discrete com-mands listed in Table 8.1. A clock enable, CKE, must be high for normal operation. When CKE is low, the SDRAM enters a low-power mode during which data transactions are not recognized. CKE can be tied to logic 1 for applications that are either insensitive to power savings or require continual access to the SDRAM. Interface signals are sampled on the rising clock edge. Many SDRAM de-vices are manufactured in multibyte data bus widths. The data mask signals, DQM[], provide a con-venient way to selectively mask individual bytes from being written or being driven during reads.

Each byte lane has an associated DQM signal, which must be low for the lane to be written or to en-able the lane’s tri-state buffers on a read.

Some common functions include activating a row for future access, performing a read, and pre-charging a row (deactivating a row, often in preparation for activating a new row). For complete de-scriptions of SDRAM interface signals and operational characteristics, SDRAM manufacturers’ data sheets should be referenced directly. Figure 8.2 provides an example of how these signals are used to implement a transaction and serves as a useful vehicle for introducing the synchronous interface.

CS* and CKE are assumed to be tied low and high, respectively, and are not shown for clarity.

The ﬁrst requirement to read from an SDRAM is to activate the desired row in the desired bank.

This is done by asserting an activate (ACTV) command, which is performed by asserting RAS* for one cycle while presenting the desired bank and row addresses. The next command issued to con-tinue the transaction is a read (RD). However, the controller must wait a number of cycles that trans-lates into the DRAM array’s row-activate to column-strobe delay time. The timing characteristics of the underlying DRAM array is expressed in nanoseconds rather than clock cycles. Therefore, the in-teger number of delay cycles is different for each design, because it is a function of the clock period and the internal timing speciﬁcation. If, for example, an SDRAM’s RAS* to CAS* delay is 20 ns, and the clock period is 20 ns or slower, an RD command could be issued on the cycle immediately TABLE 8.1 Basic SDRAM Command Set

Command CS* RAS* CAS* WE* Address AP/A10

Bank activate L L H H Bank, row A10

Read L H L H Bank, column L

Read with auto-precharge L H L H Bank, column H

Write L H L L Bank, column L

Write with auto-precharge L H L L Bank, column H

No operation L H H H X X

Burst terminate L H H L X X

Bank precharge L L H L X L

Precharge all banks L L H L X H

Mode register set L L L L Conﬁguration Conﬁguration

Auto refresh L L L H X X

Device deselect H X X X X X

-Balch.book Page 175 Thursday, May 15, 2003 3:46 PM

176 Advanced Digital Systems

following the ACTV. Figure 8.2 shows an added cycle of delay, indicating a clock period less than 20 ns but greater than 10 ns (a 50–100 MHz frequency range). During idle cycles, a no-operation (NOP) command is indicated by leaving RAS*, CAS*, and WE* inactive.

The RD command is performed by asserting CAS* and presenting the desired bank select and column address along with the auto-precharge (AP) ﬂag. A particular bank must be selected, be-cause the multibank SDRAM architecture enables reads from any bank. AP is conveyed by address bit 10 during applicable commands, including reads and writes. Depending on the type of command, AP has a different meaning. In the case of a read or write, the assertion of AP tells the SDRAM to automatically precharge the activated row after the requested transaction completes. Precharging a row returns it to a quiescent state and also clears the way for another row in the same bank to be ac-tivated in the future. A single DRAM bank cannot have more than one row active at any given time.

Automatically precharging a row after a transaction saves the memory controller from explicitly pre-charging the row after the transaction. If, however, the controller wants to take full advantage of the SDRAM’s back-to-back bursting capabilities by leaving the same row activated for a subsequent transaction, it may be worthwhile to let the controller decide when to precharge a row. This way, the controller can quickly reaccess the same row without having to issue a redundant ACTV command.

AP also comes into play when issuing separate precharge commands. In this context, AP determines if the SDRAM should precharge all of its banks or only the bank selected by the address bus.

Once the controller issues the RD command (it would be called RDA if AP is asserted to enable auto-precharge), it must wait a predetermined number of clock cycles before the data is returned by the SDRAM. This delay is known as CAS latency, or CL. SDRAMs typically implement two latency options: two and three cycles. The example in Fig. 8.2 shows a CAS latency of two cycles. It may sound best to always choose the lower latency option, but as always, nothing comes for free. The SDRAM trades off access time (effectively, t_CO) for CAS latency. This becomes important at higher clock frequencies where fast t_CO is crucial to system operation. In these circumstances, an engineer is willing to accept one cycle of added delay to achieve the highest clock frequency. For example, a Micron Technology MT48LC32M8A2-7E 256-Mb SDRAM can operate at 143 MHz with a CAS la-tency of three cycles, but only 133 MHz with a CAS lala-tency of two cycles.^* One cycle of additional delay will be more than balanced out by a higher burst transfer rate. At lower clock rates, it is often possible to accept the slightly increased access time in favor of a shorter CAS latency.

* 256MSDRAM_D.p65-RevD; Pub. 1/02, Micron Technologies, 2001, p. 11.

Address DQM CAS*

WE*

RAS*

(command) CLK

Data

ACTV NOP RD NOP

B,R x B,AP,C x

D0 D1 D2 D3

t_{RAS to CAS} CAS Latency=2

FIGURE 8.2 Four-word SDRAM burst read (CL = 2, BL = 4).

-Balch.book Page 176 Thursday, May 15, 2003 3:46 PM

High-Performance Memory Technologies 177

Once the CAS latency has passed, data begins to flow on every clock cycle. Data will flow for as long as the specified burst length. In Fig. 8.2, the standard burst length is four words. This parameter is configurable and adds to the flexibility of an SDRAM. The controller is able to set certain param-eters at start-up, including CAS latency and burst length. The burst length then becomes the default unit of data transfer across an SDRAM interface. Longer transactions are built from multiple back-to-back bursts, and shorter transactions are achieved by terminating a burst before it has completed.

SDRAMs enable the controller to configure the standard burst length as one, two, four, or eight words, or the entire row. It is also possible to configure a long burst length for reads and only single-word writes. Configuration is performed with the mode register set (MRS) command by asserting the three primary control signals and driving the desired configuration word onto the address bus.

As previously mentioned, DQM signals function as an output disable on a read. The DQM bus (a single signal for SDRAMs with data widths of eight bits or less) follows the CAS* timing and, therefore, leads read data by the number of cycles deﬁned in the CAS latency selection. The preced-ing read can be modiﬁed as shown in Fig. 8.3 to disable the two middle words.

In contrast, write data does not have an associated latency with respect to CAS*. Write data be-gins to ﬂow on the same cycle that the WR/WRA command is asserted, as shown in Fig. 8.4. This

Address DQM CAS*

WE*

RAS*

(command) CLK

Data

ACTV NOP RD NOP

B,R x B,AP,C x

D0 D1 D2 D3

FIGURE 8.3 Four-word SDRAM burst read with DQM disable (CL = 2, BL = 4).

CLK

Data

ACTV NOP WR NOP

B,R x B,AP,C x

D0 x x D3

FIGURE 8.4 Four-word SDRAM burst write with DQM masking (BL = 4).

-Balch.book Page 177 Thursday, May 15, 2003 3:46 PM

178 Advanced Digital Systems

example also shows the timing of DQM to prevent writing the two middle words. Since DQM fol-lows the CAS* timing, it is also directly in line with write data. DQM is very useful for writes, espe-cially on multibyte SDRAM devices, because it enables the uniform execution of a burst transfer while selectively preventing the unwanted modiﬁcation of certain memory locations. When working with an SDRAM array composed of byte-wide devices, it would be possible to deassert chip select to those byte lanes that you don’t want written. However, there is no such option for multibyte de-vices other than DQM.

When the transaction completes, the row is left either activated or precharged, depending on the state of AP during the CAS* assertion. If left activated, the controller may immediately issue a new RD or WR command to the same row. Alternatively, the row may be explicitly precharged. If auto-matically precharged, a new row in that bank may be activated in preparation for other transactions.

A new row can be activated immediately in most cases, but attention must be paid to the SDRAM’s speciﬁcations for minimum times between active to precharge commands and active to active com-mands.

After conﬁguring an SDRAM for a particular default burst length, it will expect all transactions to be that default length. Under certain circumstances, it may be desirable to perform a shorter transac-tion. Reads and writes can be terminated early by either issuing a precharge command to the bank that is currently being accessed or by issuing a burst-terminate command. There are varying restric-tions and requirements on exactly how each type of transaction is terminated early. In general, a read or write must be initiated without automatic precharge for it to be terminated early by the memory controller.

The capability of performing back-to-back transactions has been already mentioned. In these situ-ations, the startup latency of a new transaction can be accounted for during the data transfer phase of the previous transaction. An example of such functionality is shown in Fig. 8.5. This timing diagram uses a common SDRAM presentation style in which the individual control signals are replaced by their command equivalent. The control signals are idle during the data portion of the ﬁrst transac-tion, allowing a new request to be asserted prior to the completion of that transaction. In this exam-ple, the controller asserts a new read command for the row that was previously activated. By asserting this command one cycle (CAS latency minus one) before the end of the current transaction, the controller guarantees that there will be no idle time on the data bus between transactions. If a the second transaction was a write, the assertion of WR would come the cycle after the read transaction ended to enable simultaneous presentation of write data in phase with the command. However, when following a write with a read, the read command cannot be issued until after the write data com-pletes, causing an idle period on the data bus equivalent to the selected CAS latency.

This concept can be extended to the general case of multiple active banks. Just as the controller is able to assert a new RD in Fig. 8.5, it could also assert an ACTV to activate a different bank. There-fore, any of an SDRAM’s banks can be asserted independently during the idle command time of an in-progress transaction. When these transactions end, the previously activated banks can be seam-lessly read or written in the same manner as shown. This provides a substantial performance boost and can eliminate most overhead other than refresh in an SDRAM interface.

Address

FIGURE 8.5 Back-to-back read transactions (CL = 2, BL = 4).

-Balch.book Page 178 Thursday, May 15, 2003 3:46 PM

High-Performance Memory Technologies 179

Periodic refresh is a universal requirement of DRAM technology, and SDRAMs are no exception.

An SDRAM device may contain 4,096 rows per bank (or 8,192, depending on its overall size) with the requirement that all rows be refreshed every 64 ms. Therefore, the controller has the responsibil-ity of ensuring that 4,096 (or 8,192) refresh operations are carried out every 64 ms. Refresh com-mands can be evenly spaced every 15.625 µs (or 7.8125 µs), or the controller might wait until a certain event has passed and then rapidly count out 4,096 (or 8,192) refresh commands. Different SDRAM devices have slightly differing refresh requirements, but the means of executing refresh op-erations is standardized. The ﬁrst requirement is that all banks be precharged, because the auto-re-fresh (REF) command operates on all banks at once. An internal reauto-re-fresh counter keeps track of the next row across each bank to be refreshed when a REF command is executed by asserting RAS* and CAS* together.

It can be easy to forget the asynchronous timing requirements of the DRAM core when designing around an SDRAM’s synchronous interface. After a little time spent studying state transition tables and command sets, the idea that an asynchronous element is lurking in the background can become an elusive memory. Always be sure to verify that discrete clock cycle delays conform to the nanosec-ond timing speciﬁcations that are included in the SDRAM data sheet. The tricky part of these timing speciﬁcations is that they affect a system differently, depending on the operating frequency. At 25 MHz, a 20-ns time delay is less than one cycle. However, at 100 MHz, that delay stretches to two cycles. Failure to recognize subtle timing differences can cause errors that may manifest themselves as intermittent data corruption problems, which can be very time consuming to track down.

SDRAM remains a mainstream memory technology for PCs and therefore is manufactured in substantial volumes by multiple manufacturers. The SDRAM market is a highly competitive one, with faster and denser products appearing regularly. SDRAMs are commonly available in densities ranging from 64 to 512 Mb in 4, 8, and 16-bit wide data buses. Older 16-Mb parts are becoming harder to ﬁnd. For special applications, 32-bit wide devices are available, though sometimes at a slight premium as a result of lower overall volumes.

Dans le document COMPLETE DIGITAL DESIGN (Page 194-200)