Robust Intermediate Read-Out for Deep Submicron Technology CMOS Image Sensors

Chen Shoushun, Student Member, IEEE, Farid Boussaid, Senior Member, IEEE, and Amine Bermak, Senior Member, IEEE

Abstract—In this paper, a CMOS image sensor featuring a novel spiking pixel design and a robust digital intermediate read-out is proposed for deep submicron CMOS technologies. The proposed read-out scheme exhibits a relative insensitivity to the ongoing aggressive scaling of the supply voltage. It is based on a novel compact spiking pixel circuit, which combines digitizing and memory functions. Illumination is encoded into a Gray code using a very simple yet robust Gray 8-bit counter memory. Circuit simulations and experiments demonstrate the successful operation of a 64 × 64 image sensor, implemented in a 0.35 μm CMOS technology. A scalability analysis is presented. It suggests that deep sub-0.18 μm will enable the full potential of the proposed Gray encoding spiking pixel. Potential applications include multiresolution imaging and motion detection.

Index Terms—CMOS image sensor, intermediate read-out, scalability, spiking pixel.

I. INTRODUCTION

T HE PAST decade has seen the emergence of CMOS image sensors as a major player in the market of solid-state image sensors [1]–[3]. An increasingly large number of high-volume consumer imaging products now integrate CMOS image sensors. Examples include cell phones, cameras, fax machines, scanners to name a few. The selling point behind the success of CMOS image sensors lies in the use of the well-established semiconductor industry standard CMOS process, which results in reduced development and fabrication costs for CMOS image sensors. With the advent of deep submicron CMOS processes, which feature a minimum lithographic feature size below 0.18 μm, it becomes now possible to build high-performance single-chip cameras, integrating image capture and advanced on-chip processing circuitry. The fully integrated camera-on-chip, promises to offer significant advantages in terms of manufacturing cost, system volume and weight, power dissipation, and increased built-in functionalities [1], [3].

Examples of recently reported high-performance CMOS image sensors include a 19.9 V/Lux.s 512 × 512 digital CMOS image sensor with 12-bit column-parallel cyclic ADCs [4], a 9 V/Lux.s 512 × 512 at 5000 frames/s [5] with column-parallel ADC architecture, and a 352 × 288 CMOS digital pixel sensor with per-pixel single slope ADC and dynamic memory [6]. Such high-performance systems find a wide range of applications from nuclear science, machine vision, automotive, crash analysis, tactical to scientific research, to name a few. The performance requirements can be very stringent in terms of frame rate (for tactical IR imaging [7], [8]), high dynamic range (90 dB and more to cope with outdoor illumination conditions), low noise and high sensitivity (to allow for single particle detection in nuclear and physics experiments), and massive data throughput (>50 Gpixels/s for high-resolution multimegapixel arrays) [9]. The ability of CMOS image sensors to capture intermediate “snapshots” of the scene taken during image formation, can be used to further improve the sensor performance by [3]: i) extending the sensor dynamic range [10]; ii) implementing early vision tasks such as tracking or pattern recognition; and iii) estimating the optical flow for tasks such as noise correction, video compression, super-resolution, or motion compensation [10]. The acquisition of intermediate “snapshots” requires read-out to be nondestructive and not affect the sensed photogenerated charge throughout the integration phase. In [11], Kawahito et al. proposed a bidirectional multiple-charge transfer active pixel that enables such a readout during the integration phase. The proposed pixel is essentially a photogate active pixel with an additional transistor in the floating diffusion node to temporarily store the signal charge. Other reported implementations [12] also rely on storing the sensed charge on a temporary floating node.

In this paper, we propose instead a digital implementation enabling robust intermediate read-out in deep submicron CMOS technologies. The proposed VLSI implementation is based on a pulse-frequency modulation (PFM) or spiking pixel [13], which encodes illumination information into a train of spikes or pulses [13]. Such a scheme combines a number of advantages [14] such as digital output, linear response, wide dynamic range, and a relative insensitivity to the ongoing aggressive power supply scaling, which severely degrades the signal-to-noise ratio (SNR) and dynamic range of existing mainstream active pixel sensors [15], [16]. The proposed new spiking pixel circuitry provides intermediate readout capability during the integration phase, with no perturbation, error or loss introduced at the sensing node. This feature enables PFM detection.
sensors to operate in a “high frame rate mode” and to provide intermediate snapshots of the scene, early in the integration phase. To the best of our knowledge, such a capability has not yet been reported. Research on PFM pixels has so far focused mainly on: (i) extending the dynamic range [17], [18]; (ii) retinal prosthesis [19], [20]; and (iii) pulse coding processing [21]. The actual integration of advanced spiking in-pixel circuitry has received, so far, little attention and has been limited to a conventional flip-flop construction, resulting in a prohibitively high number of transistors and significantly degraded fill-factor [22]. In this paper, we present a novel compact in-pixel circuitry topology combining counting and memory functions. The in-pixel circuitry uses Gray code to prevent multibit count error and enable robust digital intermediate readout during the integration phase. Potential advantages and applications of the proposed implementation include robust operation at low voltages, multisolution imaging, motion vector estimation, real-time imaging, as well as dynamic range extension [3].

This paper is organized as follows. Section II discusses the pixel design and operation principle. Section III describes the proposed intermediate read-out strategy, while Section IV describes its VLSI implementation. Section V reports experimental results and discusses the potential scalability of pixel circuitry in deep submicron CMOS technologies. Finally, conclusions are drawn in Section VI.

II. PIXEL DESIGN AND OPERATION

Fig. 1 shows the block diagram of the proposed pixel. Each pixel includes a photosensitive element, a reset circuitry, a comparator, a delay chain and an 8-bit Gray counter memory.

Operation of the pixel is as follows. Initially, a reset operation is performed with the global reset signal GR maintained low. This disables the in-pixel comparator and resets the voltage $V_N$ to $V_{DD}$. The integration phase starts when transistor M2 is opened (i.e., GR high), enabling the comparator and leaving the photodiode floating. Incident light generates electron hole pairs in the depletion region of the photodiode causing the voltage at the sensing node to decrease from $V_{DD}$ in response to the generated photocurrent. $V_N$ decreases as a function of the intensity of incident light that falls upon the photodiode with high illumination levels resulting in faster voltage drops for $V_N$. When $V_N$ reaches the reference voltage $V_{ref}$, the output of the comparator goes high, causing the photodiode to be self-reset through the reset transistor M1. This has the effect of switching back the output of the comparator, which in turn deactivates the reset transistor M1. To allow for sufficient time to pull up $V_N$ to $V_{DD}$, an inverter delay chain is used. Note that the voltage at the sensing node $V_N$ is not reset from 0 to $V_{DD}$ but from $V_{ref}$ to $V_{DD}$. A pulse is generated and received by the Gray code counter each time this self-reset operation occurs. This process is repeated until the end of the integration phase (i.e., when global reset signal GR goes low). The time separating successive pulses depends on the rate of decrease of $V_N$. In fact, if we assume that the intensity of incident light is constant during the integration process, then the frequency of the generated pulse train is a linear function of the incident light intensity. The pulse train Clk at the output of the delay chain is used as a clock signal by the in-pixel counter memory, which counts and stores the number of generated pulses in the form of an 8-bit digital Gray code (Fig. 1). Note that the duty cycle of the Clk signal is a function of the number of delay elements present at the output of the comparator. For the case of three inverters, the active pulsewidth is around 600 ps, which means that a simple dynamic memory can be used to maintain charge during this period. This feature is behind the compact Gray code counter memory cell structure shown in Fig. 2(a). The basic idea is to combine counting and memory functions into a compact single circuit. Each bit circuitry comprises an SRAM cell, a DRAM cell, and a toggling combinational logic control circuitry [Fig. 2(a)]. The SRAM cell is implemented by means of two coupled inverters, and the DRAM cell by means of a simple MOS capacitor. The role of the DRAM is to hold the value of the pulse count $X$ while it is being incremented in the SRAM, as shown in Fig. 2(b). Since the duration of the generated Clk pulses is very short, there is no need to refresh the content of the DRAM $A_i (i = 0 \rightarrow 7)$. 

---

**Fig. 1.** In-pixel building blocks.

**Fig. 2.** Gray code counter/memory circuitry.
When the clock Clk goes high, the content of the SRAM will be updated provided that transmission gates are enabled. As a result, the toggling condition for each bit can be implemented with limited cascaded transmission gates while bringing the overall number of transistors from over 200 transistors down to 139 transistors.

### III. ROBUST INTERMEDIATE READ-OUT

Fig. 4 illustrates the operation of the proposed in-pixel Gray counter memory when no intermediate read-out (i.e., read-out during the integration of optically generated charges) is requested. In Fig. 4, the waveforms for bits $B_0$ to $B_3$ are given for a sensed photocurrent varying as a sine wave. Observe that the frequency of the generated Clk pulses is directly proportional to the sensed photocurrent or incident light. In addition, two consecutive counts differ by one bit and one bit only (Fig. 4). The use of Gray encoding limits bit switching activity, power dissipation and prevents multibit count error if intermediate read-out of the counter memory is to be carried out during the integration phase. This situation is depicted in Fig. 5 for the case of a constant photocurrent or uniform illumination, which results in a pulse train Clk of fixed period.

When intermediate read-out is carried out (Select high), the Clk pulses are blocked and the pixel digital count cannot be incremented during this read-out operation. The simulation results of Fig. 5 show the impact of intermediate read-out on the pixel digital count, with two possible scenarios highlighted: (a) a Clk pulse occurs during the Select pulse, resulting in a missed pulse count and (b) no Clk pulse count occurs and thus no pulse count error is introduced.

An important feature of the proposed in-pixel Gray counter memory is that the inputs to each bit toggling circuitry (example of B1 shown in Fig. 3) are constant upon generation of a Clk pulse. As a result, Clk is the only critical signal in the proposed design. The Clk pulse should, thus, remain high long enough to enable the update of the content $B_i$ of the SRAM [Fig. 2(b)]. An important design parameter is the width of the Clk pulses, which is set by the inverter delay chain shown in Fig. 1. The impact of the Clk pulsewidth on bit $B_0$ is depicted in the parametric simulation illustrated in Fig. 6, where the proper write operation of bit $B_0$ is simulated as a function of the clock pulsewidth. Note that a minimum Clk pulsewidth of about 170 ps is required.

---

### Table I: Counter Sequence for a 3-Bit Gray Code and Corresponding Bit Toggling Conditions. The Gray Counter Increments With Each Incoming Clock Pulse Clk

<table>
<thead>
<tr>
<th>Decimal Number</th>
<th>Gray $B_2B_1B_0$</th>
<th>Dummy Bit $A_0$</th>
<th>Cycle # Toggled Bit $B_i$</th>
<th>Toggle Condition $A_0\cdot Dum\cdot Clk$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>000</td>
<td>0</td>
<td>0</td>
<td>$B_0$ Dum.Clk</td>
</tr>
<tr>
<td>1</td>
<td>001</td>
<td>1</td>
<td>1</td>
<td>$B_0$ Dum.Clk</td>
</tr>
<tr>
<td>2</td>
<td>011</td>
<td>0</td>
<td>2</td>
<td>$B_1$ $A_0\cdot Dum\cdot Clk$</td>
</tr>
<tr>
<td>3</td>
<td>010</td>
<td>1</td>
<td>3</td>
<td>$B_0$ Dum.Clk</td>
</tr>
<tr>
<td>4</td>
<td>110</td>
<td>0</td>
<td>4</td>
<td>$B_2$ $A_3\cdot A_0\cdot Dum\cdot Clk$</td>
</tr>
<tr>
<td>5</td>
<td>111</td>
<td>1</td>
<td>5</td>
<td>$B_0$ Dum.Clk</td>
</tr>
<tr>
<td>6</td>
<td>101</td>
<td>0</td>
<td>6</td>
<td>$B_1$ $A_0\cdot Dum\cdot Clk$</td>
</tr>
<tr>
<td>7</td>
<td>100</td>
<td>1</td>
<td>7</td>
<td>$B_0$ Dum.Clk</td>
</tr>
</tbody>
</table>

---

**Fig. 3.** Improved implementation using transmission gates for B1.

---

During the update (Clk high), the content $B_i$ of the SRAM is inverted and fed back to the SRAM if the toggle condition of the bit is fulfilled (transistor $M_{N3}$ ON). When the Clk pulse goes inactive, the DRAM and SRAM cells become connected (transistor $M_{D3}$ ON) and the DRAM cell content is updated.

To explain the operation principle of the Gray counter/memory, let us examine the case of the 3-bit Gray code sequence, shown in Table I. Note that $B_0$ toggles every two clock cycles. To monitor whether the number of clock cycles is even or odd, a dummy bit $Dum$ is added and used for this purpose. From Table I, one can deduce that the toggling conditions for bits $B_0$ and $B_1$ are $Dum\cdot Clk$ and $A_0\cdot Dum\cdot Clk$, respectively. In the same manner, one can deduce a general “toggling condition” for $i \geq 2$ expressed for bit $B_i$ as

$$Dum\cdot Clk\cdot A_{i-1}\cdot \prod_{k=0}^{i-2} A_k.$$

Fig. 2(b) shows the corresponding toggling circuitry for the in-pixel 8-bit Gray counter memory, which comprises over 200 transistors. The number of transistors was reduced down to 139 by implementing the bit “toggling condition” circuitry using transmission gates. Fig. 3 illustrates the implementation methodology in the case of bit $B_1$. The basic idea is to use cascaded transmission gates in series with transistor $M_{N3}$.
to correctly toggle the bit value and ensure correct operation of the Gray counter memory. A wider pulselength will not affect the circuit toggling capability as the upper bound of the pulselength must only ensure that there is no loss of information at the DRAM level. We have carried out extensive simulations for different values of the DRAM capacitor and for the maximum possible clock pulselengths. Our simulations show that even for a capacitor of $3\times10^{-16}$ fF, charges can be kept for as long as 80 ns. Consequently, the proposed circuitry is not affected by variations in Clk pulselength, which is in the range of nanoseconds.

The proposed spiking pixel architecture can achieve lossless “parallel counting and readout” by using column-based buffers. Under these conditions, simultaneous counting and readout becomes possible if we are not buffering or reading out data from the same buffers. For instance, row $i$ could be readout from one buffer, while row $i+1$ could be buffered. Subsequently, row $i+1$ would be readout and row $i+2$ buffered and this process would continue for all rows. With the aforementioned “parallel counting and readout” mode, each in-pixel counter could operate continuously during integration. As a result, intermediate read-out will not cause any count/error at the pixel level. Since the pixel output is in digital form, pixel readout is much faster here (at fast SRAM speeds) and more accurate than if the pixel output was in analog form. High throughput could be achieved by tightly coupling additional on-chip memory [6].

IV. VLSI IMPLEMENTATION

The implementation of the proposed intermediate read-out scheme is based on the VLSI architecture depicted in Fig. 7, for the case of an $m \times n$ CMOS image sensor, where $m$ and $n$ refer to the number of rows and columns, respectively. A pixel $P_{i,j}$ of the $m \times n$ pixel array is read-out if the row and column address signals $RS_i$ and $CS_j$ are both active. The Select signal of Fig. 1 is the resulting AND Boolean function of the row and column select signals. An 8 bit-wide column bus is used to output pixel digital values (Fig. 7). The read-out of the pixel array values can be chosen to be sequential or random. Sequential read-out of the entire pixel array is implemented by means of a counter, which generates the address signals to the row and column decoders. On the other hand, the random read-out of individual pixels or regions of interest is controlled externally via two 6-bit address words provided externally to the row and column decoders, which are implemented at the pixel pitch. A set of externally controlled switches and multiplexers are used to define the type of read-out mode. Intermediate read-out is carried out in the same manner at the end of the integration phase, using the same row and column address decoders (Fig. 7).

A 64 $\times$ 64 CMOS image sensor prototype was implemented in full custom using AMIS 0.35 $\mu$m CMOS technology. This mixed analog/digital process, available through the Europractice IC service, features: five metal layers, self-aligned twin tub N- and P Poly gates, W-plug filling of stackable contacts and vias, nitride-based passivation, and 2.0–3.6 V power supply. The fabricated image sensor is operated at 3.3 V. Each pixel has a size of $50 \times 50 \mu$m with a fill-factor of 20%. The pixel layout is shown in Fig. 8, with the main building blocks labeled. The photosensitive elements are $n^+p$ photodiodes chosen for their high quantum efficiency. Except for the photodiode, the entire in-pixel circuitry (Fig. 1) is shielded from incoming photons to minimize the impact of light-induced current resulting in parasitic light contribution to the signal. Guard rings are used extensively to limit substrate coupling and as a means to shield the
Fig. 8. Pixel layout in AMIS 0.35 μm CMOS technology with the main building blocks labeled.

Fig. 9. Experimental setup.

carefully attention was paid to the floor planning in order to facilitate the routing of the control signals at the pixel pitch and to isolate analog sensitive parts from the digital circuitry.

V. EXPERIMENTS AND DISCUSSION

In order to characterize the prototype an experimental platform was designed including the device under test mounted on a Printed Circuit Board (PCB) connected through National Instrument data acquisition board to a PC (Fig. 9). The optical part of the experimental setup comprises an integrating sphere, a light source, and a digital light meter (Fig. 9). A close-up photograph of the DUT is shown as an inset on the top left corner of Fig. 9. The National Instruments data acquisition board is used to generate and supply the various control signals for the DUT. The imager’s 8-bit digital output code is acquired and corresponding frames can be displayed on the PC.

Fig. 10 gives the pixel response, with a nonlinearity of less than 3% over about five decades of variations in illumination level. Note that the available experimental setup could not bring the sensor to saturation.

At higher illumination levels, the pixel input–output characteristic would become nonlinear because it takes significantly more time to charge the photodiode, resulting in increased reset delay. At lower illumination levels, the photocurrent becomes comparable to the dark current, which in turn places a lower limit on the detectable optical power. The dark current’s contribution across the pixel array was estimated by recording the “dark time,” that is the time it takes for the dark current to discharge the photodiode and generate a pulse in the absence of light. The mean dark time was evaluated to be 38 s for Vref = 2.3 V. For the maximum illumination (105 lux) provided by our optical setup, a period of 2 μs was recorded for the Clk pulse train. Read-out was carried out for a Select pulse duration of around 50 ns (Fig. 1) to ensure reliable read-out. The dynamic range of a CMOS sensor is typically defined as the ratio of the largest detected signal to the smallest simultaneously detectable signal (or noise floor) [1]. In the case of our time-domain PFM pixel, only the linear part of the characteristic is to be considered as the frequency of the generated Clk should be a linear function of the incident light [13].

The dynamic range is, thus, defined here as the illumination range for which the pixel characteristic is linear. The experimental measurements shown in Fig. 10 reveal that the sensor exhibits a dynamic range better than 90 dB. Note that the available experimental setup was not able to saturate our PFM pixel, which is capable of around 120 dB of linear dynamic range [14]. It is, however, important to note that the lower bound of the dynamic range is not only limited by the noise floor figure (mainly, the dark current) but also by the frame rate of the sensor. In
Fig. 10, the dynamic range is reported assuming no limitation with respect to the frame rate of the sensor. If a minimum of 20 frames/s is required, the minimum read-out frequency will be 20 Hz, which sets the lower bound of the dynamic range to about 10 lux (refer to Fig. 10). Higher frame rate will further increase the minimum detectable level, further reducing the dynamic range at the lower end of the illumination range. This is one of the major limitations of time-domain image sensors. As we shall see later in this section, the proposed intermediate read-out enables PFM sensors to operate in “a high frame rate,” providing intermediate snapshots early in the integration phase.

One can also deduce from Fig. 10 the very small duty cycle of the Clk signal given that the Clk period is in the range of $10^{-7}$–$10^{-6}$ s (depending on the illumination level) and given that the width of the Clk pulse is 2–4 ns. For a read-out time of 50–100 ns, one can thus deduce that only a single-pulse count can be missed during the intermediate read-out. This accounts for the worse case scenario, for which Clk and Select happen to be synchronized.

The experimental setup of Fig. 9 was used to perform FPN measurements. The FPN was evaluated as the standard deviation of pixel values from the array mean, under flat field illumination. In order to minimize random noise, a total of 150 images were acquired and averaged out to form the flat field image. Histograms in Fig. 11 report FPN measurements obtained for four different chips. The standard deviation for each of these distributions was found to be between 1.63–1.82×LSB, for the FPN. Off-chip digital FPN correction could reduce the level of FPN by a factor of at least 30 [9].

During FPN measurements, it was observed that the level of FPN increases with Vref. This indicates that the FPN is primarily an offset FPN, the most likely source being pixel-to-pixel variations of the comparator offset and the reset transistor. No autozeroing capability was included in this design to favor compactness and minimize power consumption. Table II summarizes the chip characteristics and performance. The power dissipation was observed to be 9.2 mW at 3.3 V, which is slightly

<table>
<thead>
<tr>
<th>Summary of Prototype Features</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
</tr>
<tr>
<td>Supply Voltage</td>
</tr>
<tr>
<td>Pixel size</td>
</tr>
<tr>
<td>Photo-element</td>
</tr>
<tr>
<td>Fill-Factor</td>
</tr>
<tr>
<td>Responsivity</td>
</tr>
<tr>
<td>Mean Dark time</td>
</tr>
<tr>
<td>Dynamic range</td>
</tr>
<tr>
<td>Raw FPN</td>
</tr>
<tr>
<td>Non-linearity</td>
</tr>
<tr>
<td>Power consumption</td>
</tr>
</tbody>
</table>
lower compared to its PWM counterpart [24], [25]. One would have expected this imager to consume more power since the PFM pixel is allowed to continuously fire, while in a PWM scheme, the pixel would only fire once during each frame capture. It is, however, interesting to note that the PFM scheme may result in higher overall power since the pixel data are globally routed to all pixels and a global timer switching at high-frequency results in significantly increased activity on a global bus (hence, large capacitance). Note that the proof-of-concept prototype, described in this paper, has not been optimized for low-power operation, which would require a better design of the outer pixel array circuitry and output buffers.

Sample images were acquired from the prototype and three patterns were captured at three different illumination levels: 10, 100, and 1000 lux. Fig. 12 shows a series of images labeled A1—A3, B1—B3, C1—C3, and acquired for four different integration times: $T$, $T/2$, $T/4$ and $T/8$, where $T = 100\, \mu\text{s}$. These images correspond to the four top rows in Fig. 12. The top row gives the output of the imager when no intermediate read-out is requested during the integration phase. Note that in the case of high-speed intermediate images, the signal increases gradually as more electron-hole pairs are collected by the photodiode. From Fig. 12, one can note that SNR improves with higher illumination level, as well as increased integration time. If we attempt to reconstruct the final image (obtained at time $t = T$) from the intermediate image, we obtain the images shown on the three bottom rows of Fig. 12. Note that patterns can be recognized as early as $T/8$, where $T$ is the integration time. This feature allows the user to tradeoff imaging quality for high-speed imaging. Another benefit of the proposed high-speed intermediate read-out is the possibility to extend the sensor dynamic range [10].

The ongoing aggressive scaling of the power supply is rapidly limiting the analog signal swing at the sensing node, degrading sensor SNR [15], [16]. This is because noise source contributions increase with device scaling [15], [16]. For example, as the thickness of the dielectric material is scaled down below 3 nm, significant tunneling current may flow from the drain to the gate in an off-state device or from the gate to the source in an on-state device [15]. This leakage current is “exponentially” dependent upon the oxide thickness. For sub-3 nm gate oxide thicknesses, the tunneling current can be five orders of magnitude larger than acceptable photodiode dark current densities [15], degrading significantly the performance of conventional active pixel sensors. Their dynamic range, commonly defined as the ratio of the largest nonsaturating signal to the standard
Fig. 13. Estimated fill-factor versus pixel size for different technology generations. Highlighted points are based on real layout design.

VI. CONCLUSION

In this paper, a CMOS imager based on a novel spiking pixel and a robust intermediate read-out is presented to enable robust video processing in deep submicron CMOS processes. The proposed read-out technique allows for the capture of high-speed intermediate “snapshots” of the scene, while the frame is being acquired. The image sensor uses a novel compact spiking pixel circuit, which combines digitizing and memory functions. Illumination is encoded into a Gray code using a very simple yet robust Gray 8-bit counter memory. It is demonstrated that the frame rate limitation of time-domain sensors can be overcome using reduced integration time combined with a simple interpolation technique. Circuit simulations and experiments demonstrate the successful operation of a 64 × 64 image sensor, implemented in a 0.35 µm CMOS technology. It is shown that deep sub-0.18 µm will enable the full potential of the proposed Gray encoding spiking pixel. Potential applications include multiresolution imaging and motion detection.

REFERENCES


deviation of the noise under dark conditions will, consequently, significantly worsen with device scaling, since the analog signal swing is reduced and noise contributions are seen to increase due to the predominance of short channel effects [16]. The sensor dynamic range and peak SNR are directly proportional to the well capacity [26] $Q_{\text{sat}} = V_{\text{swing}} \times C_{\text{sense}}$, where $V_{\text{swing}}$ and $C_{\text{sense}}$ represent the voltage swing and capacitance at the sensing node, respectively. The peak SNR of conventional active pixel sensors can be expressed [26] as $\text{SNR}_{\text{peak}} = Q_{\text{sat}}/q$. For a 0.13 µm technology, the projected peak SNR is less than 30 dB, which is inadequate [10]. Our interest in PFM pixels is also motivated by the fact that these self-reset pixels reuse the small well-capacity several times during integration. The effective well-capacity for PFM pixels becomes $m \times Q_{\text{sat}}$, where $m$ is the number of self-resets performed during integration. This results in a $m$-fold increase in peak SNR [10]. This important feature demonstrates that the proposed spiking pixel will provide a relative insensitivity to the ongoing aggressive scaling of power supply voltages, but also benefit from the advent of deep submicron CMOS technologies with smaller pixel size. With a nominal power supply of 3.3 V, the prototype described in this paper could still be operated down to 2.4 V.

Furthermore, because the signal is digitized at the pixel-level, noise contributions such as column read-out noise and column fixed-pattern noise are eliminated. However, the integration of advanced in-pixel circuitry comes at a cost of larger pixel sizes and degraded fill-factor. As for the number of transistors, the proposed spiking pixel exhibits a total of 139 transistors, which is relatively low when compared with a conventional flip-flop construction (214 transistors reported for Andoh’s digital image sensor [22]). It is important to note that spiking pixels and digital pixel sensors are not designed to compete with conventional three transistor pixels whether in terms of fill-factor or resolution. Instead, they are best geared towards high-speed imaging and video-oriented applications. In the following, we evaluate the scaling prospects of our spiking pixel. Fig. 13 shows the achievable pixel size and fill-factor for different technologies. The highlighted points for 0.35 µm and 0.18 µm are based on actual full custom layout implementations. The data corresponding to sub-0.18 µm technologies are the result of analytical estimations. For instance, for a fill-factor of 20%, it is possible to achieve a pixel size of about 26.5, 19, and 13 µm using 0.18, 0.13, and 0.09 µm CMOS processes, respectively. The pixel size can further be reduced by replacing the SRAM in Fig. 2 by either a 3 T or a 1 T DRAM cell. Fig. 13 illustrates the potential benefits of 3 T DRAM implementations in deep sub-0.18 µm CMOS technologies. Note that for a 20% fill-factor, it becomes possible to achieve a pixel size of about 23.5, 17, and 11.5 µm using 0.18, 0.13, and 0.09 µm CMOS processes, respectively. Using 1 T DRAM structure would further decrease the pixel size and improve the fill-factor but complicate the peripheral control circuitry since the 1 T DRAM requires the use of a refresh controller and a special sensing amplifier which enables to read and rewrite the pixel data at the same time. This could be achieved for example using a latch based sensing amplifier initialized in its metastable operating point.


Chen Shoushun (S’04) received the B.S. degree from the Department of Microelectronics, Peking University, Beijing, China, the M.E. degree from the Institute of Microelectronics, Chinese Academy of Sciences, Beijing, and the Ph.D. degree in electronic and computer engineering from the Hong Kong University of Science and Technology, Hong Kong, China, in 2000, 2003 and 2007, respectively. His master thesis was related to signal integrity in the design of the “Loosgyn-1” CPU, which was the first general purpose CPU designed in China. His Ph.D. research work involved the design of low-power CMOS image sensors and image processing operations using time-to-first spike (TFS) encoding and asynchronous read out techniques.

He is now a Postdoctoral Research Associate at the Hong Kong University of Science and Technology. His research interests are smart vision sensors, integrated biomedical sensors, asynchronous VLSI circuits and systems, wireless sensor networks, and VLSI signal processing architectures.

Farid Boussaid (M’00–SM’04) received the M.S. and Ph.D. degrees, both in microelectronics, from the National Institute of Applied Science (INSA), Toulouse, France, in 1996 and 1999, respectively.

From May 1999 to February 2000, he was a Research Associate within the Microsystems and Microstructures Research Group, French National Centre for Scientific Research (LAAS-CNRS), France. In March 2000, he joined Edith Cowan University, Perth, Australia, as a Postdoctoral Research Fellow and a member of the Visual Information Processing Research Group. In December 2001, he was the recipient of an Australian Research Council APD Fellowship to develop a new generation of smart vision sensors featuring on-chip and pixel-level implementation of human vision-based algorithms. In January 2005, he joined the University of Western Australia as a Lecturer. His research interests include smart CMOS vision sensors, neuromorphic systems, device simulation, modeling, and characterization in deep submicron CMOS processes.

Dr. Boussaid was the recipient of the 2004 IEEE Chester Sall Award and the Best Paper Award at the 2005 IEEE International Workshop on System-On-Chip for Real-Time Applications.

Amine Bermak (M’99–SM’04), received the M.Eng. and Ph.D. degrees, both in electronic engineering, from Paul Sabatier University, Toulouse, France, in 1994 and 1998, respectively.

During his Ph.D., he was part of the Microsystems and Microstructures Research Group at the French National Research Center LAAS-CNRS, where he developed a 3-D VLSI chip for artificial neural network classification and detection applications. He then joined the Advanced Computer Architecture Research Group at York University, York, U.K., where he was working as a Postdoc on VLSI implementation of CMM neural network for vision applications in a project funded by British Aerospace. In 1998, he joined Edith Cowan University, Perth, Australia, first as a Research Fellow working on smart vision sensors, then as a Lecturer and a Senior Lecturer in the School of Engineering and Mathematics. He is currently an Associate Professor with the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), where he is also serving as the Associate Director of Computer Engineering. His research interests are related to VLSI circuits and systems for signal, image processing, sensors, and microsystems applications. He has published extensively on the above topics in various journals, book chapters, and refereed international conferences.

Dr. Bermak has received many distinguished awards including the 2004 IEEE Chester Sall Award, the HKUST Bicentenary Foundation Engineering Teaching Excellence Award in 2004, and the Best Paper Award at the 2005 International Workshop on System-on-Chip for Real-Time Applications. He is a member of the technical program committees of a number of international conferences including the IEEE Custom Integrated Circuit Conference CICC 2006, CICC 2007, the IEEE Consumer Electronics Conference CEC 2007, Design Automation and Test in Europe DATE 2007 and DATE 2008. He is the General Co-Chair of the 2008 IEEE International Symposium on Electronic Design Test and Applications. He is also on the editorial board of IEEE TRANSACTIONS ON VLSI SYSTEMS, the IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, and the JOURNAL OF SENSORS. He is a member of the IEEE CAS Committee on Sensory Systems.