You are on page 1of 24

PCI project

FPGAs make powerful PCI development platforms, thanks to their re-programmability and operating speed.

The interface
    
Part Part Part Part Part 0: 1: 2: 3: 4: How to create a very Simple PCI interface How PCI works PCI Reads and Writes PCI logic analyzer PCI plug-and-play

The software
 
Part 5: PCI driver for Windows Part 6: PCI driver for Linux

The hardware
We used a Dragon board for this project.

Link
  
An overview on How the PCI Bus Works from Tech-Pro.

Simple PCI interface

This is an example of PCI code. We control an LED using PCI write commands. Writing a "0" turns the LED off, writing a "1" turns the LED on!

// Very simple PCI target // Just 3 flipflops for the PCI logic, plus one to hold the state of an LED module PCI(CLK, RSTn, FRAMEn, AD, CBE, IRDYn, TRDYn, DEVSELn, LED); input CLK, RSTn, FRAMEn, IRDYn; input [31:0] AD;

input [3:0] CBE; inout TRDYn, DEVSELn; output LED; parameter IO_address = 32'h00000200; // we respond to an "IO write" at this address parameter CBECD_IOWrite = 4'b0011; //////////////////////////////////////////////////// reg Transaction; wire TransactionStart = ~Transaction & ~FRAMEn; wire TransactionEnd = Transaction & FRAMEn & IRDYn; wire Targeted = TransactionStart & (AD==IO_address) & (CBE==CBECD_IOWrite); wire LastDataTransfer = FRAMEn & ~IRDYn & ~TRDYn; always @(posedge CLK or negedge RSTn) if(~RSTn) Transaction <= 0; else case(Transaction) 1'b0: Transaction <= TransactionStart; 1'b1: Transaction <= ~TransactionEnd; endcase reg DevSelOE; always @(posedge CLK or negedge RSTn) if(~RSTn) DevSelOE <= 0; else case(Transaction) 1'b0: DevSelOE <= Targeted; 1'b1: if(TransactionEnd) DevSelOE <= 1'b0; endcase reg DevSel; always @(posedge CLK or negedge RSTn) if(~RSTn) DevSel <= 0; else case(Transaction) 1'b0: DevSel <= Targeted; 1'b1: DevSel <= DevSel & ~LastDataTransfer; endcase assign DEVSELn = DevSelOE ? ~DevSel : 1'bZ; assign TRDYn = DevSelOE ? ~DevSel : 1'bZ; wire DataTransfer = DevSel & ~IRDYn & ~TRDYn; reg LED; always @(posedge CLK) if(DataTransfer) LED <= AD[0]; endmodule

How PCI works
We concentrate on PCI 2.2 32-bits here, which is what is used in today's PCs. Newer PCI versions include PCI 2.3 and PCI 3.0.

The PCI specification
The PCI is developed and maintained by a group called the PCI Special Interest Group (PCI-SIG in short). Unlike the Ethernet specification, the PCI specification cannot be downloaded for free. You need to be a member of the PCI-SIG to access the specification. As becoming a member is expensive, you might want to check your company's hardware group (assuming you work in the semiconductor industry) to see if you can get access to the specification. Otherwise here's a short introduction, followed by some links for more info.

PCI characteristics
The PCI bus has 4 main characteristics:

   

Synchronous Transaction/Burst oriented Bus mastering Plug-and-play

PCI is synchronous
The PCI bus uses one clock. The clock runs at 33MHz by default but can run lower (all the way down to idle = 0MHz) to save power, or higher (66MHz) if your hardware supports it.

PCI is Transaction/Burst oriented
PCI is transaction oriented. 1. 2. 3. 4. You You You You start a transaction specify the starting address (one clock cycle) send as many data as you want (many following clock cycles) end the transaction

PCI is a 32-bits bus, and so has 32 lines to transmit data. At the beginning of a transaction, the bus is used to specify a 32-bits address. Once the address is specified, many data cycles can go through. The address is not re-transmitted but is auto-incremented at each data cycle. To specify a different address, the transaction is stopped, and a new one started. So PCI bandwidth is best utilized in burst mode.

PCI allows bus mastering
PCI transactions work in a master-slave relationship. A master is an agent that initiates a transaction (can be a read or a write). While the host CPU is often the bus master, all PCI boards can potentially claim the bus and become a bus master.

PCI is plug-and-play
PCI boards are plug-and-play. That means that the host-CPU/host-OS can:

Determine the identity of each PCI board in a PCI bus (manufacturer & function (video, network...))

 

Determine the abilities/requirements of each board (how much memory space it requires, how many interrupts...) Relocate each board memory space

The last feature is an important part of plug-and-play. Each board responds to some addresses, but the addresses to which it responds can be programmed (i.e. each board generates its own board/chipselect signals). That allows the OS to "map" the address space of each board where he wants.

PCI "spaces"
PCI defines 3 "spaces" where you can read and write. When a transaction starts, the master specifies the starting address of the transaction, if it's a read or a write, AND which space he wants to speak to. 1. Memory space 2. IO space 3. Configuration space They work as follow:

 

The memory and IO spaces are the workhorse spaces. They are "relocatable" (i.e. the addresses at which each board responds can be moved). The configuration space is used for plug-and-play. It's a space where each board has to implement very specific registers at very specific addresses, so that the host-CPU/OS can figure out what is each board's identity/abilities/requirements. From there, the host CPU/OS enables and configures the other two spaces. This space is fixed and always starts at address 0 for all PCI boards; so one line of the PCI connector is used as board-select (for this space only).

To be compliant, a PCI board needs to implement configuration space. Memory and IO spaces are optional, but one or both is always used in practice.

PCI bridge
PCI devices don't connect directly to a host CPU, but go through a "bridge" chip. That's because CPUs typically don't "speak" PCI natively, so a bridge has to translate the transactions from the CPU's bus to the PCI's bus. Also CPUs never have 3 memory spaces like PCI devices. Most CPUs have 1 space (memory space), while other CPUs have 2 (memory & IO). The bridge has to play some tricks so that the CPU can still access all 3 PCI spaces.

PCI voltage
PCI boards can use 3.3V or 5V signaling. Interestingly, current PCs all use 5V signaling. PCI board connectors have one or two slots that identify if the board is 3.3V or 5V compliant. This is to ensure that, for example, a 3.3V only board cannot be plugged into a PC's 5V-only PCI bus. Here an example of 5V-only board:

while this board is both 5V and 3.3V compliant:

PCI timing
PCI specifies timing related to its clock. With a 33MHz clock, we have:

 

7ns/0ns Tsu/Th (setup/hold) constraint on inputs 11ns Tco (clock-to-output) on outputs

Links
   
A more detailed technical description in this PCI Local Bus Technical Summary from TechFest A short PCI Bus Operation page. Many interesting links on Craig's PCI Pages Also An Experiment to Build a PCI Board

PCI Reads and Writes
Let's do some real PCI transactions now...

IO transactions
The easiest PCI space to work with is the IO space.

 

No virtualization from the CPU/OS (i.e. CPU address = hardware address) No driver necessary (true on Win98/Me, while on Win XP/2K, a driver is required but generic ones are provided below)

The disadvantage of the IO space is that it's small (limited to 64KBs on PCs, even if PCI supports 4GBs) and pretty crowded.

Finding a free space
On Windows 98/Me, open the "Device Manager" (from "Control Panel"/System), then show Computer/Properties and check the "Input/Output (I/O)" panel.

On Windows XP/2000, open the "System Information" program (Programs/Accessories/System Tools/System Information) and click on "I/O".

Lots of peripherals are using the IO space, so free space candidates take a little research.

Device driver
The IO space is left unprotected on Win98/Me, so not driver is necessary there. For WinXP/2K, GiveIO and UserPort are free generic drivers that open up the IO space.

A RAM PCI card
Let's implement a small RAM in our PCI card. The RAM is 32 bits x 16 locations. That's small enough to fit in IO space using "direct addressing" (the IO space is so crowded that indirect addressing is otherwise necessary). We need to pick a free IO space in the host PC. Each 32bits location takes 4 bytes addresses, so we require 4x16=64 contiguous free addresses. We chose 0x200-0x23F here but you may have to choose something else. First the module declaration. module PCI_RAM( PCI_CLK, PCI_RSTn, PCI_FRAMEn, PCI_AD, PCI_CBE, PCI_IRDYn, PCI_TRDYn, PCI_DEVSELn ); input PCI_CLK, PCI_RSTn, PCI_FRAMEn, PCI_IRDYn; inout [31:0] PCI_AD; input [3:0] PCI_CBE; output PCI_TRDYn, PCI_DEVSELn; parameter IO_address = 32'h00000200; // 0x0200 to 0x23F parameter PCI_CBECD_IORead = 4'b0010; parameter PCI_CBECD_IOWrite = 4'b0011;

Then we keep track of what is happening on the bus through a "PCI_Transaction" register. "PCI_Transaction" is asserted when any transaction is going on, either for us, or any other card on the bus. reg PCI_Transaction; wire PCI_TransactionStart = ~PCI_Transaction & ~PCI_FRAMEn; wire PCI_TransactionEnd = PCI_Transaction & PCI_FRAMEn & PCI_IRDYn; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_Transaction <= 0; else case(PCI_Transaction) 1'b0: PCI_Transaction <= PCI_TransactionStart; 1'b1: PCI_Transaction <= ~PCI_TransactionEnd; endcase // We respond only to IO reads/writes, 32-bits aligned wire PCI_Targeted = PCI_TransactionStart & (PCI_AD[31:6]==(IO_address>>6)) &

(PCI_AD[1:0]==0) & ((PCI_CBE==PCI_CBECD_IORead) | (PCI_CBE==PCI_CBECD_IOWrite)); // When a transaction starts, the address is available for us to register // We just need a 4 bits address here reg [3:0] PCI_TransactionAddr; always @(posedge PCI_CLK) if(PCI_TransactionStart) PCI_TransactionAddr <= PCI_AD[5:2];

Now a few more registers to be able to claim the transaction and remember if it's a read or a write wire PCI_LastDataTransfer = PCI_FRAMEn & ~PCI_IRDYn & ~PCI_TRDYn; // Is it a read or a write? reg PCI_Transaction_Read_nWrite; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_Transaction_Read_nWrite <= 0; else if(~PCI_Transaction & PCI_Targeted) PCI_Transaction_Read_nWrite <= ~PCI_CBE[0]; // Should we claim the transaction? reg PCI_DevSelOE; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_DevSelOE <= 0; else case(PCI_Transaction) 1'b0: PCI_DevSelOE <= PCI_Targeted; 1'b1: if(PCI_TransactionEnd) PCI_DevSelOE <= 1'b0; endcase // PCI_DEVSELn should be asserted up to the last data transfer reg PCI_DevSel; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_DevSel <= 0; else case(PCI_Transaction) 1'b0: PCI_DevSel <= PCI_Targeted; 1'b1: PCI_DevSel <= PCI_DevSel & ~PCI_LastDataTransfer; endcase

Let's claim the transaction. // PCI_TRDYn is asserted during the whole PCI_Transaction because we don't need wait-states // For read transaction, delay by one clock to allow for the turnaround-cycle reg PCI_TargetReady; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_TargetReady <= 0; else

case(PCI_Transaction) 1'b0: PCI_TargetReady <= PCI_Targeted & PCI_CBE[0]; // active now on write, next cycle on reads 1'b1: PCI_TargetReady <= PCI_DevSel & ~PCI_LastDataTransfer; endcase // Claim the PCI_Transaction assign PCI_DEVSELn = PCI_DevSelOE ? ~PCI_DevSel : 1'bZ; assign PCI_TRDYn = PCI_DevSelOE ? ~PCI_TargetReady : 1'bZ;

Finally, the RAM itself is written or read, with the PCI_AD bus driven accordingly. wire PCI_DataTransferWrite = PCI_DevSel & ~PCI_Transaction_Read_nWrite & ~PCI_IRDYn & ~PCI_TRDYn; // Instantiate the RAM // We use Xilinx's synthesis here (XST), which supports automatic RAM recognition // The following code creates a distributed RAM, but a blockram could also be used (we have an extra clock cycle to get the data out) reg [31:0] RAM [15:0]; always @(posedge PCI_CLK) if(PCI_DataTransferWrite) RAM[PCI_TransactionAddr] <= PCI_AD; // Drive the AD bus on reads only, and allow for the turnaround cycle reg PCI_AD_OE; always @(posedge PCI_CLK or negedge PCI_RSTn) if(~PCI_RSTn) PCI_AD_OE <= 0; else PCI_AD_OE <= PCI_DevSel & PCI_Transaction_Read_nWrite & ~PCI_LastDataTransfer; // Now we can drive the PCI_AD bus assign PCI_AD = PCI_AD_OE ? RAM[PCI_TransactionAddr] : 32'hZZZZZZZZ; endmodule

Now we can read and write the PCI card!

Design considerations
1. The PCI_CBE byte enables are not used, so the software is supposed to issue only 32-bits transactions, aligned. 2. You might be surprised to find that the PCI "PAR" signal (bus parity) is not used either. While PAR generation is required for PCI compliance, its checking might not be because the PCs I have access to work fine without it... And since I cannot test it in real hardware, I omitted it. 3. The above code supports burst transfers, but current PC bridges don't seem to issue bursts (at least for the IO space). x86 processors have support for burst IO instructions (REP INS/OUTS) but they end up being broken into individual transactions on the PCI bus. Also I'm not sure if burst IO would require auto-incrementing the IO address, especially since

the REP INS/OUTS instructions don't. But as not incrementing has happy consequences on timing (more details below), I kept the code this way.

Issue IO read/write transactions
On PC, you use the x8086 "IN" and "OUT" processor instructions to issue IO transactions. Some compilers don't have native support for these, so you may have to use inline assembler functions. Here are examples for Visual C++: void WriteIO_DWORD(WORD addr, DWORD data) { __asm { mov dx, addr mov eax, data out dx, eax } } DWORD ReadIO_DWORD(WORD addr) { __asm { mov dx, addr in eax, dx } }

GUI PCI IO exerciser software
You can use this simple IOtest application to issue 32-bits IO reads and writes on a PC. That works directly on Win98/Me. Be sure to have GiveIO or UserPort running on WinXP/2K.

One important thing: free spaces return 0xFFFFFFFF on reads.

Timing considerations

Remember that PCI requires:

 

7ns/0ns Tsu/Th (setup/hold) constraint on inputs 11ns Tco (clock-to-output) on outputs

Most PCI cores are complex enough that the Tsu is impossible to meet without registering the inputs right in the IO blocks. Tco is also hard to meet without doing the same for the outputs. But these registers add latencies to the design. The above code is simple enough that IO block registers are not required. The code was tested using the Dragon board and Xilinx's ISE software. It gives something like:

Timing summary: --------------Timing errors: 0 Score: 0 Design statistics: Minimum period: 9.667ns (Maximum frequency: 103.445MHz) Minimum input required time before clock: 5.556ns Minimum output required time after clock: 10.932ns

Clock frequency was largely met (103MHz against 33MHz). Tsu was met by a large margin (5.556ns against 7ns) while Tco was barely met (10.932ns against 11ns) on the PCI_DEVSELn and PCI_TRDYn signals. Tco would not have been met on the AD bus if the IO address had to be auto-incremented on burst reads. Since the address is static, and since (for read cycles only) the PCI bus requires a turnaround cycle after the address phase, the data has an extra clock cycle to get ready. Without it, the Tco was around 13ns, so above the maximum 11ns. But with the extra clock cycle, we actually meet the timing by a 28ns slack (=margin), which is very comfortable. The only timing that was not met is the input hold-time (0nS), which was hopefully low enough (0.3nS for the worst violator). But Xilinx doesn't support a way to constraint the holdtime, maybe because using IO block registers guaranties "by design" (of the FPGA) a 0ns hold-time.

PCI logic analyzer
Now that we can issue read and write transactions on the bus, wouldn't it be fun to "see" how the transactions actually look like? Here's a very simple transaction that was captured with Dragon.

During the address phase, CBE is 0x3, which means "IO Write". It's an IO Write, data 0x00000000, at address 0x0200.

The FPGA as a PCI logic analyzer
Being able to see the bus operation can be interesting to:

  

Get a better understanding of its operation. Check the bus latencies within and in-between transactions. Do post-mortem analysis (if you have functional problems in your PCI core).

Looking at the signals usually requires expensive equipment, like bus extenders and logic analyzers. That can be tricky because the PCI specification doesn't allow more than one IO load on each PCI signal (per PCI card of course). That's because the bus is sensitive to capacitive loads or wire stubs that would distort the high-speed signals. But couldn't the FPGA act like a logic analyzer? The FPGA is already connected to the bus, and has internal memories that can be used to capture the bus operation in real time. Dragon has also a USB interface that can be used to dump out the PCI captures without disturbing the PCI interface implementation, even if the PCI bus "dies". The FPGA can also easily create complex triggers conditions that would outsmart most logic analyzers... what if you want to capture the 17th write after the second read at address 0x1234?

Capturing the PCI signals
We build a "state" (=synchronous) logic analyzer here. The signals captured are: wire [47:0] dsbr = { PCI_AD, PCI_CBE, PCI_IRDYn, PCI_TRDYn, PCI_FRAMEn, PCI_DEVSELn, PCI_IDSEL, PCI_PAR, PCI_GNTn, PCI_LOCKn, PCI_PERRn, PCI_REQn, PCI_SERRn, PCI_STOPn};

Just 48 signals!

Nice, fit perfectly in 3 blockrams if we choose a depth of 256 clocks. Implementation is easy: an 8 bits counter starts feeding the blockrams once a trigger condition is set, and another counter allows the USB to read the blockrams data. Logic was also added to allow some level of pre-trigger acquisition - details in the Dragon board files. The blockram outputs are muxed out to the USB controller in this order case(USB_readaddr[2:0]) 3'h0: USB_Data <= bro[ 7: 0]; 3'h1: USB_Data <= bro[15: 8]; 3'h2: USB_Data <= bro[23:16]; 3'h3: USB_Data <= bro[31:24]; 3'h4: USB_Data <= bro[39:32]; 3'h5: USB_Data <= bro[47:40]; 3'h6: USB_Data <= 8'h01; // padding, added for ease of implementation 3'h7: USB_Data <= 8'h02; // padding, added for ease of implementation endcase

and finally, with a USB bulk read command, the data is acquired and saved into a ".pciacq" file for further analysis.

PCI bus viewer
The software used to view the ".pciacq" file can be downloaded here. A sample ".pciacq" file is included, which is the result capture of this list of transactions: ReadIO_DWORD( 0x200 ); ReadIO_DWORD( 0x204 ); ReadIO_DWORD( 0x208 ); ReadIO_DWORD( 0x210 ); WriteIO_DWORD( 0x204, 0x12345678 ); WriteIO_DWORD( 0x208, 0x87654321 ); WriteIO_DWORD( 0x210, 0xDEADBEEF ); ReadIO_DWORD( 0x200 ); ReadIO_DWORD( 0x204 ); ReadIO_DWORD( 0x208 ); ReadIO_DWORD( 0x210 );

The software looks like:

One interesting thing: during a read turnaround-cycle, the AD bus shows the data of the previous read... see cycle 151 for example... no idea why.

More PCI bus captures
If we issue an IO write transaction that is not claimed by anybody, the bridge used here retries 12 times! See this WriteNotClaimed.pciacq file (the first IO Write is claimed, the subsequent one is not and gets retried many times). To view it, just un-zip and replace the original ".pciacq" file. See also this ReadNotClaimed.pciacq file.

PCI plug-and-play
Now that reads and writes accesses are going through, what does it take for the PCI plug-and-play to work?

Our PCI card is not yet in the list...

Configuration space
Remember that PCI cards have three "spaces" where transactions (reads and writes) take place? 1. Memory space 2. IO space 3. Configuration space The configuration space is the heart of PCI plug-and-play. The OS (Windows, Linux...) reads there first to find if PCI cards are plugged-in, and their characteristics. For simple boards, the configuration space consists of just 64 bytes. They important fields are: Offset 0 2 4 Name Vendor ID Device ID Command Function Manufacturer number Device number Turn on and off accesses to the PCI board Note ... allocated by the PCI-SIG ... allocated by the manufacturers themselves ... but configuration space accesses are always on Length 2 bytes 2 bytes 2 bytes

16

BAR0 (Base Address at which the PCI address register 0) board should respond

... followed by BAR1 through BAR5

4 bytes each

By implementing the right values and registers at these locations, the OS can "find" the PCI card.

Configuration space transactions
Each PCI slots as a signal called IDSEL. The IDSEL signal is not shared along the bus; each PCI slot has its own. When a PCI card sees a configuration space transaction on the bus, and its own IDSEL is asserted, it knows it should respond. parameter PCI_CBECD_CSRead = 4'b1010; // configuration space read parameter PCI_CBECD_CSWrite = 4'b1011; // configuration space write wire PCI_Targeted = PCI_TransactionStart & PCI_IDSEL & ((PCI_CBE==PCI_CBECD_CSRead) | (PCI_CBE==PCI_CBECD_CSWrite)) & (PCI_AD[1:0]==0); After that, it can be a read or a write but it works the same way than memory or IO spaces do. A few details:

   

For the Vendor ID, let's just pick a number; we are just experimenting, right? ok, 0x0100 works fine. Device ID can be left at 0 Command bit 0 is the "on/off" bit for the IO space, while bit 1 is the "on/off" bit for the Memory space. BAR0 is a register that is written by the OS, once it decides at which address the PCI card should be located.

There are a few other details left out, like some bits of BAR0 are read-only... Please refer to a PCI specification/book for the down-to-earth details.

Windows plug-and-play
Once these registers are implemented, the OS can discover the new hardware.

But the OS requires a driver before...

... it agrees to allocate the memory resource.

Links

Many interesting things on Craig's PCI & PnP ID's Pages

PCI software driver for Windows
Now that we need a driver for our PCI card, there are two ways to get it.

The easy way
The easy way consists on having someone else doing the hard work for you! Check out WinDriver. That's a commercial toolkit that can build a PCI plug-and-play driver solution for you in minutes. It works like that:

 

You run a wizard that detects your plug-and-play devices, including the PCI cards. You select your card of interest, give a name to your device and create an ".inf" file.

 

That's enough for Windows to be able to recognize the hardware and convince him that it should use WinDriver's driver. You quit the wizard, and go through Window's plug-and-play hardware detection to install the driver. Once the driver is installed, you run the wizard again, this time to build some example source code to access the PCI card.

WinDriver gives you 30 days to try it out. Windriver may be nice, but at $2000, that's expensive if all you want to do is experiment with PCI plug-and-play mechanisms.

The hard way
Use Microsoft Windows DDK and the Online DDK documentation.

Installing Windows DDK
The latest Windows DDKs releases are not free, while earlier incarnations (98/2000) were free to download. The DDKs are easy to install. For Win98 and Win2000 DDKs, first install Visual C++ 5.0 or 6.0, then the DDK itself. Then following the "install.htm" instructions to build a few sample drivers using the "build" command.

A minimum WDM Plug-and-Play driver
Here's the very minimum code required for Windows device manager to allocate the memory resource used by our PCI card. Since it's a WDM driver, it works in WinXP/2000/98. The entry point of a WDM driver is a "DriverEntry" function (like the "main" of a C program). Its main purpose is to publish addresses of callback functions. Our minimum driver just needs 2. NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) { DriverObject->DriverExtension->AddDevice = DevicePCI_AddDevice; DriverObject->MajorFunction[IRP_MJ_PNP] = DevicePCI_PnP; return STATUS_SUCCESS; }

A WDM driver creates at least one "device" (if your PC has multiple simular items, the same WDM driver may create multiple devices). Before the driver can create a device, we need a "Device Extension" structure. The structure is used by each device to store information. We can make it as big as we want, and a typical device will store many fields in there. Our minimum device just needs one field.

typedef struct { PDEVICE_OBJECT NextStackDevice; }

DevicePCI_DEVICE_EXTENSION, *PDevicePCI_DEVICE_EXTENSION;

What is this "NextStackDevice" for? a WDM implementation detail... WDM devices process IRPs ("I/O Request Packets", create/read/write/close...). WDM devices don't work alone but are assembled in logical "stacks" of devices. IRP requests are sent along the stack and are processed on the way. Stacks are created from bottom to top (bottom=hardware layers, top=logical layers). When a stack is created, each device attaches itself to the device just below. A device typically stores the info about the device just below himself in the Device Extension, so that later, it can forward along IRP requests. A device doesn't really know where it is in the stack, it just processes or forwards requests as they are coming. Anyway, now we can implement DevicePCI_AddDevice. It creates a device object and attaches the device to the device stack. NTSTATUS DevicePCI_AddDevice(PDRIVER_OBJECT DriverObject, PDEVICE_OBJECT pdo) { // Create the device and allocate the "Device Extension" PDEVICE_OBJECT fdo; NTSTATUS status = IoCreateDevice(DriverObject, sizeof(DevicePCI_DEVICE_EXTENSION), NULL, FILE_DEVICE_UNKNOWN, 0, FALSE, &fdo); if(!NT_SUCCESS(status)) return status; // Attach to the driver below us PDevicePCI_DEVICE_EXTENSION dx = (PDevicePCI_DEVICE_EXTENSION)fdo->DeviceExtension; dx->NextStackDevice = IoAttachDeviceToDeviceStack(fdo, pdo); fdo->Flags &= ~DO_DEVICE_INITIALIZING; return STATUS_SUCCESS; }

Finally we can process the Plug-and-Play IRP requests. Our minimum device processes only START_DEVICE and REMOVE_DEVICE requests. NTSTATUS DevicePCI_PnP(PDEVICE_OBJECT fdo, PIRP IRP) { PDevicePCI_DEVICE_EXTENSION dx = (PDevicePCI_DEVICE_EXTENSION)fdo->DeviceExtension; PIO_STACK_LOCATION IrpStack = IoGetCurrentIrpStackLocation(IRP); ULONG MinorFunction = IrpStack->MinorFunction; switch(MinorFunction) { case IRP_MN_START_DEVICE: // we should check the allocated resource... break; case IRP_MN_REMOVE_DEVICE: status = IRP_NotCompleted(fdo, IRP);

if(dx->NextStackDevice) IoDetachDevice(dx->NextStackDevice); IoDeleteDevice(fdo); break; } // call the device below us IoSkipCurrentIrpStackLocation(IRP); return IoCallDriver(dx->NextStackDevice, IRP); }

The START_DEVICE request is the one where we accept or refuse the memory resources. Here we don't do anything but forward the request down the stack, where it is always accepted.

Now, our device gets some memory resources, but doesn't do anything with them. To be more useful, the driver would need to:

    

Check the memory resources before accepting them Export a device name Implement some "DeviceIOcontrol" to communicate with a Win32 application Handle more IO requests ("IRP") ...

Get the code here. Your turn to experiment! You can get more sample code by studying the "portio" project in the Windows 2000 DDK for example.

Links
    
Jungo's WinDriver and CompuWare's DriverStudio toolkits Microsoft DDK and the Online DDK documentation The articles Surveying the New Win32Driver Model and Implementing the New Win32 Driver Model from the MSJ. Examples of NT4 style drivers: Kamel from ADP GmbH, DumpPCI from Microsoft Programming the Microsoft Windows driver model book from Walter Oney

PCI software driver for Linux
Fedora is an impressive Linux release. Microsoft should be worried...

Writing a Plug-and-Play PCI driver for Linux
It's actually easier than on Windows.

1. Create the init_module and cleanup_module
These functions are called when the driver is loaded or unloaded. int init_module(void) { return pci_module_init(&pci_driver_DevicePCI); } void cleanup_module(void) { pci_unregister_driver(&pci_driver_DevicePCI); } The "pci_driver_DevicePCI" structure is shown next...

2. Create tables describing the PCI board
#define VENDOR_ID 0x1000 #define DEVICE_ID 0x0000 struct pci_device_id pci_device_id_DevicePCI[] = { {VENDOR_ID, DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, {} // end of list }; struct pci_driver pci_driver_DevicePCI = { name: "MyPCIDevice",

id_table: pci_device_id_DevicePCI, probe: device_probe, remove: device_remove }; device_probe and device_remove are 2 callback functions, created next...

3. Create the "probe" and "remove" callbacks
int device_probe(struct pci_dev *dev, const struct pci_device_id *id) { int ret; ret = pci_enable_device(dev); if (ret < 0) return ret; ret = pci_request_regions(dev, "MyPCIDevice"); if (ret < 0) { pci_disable_device(dev); return ret; } return 0; } void device_remove(struct pci_dev *dev) { pci_release_regions(dev); pci_disable_device(dev); } That should be enough to allocate the memory resource... Thanks to Ian Johnston's help, I got the current files (for Fedora Core 2 - kernel 2.6) to compile. Build them using "make" followed by "insmod DevicePCI.ko" to load the driver, and "rmmod DevicePCI.ko" to unload it. Your turn to experiment!

Link
 
The Linux Device Drivers, 2nd Edition Online Book, and in particular the "Handling HotPluggable Devices" section of chapter 15. A nice Writing a PCI driver in 5+3 steps presentation