vendredi 14 mai 2010

Looking at Tegra2

Obviously, when you've reverse engineered Nvidia desktop GPUs and written a driver for those, you're always interested in knowing how new chip models look at the hardware level. And when an embedded variant like Tegra2 comes up it's logical to wonder how different from the mainline desktop GPUs it is (ok, it's only logical for guys like me, but whatever).

Well, it happens that Tegra2 has a bunch of interesting differences when compared to its desktop big brother (at least judging by the first reverse engineering I did of it).

The first and obvious weird design is that the nvrm module can be either a user space daemon or a kernel module now. Aside from the weird design this results in, a user space daemon makes reverse engineering easier which is a bonus for me.

Second, it seems like the beast only uses PIO objects (yes, like those which were last used on the Riva128 in 1997). There is no command fifo like on the desktop variant. The fifo machinery seen on desktop cards probably requires too many transistors and juice to function.

Third, all hardware GLES2 state registers can be read back, which means that you don't need to store the state twice in memory and in the hardware, but only in the hardware. It seems like this can be a big saving in driver size/complexity when implementing the glGet() functions and friends. «What is the current size of the viewport? Let me look at the hardware, one second» might sound strange at first, but makes sense in memory-limited environments.

So do I plan to make an open source Tegra2 driver? Not in the foreseeable future, I'm doing this just out of curiosity. But if you want to look at the findings to make your own driver, I've put up a git tree with the stuff online:
http://cgit.freedesktop.org/~marcheu/tegra/

samedi 24 avril 2010

DRI misunderstanding

In http://downloadmirror.intel.com/9871/ENG/relnotes_Linux_5_1.txt (the release notes of IEGD, the embedded graphics driver from Intel) we can read that:
«Due to the use of direct rendering technology, system designers should
take special care to ensure that only trusted clients are allowed to use
the OpenGL library. A malicious application could otherwise use direct
rendering to destabilized the graphics hardware or, in theory, elevate
their permissions on the system.»
Seems like Intel did not really understand the point of DRI, whose purpose is precisely to provide that kind of security (when compared to the older, user-space only approach).

I could also point out that this portion of the release notes is misleading, as malicious clients do not need the OpenGL library to exploit this security hole and achieve privilege elevation. In fact this would be achieved using a program acting like this library instead.

Linux graphics acceleration is still a long way off...

jeudi 28 janvier 2010

Introducing Fatgrind

Fatgrind is a new valgrind plugin I've been working on recently. It basically allows you to track the numerical accuracy of your codes without needing heavy instrumentation or a recompile. Basically you can use your binary code compiled with gcc -g and have it produce a list of source-level instructions causing the most numerical error in your computations. As opposed to other tools with a similar purpose, this one is not based on a static code analysis (as static analysis uses a model which always over-evaluates the errors, and therefore is not useful in real situations) but on a runtime analysis. In particular, this allows catching data-dependent issues which is the reason I wrote it in the first place.

It works as follows:
  • First, thanks to valgrind the code is analyzed and all floating point instructions are found and instrumented.
  • The floating points instructions are doubled: while they still get executed, a high-precision version (using GMP - the GNU MP Bignum library, a library for high precision computation) is executed at the same time.
  • At runtime, each floating point computation is compared with its high precision counterpart, and the resulting error (difference between the floating point result and the high precision result) is computed.
  • The error of each operation is added to the current instruction.
At the end, what you get is an annotated listing of code with the total amount of error that each instruction is responsible for. In short, you know what parts of your code are to blame for those pesky floating point errors.

Aint life great?

Not yet, as many things are still pending:
  • First and foremost, as valgrind plugins cannot use libc functions, but GMP does use them, I have to use a hacked GMP version which uses the valgrind internal functions. Not so clean. I need to take the relevant GMP files and have them compiled using the valgrind build system.
  • Second, I need to support more floating point instructions. I couldn't find any program that uses those extra instructions on x86/x86_64, but that doesn't mean these instructions won't show up. If computers teach you one thing, it's that shit happens. All the time.
  • Finally, I want to release this code and hopefully get it upstream into valgrind. We'll see how that goes.