OpenGL MD5 cracker

Entering the world of GPGPU by brute-forcing MD5 hashes using OpenGL compute shaders. The exhaustive search can be restricted to a certain character range.

An MD5 checksum computation on short strings with output comparison requires only few data to be transferred, but a high amount of (unfortunately, discrete) calculations. Otherwise, brute-forcing hashes seems a natural fit for GPGPU, as it consists of many but easy tasks that are virtually independent from each other. OpenGL compute shaders offer a simple interface for many independent runs in parallel and the general MD5 algorithm is implementable with little effort.

Given an ever-incrementing prefix, the hashes of three digits as suffix are computed and compared with the target for each invocation. For N chars to try, this results in N3 glDispatchCompute shader work groups. To further increase parallelism, each work group consists of 32 invocations/threads, as actually 32 prefixes are pre-determined and passed at once. Inputs are bound to uniforms – whereas upon success, the result is written to a one-dimensional texture, which is checked periodically. A built-in selftest at startup ensures functionality, but also estimates the number of rounds to draw at once in order to not block the GPU for an unreasonable amount of time.

Cracking MD5 hashes

Building is simply done via make and requires headers and libraries for the -lGLEW, -lglfw, and -lGL linker flags (installable e.g. via apt-get install libglew-dev libglfw3-dev). Additionally, -lcrypto (OpenSSL) is needed for an MD5 reference implementation during the built-in startup selftest.

The resulting binary runs on a target hash in 32 byte hex representation as argument. Optionally, a custom character search space can be given – per default all 95 “printable” chars from 0x20 to 0x7E are used. Progress and hashing rate will be shown periodically.

make && ./ogl_md5 "ef238ea00a26528de40ff231e5a97f50" "" "0123456789abcdefghijklmnopqrstuvwxyz"
[●] starting at seed 0 ''
[●] looking for a08e23ef 8d52260a 31f20fe4 507fa9e5...
[●] initializing shader ...
[●] creating textures ...
[●] using 1492992 (36^3 * 32) shaders instances
[●] performing selftest ...
[○] testing: '4s3pk5' (311bc80b 402f8407 4935ad39 d88379b4)
[○] iterations: 1 (1492992 draws, 6 msec, 248832000 H/sec)
[○] testing: 'kplbow' (ab7e6e1a ffc0665d 9f21d9b2 fa4d10fd)
[○] iterations: 512 (764411904 draws, 1766 msec, 432849312 H/sec)
[✔] success: 764411904 draws in 1766 msec, 432849312 H/sec
[●] running: '∗∗∗' (<16384)
[○] current speed: 432604352 H/sec, overall: 432604352 H/sec
[●] running: '3mb∗∗∗' (<32768)
[○] current speed: 432359680 H/sec, overall: 432481984 H/sec
[●] running: '79o∗∗∗' (<49152)
[○] current speed: 432604352 H/sec, overall: 432522752 H/sec
[✔] result: 'foo123' (316f6f66 00003332 00000000 00000000) in 5 seconds
[✔] verified: ef238ea0 0a26528d e40ff231 e5a97f50

For resuming or distributed computations, restoring state from a prefix as start seed is supported – here empty to start from scratch.

Example results

About 500 MH/s (megahashes per second) can be achieved with a relatively low-end RTX 3060 LHR GPU on a Core i5-12600K machine. Given this performance, the runtime for a full exhaustive search can be estimated as follows:

DigitsCharsDuration
4a-z 0-91 sec
a-z A-Z 0-91 sec
all 951 sec
6a-z 0-95 sec
a-z A-Z 0-92 min
all 9530 min
8a-z 0-990 min
a-z A-Z 0-95 day
all 955 mon

As expected, the time needed grows quickly (exponentially) with the length and number of permissive characters. Further restricting the search range by prior knowledge is thus advisable.

Overall, this seems not a big leap in the long run when considering CPU-only or other GPU-driven implementations – the relatively naïve by-the-book shader implementation could surely be further optimized, though.

Code & Download