< Back

Assignment 2: MD5


MD-5 is a cryptographic hash function that is commonly used to compute message digests. These digests are typically used in order ensure data integrity, by comparing the received message to the digest. As long as the hash function is secure, and the digests match, we can have some assurance that the message was unaltered in transit.

The MD5 algorithm is quite straight-forward to implement, just based on the standard specifications. In this assignment, you are going to implement the full protocol, using C++.


Details

MD5 consists of 64 rounds. These 64 rounds are grouped into four 16 round groups, where each group used a unique function F. There are a handful of items that must be hard-coded in your programs. The first set are the initial values of the output vector:

A = 0x67452301
B = 0xefcdab89
C = 0x98badcfe
D = 0x10325476

These values make up the 4 initial inputs to the entire algorithm, described in detail below.

Next, the per-round shift amounts, where r is the round number, zero based:

Round Group Round F Message Index Shift Group Shift Amount
0 ≤ r ≤ 15 (B & C) | (!B & D) g = r r % 4 == 0 7
r % 4 == 1 12
r % 4 == 2 17
r % 4 == 3 22
16 ≤ r ≤ 31 (B & D) | (C & !D) g = (5 × r + 1) % 16 r % 4 == 0 5
r % 4 == 1 9
r % 4 == 2 14
r % 4 == 3 20
32 ≤ r ≤ 47 B ⊕ C ⊕ D g = (3 × r + 5) % 16 r % 4 == 0 4
r % 4 == 1 11
r % 4 == 2 16
r % 4 == 3 23
48 ≤ r ≤ 63 C ⊕ (B | !D) g = (7 × r) % 16 r % 4 == 0 6
r % 4 == 1 10
r % 4 == 2 15
r % 4 == 3 21

And finally, you need an array K of size 64. This array contains a unique value added into each round of the algorithm. These values can be computed using the sin function as follows:

  K[i] = floor(abs(sin(i + 1)) * pow(2, 32));

The MD5 algorithm begins by padding out the length of the message until its length (in bits) is divisble by 512. This is done by appending a 1, followed by 0's until the length + 64 is divisble by 512. The last 64 bits are filled with the length of the original message, modulo 264.

Figure 1: MD5 Single Round

Figure 1 shows a single round of MD5. F, Ki, and s are defined as above. The index i for M is g from the table above. The red boxes represent addition modulo 232. Copies of the current values of A, B, C, and D are used when processing a block of 512 bits.

This round function is executed 64 times per 512 bits of the message. After all 64 executions, the values of A, B, C, and D are then added to the global values of A, B, C, and D. It is sequentially executed on every block of 512 bits.

One VERY important thing to note, is that values in MD5 are read in (and processed) in little endian format. See class notes for details.


Specifics

Write a C++ program that compute the MD5 sum of a block of text. Your program should read this block of text from standard input, and write the MD5 sum to standard output in hexadecimal. You can handle the internals of the program however you feel, but the input and output must operate as specified. You can redirect input for your program as follows:

  $ cat empty
  $ ./md5 < empty
  d41d8cd98f00b204e9800998ecf8427e
  $ cat a
  a
  $ ./md5 < a
  0cc175b9c0f1b6a831c399e269772661
  $ cat abc
  abc
  $ ./md5 < abc
  900150983cd24fb0d6963f7d28e17f72
  $

This assignment is due Thursday, October 27th by the beginning of class. You must write this program in C++. Other languages will not be graded. You should follow all typical code conventions, which can be found here course website. Specifically you should avoid memory leaks and magic numbers, and include Pre and Post conditions for all defined functions. Submissions will be made through inquire.roanoke.edu.

Notice

This is a big project that can be difficult to debug. Towards that end, I have written a version of MD5 that might help you debug some issues. It is accessible on cs.roanoke.edu in the ~cssmith/MD5 directory. It outputs additional useful information for you to use.