A few thingz

Joseph Basquin


An attempt to generate random data with audio (and your computer's built-in microphone)

You probably know that generating some real random data is not so easy to do with a computer. How to design a good Random Number Generator (or a pseudo-random one) is a math topic that you can work years on ; it's also something very important for real-life applications such as security/cryptography, for example when you need to generate strong passwords.

Usually (and this is true in general in cryptography), designing your own algorithm is bad, because unless you're a professional in this subject and your algorithm has been approved by peers, you're guaranteed to have flaws in it, that could be exploited.

But here, for fun (don't use it for critical applications!), let's try to generate 100 MB of good random data.

1) Record 20 minutes of audio in 96khz 16bit mono with your computer's built-in microphone. Try to set the mic input level so that the average volume is neither 0 dB (saturation) nor -60 dB (too quiet). Something around -10 dB looks good. What kind of audio should you record? Nothing special, just the noise in your room is ok. You will get around 20*60*96000*2 ~ 220 MB of data. In these 220 MB, only the half will be really useful (because many values in the signal - an array of 16-bit integers - won't use the full 16-bit amplitude: many integers "encoding" the signal might be for example of absolute value < 1024, i.e. will provide only 10 bits)

2) Now let's shuffle these millions of bits of data with some Python code:

from scipy.io import wavfile
import numpy as np
import functools

sr, x = wavfile.read('sound.wav')  # read a mono audio file, recorded with your computer's built-in microphone

L = []  # list of bits
for i in range(len(x)):
    bits = format(abs(x[i]), "b")  # get binary representation of the data
                                   # don't use "016b" format because it would create a bias: small integers (those not using
                                   # the full bit 16-bit amplitude) would have many leading 0s!
    L += map(int, bits)[1:]        # discard the first bit, which is always 1!

print L.count(1)
print L.count(0)  # check if it's equidistributed in 0s and 1s

n = 2 ** int(np.log2(len(L)))
L = L[:n]  # crop the array of bits so that the length is a power of 2; well the only requirement is that len(L) is coprime with p (see below)

# The trick is: don't use **consecutive bits**, as it would recreate something close to the input audio data. 
# Let's take one bit every 96263 bits instead! Why 96263? Because it's a prime number, then we are guaranteed that
# 0 * 96263 mod n, 1 * 96263 mod n, 2 * 96263 mod n, ..., (n-1) * 96263 mod n will cover [0, 1, ..., n-1].  (**)
# This is true since 96263 is coprime with n. In math language: 96253 is a "generator" of (Z/nZ, +).

p = 96263  # The higher this prime number, the better the shuffling of the bits! 
           # If you have at least one minute of audio, you have at least 45 millions of useful bits already, 
           # so you could take p = 41716139 (just a random prime number I like around 40M)

M = set()
with open('random.raw', 'wb') as f:
    for i in range(0, n, 8):
        M.update(set([(k * p) % n for k in range(i, i+8)]))  # this is optional, here just to prove that our math claim (**) is true
        c = [L[(k * p) % n] for k in range(i, i+8)]   # take 8 bits, in shuffled order
        char = chr(functools.reduce(lambda a, b: a * 2 + b, c))  # create a char with it

print M  == set(range(n))  # True, this shows that the assertion (**) before is true. Math rulez!

Done, your random.raw file is filled with random data!


My personal blog,
Joseph Basquin (Orléans, France)


Data / AI / Python consulting and freelancing.

Articles about:

Some of my projects:
BSQN Digital
Jeux d'orgues