Having fun with python searching a neeedled signal in a random haystack

Mardi 29 novembre 2011, par julien // un peu de technique
Version imprimable de cet article Version imprimable


0 vote

I’am not a CS graduated, I don’t care about frameworks, and new technology, I just care about having an automata do my work. In this article the objective is to try to both present non computer point of view of coding, and also all the fun there is in coding to create castles from pure air.

I guess this article about python violates most of the tao

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.

Coding is not always about purity. Because fun beats purity :)

Foreword

The code that is presented in this article belongs to the almighty GCU [1] church. It is ugly, is is mean and it works for me© but there is a reason for that : before factoring your code, first try to discover your problem ....

The story before the code

A long time ago I was striding in the lab building in an university and I eardropped a conversation of a technician working on some of the first CCD used in astronomy. At this time (in the 90’s) CCD were bleeding edge and people used to do their own hardware. And the guy told me that in his former job at the astronomic lab of Marseilles they did make a very efficient filter not very costly that would extract a signal in noise with a NR of 10 (signal amplitude being tenfold less than the noise amplitude) and that they prevently patented it for it not to be used. It consisted as I understood to take the median value in a moving window.

Why did they need this algorithm ?

  1. electronic in very sensitive instrument tends to generate noise in the form of thermal noise which’s signature is one of a white noise ;
  2. signal happens as a random event, but constantly, therefore FFT won’t work.

And I wondered how the heck could such a simplistic solution works ?

Years passed, I forgot it, and discovered matplotlib, and decided to give it a try since wikipedia had no hint on this soution, I had to give it a try.

If you are used to prototyping then plan to throw one away, cause you’ll do it anyway. The following code will be therefore perlish, very unpythonistic and it will make your eyes bleed as much as mine when I re-read it, but it so more much fun doing stuff that are nice :)

Since I did it without internet on the train (where I coded this), it is made with some quircks (which explains the stupid use of 3D plots when 2D works well and the absence of title and legends).

Let’s have some fun

Ploting in python

#!/usr/bin/python
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import random
from math import sin, cos, pi
from mpl_toolkits.mplot3d import Axes3D
import sys

import random

from collections import Counter
SOF_SPREAD = 10
SOF_SAMPLE = 4000
noise_amplitude = 4.0
PERIOD = 200
PRESENCE_PROBABILITY = .3
def D( msg ):
   pass

fig=plt.figure()
ax=Axes3D(fig)
noise = [  ( .5 -  random.random() )  * noise_amplitude for k in range( 0, SOF_SAMPLE ) ]
signal  = [ 0 ] *  len(noise)

for i in range( 0, SOF_SAMPLE ):
   ## Square signal
   signal[ i ] = ( ( i % PERIOD )  > ( PERIOD >> 1 )  and .5 or -.5 )  
   ### muxing
   if random.random() < PRESENCE_PROBABILITY:
       noise[ i ] = signal[ i ]


ax.plot(range(0, len(noise) ), noise,color='blue')
ax.plot(range(0, len(noise) ), signal , color ='yellow' )

plt.show()

Simple plot legend In blue the resulting curb in yellow the signal.

Madness’s one step beyond

Now in discrete signal processing we learnt some ways of treating signal we canonically use the Moving Average to get rid of the noise. So let’s have fun : and code 2 filters :

def gen_arma( sof_spread ):
   """ generate a moving average for a size of spread (I cheat it is 2 * sofspread + 1  since it is easier to code)"""

   D( "sof_spread %d" % sof_spread )
   def arma(an_array, index):
       lower =  index - sof_spread
       lower = lower > 0 and lower or 0
       upper = ( index + sof_spread > len(an_array ) ) and len( an_array )  or index + sof_spread
       D("i %d l %d u %d" % (index, lower, upper) )
       sample_size =  upper - lower
       local_splice = an_array[ lower : upper ]
       res = 1.0 * sum( local_splice ) / sample_size  
       D("%r" % res)
       return res

   return arma
def gen_median(sof_spread):
   """ generate a median filter  for a size of spread (I cheat it is 2 * sofspread + 1  since it is easier to code)"""

   def median(an_array, index):
       lower =  index - sof_spread
       lower = lower > 0 and lower or 0
       upper = ( index + sof_spread > len(an_array ) ) and len( an_array )  or index + sof_spread
       sample_size =  upper - lower
       local_splice = an_array[ lower : upper ]
       local_splice.sort()
       D("i %d l %d u %d size %d" % (index, lower, upper, sample_size) )
       
       return local_splice[ sample_size >> 1  ]
   return median

Well now we need to see what is more efficient : moving average or median filter.

.....

No physics exists without measure

def array_distance(this, that):
   if len(this) != len(that):
       raise( "arg! size matters")
   return sum( [  abs( 1.0  * (a - b) )  for a, b in zip( this, that) ] )  / len(this)  

At this point, any matplotlib user, pythonistas, scientists are chuckling in anger : I don’t use numpy, I dont write pythonistic code, I dont write fortran like code.

Well, this is just dirty code to explore the problem, heavy refactoring for this code will be needed for this code to be usefull :

  1. DRY : for instance splicing is common to all my solution so I should mutualize it ;
  2. a map on the signal would be a more idiomatic way of filtering filter ;
  3. numpy would be an improvement in performance in readability and in functionalities.

But I have the physicist answer to all these people : you are right and wrong :

  1. don’t let code stand between what you want to do, and code the way you are sure of your result ;
  2. early optimization is root of all evil ;
  3. rather have obvious defaults that ease the refactoring than pseudo smart code.

If all of these people were to recode this, the code is straight forward to be easily fixed. [2].

Back to fun

Now, for the aforementioned condition let’s find an optimum by adding a 3D plot of distance from filtered noise to the signal in regard to the value of size of the window, and the presence probability, based on an ugly code derived from the following code :

distance_median=[]
distance_arma = []
tosee = range(0,( SOF_SAMPLE  / SOF_SPREAD )>> 1 )
for multi in tosee:
   
   medianer = gen_median( SOF_SPREAD * (  multi + 1   )  )
   
   medianed = [ medianer( noise , i ) for i in range(0, SOF_SAMPLE)  ]
   distance_median  +=  [ array_distance(medianed,  signal) ]
   armater =  gen_arma( SOF_SPREAD * ( multi + 1 ) )
   armated =  [ armater( noise, i )   for i in range(0, SOF_SAMPLE) ]
   distance_arma  += [ array_distance(armated, signal)]

print "min for distance_arma is %r" % ( (distance_arma.index( min(distance_arma ) )+ 1  ) * SOF_SPREAD )
print "min for distance_median is %r" % (( distance_median.index( min(distance_median ) )  + 1 )  * SOF_SPREAD )
ax.plot(tosee, distance_median, color = 'green')
ax.plot(tosee, distance_arma , color = 'yellow')
plt.show()

And what do we have ? ....

Distance between signal and filtered result (z) in function of (x) size of the window, (y) the probability of presence legend Moving average in red, median filter in green.

Tadadam : Our filter surpasses Moving Average for a signal interlaced in noise with an amplitude of 8 for all presence of probabiity between 0.1 and .3

Furthermore : our optimal finder has a side effect : it is able to tell us with a better sensitivity when a signal is encoded in noise (if no signal is present you won’t have extremum).

That’s when coding is fun : having fun gives results ! And, simple algorithms works too ! Simple takes time to imagine, but results make you childishly enthousiastic.

Now let’s vizualize the optimum given by the program for p = .1 and p = .3. For p = 0.1

Filtered signal for probability if signal presence = 0.1

Filtered signal for probability if signal presence = 0.3

legend : Input of filter is blue Input signal muxed in noise is yellow MA result is green Median filter is red

Median filter works best in both case

Well, now, more fun is to come :
- can I make an even better filter ? [3]
- what happens with a continuous signal (sinusoid) ? [4]
- what is the relationship between size of the window and size of the period ? [5]
- How would I write nice code with this ? [6]

Conclusion

Matplotlib is good and fun : come get some.

And always separate the way of coding proof of concept and production code. I would never let this code slip in production. And I would never accept a POC which size would an order of magnitude bigger. POC has to stick to the point.

If you liked this, I may prepare an article proving why forbiding (Grece’s) bankrupty is a way of making some people richer without benefiting the community with matplotlib and a multi agent simulation https://github.com/jul/KISSMyAgent/wiki.

Yes, we can do politics with code ! And I would enjoy trying to do it. (But it is against a lot of coder’s magazine policy)

P.-S.

Being dylsexic is a pain. I must have missed quite a few mistakes but this is in -no way- meant to be disrepectful to the readers and I will correct any mistakes as soon as pointed.

Répondre à cet article


Mots clés