In this post, I’ll look at how we can use the iPad camera to detect movement. This post is different in that I’m starting it before I know the ending, so it should be interesting.

Research

Looking around the net, it’s pretty clear (and obvious) that motion detection requires you to have a background image, and you then take pictures and compare them with the background image, to detect changes. It sounds simple, but there are quite a few problems to deal with.

Backgrounds can change
You can’t assume the background will never change, and that the first image you capture is the best reference. If, for example, you want to build a Wii type control system where you wave your hand at the iPad, the background is going to be the rest of your body, which is going to be moving about, and the iPad may not be perfectly steady, either.

What commercial software seems to do is gradually iterate towards the current image, ie add a small percentage of the current image to the background image, and over time, any change to the background will gradually be included. (This is probably why it is possible to beat motion detectors by moving very slowly!).

Pictures can be noisy
Background pictures aren’t always going to stay exactly the same. As I just said, if we are using the front camera to catch gestures, we won’t sit absolutely still. In addition, hand movements are noisy because we have fingers that wave about, and arms attached to our hands.

We may therefore need to set a noise “threshold” below which any change is ignored.

Pixel by pixel comparison is slow
It’s going to be hard to work in real time if we’re comparing every pixel, for each of r,g,b and a.

There seem to be two solutions to this. One is to use a single value for each pixel, one of which is luminosity. This could be measured as (r+g+b)/3, but I prefer the approach used for Gimp, described here, which is to calculate luminosity as 0.21r + 0.71g + 0.08b (weighted to green because our eyes are biased). So I’ll start with that.

The other obvious solution is to look at a sample set of pixels instead of all of them, eg every 10th pixel. How many? I don’t know yet.

First attempt at motion detection

Based on the above, I’m going to try the following:

• capture images in real time
• store the luminosity of a sample of pixels
• calculate differences in luminosity between background and current image
• compare with threshold and do any other analytical work
• update the background image to include some of the current image

.

Setup

So this is how I went about it. I started with the demo Camera app built into Codea, and my setup function looks like this (actually, I split some of it into a Settings function, but this is easier to follow):

```function setup()
cameraSource(CAMERA_FRONT) --choose camera facing me
lum_r,lum_g,lum_b=0.21,.71,.08  --luminosity weightings
pixelNoise=100 --luminosity difference allowed before a pixel is noted as a change
noiseThreshold=.05 -- fraction of changing pixels required to raise an alert
sampleSize=20 --examine every X pixels
iterationWeight=0.02 --weight to give new image when merging with background image
bgTable={} --table to hold sampled pixels for background
end
```

So I’ve set a threshold of 100 in luminosity. Any pixel changing by less that this won’t be counted. And I’m not going to get excited until at least 5% of my pixels are changing. Both of these are so I can ignore background noise.

I’m only looking at one in every 20 pixels, so if (say) the image were 800×400, that would be 40×20=800 pixels.

The iteration weight is used to merge the background picture with the latest image. If a pixel has changed, then I set its value in the new background image as 0.95 x the old value + 0.05 x the new value. This means I don’t give the new value too much weight, but if it stays there long enough, it will gradually replace the old value.

Draw

The draw function looks like this:

```function draw()
background(220)
while camWidth==0 do --see below
img=image(CAMERA)
camWidth, camHeight = spriteSize( CAMERA )
screenCamWidth=math.min(WIDTH,camWidth)
return
end
--update picture, draw if required
FPS=FPS*.9+.1/DeltaTime
sprite( CAMERA,WIDTH/2,HEIGHT/2,screenCamWidth)
img=takePhoto()
if img~=nil then sprite(img,WIDTH/2,HEIGHT/2,screenCamWidth) end
end
```

There is something I need to explain here. The demo Camera app starts off by setting camWidth and camHeight to the size of the camera image. The problem with this is that it doesn’t work, especially if you aren’t drawing the photo on the screen – until about the third time through the draw function, anyway. You don’t notice this in the demo app because it all happens pretty fast, and all that app does is draw the image on screen.

But what I had to do is start the draw function with a while loop that just kept trying to get the image dimensions, until it worked. After that, the draw function runs the takePhoto function, and also draws the latest photo on the screen. takePhoto may also return an image showing which pixels have changed, and this can be overlaid on the latest photo, so you can see what the program is identifying as changes.

Analysis

The really important stuff happens in the takePhoto function, which analyses the latest images for differences against a set of stored background pixels.

So here is the first part of takePhoto

```function takePhoto()
local img=image(CAMERA) --take picture
if Picture then local img2 end --secondary image overlaid to show moving pixels
--fill background table if this is the initial background image
if #bgTable==0 then
for i=1,camWidth,sampleSize do
for j=1,camHeight,sampleSize do
local r,g,b=img:get(i,j)
--calculate luminosity
--use sequential table because we'll always work through it in the same order
bgTable[#bgTable+1]=r*lum_r+g+lum_g+b*lum_b
end
end
--also set number of pixels that can be changed before we create an alert
pixelChangesAllowed=noiseThreshold*#bgTable
--initialise directional variables
--these store the average (centre) of the changing pixels
lastX,lastY=0,0
```

So this chunk of code sets up the initial background image data, if it hasn’t been done already. It loops through the image pixels by row and column, storing the luminosity of every Nth pixel, as specified by sampleSize.

The variables lastX and lastY are initialised, and this is a good time to explain how they are used. If we do our analysis and find a whole mess of pixels have changed, that may be enough for a burglar alarm, but we may want to know direction, if we want to use the camera as a game controller. How do we figure out which way a bunch of changed pixels are moving?

I do it by calculating the average x,y value of the changed pixels, and comparing it to the average x,y position from the last frame we processed. So if our x value is much bigger than last time, but y is much the same, I know the movement is to the right.

```        --if we want to show changed pixels on screen,
--create an image to draw on  ---------1
if Picture then img2=image(camWidth,camHeight) end
pushStyle()
fill(255,255,0)
if Picture then setContext(img2) end
--analysis starts here  --------2
local n,count=0,0 --n marks place in table, count adds up changed pixels
local weightX,denomX,weightY,denomY=0,0,0,0 -- weighted centre of moving pixels
for i=1,camWidth,sampleSize do
for j=1,camHeight,sampleSize do
local r,g,b=img:get(i,j)
--calculate luminosity
local lum=r*lum_r+g+lum_g+b*lum_b
n=n+1
--compare with background
local u=math.abs(bgTable[n]-lum)
if u>pixelNoise then
count=count+1
--calc weighted ave of x and y of above-threshold changes
weightX=weightX+u*i denomX=denomX+u
weightY=weightY+u*j denomY=denomY+u
if Picture then ellipse(i,j,3) end --draw marker on secondary image
end
--mix the background and current image  --------3
--this is so the background gradually iterates toward the current image
--allowing any changes in the background to gradually be included
bgTable[n]=bgTable[n]*(1-iterationWeight)+lum*iterationWeight
end
end
AlertLevel=count*10/#bgTable --the parameter value roughly indicating activity level
popStyle()
if Picture then setContext() end
```

There are several stages in this code, which I’ve numbered like this ———1

The first stage is that if I’m going to overlay a picture showing dots where there was a change, then I need to create an image, so that what the first part of the code does. (The variable Picture is a parameter you can turn on and off).

The next stage is to loop through the new image in the same way we did for the initial background image, comparing the luminosity with the stored background figure. If it exceeds the threshold, we increase our count, and also add to our weighted totals that will be used to calculate average x,y. I am weighting the x,y position by the size of the change, so the big changes have more effect on the overall average.

The third stage, once we’ve gone through all the pixels, is to update the background image by mixing it with the new image, as described earlier.

Finally, I set an alert level, which is just a crude burglar alarm. It just says “hey, something’s happening here”, but we want a bit more than that.

Unfortunately, this is where it gets really hard.

Because if we want to use this to detect specific things like hand movements, we have some way to go, because

• it’s slow – the frame rate is down below 20, which makes it tough to include a game
• we need to pick up the direction of movement
• hand movements can be very noisy, not least because there is an arm attached

.
I’ve linked to code here which tries to pick up the direction of a swipe by calculating the centre of the moving pixels and seeing how they change from one redraw to the next. The program shows the moving pixels as yellow dots.

But this only works well if the only moving pixels are around your hand, and often, other things will be moving as well, and that distorts the average centre of the movement. Also, if your hand wanders off the edge of the screen and there are just a couple of moving pixels on the other side of the screen, they will become the new average suddenly, and your centre jumps all the way over.

I tried another approach of “training” the program to recognise a hand by getting the user to take a picture of their hand, so the program can try to choose moving pixels that belong to a hand rather than anything else. But this is really difficult, and will take more work.

So I’m sorry I can’t give you a nice solution, but life is often like that.

From → animation, Games

Thanks for this project, it’s really helpful. I had written an app to manage my Foscam Home cameras and adding it motion detection within app might be fun.

• That’s good. I’m sorry the code isn’t cleaner but I was still trying to find a good solution, and you know how it is, you try this and you try that, and your code suffers.

Basic motion detection is relatively easy, but tracking something like hand movement is way more difficult.