# 248. Optimising speed in Codea

Codea is amazingly fast. For example, it can run the tan function (which is not simple) between 100,000 and 600,000 times per second, depending on your iPad.

This means you rarely have to worry about speed, in most projects. But if you push Codea so hard that performance falls below 60 frames per second, you may need to improve it. And because some things run faster than others, there are some tricks you can use, as shown below.

### 1. Don’t optimise

This surprising advice comes from someone who should know – the main developer of Lua, the language behind Codea – and many other experts agree.

What they mean, is that rather than fiddling about with optimising your code itself, you can usually get much better improvements by changing what your code is doing.

For example, if I am drawing 1,000 spaceships on the screen at once, and I am using readImage in the draw loop to get them from disk, I will get much better performance by reading the image into memory in the setup function, and spriting it in draw. If I want even better performance, I can use put the spaceships into individual meshes, and to improve it yet again, I can put them in a single mesh. Obviously, you need greater skills to do these things, so you should try to learn as much as you can, especially from other people and their code.

Similarly, if you are working in 3D, you may have hundreds of trees or other objects in a scene. Normally, you can leave it to OpenGL to culling (ie ignore) objects that are out of view of the camera, but if performance is suffering, there may be something you can do. Although OpenGL is quicker than Codea, it has to look at every vertex one at a time, whereas we can look at the whole object to decide whether to draw it or not. So if we have an object with 100 vertices, we can save OpenGL 100 vertex checks by testing if it is in the field of view or not. Even billboard objects, with only 6 vertices, are worth checking for visibility in this way.

So optimising what work you do is probably going to give you much greater improvements than optimising how you do it.

But if you’ve done all that, and your program still isn’t fast enough, read on.

### 2. Localise

Codea’s built in functions are held in a table named _G. Looking them up takes time and “localising” them can improve performance dramatically.

Suppose we have some code that uses math.sin and math.rad frequently.

s = math.sin(math.rad(angle))

We can make it run much faster (about 3x faster on my iPad!) by putting this “localisation” code first

local sin,rad=math.sin,math.rad

and then using this code for our calculation

s = return sin(rad(angle))

This is because of the way Lua stores references. Local variables can be looked up much more quickly than global variables.

Important note – there is no point “localising” every time we calculate sin, because Codea has to look up the math.sin function in _G before it can localise it. *So you don’t save any time by just using it once*. The trick is to localise once, and then use the local value many times.

So the best place to put the localisation code is often at the very top of the code tab where you are going to do the calculations (or maybe just before the function that is going to use the localisation). Then the localisation lookup will only be done once, when Codea starts.

Does the same applies to the functions we write ourselves, since they are also stored in the global table? I tested a function that used sin and rad, to see if it made a difference if it was put into a different tab to the code that used it, and whether it made a difference if it was localised, and the result was that it didn’t seem to make any difference at all. Go figure.

### Use shortcut functions

Lua has several shortcuts which are faster than math functions. These shortcuts date back to the early days when there were no such things as math libraries, and one character function codes saved space.

Each of these shortcuts is up to 2x faster

13.5//3 = 4 instead of math.floor(13.5/3) 3^4 = 81 instead of math.pow(3,4) 13%3 = 1 (remainder) instead of math.modf or math.fmod

But this one is about the same speed

4^0.5 = 2 instead of math.sqrt(3)

These results vary for different iPads, as you will see below.

### Loops and Tables

If you have a table T with items T[1], T[2], etc, then for i=1, #T do is more than twice as fast as for i, j in pairs(T) do.

If you are inserting items in a table T, then T[#T+1]=a seems to be about the same speed as table.insert(T,a). However, other testers have reported that the first method is faster.

### Things that don’t matter

The usual advice that multiplying is faster than dividing doesn’t seem to apply. There isn’t a significant difference.

### Speed of built in functions

I carried out tests on many built in functions, using an iPad Mini 1, and an iPad Air 2. The results are given below, as the number of times each function can run in one frame, which is 1/60 of a second. The bigger the number, the better.

Please note the speeds are only approximate, so ignore small differences.

Shortcut functions

First, the shortcuts I talked about above. This table compares them with the built in functions, for the two iPads, and then I compare the two iPads.

Trigonometry functions

This time, I’ve compared the two iPads.

Other math functions

Vector functions

All figures are for vec3, ie (x,y,z)

Can you see how slow the vector functions are, compared to the other functions? (of course, they are still very fast, considering these numbers are for just 1/60 of a second).

The relative slowness is because of the overhead of creating the vector, as Codea’s developer, Simeon, explains

The problem is that vec2 (and friends) are Lua “user data” objects, which are basically Lua-managed interfaces to C code and data. The problem is that performing operations on them needs to return a result, which needs to ask Lua to create a new user data. This causes Lua to allocate memory with its memory allocator. And that’s the source of the performance difference.

So I tried doing the calculations without vectors by “decomposing” (ie splitting) the vectors into three numbers and doing the math separately on each of them. So v1+v2 becomes x1+x2,y1+y2,z1+z2.

And this is the result, which is quite dramatic.

It means we can improve basic vector calculation speed by 5-6 times by decomposing the vectors. There isn’t much improvement for the length/distance functions.

The problem, of course, is that the main reason we use vectors is to make our code tidy, use the special vector functions, and avoid having to do calculations for x, y and z separately. We can possibly reduce the clutter by writing functions for our decomposed functions, like this

--normal vector code v1 = vec3(x1,y1,z1) v2 = vec3(x2,y2,z2) v3 = v1 + v2 --decomposed addition x3,y3,z3 = x1+x2, y1+y2, z1+z2 --using a function for the decomposed addition --to make the code a little more readable x3,y3,z3 = VecAdd(x1,y1,z1,x2,y2,z2) function VecAdd(x1,y1,z1,x2,y2,z2) return x1+x2, y1+y2, z1+z2 end

However, you still have to manage 3 times as many variables, and writing functions can halve the speed.

It seems that unless you abandon vectors completely and just work with x,y and z values (which could make debugging much more difficult), the best way to take advantage of the greater speed of decomposed vectors might be where you are doing a lot of calculations with the same vectors, in which case you can decompose them into x,y,z to start with, and then do all the calculations very quickly. This is similar to the suggestion for localising functions earlier, ie it’s not worth doing unless you have a lot of calculations to do.

### Finally

I’ll end by repeating the main points

- Codea is extremely fast, so you will rarely need to worry about performance
- Start by trying to reduce the work done by Codea, before you look at the code you are using
- Localise, localise, localise
- Use shortcut functions
- Think about the options for vector functions,

and, of course, save up for a faster iPad!