Filtering Primes


This is an attempt to explain the pattern I've found in prime numbers. More accurately, what I've found is not a pattern in the prime numbers themselves, but in the filtering out of composite numbers when searching for primes. I have no idea whether or not this has been done before. I assume so, but I have yet to find any references to it, so I write this out as my own work at this point.

This algorithm uses the Sieve of Eratosthenes to eliminate composite numbers from a set of natural numbers greater than 1 and less than a pre-defined limit; leaving us with a set of primes.

What I wanted to do with the first iteration of this is see if there is any pattern in the differences between numbers which are being unflagged that are not already unflagged. The first sequence in this case is obvious, as we're pulling out every multiple of two. The difference between each of the numbers we pull out is thus two, so we see the set {2, 2, 2, 2, 2.....}. When removing our next prime number, three, we have an equally simple series. We remove every odd multiple of three (as the even multiples have already been removed, being multiples of two). So we see the difference between them is {6, 6, 6, 6, 6, ...}.

When we start filtering out the next series, multiples of five, we see it becomes slightly more complicated. The first one we remove is 25, followed by 35, 55, 65, 85, 95, 115, and so on. The series of differences between them then is {10, 20, 10, 20, .....}. The sequences get more complicated with each one we filter out. When filtering out multiples of seven, for instance, we get this sequence: {28, 14, 28, 14, 28, 42, 14, 42} which repeats indefinitely.

The first thing that stands out is that all of these numbers are divisible by the number that is being filtered out. This makes perfect sense. We're filtering out multiples of the number, so of course the differences between those multiples will be divisible by it as well. The next thing to look at then is what those sequences look like when they are divided by the number that's being filtered out. This is where some interesting patterns start to emerge.

In order to see such patterns, I have written a program that goes through using the aforementioned method of finding primes, and keeps track of the differences between the numbers it unflags. It prints those differences out in a table like the one below. The one shown here is in fact the output of that program, but formatted to work a little better in an HTML document. A few things to note about this table:

Table A
P Delta / P
211111111111111111111111111111111111111111111111111111111111111111111111111111111...
322222222222222222222222222222222222222222222222222222222222222222222222222222222...
524242424242424242424242424242424242424242424242424242424242424242424242424242424...
742424626424246264242462642424626424246264242462642424626424246264242462642424626...
11242462642466264264684242486462462664246264242A2A242462642466264264684242486462462664246264242A2A...
13424626424662642646842424E462A26642462A242CA2424626466626426468424686A246266424
1724626424662642646842424E462A2664662A242CC42462A666264264
194626424662642646842424E462A2664662A242CC42462A66
23626424662642646842424E462A2664662A242C
2926424662642646842424E462A2664
316424662642646842424E462A26
37424662642646842424E4
4124662642646842424
434662642646842424
47662642646842
5362642646
59264264
616426
6742

You may note that the first sequences form repeating patterns. I have in fact highlighted the repeating segments to make this clear. I am quite certain that they all form repeating patterns, but the length of them becomes gargantuan extremely quickly. In fact, I tried to find the lengths of the first few patterns by observation, and here is what I have:

Table B
Prime:Pattern Length:
21
31
52
78
1148
13480
175760
1992160
231658880

In order to get the length of the pattern for #23, I had to test up to two billion primes. To count any higher than that, I will need to rewrite the program so that it does not require as much memory.

An interesting thing to note about those numbers is that each one is divisible by the preceding ones. In fact, if we divide each number by the preceding one, then we get some interesting results:

Notation
Symbol:Meaning
P(x) The Xth prime number, so P(1) = 2, P(2)=3, P(3)=5, P(4)=7 and so on.
P'(x) The rate of change of P(x), so P'(x) = P(x) - P(x - 1).
L(x) The length of the repeating pattern we see when filtering out P(x)
Table C
x: P(x): P'(x): L(x): F(x)=L(x)/L(x - 1) F'(x) = F(x) - F(x - 1)
1 2 1
2 3 1 1 1
3 5 2 2 2 1
4 7 2 8 4 2
5 11 4 48 6 2
6 13 2 480 10 4
7 17 4 5760 12 2
8 19 2 92160 16 4
9 23 4 1658880 18 2

At first glance the numbers in the rightmost column may seem unrelated. If we look a bit closer at them though, we can see that they are the same pattern as the differences between prime numbers. That is to say, the values in last column of the table above are equal to P(x - 1) - P(x - 2). I do not have a direct proof of this, and would need to see more data to be satisfied with it by observation, but from the data we have above, this does appear to work.

If that does work, then we can say that L(10), the length of the repeating series for P(10), would be 1658880 * (18 + 4) = 36495360. L(11) would be L(10) * (22 + 6) = 1021870080. Being already into the billions at that point, demonstrating it on a personal computer becomes increasingly difficult. Looking at those lengths as a series of equations, we get the following:

L(1) = 1
L(2) = L(1) * 1
L(3) = L(2) * [1 + P'(1)]
L(4) = L(3) * [1 + P'(1) + P'(2)]
L(5) = L(4) * [1 + P'(1) + P'(2) + P'(3)]
L(6) = L(5) * [1 + P'(1) + P'(2) + P'(3) + P'(4)]

To summarize those equations, we can say:

That's fairly interesting, but what really catches my eye is the fact that we see the same sequence of numbers occurring over and over again. To make that sequence easier to see, it is best if presented in the plain text format output by the program. I will go so far as to truncate some of the longest series, but the rest will remain untouched. All of the sequences ending with "..." are truncated. One thing I did in this program was to line up the output sequences so that common patterns are aligned with each other. That way it is easier to spot relationships between them.

If you would like the source code that produces this output, it is available here. I've been compiling it with gcc using:

gcc -Wall primepattern.c -O3 -o primepattern
This should work on any complier you use, as it only uses standard libraries.

Table D

[   2]0111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111...
[   3] 022222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222...
[   5]  02424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424...
[   7]   0424246264242462642424626424246264242462642424626424246264242462642424626424246264242462642424626424246264242462642424...
[  11]    0242462642466264264684242486462462664246264242A2A242462642466264264684242486462462664246264242A2A24246264246626426468...
[  13]     0424626424662642646842424E462A26642462A242CA2424626466626426468424686A246266424626426A2A242468424C2642646C2424864624...
[  17]      024626424662642646842424E462A2664662A242CC42462A666264264E424686A2462666462648A2A242468424C842646C242C6466626A24626...
[  19]       04626424662642646842424E462A2664662A242CC42462A6662642AE424E6A246266646848A2A242468424C84846C26C6466626A24626642CA...
[  23]        0626424662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C26C6A6626A6626642CA2466...
[  29]         026424662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  31]          06424662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  37]           0424662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  41]            024662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  43]             04662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  47]              0662642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  53]               062642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  59]                02642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  61]                 0642646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  67]                  042646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  71]                   02646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  73]                    0646842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  79]                     046842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  83]                      06842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  89]                       0842424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[  97]                        042424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 101]                         02424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 103]                          0424E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 107]                           024E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 109]                            04E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 113]                             0E462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 127]                              0462A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 131]                               062A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 137]                                02A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 139]                                 0A2664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 149]                                  02664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662...
[ 151]                                   0664662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642CA24662
[ 157]                                    064662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6626642C
[ 163]                                     04662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A66266
[ 167]                                      0662A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626A6
[ 173]                                       062A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A6626
[ 179]                                        02A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6A
[ 181]                                         0A242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2I6
[ 191]                                          0242CC42462A6662642AE424E6A24686646848A2A26468424C84846C2
[ 193]                                           042CC42462A6662642AE424E6A24686646848A2A26468424C84846
[ 197]                                            02CC42462A6662642AE424E6A24686646848A2A26468424C8484
[ 199]                                             0CC42462A6662642AE424E6A24686646848A2A26468424C848
[ 211]                                              0C42462A6662642AE424E6A24686646848A2A26468424
[ 223]                                               042462A6662642AE424E6A24686646848A2A264
[ 227]                                                02462A6662642AE424E6A24686646848A2A26
[ 229]                                                 0462A6662642AE424E6A24686646848A2A2
[ 233]                                                  062A6662642AE424E6A24686646848A2
[ 239]                                                   02A6662642AE424E6A24686646848
[ 241]                                                    0A6662642AE424E6A24686646848
[ 251]                                                     06662642AE424E6A246866468
[ 257]                                                      0662642AE424E6A24686646
[ 263]                                                       062642AE424E6A246866
[ 269]                                                        02642AE424E6A2468
[ 271]                                                         0642AE424E6A2468
[ 277]                                                          042AE424E6A246
[ 281]                                                           02AE424E6A24
[ 283]                                                            0AE424E6A24
[ 293]                                                             0E424E6
[ 307]                                                              0424
[ 311]                                                               024
[ 313]                                                                04

Now the repeating series shows itself quite clearly. Once again, the sequence we see here is the differences between prime numbers. Please note that the zeros at the beginning of each line are a special case. Remember that what we're looking at is factors of the differences between numbers being unflagged. When the first one is unflagged, there is no number preceding it, and thus no difference. The zeros then, are the number shown when unflagging the first number in the sequence, which is the square of the number that is being filtered out.

Before continuing with this document, and trying to describe some observations, the legend will need to be extended:

Notation
Symbol:Represents:
P(x) The Xth prime number, so P(1) = 2, P(2)=3, P(3)=5, P(4)=7 and so on.
S(x) The series of numbers output by the program when filtering out P(x). For example, S(1) = {22222.....}, S(2) = {24242424.....}, S(3) = {4242462642424626...} and so on.
S(x, y) The y'th element of the series S(x)
D The series of differences between prime numbers, {1, 2, 2, 4, 2, 4, 2, ...} D(x) would represent the xth element of that series. In other words, D(x) = P(x + 1) - P(x)
** "To the power of", eg. 5**2 = 25.

Some observations

  1. With the exception of S(1), each number we see in these series is divisible by two. This makes perfect sense. It happens because all multiples of two have already been removed. This means that the numbers we are filtering out will all be odd ones. Because they are all odd numbers, the differences between them have to be even numbers. If you take the difference between any two odd numbers, it has to be an even number. In other words, if (x mod 2) = (y mod 2), then (Delta(x, y) mod 2) = 0. That is to say, if any two numbers divided by a third number return the same remainder, then the difference between those numbers will also be divisible by that third one.

  2. S(x) enters the D sequence in the xth position.

  3. All numbers in S(x) are less than P(x). This means that, when filtering out multiples of x, the largest difference between any two compound numbers which have not been removed already will be x * (x - 1).

  4. At any point where S(x + 1) differs from S(x), it does so by replacing two numbers in the series with their sum. For instance, if you look at S(4) again, and then at S(5), you can see this:

    S(4):42424626424246264242
    S(5):02424626424662642646

    You'll notice that the 6 in the twelfth character shown in S(5) replaces a 2 and a 4 in S(4). Then, as it progresses, another 6 replaces a 4 and a 2. I've yet to find any sequence that does not follow this behavior.

  5. The first iteration of S(x) that does not deviate from D is also the first one in which P(x)**3 is greater than the number up to which we're counting. For example, if we're counting up to 32768, then S(11) will be the last one that deviates, because P(11) = 31, and 32768**(1/3) = 32, which is greater than 31 and less than the next prime number, 37. This probably doesn't have much meaning or value from an abstract mathematical point of view, but is very good to know as far as writing an algorithm for handling primes is concerned. It means that the highest number we really need to do any calculations up to is the cube root of our maximum test value.

    Please note that this is not visible in the table above, as it is truncated to make it semi browser-friendly. In order to see this you will need to look directly at the output of the program

  6. If "y" represents the first element of P whose value is greater than P(x)**2, then S(x) deviates from D at the same point that S(y) enters it.

    That was a rather difficult sentence to compose and I'll need to reread it a few times to make sure I'm stating it properly. To explain a little more clearly, I'll try by example:

    You may note that S(4) (where we're filtering out the number 7) follows the series D up until S(4, 12). That is the same as D(21). D(21) also happens to be the point at which S(16) begins. S(16) is testing multiples of 53, which is the next prime greater than 7**2.