Fixing the Scoring Metric 03/05/2018

This week is all about making sure my scoring metric works, namely, getting rid of the static that occurs. Each day I will start by making some changes, testing, running, and eventually running for an extended period of time.

Day 1

To try and fix the scoring metric the first thing I will try is to adjust the size of how the program defines medium. I strongly suspect that this is off, and some larger pixelated shaped are getting counted as medium. This would give the image an artificially inflated score.

For my first test I merely re-adjusted the definition of medium. I based it off the image that scored best with the participants of the survey. I had the program randomly generate 35 images to see which ones were getting rated the best.

The good news, is that I saw a trend toward the objects that had larger shapes being scored better, the bad news is that pixelated images still did get scores. However, in the long run I have yet to see if this is a problem.

The best: 0.5185208953077559
Pixelated Score: 0.0659560000685043

We can see from the above images larger shapes are forming and getting rated better. However, the first generation tends to have images that are less pixelated and more sparse than those generated down the line. It could just be the pool the program has to judge from.

Testing this in the long run yielded the following results:

A pixelated mess with a score of: 0.5250059715377846

This is not at all what we wanted, and is disappointing to see considering how many different automatas were produced in this run-through.

I will once again try to do this, except this time, I will up the k-score the program uses by 1.

Once again bad: 0.5337039721989159

Day 2

To begin today in light of the unsuccessful tests run yesterday, I start by implementing the database in such a way where we can see all four aspects of the score put into place.

This should help me through the rest of the week when I am trying to develop a decent scoring metric.

database score breakdown

Now when the “View Best” button is clicked, we not only get the automata’s with their score, but also a breakdown of why they got that score. As we can see here, it is much easier for an automata to score well with shape than it is with color.

Day 3

Day 3 did not happen. I ended up not working on this.

Day 4

After having some time to sit down with the program again, I ran a test to see what sort of score metric the program was producing more specifically.

Score: 0.5045024828987243

With a breakdown of:

Colorscore: 0.48729443948165907

Sizescore: 0.5217105263157895

This means that the sizescore is still off, but from the above image we can see that the sizescore is getting less pixelated. More medium sized shapes are forming. but shapes larger than that are not.

I am going to run some tests where I up the amount of big shapes that the program wants to find, from 2 – 5, to 3 – 8.

Score: 0.44558746313763953

As we can see, that was a grand success, and the program ended up picking an image that was not pixelated at all, and formed many big shapes.

The breakdown of this was:

Colorscore: 0.2017809868813396

Sizescore: 0.6893939393939394

This is a score I would agree with. Obviously the shapes formed are much larger than before, but the colors are ugly. This is expressed in the score, where the colors are rated rather low.

Previously, I used a heavy-handed extra approach to try and punish the images that had over a certain number of small shapes. Now, I will go back to test and see whether this approach is pushing the size scores up by a significant amount.

It does to a degree, but it is still possible for the color scores to catch up.

I set the larges shapes to 3 – 7, and got rid of the extra function dealing with small shapes.

This made the shapes once again pixelated.

Score: 0.4711084607047361

Tomorrow I will again look at this and try to re-correct it in a way that doesn’t overwhelm the color score of the image.

Day 5

I am so close to getting a balanced scoring metric, however, I just need a small push. To do this, I am going to say the only way for an image to get a perfect chi metric for big shapes is for it to have exactly 7 big shapes like the example from the survey.

Since I took out the score for small shapes that was heavyhanded, I re-implemented a smaller less effective approach. We will see if this is enough for the program to continue to choose large shapes.

Looking through the results, we can see that this metric is better, but still not at all perfect.

This was the best, with a score of : 0.528529487141361

The breakdown was about 0.5 for each size and color score. However, looking at some of the runner ups in the algorithm revealed that some pixelated automata’s are almost making the cut.

Below you see some of the other automata’s that the program liked.

As you can see, the program is still confused. I am going to re-try an edited heavy-handed approach with some of my changes, and while that occurs, look for anything strange in the forming of the automata or the seeds. It is my hope that the overabundance of pixelated automata might be occurring in error, and can be set right..

I moved some things around, however, I am not really sure if I did anything.

For the rest of the night I am going to be running 2 – 3 longer tests. These will be very hands off, and tomorrow when I review them I should be able to get a better idea of if this is a sufficient scoring metric to use.

I will also be purchasing poster board today.

Day 5

16 poster boards were purchased, I also obtained clear packing tape and fishing wire to hang them up. All that is left is creating my 8 images, printing, gluing them, and typing up some stuff for the poster board + brochures.

It is important to note that I will not be choosing these images. I will be running the program exactly 8 times, for the same amount of generations. Whatever I receive from these run through is final unless I make changes and produce exactly 8 more.

Which is why it is so important that I get this metric working correctly.

The tests were not very insightful, but I have one small suspicion that I am currently following. I have set the original iterations for my automata to the low number of 2, this makes it nearly impossible for static to form. I have also set the iterations to remain the same through mutation. If I run through a large number of generations, and the automatas start getting more complex, this is a sign that somewhere along the line iterations are stacking, and I am no longer getting accurate automatas.

There is definitely something going on with my automata’s, as demonstrated in the difference between how the generations form.

baby automatas

In the original generation, none of the automata’s have static. It is very rare for them to even touch. This is because their iteration is so low.

bad generation

It looks like mutation is perhaps doing something strange with it’s iterations. Moving forward in the generations, crossover tends to not have as much static. However, the last generation does contain crossover with static. It is perfectly possible that a flawed mutation would later get crossed causing this to occur. For now I will review my code for mutation and see what might be happening to cause this. I am going to officially call this a bug.

The bug

Images happening that appear to have an overabundance of iterations. According to what gets put in the database, these automata’s officially have an iteration of ‘2’.

It turned out to be in where the program called cell.change, which mutates the cell. The cell did change, but also called itself to be created, which was incredibly problematic considering this is what ended up happening later on down the line as well. This also explains why it was only occurring in mutation. Removing this allowed the program to function correctly.

With iterations set to four. It turns out it was more effective to fix a bug, than to try and make huge changes to the scoring metric!

Now I can safely run a few more tests to confirm the scoring metric works, and I should be good to go for producing my eight images Monday!

This is good, since I need to be printing them Wednesday.

I ran a test with 6 iterations each, and 35 starting images.

I am very content with the results from this test. I find the chosen automata aesthetically pleasing, and fitting for my new metric. Further than that though, after running seven generations the program came to a clear decision on what it like.

Score: 0.5049935919222249

With a breakdown of:

Colorscore: 0.48755128640855244

Sizescore: 0.5224358974358975

The colors clearly are similar to each other and fulfill our metric. In the final generation, we can see just how much this program likes this colors scheme:

Final generation

It makes perfect sense that this is what would form as the program slowly narrows down on what sort of automata it likes. It is my hope that I will see similar results in future runs of this.


  • This week is image generating week
  • Next week is poster week
    • 16 posters with score underneath
    • Poster should walk through the process of scoring
    • Flyer should explain how the images were formed and the point of the project + genetic algorithm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s