This is a raw dump of brainstormery had during a hacksession with Threebean.


$ sudo yum install python-fedmsg-meta-fedora-infrastructure $ hub clone ralphbean/fedora-stats-tools

The Longtail Metric

Though this was only about 90 minutes of cycling, it is the part that is burned most into my brain. This metric is all about Helping identify how "flat" the message distributions are, to avoid uneven burnout mode... aka, take the agent that is generating the most messages within a time frame (the "Head"), and the agent generating the least number of messages in that timeframe(the "Tail"), and come up with a line drawn between them. The more "flat" that line is, the more even the number of generated messages is amongst all contributors. Still unclear? Me too ;) Here's some python instead:

Logtail.analyze at

    import collections
    import json
    import pprint
    import time

    import requests

    import fedmsg.config
    import fedmsg.meta

    config = fedmsg.config.load_config()

    start = time.time()
    one_day = 1 * 24 * 60 * 60
    whole_range = one_day
    N = 50

    def get_page(page, end, delta):
        url = ''
        response = requests.get(url, params=dict(
        data = response.json()
        return data

    results = {}
    now = time.time()

    for iteration, end in enumerate(range(*map(int, (now - whole_range, now, whole_range / N)))):
        results[end] = collections.defaultdict(int)
        data = get_page(1, end, whole_range)
        pages = data['pages']

        for page in range(1, pages + 1):
            print "* (", iteration, ") getting page", page, "of", data['pages'], "with end", end, "and delta", whole_range
            data = get_page(page, end, whole_range)
            messages = data['raw_messages']

            for message in messages:
                users = fedmsg.meta.msg2usernames(message, **config)
                for user in users:
                    results[end][user] += 1


    with open('foo.json', 'w') as f:

Logtail.analyze at

import json

comparator = lambda item: item[1]

with open('foo.json', 'r') as f:
    all_data = json.loads(

for timestamp, data in all_data.items():
    for username, value in data.items():
        all_data[timestamp][username] = float(value)

timestamp_getter = lambda item: item[0]

sorted_data = sorted(all_data.items(), key=timestamp_getter)

results = {}

for timestamp, data in sorted_data:
    head = max(data.items(), key=comparator)
    tail = min(data.items(), key=comparator)

    x1, y1 = 0, head[1]
    x2, y2 = len(data), tail[1]

    slope = (y2 - y1) / (x2 - x1)
    intercept = y1

    metric = 0

    data_tuples = sorted(data.items(), key=comparator, reverse=True)

    for index, item in enumerate(data_tuples):
        username, actual = item
        # line formula is y = slope * x + intercept
        ideal = slope * index + intercept
        diff = ideal - actual
        metric = metric + diff

    print "%s, %f" % (timestamp, metric / len(data))
    results[timestamp] = metric / len(data)

import pygal
chart = pygal.Line()
chart.title = 'lol'
chart.x_labels = [stamp for stamp, blob in sorted_data]
chart.add('Metric', [results[stamp] for stamp, blob in sorted_data])

Stuff to build/consider next?

Radar Charts

We must be concerned with normalizing the data, because koji will always have highest magnitude of messages. This is done by:

  1. querying all messages of a type, get the total
  2. querying just messages for that user, in that type
  3. divide usermessages/totalmessages

Daily +/-
just the diff of topic counts
weekly +/-
just the diff of topic counts