The Program

Geek Out Blog

Apply to TechStars for access to $118K in funding and the best mentors and investors on the planet.

2nd July 2012

The Death of Demographics

Author: Chris Thorne of Spot Influence

A few months ago, our Co-Founder Dave Angulo did a presentation at the Defrag conference regarding “The Death of Demographics” [See SlideShare deck at bottom]. Recently, I was chatting with the president of an advertising agency, and he mentioned a similar theme. When describing our data, he felt that our technology could revolutionize the industry by providing access to people data at scale – eliminating the need for generalizations of large audiences. We’ve gotten a really positive response from people regarding this topic, so I think it’s time for a blog post…

Traditionally, demographic analysis has involved dealing with generalized abstractions of people, and then picking a subset to ask detailed questions in order to understand them better. A great example of this is what Time magazine calls, The Smartphone Mom. According to Time, “Web firms are collecting personal information about moms, including what times of the day they’re logged on, if they are connected from home or on the road, and how often browsing turns into a purchase. Firms such as Procter & Gamble, Walt Disney, Comcast and AT&T… want to use that kind of data to tailor ads to that demographic.”

Demographic data can be very useful for marketers, but it’s limited in the sense that the insights are only generalizations. For example: “52% of Moms said they use their smartphone within 5 minutes of waking up” (Life360). As a business, don’t you want to know who those 52% actually are? Specifically, what are their interests & where can you engage with them online? Are there influencers within that specific group of Moms?

These signals have existed online for some time, but up until now, technology has prevented them from being analyzed at scale. Generalizations were made because understanding the interests of millions of people, at the granular level of each individual, was impossible. There was simply too much detailed data to provide business with actionable insights. With the technology we’ve developed, we’re about to change that…

By organizing the web around people, Spot Influence is able to determine who the influencers are for any subject/search online, where these individuals post public content (Twitter, Blogs, LinkedIn, etc.), and what their interests & influential topics are. This data has the potential to drastically change how companies market their product & gain insights into their actual audience. Generalizations no longer need to be made. We can’t wait to see the impact this technology has on Marketing!

Check out our presentation from Defrag 2011 below, and let us know your thoughts on this post.

Death of Demographics

25th June 2012

What is DMARC and Why Should You Care?

Author: Carly Brantz of SendGrid

DMARC is an acronym for “Domain-based Message Authentication, Reporting & Conformance” which is a new standard that makes it easier for ISPs’ to prevent messaging abuse in the email ecosphere. With malicious email on the rise, consumers are having increasing difficulty identifying legitimate email. As a result, they unwittingly give up their personally identifiable information which exposes them to financial havoc and ultimately reduces trust in the channel and trust in the brand.

DMARC is the next evolution of email authentication, a process by which ISPs can better identify legitimate mailers from spammers. Essentially, DMARC allows email senders to specify how ISPs should treat emails that cannot or were not authenticated using SPF or DKIM by them. Senders can opt to send those emails to the junk folder or have them blocked them all together. By doing so, ISPs can better identify spammers and prevent malicious email from invading consumer inboxes while minimizing false positives and providing better authentication reporting for greater transparency in the marketplace.
This is just another great example of email senders and ISPs working together to protect the email channel. To learn more about DMARC, visit the organization’s website at To learn more about authentication, read this blog post.

19th June 2012

Fixing Top 3 Issues with HTML5 Apps Connecting to Cloud Backends

Author: Mark van Seventer of Kinvey

What are your New Year’s resolutions?

I’ve always hated that question, not in the last place because everyone tends to forget all about them once it’s February. However, when my aunt asked me this question last year, I replied: “I want to work with a company in the United States.”

Let me introduce myself. I’m Mark, a Computer Science student at VU University Amsterdam, The Netherlands. Over the past year I have been looking for a once-in-a-lifetime-internship to conclude my studies. My patience paid off, as of today I am the latest addition to the Kinvey team. Exciting!  So, here we are, one year after I made that fateful resolution…

I will be working on making it really easy to connect HTML5 apps to cloud backends. Although iOS and Android apps are hugely popular at the moment, I am convinced HTML5 apps will get a growing market share in 2012. But, in order to make a real impact, HTML5 apps need to overcome a number of challenges to communicate with cloud backends. Here are my top 3:

1. Browser Compatibility for Mobile App Features

The HTML5 spec is far from complete. This makes it difficult for browsers to implement HTML5 specific features, since these are subject to change. Some browsers don’t even bother implementing certain features. Consider the Web Sockets API, allowing HTML5 apps to implement push messaging to active apps: not supported by the Android Browser.

How do we solve this? Depending on the importance of the feature, we could simply ignore certain browsers. However, things like push messaging often cannot be left out. So we need some kind of workaround, but only for browsers who don’t offer native support.

By using a library such as Modernizr, it is easy to detect whether an API is natively supported. If not, just use a workaround which mimics this API, and off you go. However, one thing that is often overlooked is that a workaround comes with a performance penalty.

Many libraries implement the above behavior, and most include workarounds even if they are not needed. The obvious tradeoff made here is download size versus number of HTTP requests. For example, jQuery still supports Internet Explorer 6. When using jQuery for your app, it may be very interesting to use a smaller, stripped version which only supports mobile browsers.

Furthermore, grouping different workarounds together (e.g. CSS3 transitions and CSS3 animations) is an interesting approach to decreasing the download size even more. Performance is a critical success factor of any app, so it will be very interesting to see how this develops in 2012.

2. Data accessibility

Another big challenge is building a HTML5 library, which converts the application’s data model into REST API calls. Since Kinvey targets a huge variety of apps, the library should be able to store any data model. An additional requirement is that all communication should go asynchronously, to avoid blocking the browser.

Unfortunately, since the application runs in the browser, it cannot do any data processing once the application is closed. This can and eventually will lead to situations where data is not persisted correctly, maybe because the Wi-Fi is slow.

To make things even more complicated, requests have a certain order. For example, it should not be possible to delete a resource first, and then edit it. However, due to the asynchronous nature of the requests, this may very well occur. This can be resolved using an asynchronous job queue, such as the one used by persistence.js.

However, unrelated requests should be pipelined in order to optimize performance. This seems tricky at first, but given a NoSQL data store, checking on document level should be sufficient.

3. Synchronization

Many applications will only succeed if they can be used offline. This also holds true for HTML5 apps. But how can we implement offline storage, and synchronize the offline data store with the remote backend once the app is back online? This is, in my opinion, the toughest challenge to overcome.

You can imagine synchronizing data is a lot more than simply copying data from one server onto another. What if two users alter the same piece of information? Or the data model changes? These examples are both realistic use cases which need to be taken into account. There are existing solutions which tackle one problem, but not another. This will not work.

The way to overcome this challenge is to abstract the whole service, so that it also can be run locally. When both the remote and local backend implement the same policies, synchronizing data is just a matter of following a defined set of rules. This not only holds true for HTML5, but also for native apps.

All in all, I can understand some people referring to HTML5 as immature. However, with some tweaks we can run HTML5 apps almost everywhere. And, native support is only going to get better. For me, there is no reason not to create your app in HTML5. And with Kinvey’s upcoming support for HTML5 apps, its becoming much, much simpler.

Those are my top 3, what are yours?

11th June 2012

Face Detection Explained – Technology Overview

Co-authored by Jason Sosa of Immersive Labs and Dr. Stephen Moore

Face detection is a computer-vision technology that determines the location and size of a human face in an arbitrary (digital) image. The facial features in the image are detected, and any other objects like trees, buildings, bodies, etc. are ignored. The human face is a rich source of information — by looking at the person’s face, we can immediately identify whether the person is male or female, the person’s approximate age, facial expression, and so on. Face detection can be regarded as a more “general” case of face localization. In face localization, the task is to find the locations and sizes of a known number of faces.



Different Techniques

Face detection is one of the visual tasks which humans can do effortlessly. However, in computer vision terms, this task is not easy. A general statement of the problem can be defined as follows: Given a still or video image, detect and localize an unknown number (if any) of faces. The solution to the problem involves segmentation, extraction, and verification of faces, and possibly facial features, from an uncontrolled background. As a visual front-end processor, a face detection system should also be able to achieve the task regardless of illumination, orientation, or camera distance. There are different approaches to detect a face in a given image. Below are just some of the techniques used for computer-vision face detection.

  Finding faces by color

This is the approach where a face is detected using skin color. Once we have access to color images, it is possible to use the typical skin color to find face segments. But in this approach, there is a drawback. Skin color varies from race to race, and this does not work well with all skin colors. In addition, this approach is not very robust under varying lighting conditions.

 Finding faces by motion

Faces are usually moving in real-time videos. Calculating the moving area will capture the face segment. However, other objects in the video can also be moving, and may affect the results. A specific type of facial motion is blinking. Detecting a blinking pattern in an image sequence can detect the presence of a face. Eyes usually blink together and are symmetrically positioned, which eliminates similar motions in the video. Each image is subtracted from the previous image. The difference image will show boundaries of moved pixels. If the eyes happen to be blinking, there will be a small boundary within the face.  A face model can contain the appearance, shape, and motion of faces. There are several shapes of faces. Some common shapes are oval, rectangle, round, square, heart, and triangle. Motions include, but are not limited to, blinking, raised eyebrows, flared nostrils, wrinkled forehead, and opened mouth. The face models will not be able to represent any person making any expression, but the technique does result in an acceptable degree of accuracy. The models are passed over the image to find faces; however, this technique works better with face tracking. Once the face is detected, the model is laid over the face and the system is able to track face movements.

Finding faces in image with controlled background

This is the easiest of all the approaches. In this approach, images are used with a plain mono color background, or images with a predefined static background. Removing the background gives only the face boundaries, assuming the image contains only a frontal face. Intel OpenCV is an open source library which adopts this method to detect human frontal faces in a given image.  The OpenCV library makes it fairly easy to detect a frontal face in an image by using its Haar Cascade Face Detector (also known as the Viola-Jones method). OpenCV comes with several different classifiers for frontal face detection, as well as some profile faces (side view), eye detection, nose detection, mouth detection, whole body detection, etc. Once it detects a human face in a given image, it will mark a rectangular box around it.

Differences between Face Detection and Face Recognition

As computers become more ubiquitous, concerns over privacy have come to the forefront of discussions in the media and online. One area in particular which has raised concerns is the area of face detection and face recognition. Media outlets commonly interchange the terms face detection and face recognition. This has also led to misconception about the technology. Simply put, face detection detects human faces. While face recognition searches for faces similar to ones stored in a database. Face recognition can identify and remember your face even years after it was first recorded.  With face detection, no faces or identity information is learnt or stored, thus if a person leaves the camera view and re-appears later that face is not remembered. Face Detection software uses some characterizes of the face to classify demographics (age and gender) and are immediately discarded. In fact, face detection software is not capable of recognizing individuals.

In contrast to face detection is the problem of facial recognition. Recognition software concentrates on making a positive identification of the individual against a database that archives identity information. Face recognition requires the face image or at the very least, information that can identify the face as unique be stored and compared to other such identity information in a database. Such films as Minority Report depict retinal scanning and provides a glimpse into the future where advertisers record identity information and market products based on previous purchases.

Real-Time vs Server Side Face Detection/Recognition 

Various technologies exist within face detection/recognition to tackle specific problems. Companies such as provide a REST API which allow third party developers to submit images for analysis and tagging. This is done by sending an image file of a face to their servers. The face data is then returned with tagged attributes such as identity, age and gender. This server side technique provides a high level of recognition accuracy but does not allow for real-time face detection and tracking due to the latency of the network connection. Real-time face detection runs locally on the client side and allows for real-world detection of multiple faces simultaneously. The processing time required for analysis is less than 100 milliseconds.

Benefits of Face Detection

Detection of human faces is becoming a very important task in various applications.  Using face detection as the foundation, many applications can be based on top of it.

  1. Face detection is most widely used as a pre-processor in face recognition systems.  Face recognition systems have many potential applications in computer vision communication and automatic access control systems.
  2. Face detection is also being researched for use in the area of energy conservation. Televisions and computers can save energy by reducing screen brightness. People tend to watch TV while doing other tasks, and are not always focused 100% on the screen. The TV brightness stays at the same level unless the user lowers it manually. The system can recognize the face direction of the TV viewer. When the viewer is not looking at the screen, the TV brightness is lowered. When the face returns to the screen, the brightness is increased. Programming and advertising can also be tailored based on Face recognition.
  3. Gender Recognition: From a given image, we can detect whether the person in the image is male or female. This is particularly useful for advertisers and retailers interested in audience measurement in physical spaces.
  4. Detection of facial expression (happy, surprised, angry, sad, etc).


9th June 2012

Averages, Web Performance Data, and How Your Analytics Product Is Lying To You

Author: Josh Fraser of Torbit

Did you know that 5% of the pageviews on take over 20 seconds to load? Walmart discovered this recently after adding real user measurement (RUM) to analyze their web performance for every single visitor to their site. Walmart used JavaScript to measure their median load time as well as key metrics like their 95th percentile. While 20 seconds is a long time to wait for a website to load, the Walmart story is actually not that uncommon. Remember, this is the worst 5% of their pageviews, not the typical experience.

Walmart’s median load time was reported at around 4 seconds, meaning half of their visitors loaded faster than 4 seconds and the other half took longer than 4 seconds to load. Using this knowledge, Walmart was prepared to act. By reducing page load times by even one second, Walmart found that they would increase conversions by up to 2%.

The Walmart case-study highlights how important it is to use RUM and look beyond averages if you want an accurate depiction of what’s happening on your site. Unlike synthetic tests which load your website from random locations around the world, RUM allows you to collect real data from your actual visitors. If Walmart hadn’t added RUM, and started tracking their 95th percentile, they may have never known about the performance issues that were costing them some of their customers. After all, nearly every performance analytics product on the market just gives you an average loading time. If you only look at Walmart’s average loading time of 7 seconds it’s not that bad, right? But as you just read, averages don’t tell the whole story.

There are three ways to measure the central tendency of any data set: the average (or mean), median, and the mode – in this post we’re only going to focus on the first two. We’re also going to focus on percentiles, all of which are reported for you in our real user measurement tool.

It may have been some time since you dealt with these terms so here’s a little refresher:

  • Average (mean): The sum of every data value in your set, divided by the total number of data points in that set. Skewed data or outliers may exist and pull the average away from the center, which could lead you to make wrongful interpretations.
  • Median: If you lined up each value in a data set in ascending order, the median is the single value in the exact middle. In page speed analytics, using the median gives you a more accurate representation of page load times for your visitors since it’s not influenced by skewed data or outliers. The median represents a load time where 50% of your visitors load the page faster than the median value and 50% load the page slower than that value.
  • Percentiles: Percentiles are the 100 groups that fall under the full spectrum of your data. Usually, we hear, “You’re in the 90th percentile,” which means that your data is better than 90 percent of the data in question. In real user measurement, the 90th percentile represents a time value, and 90 percent of your audience loading at that value or faster. Percentiles show you a time value that you can expect some percentage of your visitors to beat in their load times.

Look at this example histogram showing the loading times for one of our customers. If you’ve studied probability theory, you may recognize this as a log-normal distribution. This means the distribution is the multiplicative product of multiple independent random variables. When dealing with performance data, a histogram is one of your most helpful visualizations.

In this example, other products that only report the average load time would show that their visitors load the site in 5.76 seconds. While the average page load is 5.76 seconds, the median load time is 3.52 seconds. Over half of visitors load the site faster than 5.76 seconds, but you’d never know that just looking at averages. Additionally, the 90th percentile here is over 11 seconds! Most people are experiencing load times faster than that, but of course, that 10% still matters.

For people who care about performance, it’s important to use a RUM product that gives you a complete view into what’s going on. You should be able to see a histogram of the loading times for every visitor to your site. You should be able to see your median load time, your 99th percentile and lots of other key metrics that are far more actionable than just looking at an average.

For any business making money online, you know that every visitor matters. For most sites, it’s not acceptable for 10% of your visitors to have a terrible experience. Those 10% were potential customers that you lost, perhaps for good, simply because your performance wasn’t as great as it should have been. But how do you quantify that?

It all begins with real user measurement.

If you want to accurately measure the speed on your site, it’s important to include RUM in your tool belt. Neither synthetic tests nor averages tell the full story. Without RUM, you’re missing out on important customer experience data that really matters for your business.

Don’t Miss a Post, Subscribe Below:

Email address: