Visitor Identification

03 Feb 2019 » MSA

Welcome back to another basic post about the Adobe Experience Cloud. One of the main pillars of any web analytics tool is the visitor identification. It is not only used for the visitors metric, but also as the basis of multiple other features in tools like Target and Audience Manager.

Why visitor identification?

Source connectors Let’s start by understanding why do we need visitor identification. If you are reading this post, I am pretty sure you know the reason very well. However, I am also confident in stating that some of you may have had to explain this to other people. So, I am not writing it for you to understand it, but to help you explain it to others.

In the early days of the web, there was only one KPI: page views. That is all that counted. The value of a websites was based on this single metric. Someone even made 1 million dollars by creating a single page with lots of traffic. Over time, that narrow view expanded and marketers needed more metrics, like visits and visitors.

Ideally, we would like to count number of real people, but this was not possible. Back in the days, people only had one single connected device, a desktop, often shared by the whole family, with only one browser (remember the days of IE6?). So, tracking individual browsers was a good proxy for visitors. Visitors, in turn, was a good proxy for people.

Page views were very simple to measure. You just needed to parse your web server logs and count the times the server served an HTML. Visitors, on the other hand, were not that simple. So, new techniques had to be envisioned.

Finally, with visitor identification in place, you can not only count them, but follow them as they navigate through your properties.

Adobe Analytics

Adobe Analytics uses an array of techniques to identify a visitor, which are summarised in the help section. It is the most complex solution in the Adobe Experience Cloud in this respect.

The following list shows, in order, the sequence of techniques that Adobe Analytics uses. This is a strict sequence, from top to bottom, stopping once one succeeds.

(1) `s.visitorID`

Do not use it. STOP.

I could spend a long time explaining its usage, but the current best practices recommendation is to avoid it completely. It has too many issues to make a good use of it.

(2) `s_vi` cookie

If you have been using Adobe Analytics for more than 5 years, you will remember that this was the main visitor identification solution. Basically, Adobe Analytics collection servers generated a random value and stored it in a cookie in the browser with the name s_vi. This meant that the cookie had to be stored under Adobe’s domain, first 2o7.net and, later, omtrdc.net.

Since 3^rd party cookies are not as reliable as 1^st party cookies and are often blocked, there was an option, using DNS configuration, to get the Analytics collection servers a FQDN under the same domain as the website (remember metrics.mywebsite.com?). Therefore, the browser would store s_vi as a 1^st party cookie, even when generated by Adobe’s server. In case you need a refresher, have a quick look at my post about cookies.

When the ECID came out, Adobe renamed this parameters as aid. In the analytics calls, it would be sent as a query string parameter instead of a cookie.

I wanted to show some images with the s_vi cookie in action, but I could not find any website that still uses it. However, Adobe’s server will set it if the ECID is not present in the call.

(3) ECID

This is what you should be using in all your implementations. I wrote a full blog post about the ECID, so I recommend you review it if you need more details about it.

(4) fid

If the s_code (H.25.3 or above) or the AppMeasurement library cannot get an ECID, they will create a cookie named s_fid, with a randomly generated ID. The JavaScript library will then send the fid in all Analytics calls, if the s_fid cookie is available, irrespective of whether it is needed or not. The Analytics servers will use this fid only if any of the previous techniques fail.

Your inquisitive mind may be asking a few questions about this technique. Let me clarify a few of details:

The AppMeasurement library cannot tell whether the s_vi cookie is present, as it could be a 3^rd party cookie. Therefore, it has to generate the s_fid without knowing whether the s_vi exists. On the other hand, this library knows whether the ECID has been generated.
In the image request, both s_vi (or aid) and fid may be present. Analytics’ servers will follow the rules described in this blog to choose the best value.
The JavaScript random number generator is not very random. The likelihood of two browsers generating the fid is not negligible, which defeats the whole purpose of visitor identification. Therefore, the fid value is not the best technique and this is why the s_vi or the ECID is preferred as Adobe generates them server-side.

In case you are wondering, F is for “fallback”.

(5) IP address and User-Agent

If all else fails and the image request has no unique ID, Adobe Analytics will fall back to the very basics. As a last resort, Analytics servers will get the User-Agent and the IP address and combine them. The output will be a random-ish ID, which Analytics will use as a visitor ID.

There are many problems with this technique, which is why it is the last in the list:

If you have a mobile device and move between different networks (home, mobile network, office…) in each case, the IP address will be different. So, a single device will be identified as multiple visitors.
If you have multiple, equal devices sharing the same IP address, Analytics will consider all of them as a single visitor. For example, consider an office, with multiple laptops. The public IP address used by all devices in the office will be the same and many laptops will have exactly the same browser version. In this case, a combination of User-Agent and IP address will not be unique.

Adobe Target

Adobe Target only uses one method to identify the visitor, the PCID, which is generated server-side, like the s_vi or the ECID. The Target JavaScript library will store its value in the mbox cookie. Depending on your configuration, this cookie can be either a 1^st or a 3^rd party cookie, so the accuracy will depend on which option you have.

Target also uses the ECID, but it uses this service differently than Analytics:

In web and app, Target will still generate PCID, even if the ECID is present. Target relies on the PCID as the master key, to protect itself from a failure in the ECID service. Target will still use the ECID to communicate with Analytics and Audience Manager.
In the API, you can only include the ECID, which will then be used as the master key.

You may be wondering: why so much difference with Analytics? Well, remember that Analytics is about accuracy, whereas Target is about experience. If you cannot identify a visitor in Target, the worst thing that will happen is that this visitor will get the default experience. No big deal. Besides, you should not use the Target as your reporting source for visits and visitors.

Adobe Audience Manager

As with Target, Audience Manager uses its own specific cookie: demdex, under the demdex.net domain. The value of this cookie is called the UUID and it is generated server-side. It is always a 3^rd party cookie, with all its consequences.

What if the browser rejects 3^rd party cookies?

For integrations with other Adobe tools, as long as you are using the ECID, there are no consequences. The ECID is a derivative of the UUID and AAM can convert between the two. Therefore, the visitor can be tracked and audiences exchanged between Adobe tools.
For server-to-server integrations with DSPs, there is nothing you can do. That visitor will not be targetable.

As with Target, you should not use AAM visitor count for reporting.

Problems with visitor identification

It looks like, from what I have been saying, that all is perfect with visitor identification. The reality is not as rosy, with multiple issues:

Multiple browsers. It is very common to have multiple browsers in our computers, even our mobile phones. Remember that each browser has its own cookie jar, which means that browsers in the same device do not use a common set of cookies. Therefore, if someone uses 2 browsers in the same computer, for all our purposes, these 2 browsers will count as 2 visitors.
Multiple devices. As if the previous point was not enough, now we all have multiple devices. I have a tablet, 2 mobile phones and 2 laptops. Again, each device will count as one or more visitors.
3rd party cookies blocked. With Apple’s ITP and Firefox’s equivalent, 3^rd party cookies are now an anathema. Throughout this document, I have mentioned a few times 3^rd party cookies. In those cases, Safari and Firefox will not play nicely with that option.
Cookies cleared. Many people clear cookies often. For our purposes, this means creating a new visitor.
Incognito/private browsing. The whole purpose of this mode is to prevent tracking and, at the end of the session, the browser clears all cookies and local storage. This brings us back to the previous point.

Device Fingerprinting

Some research has been done to overcome the limitations of the cookies. The idea is gather enough information, from both JavaScript and HTTP headers, and be able to uniquely identify the browser or the device, regardless of the privacy settings of the user. Note that I said “device”, not “person”. Only the NSA or GCHQ can get to that level of “person” (for now).

Here you have some techniques I know of and I am sure there are more.

Evercookie. Using all the browser’s capabilities, an algorithm stores a unique random ID in multiple places in the device, so that it becomes almost impossible to delete. The hope is that, even if the user clears cookies, the browser will miss some of these places. As long as the ID is still present somewhere, the algorithm can retrieve, store it again in multiple places and reuse it.
Canvas fingerprinting. Understanding this technique requires a lot of technical knowledge. Let me just say that, using an HTML5 feature, a unique ID can be generated, which is very likely to be unique per CPU. And I am not talking about CPU model, but per each individual CPU. In other words, all browsers in the same device will generate the same ID. Besides, that no data is stored, so you cannot delete anything to prevent tracking. This ID can always be regenerated at will.
Statistical methods. An intelligent algorithm could gather enough information offered by the browser and, using advanced algorithms, detect unique traits. The more information you have about a system, the more unique it becomes. As with canvas fingerprinting, no data is stored, so no data can be deleted to prevent it from working. Some reports have shown how effective this technique can be.

Adobe does not use, endorse or recommend the usage of any of these techniques. I am just putting them here for completeness.

Photo by Robert F. on Unsplash

Introduction to Generative AI (Categories: MSA, Opinion)
Non-traditional channels (Categories: MSA)
Target Transparency (Categories: MSA)
Content Supply Chain (Categories: MSA)
Of data and content (Categories: MSA)
Adobe Journey Optimizer vs Adobe Campaign Classic vs Marketo (Categories: MSA)