ALI (AggData Location Identifier)

ALI (AggData Location Identifier)

What is the ALI?

The ALI is an Identification number that uniquely identifies a location within a list in the AggData Library.

  • List ID: The first digits of the ALI indicate the List ID in which the ALI exists. This value can be up to 5 digits long, but is not zero-padded.  In the above example, the List ID is 393, meaning this location is in the “Starbucks US” list.
  • Core ALI: The next 6-7 digits of the ALI contain the Core ALI. This value is calculated from the core attributes of a location: address, city, state, zip_code.  For international locations, “Province” or “Postal Code” values are used, where applicable.
  • Extended ALI (optional): After the List ID and Core ALI, the Extended ALI is a 6 digit value, calculated from the core attributes plus extended attributes, and added after a dash (-) character. The current extended attributes are: store_name, distributor_name, address_line_2, and store_type.  If none of these fields have values, then the extended ALI will not be included in the overall ALI.


Why is the ALI split into "Core" and "Extended"?

The ALI is meant to uniquely identify each location in the AggData Library.  Currently, the only consistent way to uniquely identify a location is by using its address data; every other collected attribute is subject to change at the will of the primary source.  For example, a chain that has been using one numbering scheme for their store numbers could completely change that numbering system for any reason.  Or, they may display the store number on their website for a period of time, but then redesign their site and remove it from view.  The only guaranteed consistent information is the specific data about the geographical location of the business.

The problem that arises is when a list has two or more locations with the same address.  Take the following example of 4 Starbucks locations:

All four of these locations have the same exact address information, because they are all locations at the Oakland International Airport.  The only way to tell them apart is the Store Name column, or the even less consistent Store Number.  So in order to uniquely identify these locations, we would have to include the store_name (or distributor_name) value in our ID.

Unfortunately, adding extra columns in the way we uniquely identify locations creates a fragility in ID values over time.  For example, a chain could completely redo the way they listed their Store Names, or remove/add them between versions.  If we are dependent on these attributes in our ALI calculation, we would see abrupt changes in values between historical data versions.

The Solution

By separating out the Core ALI and Extended ALI, we can be more robust to the types of changes described previously.  Here are the new ALI values for the above Starbucks example:

Note that in the ALI for all 4 of these locations, the Core ALI is an equal value, and only the extended ALI changes.  This means that you can use the Core ALI as a default when comparing locations across versions, but when you do have duplicate Core ALIs in a list, the extended ALI will distinguish the locations.

Let's take another example.  Say you have a location that only has the following core address information available:

As you can see, the ALI is shortened to only include the List ID and Core ALI.  However, in the next version of this file, we are now able to collect the store name value and add it to our available attributes.  Previously, this would have caused the ULI to change completely.  Now the ALI looks like the following:

Observe that the Core in the new ALI value has stayed consistent, even though the store_name attribute was added and the extended ALI included.  Now you can easily match this to the previous location