Exploring graph database for capturing User Journey

Before diving into the actual topic; it is important to have some understanding about Graph Databases.

What is a Graph Database?

A graph consists of two things: 1) Vertices and 2) Edges. Vertices represent the entities and Edges represent the relationship between those entities. A graph database is a NoSQL database which stores data in the form of a graph; it is a highly efficient way to store data as you don’t require complex joins to fetch data at runtime. In a graph database, you can directly traverse to/through different vertices (objects) in any direction using the edges (relationship) between them. This process is called traversal.

It has a clear edge over the traditional databases in terms of database design and modelling, data ingestion and retrieving data involving many-to-many relationships. These vertices and edges can have independent properties; these properties are stored in key-value pairs. Some graph databases need a schema specification with datatypes and labels; most graphs allow you to manipulate or ingest data without a fixed schema.

A graph database allows you to traverse through millions of nodes and access specific information by using properties to query only the part of data which satisfies the condition of the query; the other part of data which doesn’t match the query pattern remains un-accessed. So, it is a very fast and straight-forward way to access aggregated data based on relationships.

A graph database is very popular in domains like fraud-detection, asset management and social networks. It can also be used in the scenario of capturing online events of a user on a website or a mobile app. The goal is to build a journey of a user by tracking all events/activities like ad-click, ad-impression, page-view, add-to-cart, sale etc; and to efficiently access important information from event properties when required.

There are many graph database products available in the market; some of these are managed and some of them are raw. The popular available options are: 1) Azure Cosmos 2) AWS Neptune 3) NEO4J 4) IBM Graph and 5) Datastax Graph. Out of these, Azure Cosmos and AWS Neptune are new entrants and are fully managed cloud-based solutions.

Let’s see how if a graph database can resolve our problem and can help us capture user journey or not!

What is a User Journey?

It is a series and timeline of views, clicks and other custom events on a website or a mobile application which led to a particular conversion/ sale/lead submit. It can also be termed as the path to conversion.

To visualize this, consider this example:

I want to buy a mobile phone; so I randomly googled ‘best mobile phone under 30k INR’; google showed an ad for a device called Moto Z play. I clicked on it and went through its specs but I decided against buying it due to some reason.

Now that Google knows my history; it will keep showing me relevant ads even on some third-party sites. I encountered one of those ads but didn’t click on it. After a week or so; an ad popped up on my facebook newsfeed that there is a 20% discount on Moto Z play on Amazon, at this moment I decided to buy. So I clicked on this facebook ad and was about to buy this product from Amazon; but before that, I checked for discount coupons on Coupon-Dunia, and finally, I bought the product for 25k from Amazon.

User journey for this particular conversion will be something like this: Click1 (Google, Campaign1, Ad1), View1 (Display, Campaign2, Ad2), View2 (Facebook, Campaign3, Ad3), Click3 (Facebook, Campaign3, Ad4), View3 (Display, Campaign4, Ad5) -> Conversion (Amazon, Revenue:25000, CartItem: MotoZPlay, No of Items: 1)

Modeling the incoming stream of events

We have two main entities: 1) Users 2) Events. Both these will be represented by nodes, and there will be an edge connecting them which will be called a ‘performed‘ edge. We will have one more edge representing the relationship ‘previous‘; so if event1, event2 and event3 are the events performed by a user at time t1, t2 and t3 respectively where t1<t2<t3 then the graph of that user would look as depicted in the diagram shown below.

User-Event Graph Model

There are two possibilities to link events; 1) using the ‘next‘ edge from event1 to event2 and 2) using the ‘previous‘ edge from event2 to event1. But there is one problem with the first approach; when event1 is inserted at time t1, we actually don’t have event2 in the system! Event2 will be inserted at time t2 where t2>t1. It makes more sense to insert a ‘previous‘ edge from event2 at time t2 to event1 whenever event2 occurs. So, the logic of ingesting new event node will also check if there is any event prior to the incoming event for the same user. If there are any events it will take the newest event and plot an edge from the incoming event to the newest event in the current system. So, 1) when event1 comes, no previous edge will be created as it is the first event in the system, 2) for event2 it will find event1 and plot an edge to it. and 3) for event 3 it will find event1, and event2 but it will choose event2 as it the newest event in the current system and will plot a ‘previous’ edge to it. This approach gives us flexibility and an easy way to traverse through any user node and the events associated with it.

Now as we have decided on our modeling; let’s try to plot the user-journey example of Moto Z play as per our model. There are 6 events of different types, and its tabular representation and graph model would look something like this.

MotoZ Play – User Journey – Tabular Representation
User Journey Graph Model

Event analytics and tracking of a user journey requires a new way of storing and querying data. I have worked extensively on tracking and attribution side of things; initially, we had opted SQL Server as the backend to store data but the relational database requires multiple joins and heavy table scans to fetch complicated breakdowns of data required for event filters. Also, the event database can be huge (in GBs/TBs). With our learnings from the traditional databases and after going through a rigorous trial & error phase, we came up with the above graph model.

We have been testing AWS Neptune/Azure Cosmos for last 3 months and we are astonished to see the possibilities and performance of Graph database. It’s too early for me to say that Graph database is a silver bullet for our requirement but it has been a great learning experience as a whole. I would publish a new blog post soon, on how to ingest and query graph databases for complicated outputs of dashboard charts and segmentation filters.

Stay tuned.

Share this blog

DOCKER : Why, What and How?

Why is DOCKER needed?

If you are working in software engineering, software application development or any computer/technology related industry you would have heard these buzzwords: docker, virtual machines, compartmentalization, containerization, micro-service architecture etc.

There were days in the near past when a software product’s technological architecture and stack were relatively (much!) simpler. But in the present day, the software products are getting highly complex in terms of the technologies they use and the hardware requirements they have!

Any modern enterprise application or even a high-user intensive mobile/web application will be a hot & sour soup of ingredients like web APIs, DB (SQL/NoSQL), cloud (AWS/Azure/Google Cloud), analytics DB, Logging Tools, Background Services, API based hooks & endpoints, funky frameworks, task queues, a web frontend for interactive dashboards, complicated reporting, notifiers; and to support everything a dozen programming/ scripting languages! This is just about software; it has to be then integrated continuously to code repositories and has to be deployed on a large set of diverse devices/servers with complex configurations having different operating systems. Additionally, it will also have to be deployed over virtual-machines and cloud-platforms with different environments.

Everyone in the industry is busy revamping their technology stacks; monolith architecture is doomed and deemed to be a legacy! Like startups; even MNC’s are keen (and forced) to take the maximum advantage of the cloud services offered by some of the best tech giants and new toolset invented by small innovative start-ups due to the increasing demands and complexity in both software and hardware requirements as well as to make the entire product more scalable, robust, secure and manageable.

In a software development life cycle, the product feature goes through development, testing, staging and production phases. Generally, dev systems are different from testing & staging environments which in turn is different from the actual production environment. This requires an intensive level of code, environment and deploy related settings. Moreover, in a typical microservice architecture each service or application is deployed on different machines (Azure VM or AWS EC2 instance). You will have to create new VM instances everytime you set-up new services; it isn’t optimum as that will waste so much memory, resources and money.

In simple terms, what we need is,

1) a clean way of providing separation of concerns

2) an efficient, cheaper & automated way to ship production quality code to different machines/environments for an enterprise application.

3) a less tedious, easily manageable and more scalable way of managing micro-services architecture


What is DOCKER?

Docker is the binding glue and the missing piece between Dev and DevOps!

docker logo image

Docker provides a way to ship code, database, configuration, API and other important building blocks of a service or an application into a secure container which can communicate via the network with similar other containers. It is a more efficient, secure and hassle-free way of distributing a complex product requirement into mutually independent, lightweight, portable, self-sufficient and horizontally scalable small units.

Docker is an open source project designed to easily create smaller containers from any existing application or service. To understand how Docker works, it’s important to understand these important concepts:

Docker Engine

docker engine

Docker Engine is a client-server application comprising of a daemon-process, a REST API and a CLI. You can communicate with the daemon-process through CLI. REST API is used internally by CLI to control daemon-process.

Docker Image

A docker image contains binaries, libraries, source code and configurations that make your application. This can be termed as a blueprint for your application.

Docker Registry

A docker registry is an image repository where docker images are stored. It is the GitHub of docker images. DockerHub is a public registry for docker images; you can pull or push your images from DockerHub. You can also create your private registries.

Docker Container

Containers are the fundamental building blocks of docker toolkit. A container is nothing but a process and a running instance of an image. There can be one or more containers running off the same image.

Containers are the image executables which are running on host VM. They are spawned as processes or daemons. It can run independently and can also communicate with other container processes in the host VM. Communication with an external application can also be configured through parent VMs network by doing appropriate protocol/ port settings between host VM and running container process.

docker-architecture

Docker container is not miniature VM! A VM has it’s own kernel space whereas a container will share it’s kernel space with host machine.

vm vs container

Docker is application specific while VM isn’t! A lot of tools, use containers as smaller servers; which is nothing but the tiny VM version of the parent VM. Docker does things differently; a container is like a process; and not a VM. Docker internally uses namespaces, union file systems and control-group features to sandbox a process from other processes. So logically docker provides a way to support two or more versions of a software having conflicting dependencies or even two or more different conflicting software products!

If a system or a VM has a docker installed then one or more images can be built and executed as containers on it, irrespective of host machine’s OS or internal configurations as long as sufficient RAM and SSD/HD space are available in the host machine.

This much theory and understanding are more than enough to get us started with Docker’s initial setup and basic commands. There are advanced concepts like Docker Compose and Docker Swarm but they are for another day! I will cover them in deep in my next blog post.


How to set up DOCKER?

Docker was originally built for Linux. So, Docker supports almost all the flavours of Linux such as Ubuntu, RHEL, CentOS and Fedora. I am not so sure about Amazon Linux.

Docker also supports MacOS and recently it has made huge progress on Windows platforms like Windows Server 2016 and Windows 10 (professional & enterprise editions only).

Docker comes in two variants: 1) Docker-CE and 2) Docker-EE. CE stands for community edition; this is a free version and good enough for small to mid-size projects. EE stands for enterprise edition; it is a paid version.

Installing docker-ce

Please refer – https://docs.docker.com/install/linux/docker-ce/ubuntu/

Installing docker-machine

Please refer – 1) https://docs.docker.com/machine/install-machine/ and 2) https://github.com/docker/machine/releases

The reason why I am not providing download/install commands over here is that Docker is very hot in terms of updates and fixes they are pushing; so it’s always good to refer official docker website and repository to use the latest stable version in your system.

Explore docker commands

Please refer – https://www.devteam.space/blog/list-of-docker-commands-with-examples/


I hope that this post helped you in understanding all the important concepts about docker and got you kickstarted on building your first docker container.

Subscribe and follow this blog for more insights.

Share this blog

Introduction to Elasticsearch

Elasticsearch is the most popular, open-source, cross-platform, distributed and scalable search-engine based on Lucene. It is written in Java and released under the terms of the Apache License.

Elasticsearch is developed alongside Logstash and KibanaLogstash is a data-collection and log-parsing engine while Kibana is an analytics and visualization platform. The three products combined are referred to as the Elastic Stack (formerly known as the ELK stack). They are developed to provide integrated solutions and are designed to be used together.

The data stored in Elasticsearch is in the form schema-less JSON documents; similar to NO-SQL databases. You can communicate with the Elasticsearch server through an HTTP REST API; the response generated will be in the form of JSON object. Elasticsearch is designed to take chunks of big-data from different sources, analyze it and search through it. It is optimized to work well with huge data set; the searches happen very quickly i.e., almost near real-time!

To understand Elasticsearch in detail; we need to understand its core concepts and terminologies.

We will go through each one of them in brief:

Near real-time

In Elasticsearch the data is distributed and stored in different clusters. So when a change is made on an index it may not be readily available; a latency of 1-2 seconds is expected! Contrary to this, when a change is made, it is propagated instantly in a relational database as they are deployed on a single machine. We can live with this slight delay as it is due to it’s distributed architecture; it is required to make it scalable and robust.

At the end of this post; you will get the clear picture of what happens internally and why this latency is expected!

Cluster

A cluster is a group of Elasticsearch servers. These servers are called nodes. Depending upon the use-case and scalability preferences; a cluster can have any number of nodes. Each node is identified by a unique name; all the data is distributed amongst these nodes which in turn is grouped into different clusters. A cluster allows you to index and search the stored data.

Node

A node is a server; it is a single unit of the cluster. If it is a single node cluster; all data will be stored in that single node; else the data will be distributed amongst n nodes which are the part of that cluster. Nodes participate in a cluster’s search and indexing capabilities. Depending upon the type of query fired; they will collaborate and will return the matching response.

Index

An index is a collection or grouping of documents; the index has a property called type. In relational database terms; an index is something like database and type is something like a table. This comparison may not be always true because it very much depends on how you design your cluster, but in most cases, it will hold true.

Any number of indexes can be defined; just like node and cluster, an index is identified by a unique name; this name should be in lower-case.

Type

Type is a category or a class of similar documents; as explained in the above paragraph; it comes close to a table in relational database terms. It consists of a unique name and mapping. Whenever you query an index; Elasticsearch reads _type from a metadata file and applies a filter on this field; Lucene internally has no idea of what the type is! An index can have any number of types, and types can have their own mapping!

Mapping

Mapping is somewhat similar to the schema of a relational database. It is not mandatory to define mapping explicitly; if a mapping is not provided Elasticsearch will add it dynamically based on its data when the document is added. Mapping generally describes the field and its datatype in a document. It also includes information on how to index and store fields by Lucene.

Document

A document is the smallest and most basic unit of information that can be indexed. It consists of key-value pairs; the values can of datatype string, number, date, object etc. An index can have any number of documents stored within it; in object-oriented terms, a document is something like an object. It is in the form of JSON. In relational database terms; a document can be thought of as a single row of a table.

Shards

An index can be divided into multiple independent sub-indexes; these sub-indexes are fully functional on their own and are called shards. They are useful when an index needs to have more data than the hardware capability of a node(server) supports; for example, 800 GB data on 500 GB disk!

Sharding allows you to horizontally scale by volume and space; it enhances the performance of a cluster by running parallel operations and distributing loads across different shards. By default; Elasticsearch adds 5 primary shards for an index. This can be manually configured to suit your requirements.

Replica

A replica is a copy of an individual shard. Elasticsearch creates a replica for each shard; the replica and the original shard never reside on the same node.

Shards and Replicas - Image
This image is downloaded from google; copyright infringement is not intended.

Replica comes into picture when nodes in a cluster fail, shards in a node fail or a spike in read-throughput is encountered; replica promises the high availability of the data in such situations. When a write query is fired; the original shard is updated first and then the replicas are updated with some overlying latency. But read queries can run in parallel across replicas; this will improve the performance of read operations overall.

By default; a single copy of each primary shard is created, but a shard can have more than one replicas in some special cases.

Inverted-Index

It is a unique type of data structure that Lucene uses to make huge dataset readily searchable. Inverted-Index is a set of words, phrases or tokens associated with different documents to allow full-text-search. In simple terms, an inverted index is something like an appendix page at the end of the book; it will have mappings of words to documents.

Inverted Index Image
This image is downloaded from google; copyright infringement is not intended.
Segment

Each shard consists of multiple segments; these segments are nothing but inverted-indexes; which will search in parallel, get results and combine them in the final output for that particular shard.

ES Architecture Image
Visual representation of Internal ES Architecture

As and when the documents are indexed; Elasticsearch writes it to new segments, refreshes the search data and updates transaction logs. This happens very frequently to make data in new segment visible to all queries. Elasticsearch is not meant for updates and delete, so if data needs to be deleted or updated it actually just marks the old document as deleted, and indexes a new document. The merge process also expunges these old deleted documents. Elasticsearch constantly merges similar segments into common big segments in the background; querying too many small segments is not very optimum. After the bigger segment is written; the smaller segments are dropped and log files are again updated to reflect new changes.

It may seem complicated.. But

You don’t have to deal with the internal working of Lucene and ElasticSearch as it is abstracted; you just have to configure clusters with the right number of nodes and create indexes with appropriate mappings! Everything else is done internally. Several organizations like IBM, AWS, Searchly, Elastic Cloud etc., offer Elasticsearch as a managed service; so you don’t have to worry about managing servers, doing deployments, taking backups etc. It will take care of these things for you to save your time and effort to operate these servers.

This post was meant to cover basics of ElasticSearch and a brief idea of how it works internally. I hope that I have done justice to it. In my next post; I aim to cover ‘How to query on Elasticsearch index using Kibana?’.

Stay tuned.

Share this blog

A layman’s guide to Attribution Models

The understanding of attribution models requires a basic knowledge of some digital marketing jargons and concepts. Let us quickly go through some of them.

Ad

A short description of a product or an idea or an event aimed to create an awareness for a particular brand, strategically.

Campaign

A collection of ads through which an agency wants to convey an idea or a message to the users. It generally has a theme around it; aimed at creating an appeal for a set of products to target users.

View or Impression Event

Whenever a user sees an ad; a view or an impression event is triggered.

Click Event

Whenever a user clicks on an ad; click event is triggered.

Conversion Event

Whenever a user makes some sort of transaction; i.e., buying a product, downloading an app, filling a form etc., conversion event is triggered.

User Journey

It is a series and timeline of views and clicks which led to a particular conversion. It can also be termed as the path to conversion.

To visualize everything, consider this example:

I want to buy a mobile phone; so I randomly googled ‘best mobile phone under 30k’; google showed an ad for a device called Moto Z play. I clicked on it and went through its specs but I decided against buying it due to some reason.

Now that google knows my history; it will keep showing me relevant ads even on some third party sites. I encountered one of those ads but didn’t click on it. After a week or so; an ad popped up on my facebook newsfeed that there is a 20% discount on Moto Z play on amazon, at this moment I decided to buy. So I clicked on this facebook ad and was about to buy this product from amazon; but before that, I checked for discount coupons on Coupon-Dunia, and finally, I bought the product for 25k.

User journey for this particular conversion will be something like this: Click1 (Google, Campaign1, Ad1), View1 (Display, Campaign2, Ad2), View2 (Facebook, Campaign3, Ad3), Click3 (Facebook, Campaign3, Ad4), View3 (Display, Campaign4, Ad5) -> Conversion (Amazon, Revenue:25000)

Now that we have a basic idea about ads, campaigns, impressions/views, clicks, conversions and user journey; we can dive deeper into attribution.

What is attribution?

In simple terms, the process of identifying which ad/ campaign led to conversion is called attribution. Attribution is necessary to understand how your ads/campaigns are performing on different search, social, and display networks; using this insight you can plan the budget allocation and targeting rules for different campaigns. It gives a clear picture in terms of ROI and lead generation.

Attribution can be performed using different models; these models can be categorized into two types: 1) Single-Touch Models and 2) Multi-Touch Models

Single-Touch Models

The philosophy of single-touch models is very simple; attribute the conversion to only one event; either the first event or the last event. These are generally ‘click only‘ models; views are not given credit for the conversions.

First-Touch (First-Click)

The entire credit for the conversion will go to the first event in this model. In our case, if we consider the above user journey; entire conversion weight and revenue will be distributed to Click1 (Google, Campaign1, Ad1).

Last-Touch (Last-Click)

The entire credit for the conversion will go to the last event in this model. In our case, if we consider the above user journey; entire conversion weight and revenue will be distributed to Click3 (Facebook, Campaign3, Ad4). If you look carefully at our user journey; the last event was View3 (Display, Campaign4, Ad5) but as this is a ‘click only‘ model we will give credit to the last click.

So this is all about single-touch models; they are very easy to understand and fairly easy to implement. It makes sense when your use case is simple and you are using singular campaigns on a fixed network. It is not reliable when multiple engines and multiple campaigns are involved; the weight and revenue distribution, in this case, will not give you correct insight on your ad/campaign performance vs spending!

Multi-Touch Models

Multi-Touch models credit each and every event (both clicks and views) that leads to conversion. Weight distribution varies from model to model, but it validates the contribution of each event for conversion!

There are four commonly used multi-touch models: 1) Linear 2) U-Shape 3) Time-decay 4) Custom. We will go through them one by one in detail.

Linear

In Linear model, weights are evenly distributed amongst all the events. It gives an equal pie of revenue to each ad/campaign which contributed to conversion.

Linear model weight distribution with graph

In our case, there are total 5 events; two clicks and three views, each of these events will get an equal weight and the corresponding ad/campaign will get fifth part of revenue!

U-Shape

In U-shape model, weights are assigned in the ratio of 40-20-40; it means that the first and last event will get 40% weight and the remaining events will get an equal distribution of 20% weight.

U-Shape model weight distribution with graph

In our case, Click1 (Google, Campaign1, Ad1) and View3 (Display, Campaign4, Ad5) will get 40% each. View1 (Display, Campaign2, Ad2), View2 (Facebook, Campaign3, Ad3) and Click3 (Facebook, Campaign3, Ad4) will get (20/3)% each!

This model is quite unique as compared to above models; it gives more credit to edge events, edge events are the ones which actually starts or stops a user journey. So from a campaign strategy perspective, these are the events which actually called for an action in the real sense.

Time-decay

In Time-decay model, the weight of the event is inversely proportional to the time difference between the event date and conversion date. In other words, the event closer the conversion will get higher weights and the event further to the conversion will get lesser weights.

Time-decay model weight distribution with graph

In our case, Click1 (Google, Campaign1, Ad1), View1 (Display, Campaign2, Ad2), View2 (Facebook, Campaign3, Ad3), Click3 (Facebook, Campaign3, Ad4), View3 (Display, Campaign4, Ad5) will get credit in increasing order; i.e., Click1<View1<View2<Click3<View3.

From a theoretical viewpoint, Time-decay seems to be the most sensible weight distribution mechanism! But in reality, the model selection depends on the type of business, type of product, and use-case of the client. For some businesses, U-Shape will make more sense while for other Time-Decay will be more preferable.

Custom

There are times when a certain business solution or strategy would require meticulous analytics information. U-Shape and Time-Decay are used widely for attribution but they may not be enough! At this point, if we want to dive one level deeper we can design a custom model, which may assign weights according to the type of events, campaigns and ads, it may also consider the timeline of the events.

Sometimes there are events which do not lead to conversion; these are called negative paths. We can use this data to design a custom algorithm which assigns and updates weights at entity (engine, campaign, ad) levels depending upon the event type (click or view).

There are some engines/ channels which are not that important in a user journey; it doesn’t matter whether this engine is a part of the path or not! In our case, last event View3 (Display, Campaign4, Ad5) was triggered because as a user I tried to find discount coupons on Coupon-Dunia; this event is relevant but doesn’t add value to the path because conversion would still happen even if this event is not present in the path! It still gets higher weights in Time-Decay and U-Shape. These discrepancies can be removed in the custom algorithmic model after we have enough data to find and connect these invisible dots!

Time-decay vs U-Shape vs Linear

To wrap things up, it is really difficult to say that a particular model is 100% perfect and efficient; you need to try and figure out which model works better for you. Ideally to derive optimum insights; a combination of two or three models should be tried and this data should be then used to design a custom algorithmic model to cater your needs!

Stay tuned for more insights!

Share this blog

How to use PIVOT in SQL SERVER for more than one columns?

Hi, folks! As a developer, I have to query SQL Server on regular basis but we hardly use PIVOT in our day to day lives. Even if we use PIVOT; we generally use it for a single column.

Today, I had to generate a report for one of our clients; according to their business requirement, I had to show them aggregated data which was dynamically calculated by a group by query.

Sample data after the above step looked something like this,

ModelDateEngineRevenueConversions
First Click2/22/2017bing2388103.292649
First Click2/22/2017facebook289642387
First Click2/22/2017google8199735.056566
Last Click2/22/2017bing2534385.592903
Last Click2/22/2017facebook304360380
Last Click2/22/2017google7884433.756148
Weighted2/22/2017bing2522729.0292930.342273
Weighted2/22/2017facebook300418.4666394.102303
Weighted2/22/2017google8282122.8416526.552723

 

I wanted this to be in the format below,

 

DateSearchEngineFirstClickConversionLastClickConversionWeightedConversionFirstClickRevenueLastClickRevenueWeightedRevenue
2/22/2017bing264929032930.3422732388103.292534385.592522729.029
2/22/2017facebook387380394.102303289642304360300418.4666
2/22/2017google656661486526.5527238199735.057884433.758282122.841

If you look carefully, all these are dynamically calculated columns. So, I needed to PIVOT on ‘Conversion’ and ‘Revenue’ columns simultaneously. I tried to surf around the internet hoping to find a quick workaround. I went through a couple of accepted solutions on stack overflow, but they were pretty complex to understand and implement.

I thought of a simple JUGAAD to implement this; I stored the results of pivot1 and pivot2 into tempTable1 and tempTable2 respectively then I used an inner join to get the aggregated transposed result which I required!

SELECT * INTO #tempTable1
FROM
 (SELECT CASE
 WHEN modelid = 1 THEN 'FirstClickRevenue'
 WHEN modelid = 2 THEN 'LastClickRevenue'
 WHEN modelid = 3 THEN 'WeightedRevenue'
 END AS ModelType,
 convert(date,conversiondate) AS Date,
 campaign,
 SUM(revenue) AS Revenue
 FROM SOMETABLENAME
 WHERE conversiondate >= @FromDate
 AND conversiondate <= @ToDate
 GROUP BY modelid,
 convert(date,conversiondate),
 campaign) AS s PIVOT (sum(revenue)
 FOR [ModelType] IN (FirstClickRevenue,LastClickRevenue,WeightedRevenue))AS pvt1

SELECT * INTO #tempTable2
FROM
 (SELECT CASE
 WHEN modelid = 1 THEN 'FirstClickConversion'
 WHEN modelid = 2 THEN 'LastClickConversion'
 WHEN modelid = 3 THEN 'WeightedConversion'
 END AS ModelType,
 convert(date,conversiondate) AS Date,
 campaign,
 SUM(weight) AS Conversions
 FROM SOMETABLENAME
 WHERE conversiondate >= @FromDate
 AND conversiondate <= @ToDate
 GROUP BY modelid,
 convert(date,conversiondate),
 campaign) AS s PIVOT (sum(conversions)
 FOR [ModelType] IN (FirstClickConversion,LastClickConversion,WeightedConversion))AS pvt2
ORDER BY date,campaign

SELECT p1.*,
 p2.FirstClickRevenue,
 p2.LastClickRevenue,
 p2.WeightedRevenue
FROM #tempTable2 p1
INNER JOIN #tempTable1 p2 ON p1.date=p2.date
AND p1.campaign=p2.campaign
ORDER BY p1.date,
 p1.campaign

It’s a simple solution; a workaround for using PIVOT in special conditions like these! Nothing fancy but it efficiently serves the purpose; I hope it helps someone in need! 🙂

Here is the link to Microsoft documentation on using PIVOT.

Cheers! Happy coding and happy querying!

Share this blog