Examples of SQL queries: Analysis of Google Analytics table data in BigQuery.

The "General" table stores data for the period from "2022-01-01" to "2022-03-31".

Each row represents a user's page view on a website.

Date: the date of the page view

Hour: the hour of the page view

Minute: the minute of the page view

Country: the user's country

User_id: a unique code for the user in our system

Registration_date: the date when the user registered on the website

Transaction_revenue: the amount by which the user topped up their account

Transaction_id: the transaction ID (unique code)

Cost: the amount spent on attracting the user to the site (for example, if a user clicks on an ad and visits the site, we deduct $1 in the advertising account, and this $1 is reflected in the "Cost" column).

1. Calculate the total Revenue for each user over the entire period and sort by the Revenue amount in descending order.

Query:

SELECT 

  user_id,

  SUM(transaction_revenue) AS Revenue

FROM `general`

GROUP BY user_id

ORDER BY Revenue DESC;

2. Calculate the average revenue per paying user for the country Brazil during the period from January to March 2022, considering users who registered in January 2022. A paying user is defined as a user who has made at least one transaction.

Query:

SELECT

  country,

  AVG(UserTotalRevenue) AS AVG_Revenue

FROM (

  SELECT

    country, user_id,

    SUM(transaction_revenue) AS UserTotalRevenue

  FROM

    `general`

  WHERE

    country = 'Brazil'

    AND transaction_revenue > 0

    AND registration_date >= DATE('2022-01-01')

    AND registration_date <= DATE('2022-01-31')

    AND date >= DATE('2022-01-01')

    AND date <= DATE('2022-03-31')

  GROUP BY

    country, user_id

)

GROUP BY

    country;

3. Calculate the total Revenue, Cost, and ROI as of March 31, 2022, for users who registered in January 2022.

Query:

SELECT

  SUM(transaction_revenue) AS Revenue,

  SUM(cost) AS Cost,

  (SUM(transaction_revenue) / SUM(cost)) * 100 AS ROI

FROM

  `general`

WHERE

  registration_date >= DATE('2022-01-01')

  AND registration_date <= DATE('2022-01-31')

  AND date >= DATE('2022-01-01')

  AND date <= DATE('2022-03-31');

4. Determine the first and second transactions of the user (Assume that all first payments from users after January 1, 2022, are considered the first).

For example: if a user makes the first transaction on January 1, 2022, the 'type' column should display '1st,' and then when the user makes the second transaction, the 'type' column should display '2nd.' For the 3rd and subsequent transactions, display '3rd' and so on.

Query:

WITH RankedTransactions AS (

  SELECT

    user_id,

    transaction_id,

    ROW_NUMBER() OVER(

      PARTITION BY user_id ORDER BY date, hour, minute

      ) AS TransactionNumber

  FROM

    `general`

  WHERE

    date >= DATE('2022-01-01')

    AND transaction_revenue > 0

)

SELECT

  transaction_id,

  CASE

    WHEN TransactionNumber = 1 THEN '1st'

    WHEN TransactionNumber = 2 THEN '2nd'

    ELSE '3rd'

  END AS type

FROM

  RankedTransactions;

5. Calculate the retention for two cohorts:

  1. Users who registered in January 2022.
  2. Users who registered in February 2022.

Query description:

  1. Inside UserCohort, a temporary table is created with data about users who registered in January or February 2022. For each user, the month of registration, the viewing date, and the ordinal number by the viewing date are determined.
  2. Another temporary table is created inside UserCount, aggregating data from UserCohort. It calculates the total number of unique users, the number of users with at least 2 visits in month_0, as well as the number of users in month_1 and month_2.
  1. Retention creates yet another temporary table where Retention is calculated for each cohort. Percentages for month_0, month_1, and month_2 are calculated relative to the total number of users.
  2. In the final SELECT statement, CASE is used to rename the month numbers into their textual representation ("January 2022" and "February 2022"). The result contains columns registration_month and Retention values for each month after registration.

Query:

WITH UserCohort AS (

  SELECT

    user_id,

    EXTRACT(MONTH FROM registration_date) AS registration_month,

    date,

    ROW_NUMBER() OVER (

      PARTITION BY user_id ORDER BY date

    ) AS VisitNumber

  FROM

    `general`

  WHERE

    EXTRACT(YEAR FROM registration_date) = 2022

    AND (EXTRACT(MONTH FROM registration_date) = 1 OR EXTRACT(MONTH FROM registration_date) = 2)

),

UserCount AS (

  SELECT

    registration_month,

    COUNT(DISTINCT user_id) AS total_users,

    COUNT(DISTINCT CASE WHEN VisitNumber >= 2 THEN user_id END) AS month_0,

    COUNT(DISTINCT CASE WHEN EXTRACT(MONTH FROM date) = registration_month + 1 THEN user_id END) AS month_1,

    COUNT(DISTINCT CASE WHEN EXTRACT(MONTH FROM date) = registration_month + 2 THEN user_id END) AS month_2

  FROM

    UserCohort

  GROUP BY

    registration_month

),

Retention AS (

  SELECT

    registration_month,

    ROUND(month_0 * 100.0 / total_users, 0) AS month_0,

    ROUND(month_1 * 100.0 / total_users, 0) AS month_1,

    ROUND(month_2 * 100.0 / total_users, 0) AS month_2

  FROM

    UserCount

)

SELECT

  CASE

    WHEN registration_month = 1 THEN 'Январь 2022'

    WHEN registration_month = 2 THEN 'Февраль 2022'

  END AS registration_month,

  month_0,

  month_1,

  month_2

FROM

  Retention;