Examples of SQL queries: Analysis of Google Analytics table data in BigQuery.
The "General" table stores data for the period from "2022-01-01" to "2022-03-31".
Each row represents a user's page view on a website.
Date: the date of the page view
Hour: the hour of the page view
Minute: the minute of the page view
Country: the user's country
User_id: a unique code for the user in our system
Registration_date: the date when the user registered on the website
Transaction_revenue: the amount by which the user topped up their account
Transaction_id: the transaction ID (unique code)
Cost: the amount spent on attracting the user to the site (for example, if a user clicks on an ad and visits the site, we deduct $1 in the advertising account, and this $1 is reflected in the "Cost" column).
1. Calculate the total Revenue for each user over the entire period and sort by the Revenue amount in descending order.
Query:
SELECT
user_id,
SUM(transaction_revenue) AS Revenue
FROM `general`
GROUP BY user_id
ORDER BY Revenue DESC;
2. Calculate the average revenue per paying user for the country Brazil during the period from January to March 2022, considering users who registered in January 2022. A paying user is defined as a user who has made at least one transaction.
Query:
SELECT
country,
AVG(UserTotalRevenue) AS AVG_Revenue
FROM (
SELECT
country, user_id,
SUM(transaction_revenue) AS UserTotalRevenue
FROM
`general`
WHERE
country = 'Brazil'
AND transaction_revenue > 0
AND registration_date >= DATE('2022-01-01')
AND registration_date <= DATE('2022-01-31')
AND date >= DATE('2022-01-01')
AND date <= DATE('2022-03-31')
GROUP BY
country, user_id
)
GROUP BY
country;
3. Calculate the total Revenue, Cost, and ROI as of March 31, 2022, for users who registered in January 2022.
Query:
SELECT
SUM(transaction_revenue) AS Revenue,
SUM(cost) AS Cost,
(SUM(transaction_revenue) / SUM(cost)) * 100 AS ROI
FROM
`general`
WHERE
registration_date >= DATE('2022-01-01')
AND registration_date <= DATE('2022-01-31')
AND date >= DATE('2022-01-01')
AND date <= DATE('2022-03-31');
4. Determine the first and second transactions of the user (Assume that all first payments from users after January 1, 2022, are considered the first).
For example: if a user makes the first transaction on January 1, 2022, the 'type' column should display '1st,' and then when the user makes the second transaction, the 'type' column should display '2nd.' For the 3rd and subsequent transactions, display '3rd' and so on.
Query:
WITH RankedTransactions AS (
SELECT
user_id,
transaction_id,
ROW_NUMBER() OVER(
PARTITION BY user_id ORDER BY date, hour, minute
) AS TransactionNumber
FROM
`general`
WHERE
date >= DATE('2022-01-01')
AND transaction_revenue > 0
)
SELECT
transaction_id,
CASE
WHEN TransactionNumber = 1 THEN '1st'
WHEN TransactionNumber = 2 THEN '2nd'
ELSE '3rd'
END AS type
FROM
RankedTransactions;
5. Calculate the retention for two cohorts:
Query description:
Query:
WITH UserCohort AS (
SELECT
user_id,
EXTRACT(MONTH FROM registration_date) AS registration_month,
date,
ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY date
) AS VisitNumber
FROM
`general`
WHERE
EXTRACT(YEAR FROM registration_date) = 2022
AND (EXTRACT(MONTH FROM registration_date) = 1 OR EXTRACT(MONTH FROM registration_date) = 2)
),
UserCount AS (
SELECT
registration_month,
COUNT(DISTINCT user_id) AS total_users,
COUNT(DISTINCT CASE WHEN VisitNumber >= 2 THEN user_id END) AS month_0,
COUNT(DISTINCT CASE WHEN EXTRACT(MONTH FROM date) = registration_month + 1 THEN user_id END) AS month_1,
COUNT(DISTINCT CASE WHEN EXTRACT(MONTH FROM date) = registration_month + 2 THEN user_id END) AS month_2
FROM
UserCohort
GROUP BY
registration_month
),
Retention AS (
SELECT
registration_month,
ROUND(month_0 * 100.0 / total_users, 0) AS month_0,
ROUND(month_1 * 100.0 / total_users, 0) AS month_1,
ROUND(month_2 * 100.0 / total_users, 0) AS month_2
FROM
UserCount
)
SELECT
CASE
WHEN registration_month = 1 THEN 'Январь 2022'
WHEN registration_month = 2 THEN 'Февраль 2022'
END AS registration_month,
month_0,
month_1,
month_2
FROM
Retention;