Ethical YouTube Analytics Pipeline

Audience: South Texas College – Networking & Cybersecurity Students

Motivation: Gain a competitive edge in a YouTube analytics competition by building a private, opt‑in data collection and analysis system.


1. Project Overview

This project demonstrates how to ethically collect and analyze your own YouTube search traffic using:

The goal is to simulate a real‑world analytics pipeline without intercepting or modifying any third‑party traffic. All clients in this design voluntarily opt in by pointing their DNS to the analytics node.


2. Ethical Network Map (Opt‑In DNS Routing)

Below is the network topology used in the lab. Clients intentionally configure their DNS to route youtube.com lookups to a local analytics server.

Client A (Laptop) ----\ \ Client B (Phone) --------> [ Local DNS Resolver ] Client C (Desktop) ----/ | | v [ Analytics Node ] IP: 192.168.1.50 DNS Entry: youtube.com → 192.168.1.50 | v [ Outbound Gateway ] | v [ Spectrum ISP → Internet ]

This setup allows the analytics node to:

No traffic is intercepted without consent. This is a controlled, ethical, educational environment.


3. DNS Configuration (Opt‑In Only)

Each client manually sets DNS to:

Primary DNS:   192.168.1.50
Secondary DNS: 1.1.1.1 (fallback)

The analytics node runs a lightweight DNS server (e.g., dnslib in Python) that resolves:

youtube.com → 192.168.1.50

All other domains are forwarded upstream to Spectrum’s DNS or Cloudflare.


4. Python Analytics Node Workflow

The analytics node performs three tasks:

  1. Accept HTTP requests for youtube.com/results?search_query=...
  2. Extract the search term
  3. Store it in a local SQLite database
  4. Forward the request to the real YouTube servers

Database location:

C:\Temp\projectYouTubeAnalytics\DB\searches.db

5. SQLite Schema

CREATE TABLE IF NOT EXISTS youtube_searches (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    client_ip TEXT,
    query TEXT,
    timestamp TEXT
);

6. Python Script Outline

This is the high‑level structure of the analytics script:

1. Start a local DNS server
   - Resolve youtube.com → analytics node
   - Forward all other domains

2. Start a local HTTP proxy
   - Listen for GET requests to /results?search_query=
   - Extract the search term
   - Log to SQLite

3. Forward the request to the real YouTube server
   - Return the response to the client

4. Store analytics
   - client_ip
   - search term
   - timestamp

This creates a complete, ethical analytics pipeline suitable for competition use.


7. Why This Gives a Competitive Edge

All without violating privacy, laws, or ISP policies.


8. Extensions