Technical Case Study · Marathon Help System · Khojant LLC

From 900MB of Silent Video to a Fully Searchable Help System

How one automation project eliminated 95% of information-retrieval time, compressed 900MB of video knowledge into lightweight HTML, and made every tutorial step discoverable in under 5 seconds.

97 Video Tutorials Converted Python & Batch Automation 100% Full-Text Searchable Jeff Hojka — Khojant LLC
Jeff Hojka
Jeff Hojka
// FOUNDER · SOFTWARE ENGINEER · CLOUD ARCHITECT
View Full Portfolio →
95% Faster Retrieval
90% Lighter Footprint
100% Keyword Searchable
97 Tutorials Converted

// Section 01

The Challenge — A Library That Couldn’t Be Read

Over time, a library of 97 screen-recorded tutorials (~900MB) had accumulated in a local video folder. On paper, this represented a comprehensive knowledge base covering dozens of workflows, tools, and step-by-step procedures. In practice, the library was functionally inaccessible.

Video files are inherently opaque containers. Their contents cannot be indexed by operating-system search, scanned by AI assistants, or queried with a keyword. Every time a team member needed to locate a specific instruction or revisit a process step, the only option was to manually scrub through timelines — a slow, frustrating, and error-prone process.

  • Finding a single step required ~3 minutes of manual video scrubbing per search — with no guarantee of success on the first attempt.
  • Video content was a black box — completely invisible to desktop search tools, browser find functions, and AI-powered assistants.
  • Knowledge was effectively siloed inside the video format, making cross-referencing, sharing, or updating individual steps laborious.
  • With 97 videos, the cumulative cost of information retrieval across a team compounded quickly into a significant hidden productivity drain.
“The knowledge existed — it just couldn’t be found. The video library had become a write-only archive: information went in, but retrieving it cost more time than it saved.”

// Section 02

The Solution — Automated Extraction Pipeline

Rather than manually transcribing 97 videos — an estimated multi-day effort — a custom Python and Batch script automation pipeline was designed and built to convert the entire library systematically. The pipeline extracted the meaningful content from each video and rendered it as structured, responsive HTML pages, culminating in a unified help system accessible at a single entry point.

The design philosophy was deliberate: the output had to be fast to load, easy to navigate, and fully indexable by any search tool — human or machine. No heavyweight frameworks, no databases, no cloud dependencies. Pure, portable, lightweight HTML.

🐍 Python Extraction Engine
Core script to process each video file, extract content, and generate structured HTML output per tutorial.
⚡ Batch Automation Layer
Windows Batch orchestrator to iterate across all 97 source videos and invoke the Python pipeline without manual intervention.
🌐 Responsive HTML Output
Each tutorial rendered as a self-contained, mobile-friendly HTML page with consistent navigation structure.
🔍 Unified Search Index
Master tutorials.html hub with full-text keyword search across all 97 converted pages — no backend required.
The entire conversion process — a task that would have required days of manual transcription — was fully automated, repeatable, and extensible to any future video additions.

// Section 03

📊 The Impact — Measured, Meaningful, Immediate

The results of this conversion project are quantifiable across three distinct dimensions of operational efficiency: speed of retrieval, storage footprint, and searchability coverage.

95%FASTER
Information Retrieval Speed
Locating a specific step or process within the library dropped from an average of ~3 minutes of manual video scrubbing to under 5 seconds with a keyword search. For a library of 97 tutorials used repeatedly across workflows, this represents an outsized cumulative time saving — especially as team size and usage frequency scale.
Before: ~3 min scrubbing After: <5 sec keyword search
90%LIGHTER
Storage Efficiency & Information Weight
The original video library occupied approximately 900MB of storage. The converted HTML help system delivers the same instructional content at a fraction of that footprint — achieving a ~90% reduction in information weight. Lightweight HTML pages load near-instantly, require no media player, and impose zero bandwidth overhead for network or cloud deployments.
Before: 900MB of video files After: Compressed HTML (~90MB equiv.)
100%INDEXED
Full Search Discoverability
Video files are invisible to every search tool: OS search, browser Ctrl+F, and AI assistants cannot read their contents. The converted HTML system achieves 100% keyword discoverability — every step, label, and instruction is now fully indexed and surfaceable by internal search, browser find functions, and AI-powered tools. The knowledge base has shifted from a black box to a glass box.
Before: 0% searchable (black box) After: 100% indexed & discoverable
Beyond raw metrics, the project fundamentally changed the relationship between the team and their knowledge base. Content that was once trapped is now active — findable, shareable, and ready to be surfaced by the next generation of AI-assisted workflows.
// ready.to.talk

Ready to Work Together?

Have a knowledge management challenge or an automation opportunity? Let's talk.