Files
shaka-packager/include/packager/packager.h
Torbjörn Einarsson 19dbd203b0 fix: DVB-Teletext: heartbeat mechanism and segment alignment with video/audio (#1535)
I've finally generated test material and tests to make a PR to fix text
segment generation for MPEG-2 TS streams with sparse teletext data,
making a big improvement compared to the original teletext support in
#1344.


## Problem

When packaging MPEG-TS input containing DVB-Teletext subtitles for
DASH/HLS output, two fundamental issues arise:

1. **Sparse input handling** - Teletext streams only contain data when
subtitles are displayed. During gaps (which can span multiple segments),
no PES packets arrive, leaving the text chunker with no timing
information to drive segment generation. This results in missing
segments or segments with incorrect timing.

2. **Misaligned segment boundaries** - Even when segments are generated,
the text segment timestamps and boundaries differ from video/audio
segments. This causes `<SegmentTimeline>` mismatches in the MPD,
playback issues on some players, and sometimes fewer text segments than
video segments.

## Solution

This PR introduces two complementary mechanisms:

### 1. Heartbeat mechanism (sparse input handling)

The `Mp2tMediaParser` now sends periodic "heartbeat" signals to text
streams:

- Video PTS timestamps are forwarded to all text PIDs as
`MediaHeartBeat` samples
- `EsParserTeletext` emits `TextHeartBeat` samples when PES packets
arrive without displayable content
- `TextChunker` uses these heartbeats to drive segment generation even
during gaps in subtitle content
- Ongoing cues that span segment boundaries are properly split and
continued

A new `heartbeat_shift` stream descriptor parameter (default: 2 seconds
at 90kHz) controls the timing offset between video PTS and text segment
generation, compensating for pipeline processing delays.

### 2. SegmentCoordinator (segment boundary alignment)

A new N-to-N media handler (`SegmentCoordinator`) ensures text segments
align precisely with video:

- Passes all streams through unchanged
- Replicates `SegmentInfo` from video/audio `ChunkingHandler` to
registered teletext streams
- `TextChunker` in "coordinator mode" uses received `SegmentInfo` events
to determine segment boundaries instead of calculating from text
timestamps

This guarantees identical segment timelines across all adaptation sets.

## Testing

- **Integration tests** in `packager_test.cc`:
- `TeletextSegmentAlignmentTest.VideoAndTextSegmentsAligned` - Verifies
segment count, start times, and durations match between video and text
-
`TeletextSegmentAlignmentTest.VideoAndTextSegmentsAlignedWithWrapAround`
- Same verification with PTS timestamps near the 33-bit wrap-around
point (~26.5 hours)

- **Test files** (synthetic teletext with known cue timings at 1.0s,
3.5s, 13.0s):
  - `test_teletext_live.ts` - Normal PTS range
  - `test_teletext_live_wrap.ts` - PTS near wrap-around boundary

- **Unit tests** for `SegmentCoordinator` and updated `TextChunker`
tests

## Documentation

- Extended `docs/source/tutorials/text.rst` with DVB-Teletext section
covering:
  - Page numbering (3-digit cc_index format)
  - Heartbeat mechanism explanation
  - Segment alignment behavior
  - `--ts_ttx_heartbeat_shift` parameter tuning
  - Troubleshooting guide

- Added teletext processing pipeline diagram to `docs/source/design.rst`

## Future work

The heartbeat and `SegmentCoordinator` mechanisms would likely benefit
**DVB-SUB (bitmap subtitles)** as well (Issue #1477) , which faces
similar challenges with sparse subtitle data in MPEG-TS input and
segment alignment. The infrastructure is now in place to extend this
support.

## Example usage

```bash
packager \
  --segment_duration 6 \
  --mpd_output manifest.mpd \
  'in=input.ts,stream=video,init_segment=video/init.mp4,segment_template=video/$Number$.m4s' \
  'in=input.ts,stream=audio,init_segment=audio/init.mp4,segment_template=audio/$Number$.m4s' \
  'in=input.ts,stream=text,cc_index=888,lang=en,init_segment=text/init.mp4,segment_template=text/$Number$.m4s,dash_only=1'
```


Fixes #1428
Fixes #1401
Fixes #1355
Fixes #1430
2026-03-11 16:06:30 -07:00

230 lines
9.0 KiB
C++

// Copyright 2017 Google LLC. All rights reserved.
//
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file or at
// https://developers.google.com/open-source/licenses/bsd
#ifndef PACKAGER_PUBLIC_PACKAGER_H_
#define PACKAGER_PUBLIC_PACKAGER_H_
#include <cstdint>
#include <memory>
#include <optional>
#include <string>
#include <vector>
#include <packager/ad_cue_generator_params.h>
#include <packager/buffer_callback_params.h>
#include <packager/cea_caption.h>
#include <packager/chunking_params.h>
#include <packager/crypto_params.h>
#include <packager/export.h>
#include <packager/file.h>
#include <packager/hls_params.h>
#include <packager/mp4_output_params.h>
#include <packager/mpd_params.h>
#include <packager/status.h>
namespace shaka {
/// Parameters used for testing.
struct TestParams {
/// Whether to dump input stream info.
bool dump_stream_info = false;
/// Inject a fake clock which always returns 0. This allows deterministic
/// output from packaging.
bool inject_fake_clock = false;
/// Inject and replace the library version string if specified, which is used
/// to populate the version string in the manifests / media files.
std::string injected_library_version;
};
/// Packaging parameters.
struct PackagingParams {
/// Specify temporary directory for intermediate temporary files.
std::string temp_dir;
/// MP4 (ISO-BMFF) output related parameters.
Mp4OutputParams mp4_output_params;
/// The offset to be applied to transport stream (e.g. MPEG2-TS, HLS packed
/// audio) timestamps to compensate for possible negative timestamps in the
/// input.
int32_t transport_stream_timestamp_offset_ms = 0;
// the threshold used to determine if we should assume that the text stream
// actually starts at time zero
int32_t default_text_zero_bias_ms = 0;
/// Chunking (segmentation) related parameters.
ChunkingParams chunking_params;
/// Out of band cuepoint parameters.
AdCueGeneratorParams ad_cue_generator_params;
/// Create a human readable format of MediaInfo. The output file name will be
/// the name specified by output flag, suffixed with `.media_info`.
bool output_media_info = false;
/// Only use a single thread to generate output. This is useful in tests to
/// avoid non-deterministic outputs.
bool single_threaded = false;
/// DASH MPD related parameters.
MpdParams mpd_params;
/// HLS related parameters.
HlsParams hls_params;
/// Encryption and Decryption Parameters.
EncryptionParams encryption_params;
DecryptionParams decryption_params;
/// Buffer callback params.
BufferCallbackParams buffer_callback_params;
/// CEA-608 / CEA-708 captions.
std::vector<CeaCaption> closed_captions;
// Parameters for testing. Do not use in production.
TestParams test_params;
};
/// Defines a single input/output stream.
struct StreamDescriptor {
/// index of the stream to enforce ordering
std::optional<uint32_t> index;
/// Input/source media file path or network stream URL. Required.
std::string input;
/// Stream selector, can be `audio`, `video`, `text` or a zero based stream
/// index. Required.
std::string stream_selector;
/// Specifies output file path or init segment path (if segment template is
/// specified). Can be empty for self initialization media segments.
std::string output;
/// Specifies segment template. Can be empty.
std::string segment_template;
/// Optional value which specifies output container format, e.g. "mp4". If not
/// specified, will detect from output / segment template name.
std::string output_format;
/// If set to true, the stream will not be encrypted. This is useful, e.g. to
/// encrypt only video streams.
bool skip_encryption = false;
/// Specifies a custom DRM stream label, which can be a DRM label defined by
/// the DRM system. Typically values include AUDIO, SD, HD, UHD1, UHD2. If not
/// provided, the DRM stream label is derived from stream type (video, audio),
/// resolutions etc.
std::string drm_label;
/// If set to a non-zero value, will generate a trick play / trick mode
/// stream with frames sampled from the key frames in the original stream.
/// `trick_play_factor` defines the sampling rate.
uint32_t trick_play_factor = 0;
/// Optional user-specified content bit rate for the stream, in bits/sec.
/// If specified, this value is propagated to the `$Bandwidth$` template
/// parameter for segment names. If not specified, its value may be estimated.
uint32_t bandwidth = 0;
/// Optional value which contains a user-specified language tag. If specified,
/// this value overrides any language metadata in the input stream.
std::string language;
/// Optional value for the index of the sub-stream to use. For some text
/// formats, there are multiple "channels" in a single stream. This allows
/// selecting only one channel.
int32_t cc_index = -1;
/// Required for audio when outputting HLS. It defines the name of the output
/// stream, which is not necessarily the same as output. This is used as the
/// `NAME` attribute for EXT-X-MEDIA.
std::string hls_name;
/// Required for audio when outputting HLS. It defines the group ID for the
/// output stream. This is used as the GROUP-ID attribute for EXT-X-MEDIA.
std::string hls_group_id;
/// Required for HLS output. It defines the name of the playlist for the
/// stream. Usually ends with `.m3u8`.
std::string hls_playlist_name;
/// Optional for HLS output. It defines the name of the I-Frames only playlist
/// for the stream. For Video only. Usually ends with `.m3u8`.
std::string hls_iframe_playlist_name;
/// Optional for HLS output. It defines the CHARACTERISTICS attribute of the
/// stream.
std::vector<std::string> hls_characteristics;
/// Optional for DASH output. It defines Accessibility elements of the stream.
std::vector<std::string> dash_accessiblities;
/// Optional for DASH output. It defines Role elements of the stream.
std::vector<std::string> dash_roles;
/// Set to true to indicate that the stream is for dash only.
bool dash_only = false;
/// Set to true to indicate that the stream is for hls only.
bool hls_only = false;
/// Optional value which specifies input container format.
/// Useful for live streaming situations, like auto-detecting webvtt without
/// its initial header.
std::string input_format;
/// Optional, indicates if this is a Forced Narrative subtitle stream.
bool forced_subtitle = false;
/// Optional for DASH output. It defines the Label element in Adaptation Set.
std::string dash_label;
};
class SHAKA_EXPORT Packager {
public:
Packager();
~Packager();
/// Initialize packaging pipeline.
/// @param packaging_params contains the packaging parameters.
/// @param stream_descriptors a list of stream descriptors.
/// @return OK on success, an appropriate error code on failure.
Status Initialize(const PackagingParams& packaging_params,
const std::vector<StreamDescriptor>& stream_descriptors);
/// Run the pipeline to completion (or failed / been cancelled). Note
/// that it blocks until completion.
/// @return OK on success, an appropriate error code on failure.
Status Run();
/// Cancel packaging. Note that it has to be called from another thread.
void Cancel();
/// @return The version of the library.
static std::string GetLibraryVersion();
/// Default stream label function implementation.
/// @param max_sd_pixels The threshold to determine whether a video track
/// should be considered as SD. If the max pixels per
/// frame is no higher than max_sd_pixels, i.e. [0,
/// max_sd_pixels], it is SD.
/// @param max_hd_pixels The threshold to determine whether a video track
/// should be considered as HD. If the max pixels per
/// frame is higher than max_sd_pixels, but no higher
/// than max_hd_pixels, i.e. (max_sd_pixels,
/// max_hd_pixels], it is HD.
/// @param max_uhd1_pixels The threshold to determine whether a video track
/// should be considered as UHD1. If the max pixels
/// per frame is higher than max_hd_pixels, but no
/// higher than max_uhd1_pixels, i.e. (max_hd_pixels,
/// max_uhd1_pixels], it is UHD1. Otherwise it is
/// UHD2.
/// @param stream_info Encrypted stream info.
/// @return the stream label associated with `stream_info`. Can be "AUDIO",
/// "SD", "HD", "UHD1" or "UHD2".
static std::string DefaultStreamLabelFunction(
int max_sd_pixels,
int max_hd_pixels,
int max_uhd1_pixels,
const EncryptionParams::EncryptedStreamAttributes& stream_attributes);
private:
Packager(const Packager&) = delete;
Packager& operator=(const Packager&) = delete;
struct PackagerInternal;
std::unique_ptr<PackagerInternal> internal_;
};
} // namespace shaka
#endif // PACKAGER_PUBLIC_PACKAGER_H_