Public | Automated Build

Last pushed: a year ago
Short Description
Content Manager
Full Description

Interface

Brands

Get all brands

/brands

Get detail of brand by id

/brands?id={brand_id}

Shows

Get all shows cross brand

/shows

Get specific show

/shows?id={show_id}&brand={brand_id}

Get shows for brand

/shows?brand={brand_id}

Get featured shows for brand

/shows?featured=trued&brand={brand_id}

Get shows for platform

/shows?brand=cc&platform=ios

Episodes

Get specific episode

/episodes?id={episode_id}&brand={brand_id}

Get episodes for brand

/episodes?brand={brand_id}

Get featured episodes for brand

/episodes?featured=trued&brand={brand_id}

Get episodes for platform

/episodes?brand=cc&platform=ios

Seasons

Get seasons for a show

/seasons?show={show_id}&brand={brand_id}

Get a season for a show

/seasons?id={season_id}

Sorting

/shows?sort_by={attribute1},{attribute2}

###Sort order

Items can be sorted in ascending and descending order. To sort in asending order, you just specify the sort_by parameter. To sort in descending order, add - before the parameter. The following parameters are supported:

  • publish_date
  • title
  • episode.publish_date (only for /shows resource)
  • featured

If no sort by criteria is specified, the following sort order is used by default:

/shows?sort_by=-publish_date,title
/episodes?sort_by=-publish_date,title
/seasons?sort_by=-publish_date,title

Ascending:

/shows?sort_by=publish_date

Descending:

/shows?sort_by=-publish_date

Pagination

Offset

/shows?offset=1

Limit

/shows?limit=5

Field Mappings

The following are mappings from the data sources to Content Manager's internal mappings:

<source> -> <Content Manager field>

Brand

Fields

  • id -> id
  • logos -> logos
    • url -> url
    • usage -> usage
  • name -> name
  • platforms -> platforms
    • link -> link
    • name -> name

Show

Fields

  • description -> description
  • featured -> featured
  • id -> id
  • originalAirDate -> originalAirDate
  • publishedTo_dt -> publish_date
  • rating -> rating
  • ratingType -> ratingType

Relationships

  • images -> image model
    • aspectRatio -> aspectRatio
    • url -> url
    • width -> width
    • height -> height
  • platforms -> platform model
    • authRequired -> authRequired
    • downloadLink -> downloadLink
    • contentLink -> contentLink
    • startDate -> startDate
    • endDate -> endDate
    • platform -> platform
  • brand.id -> brand model

Season

Fields

  • id -> id
  • title -> title
  • genre_s -> genre
  • publishedTo_dt -> publish_date
  • rating -> rating
  • ratingType -> ratingType
  • seasonNumber_i -> seasonNumber

Relationships

  • images -> image model
    • image_uri_s_mv -> url
    • image_aspectRatio_s_mv -> aspectRatio
  • brand_id_s -> brand model
  • series_id_s -> show model

Episode

Fields

  • description -> description
  • shortDescription -> shortDescription
  • duration -> duration
  • episodeNumber_i -> episodeNumber
  • id -> id
  • originalAirDate -> originalAirDate
  • publishedTo_dt -> publish_date
  • rating -> rating
  • ratingType -> ratingType
  • title -> title

Relationships

  • images -> image model
    • aspectRatio -> aspectRatio
    • url -> url
    • width -> width
    • height -> height
  • platforms -> platform model
    • authRequired -> authRequired
    • downloadLink -> downloadLink
    • contentLink -> contentLink
    • startDate -> startDate
    • endDate -> endDate
    • platform -> platform
  • brand.id -> brand model
  • show.id -> show model
  • season.id -> season model

Response

Sample response

{
  "data": [
    {
      "id": "28193daa-ecfd-11e0-aca6-0026b9414f30",
      "featured": -1,
      "platforms": [
        {
          "contentLink": "ccnetworkapp://series/28193daa-ecfd-11e0-aca6-0026b9414f30",
          "authRequired": "false",
          "downloadLink": "https://itunes.apple.com/us/app/comedy-central/id799551807?mt=8",
          "platform": "ios",
          "endDate": "9999-12-30T19:00:00Z",
          "startDate": "2015-03-23T10:58:00Z"
        },
        {
          "contentLink": "ccnetworkapp://series/28193daa-ecfd-11e0-aca6-0026b9414f30",
          "authRequired": "false",
          "downloadLink": "https://play.google.com/store/apps/details?id=com.vmn.android.comedycentral",
          "platform": "android",
          "endDate": "9999-12-30T19:00:00Z",
          "startDate": "2015-03-23T10:58:00Z"
        },
        {
          "contentLink": "entity=series&amp;mgid=mgid:arc:series:comedycentral.com:28193daa-ecfd-11e0-aca6-0026b9414f30&amp;autoplay=false",
          "startDate": "2015-03-23T10:58:00Z",
          "authRequired": "false",
          "endDate": "9999-12-30T19:00:00Z",
          "platform": "roku"
        },
        {
          "contentLink": "http://www.cc.com/shows/louis-c-k---hilarious",
          "startDate": "2015-03-23T10:58:00Z",
          "authRequired": "false",
          "endDate": "9999-12-30T19:00:00Z",
          "platform": "web"
        }
      ],
      "description": "Louis C.K. takes an unflinching look at life from the perspective of a fortysomething divorced dad who has to confront dating, child care and dumb guys in coffee shops. Along the way, he offers sharp critiques of cell phones, language and his own sexual history."
      "related": {
        "brands_count": 1,
        "episodes_count": 0
        },
    }
  ],
  "metadata": {
    "limit": "1",
    "offset": 0,
    "total_count": 265,
    "item_count": 1
  }
}

Schema

Brand

  • id (PK)
  • complete (Boolean)
  • data (JSONB)
  • M:M shows
  • M:M episodes
  • 1:M brandShows
  • 1:M brandEpisodes

Show

  • id (PK)
  • brand (PK)
  • complete (Boolean)
  • data (JSONB)
  • M:M brands
  • M:M platforms
  • 1:M brandShows
  • 1:M showPlatforms

Episode

  • id (PK)
  • complete (Boolean)
  • data (JSONB)
  • M:M brands
  • M:M shows
  • M:M platforms
  • 1:M brandEpisodes
  • 1:M showEpisodes
  • 1:M episodePlatforms

BrandShow

  • brand_id (FK)
  • show_id (FK)
  • show_brand_id (FK)
  • M:1 brand
  • M:1 show
  • 1:M brandShows

BrandEpisode

  • brand_id (FK)
  • episode_id (FK)
  • auth_required (Boolean)
  • start_date (Date)
  • end_date (Date)
  • M:1 brand
  • M:1 episode
  • 1:M brandEpisodes

Platform

  • id (PK)
  • M:M shows
  • 1:M showPlatforms

ShowPlatform

  • platform_id (FK)
  • show_id (FK)
  • show_brand_id (FK)
  • auth_required (Boolean)
  • start_date (Date)
  • end_date (Date)
  • M:1 show
  • M:1 platform
  • 1:M showPlatforms

EpisodePlatform

  • platform_id (FK)
  • episode_id (FK)
  • auth_required (Boolean)
  • start_date (Date)
  • end_date (Date)
  • M:1 episode
  • M:1 platform
  • 1:M episodePlatforms

ShowEpisode

  • show_id (FK)
  • brand_id (FK)
  • episode_id (FK)
  • M:1 show
  • M:1 episode
  • 1:M showEpisodes

Content Manager algorithm explained

Ingestion

The Content Manager supports the periodic ingestion of content using the task queue library Celery. There will be a background worker which is responsible for ingesting content, and a client who will periodically tell the worker to perform the ingestion. The Celery library is highly available. Celery workers and client will automatically retry in the event of connection loss or failure. The Celery library uses RabbitMQ as its message broker.

The Content Manager container is reused for this ingestion service, albeit using a different Docker run command. This container is deployed as part of the CAPI ECS cluster.

To start up a worker to handle tasks locally:

celery -A content_manager.tasks worker -l info

To schedule tasks on this worker. Call the following script from the project root directory:

python3 scheduler.py

Schema switch

Postgres supports the concept of schemas, which is a namespaced group of tables. This is analogous to directories at the operating system level, except that schemas cannot be nested. This is the strategy used to keep data consistency for Content Manager reads while ingestion is being processed in the background. This allows us to store two identical set of tables in the same database. There are 2 schemas in the database, the primary and shadow. The primary schema serves as the current snapshot of the data and is read only. The ingestion process will always write to the shadow schema. When the ingestion process is complete, the schemas will be swapped. So that the shadow schema, which contains the most updated data, will now become the primary schema. The state of the schema switch is stored in the database as well, outside of the schemas.

The schema switch is scheduled as a celery task that is part of the ingestion workflow. The file scheduler.py contains the logic that invokes this workflow.

Dashboard

The Content Manager offers a dashboard to view ingestion history and trigger manual ingestion of data sources. The dashboard can be accessed at the following endpoint:

/dashboard

The dashboard feature is toggleable. By default it is disabled. It can be enabled by setting the following environment variable:

ENABLE_CM_DASHBOARD=true

This is useful for use with parameterized Docker containers. This allows reuse of the same container, where some production Content Manager containers might have this featured disabled, while one container used by Operations can have this enabled.

Task API

The Dashboard uses the Task API for all its data.

Get the list of completed tasks including ingestions

GET /tasks

Response

{
    [
        "content": "Show",
        "source_url": "http://source-url",
        "status": "failed",
        "duration": 2390490234
    ]
}

Get list of source urls

GET /source-urls

[
    {
      "name": "Brands",
      "url": "https://api.viacom.com/contents/v2/brands?apiKey=0IdibN1F8tndFwXG9iro1M6CLxERLKMt"
    },
    {
      "name": "CC Shows",
      "url": "http://origin.feeds.fep.mtvnservices.com/fep/view/cc-shows-source-api-feed/default"
    }
]

Trigger ingestion for endpoint

One ingestion:

POST /ingestions

Body
{
    "ingestions": [
        "http://data-source.url"
    ]
}

Batch ingestions:

POST /ingestions

Body
{
    "ingestions": [
        "http://data-source.url",
        "http://data-source2.url"
    ]
}

RabbitMQ

Start local rabbitMQ container:

docker run -d --hostname my-rabbit -p 15672:15672 -p 5672:5672 --name some-rabbit rabbitmq:3-management

Setup

Docker is required for setup. For mac users, please follow the Docker installation guide here. Make sure to follow the instruction to setup a default VM using the Docker Quickstart Terminal app in mac's Application folder.

A simple script is included with this project which will setup the containers required to run the Content Manager. Run the following command

./quickstart.sh

Make sure to run:

eval "$(docker-machine env default)

Note, if command fails make sure "execute" permission is given to quickstart.sh.

This script does the following:

  • setup the PostgreSQL database container
  • default database is created by default
  • setup the Content Manager container
  • containers can access service on another via link
  • pre-populate database with data from local json files

Development

Unit Tests

Flask-Testing is used for unit tests. Make sure to set the environment variable APP_SETTINGS:

export APP_SETTINGS=content_manager.config.TestConfiguration

Run the following from project root directory:

./run_tests_local.sh

Docker Compose

To run services locally via Docker Compose, run the following:

docker-compose -f docker-compose.prod.yml up -d

Docker deployment

Build the docker image from root directory of project:

docker build -t cm .

Tag the image:

docker tag {image_id} quay.vmn.io/linkapi/content-manager:{tag}

Remove local container volume:

docker rm -v {container-name}

In this case image_id is cm. Tag should use semantic versioning. When tag isn't specified, it will be tagged as latest

Remove tag:

docker rmi quay.vmn.io/linkapi/content-manager:{tag}

quay.vmn.io

  1. To build the image, use the following command. This will tag it as "latest".

    docker build -t quay.vmn.io/linkapi/content-manager .

  2. Push "latest" to quay:

    docker push quay.vmn.io/linkapi/content-manager

If you want to explicitly specify tag version. Do the following instead:

docker build -t quay.vmn.io/linkapi/content-manager:{version} .
docker push quay.vmn.io/linkapi/content-manager:{version} .

Amazon ECS

1) Retrieve the docker login command that you can use to authenticate your Docker client to your registry:

aws ecr get-login --region us-east-1

2) Run the docker login command that was returned in the previous step.
3) Build your Docker image using the following command. For information on building a Docker file from scratch see the instructions here. You can skip this step if your image is already built. This build and tag the image with 206121644395.dkr.ecr.us-east-1.amazonaws.com/capi_content_manager:latest:

docker build -t 206121644395.dkr.ecr.us-east-1.amazonaws.com/capi_content_manager .

4) Run the following command to push this image to your newly created AWS repository:

docker push 206121644395.dkr.ecr.us-east-1.amazonaws.com/capi_content_manager:latest

Setting up ECS cluster: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_GetStarted.html

Logging for containers

In the context of Amazon ECS, every service (co-located containers) will contain a SumoLogic container. This Sumo container will contain a data volume shared by other containers for storing log files. It will collect all logs from this data volume. Every service container, such as the Content Manager and Responder, will use this shared volume for storing logs.

The Sumo Logic container will by default mount a volume at /tmp/clogs/**. In order for containers to integrate with Sumo Logic, it will need to mount the volume using the volumes-from parameter. There will be a name-spaced directory that each container will save the logs to, i.e. /tmp/clogs/contentmanager{containerId}/. This allows us to filter results based on container Id to monitor resource allocations across containers. Here is an example of the implementation:

docker run -v /tmp/clogs -d --name="sumo" sumologic/collector:latest-file [Access ID] [Access key]

docker run -d -e APP_SETTINGS='content_manager.config.TestDockerConfiguration' -e SUMO_LOG_PATH=/tmp/clogs/content_manager/logfile.log --name cm -p 5000:5000 --volumes-from sumo 206121644395.dkr.ecr.us-east-1.amazonaws.com/capi_content_manager:latest

Here we are two containers, the SumoLogic container and the Content Manager. Here, the Content Manager will be internally using the SUMO_LOG_PATH environment variable to set the log path for its service. It's important to note that the path convention to use for the shared volume is always /tmp/clogs/
{service_name} {containerId}

JIRA integration

To link commit to Jira issue, use Atlassian's Smart Commits feature. https://confluence.atlassian.com/fisheye/using-smart-commits-298976812.html

git commit -m "CAPE-429"

This will then link this commit to the Jira issue CAPE-429.

Start postgres

pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start

Stop postgres

pg_ctl -D /usr/local/var/postgres stop -s -m fast

Start server

python3 app.py

Initialize migration

python3 manage.py db init

Create migration file

python3 manage.py db migrate

Apply migration to database

python3 manage.py db upgrade

Revert to previous revision

python3 manage.py db downgrade

Revert to base revision

python3 manage.py db downgrade base

Continuous Integration

Amazon ECS

Use the provided Dockerfile_awscli.
Build the image first:

docker build -t awscli .

Run AWS CLI in container to get the docker login credential:

docker run -e AWS_ACCESS_KEY_ID={KEY} -e AWS_SECRET_ACCESS_KEY={SECRET} awscli aws ecr get-login --region us-east-1

Now use the returned docker login command to login into the Elastic Container repository.

RabbitMQ

docker run -d --hostname my-rabbit -p 5672:5672 -p 15672:15672 --name some-rabbit rabbitmq:3-management

Outstanding issues

Guarantee associations are up to date

When ingestion happens, the content types (brand, show, episodes, season) and associations are updated. Associations should always be up to date. However, the current implementation will keep appending new associations, but none are ever removed. This only applies to associations between the following:

  • brand has many shows
  • show has many seasons
  • show has many episodes
  • season has many episodes

This causes a problem as time goes on as there might be associations that are no longer valid. We can't simply remove all relationships, for example show's episodes, ahead of time. This is because we can't ingest all episodes for a show in a single transaction. This will incur a window of time where the associations are unavailable. This has to do with the way feeds are ingested.

Reset Episode's people association before ingestion

  • episode has many people

Reset Show and Episode's people association before ingestion

  • episode has one franchise
  • show has one franchise

Missing association for episode

Noticed strange issue where an episode's platform record was missing. The id of the episode observed is 8275ead7-ea85-4f3d-aed6-90548858e6ee. It was missing the web platform record. A re-ingest of the CC episode feed solved the issue. This needs to be investigated further.

Batch ingest via Dashboard not working

Troubleshooting

alembic util command error can't find identifier

Alembic stores the version history in your database. Hence it is using the value stored in your database to search for the revision. The version number for my personal database is stored in the table alembic_version:

mysql> SELECT * FROM alembic_version;
+-------------+
| version_num |
+-------------+
| c8ad125e063 |
+-------------+
1 row in set (0.00 sec)
Hint: Use the command SHOW TABLES if it's a SQL based database to see the tables.

To solve your problem simply use the command:

DROP TABLE alembic_version;

Docker Pull Command
Owner
rubberviscous

Comments (0)