多平台部署的作用

本案例的目标是阐述多平台部署成为必要性的场景。我们将从用例开始,然后深入探讨如何在 Spring Cloud Data Flow 中进行设置。

用例

  • 对于某些用例,您可能希望将流和批处理数据管道的部署隔离到唯一的环境中。例如,您可能希望运行需要大量内存的预测模型训练例程,其中计算通常定义了特定的边界,并且只允许特定的工作负载在其上运行。换句话说,您不希望常规应用程序使用高计算资源池并使其饱和。当您按使用量付费运行机器以避免高昂成本时,这一点尤为重要。
  • 与前一个用例类似,您可能需要在消息代理附近运行应用程序(即,在数据所在位置附近运行业务逻辑)。这样做可以避免 I/O 延迟,以满足高吞吐量和低延迟的服务级别协议 (SLA)。同样,必须编排一种部署模式,其中流应用程序可以定向部署在运行消息代理的同一台虚拟机上,这有助于满足 SLA。
  • 用户有时使用“单个”Spring Cloud Data Flow 实例来编排部署模型,在该模型中,流式和批处理数据管道被部署并启动到多个环境。此部署模式主要为了使用定义明确的边界来组织部署拓扑,其中单个 SCDF 实例可以集中编排、监控和管理数据管道。

前面的场景要求 Spring Cloud Data Flow 使用灵活的平台配置来部署流式和批处理应用程序。幸运的是,从 v2.0 开始,Spring Cloud Data Flow 支持多平台部署。因此,用户可以在预先声明性地配置所需数量的平台帐户,并在部署时使用已定义的帐户来区分边界。

现在我们了解了用例需求,可以回顾一下在 Kubernetes 和 Cloud Foundry 中配置多个平台帐户的步骤。

配置

本节讨论 Kubernetes 和 Cloud Foundry 的配置。

Kubernetes

假设您要将包含三个应用程序的流部署到 kafka-namespace。同样,如果要将批处理作业启动到 highmemory-namespace,则可以在 SCDF 部署文件中定义配置。

由于流数据管道是通过 Skipper 管理的,您可以使用以下内容更改 skipper-config-kafka.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: skipper
  labels:
    app: skipper
data:
  application.yaml: |-
    spring:
      cloud:
        skipper:
          server:
            platform:
              kubernetes:
                accounts:
                  default:
                    namespace: default
                    environmentVariables: 'SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZK_SERVICE_HOST}:${KAFKA_ZK_SERVICE_PORT}'
                    limits:
                      memory: 1024Mi
                      cpu: 500m
                    readinessProbeDelay: 120
                    livenessProbeDelay: 90
                  kafkazone:
                    namespace: kafka-namespace
                    environmentVariables: 'SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZK_SERVICE_HOST}:${KAFKA_ZK_SERVICE_PORT}'
                    limits:
                      memory: 2048Mi
                      cpu: 500m
                    readinessProbeDelay: 180
                    livenessProbeDelay: 120
      datasource:
        url: jdbc:mariadb://${MARIADB_SERVICE_HOST}:${MARIADB_SERVICE_PORT}/skipper
        username: root
        password: ${mariadb-root-password}
        driverClassName: org.mariadb.jdbc.Driver
        testOnBorrow: true
        validationQuery: "SELECT 1"

如果 RabbitMQ 是代理,则必须改为更改 skipper-config-rabbit.yaml

请注意,包含名称为 kafkazone 的平台帐户。此外,已部署 Pod 的默认内存设置为 2GB,以及 readiness 和 liveness 探针自定义。

但是,对于批处理数据管道,您必须更改 server-config.yaml 中的配置,如下所示

apiVersion: v1
kind: ConfigMap
metadata:
  name: scdf-server
  labels:
    app: scdf-server
data:
  application.yaml: |-
    management:
      metrics:
        export:
          prometheus:
            enabled: true
            rsocket:
              enabled: true
              host: prometheus-proxy
              port: 7001
    spring:
      cloud:
        dataflow:
          metrics:
            dashboard:
              url: 'https://grafana:3000'
          task:
            platform:
              kubernetes:
                accounts:
                  default:
                    namespace: default
                    limits:
                      memory: 1024Mi
                  highmemory:
                    namespace: highmemory-namespace
                    limits:
                      memory: 4096Mi
      datasource:
        url: jdbc:mariadb://${MARIADB_SERVICE_HOST}:${MARIADB_SERVICE_PORT}/mariadb
        username: root
        password: ${mariadb-root-password}
        driverClassName: org.mariadb.jdbc.Driver
        testOnBorrow: true
        validationQuery: "SELECT 1"

请注意,包含名称为 highmemory 的平台帐户。此外,已部署 Pod 的默认内存设置为 4GB。

使用这些配置,当您从 SCDF 部署流时,可以选择平台。为此,您可以列出可用平台,然后选择一个

dataflow:>stream platform-list
╔═════════╤══════════╤═══════════════════════════════════════════════════════════════════════════════════════╗
║  Name   │   Type   │                                   Description                                         ║
╠═════════╪══════════╪═══════════════════════════════════════════════════════════════════════════════════════╣
║default  │kubernetes│master url = [https://10.0.0.1:443/], namespace = [default], api version = [v1]        ║
║kafkazone│kubernetes│master url = [https://10.0.0.1:443/], namespace = [kafka-namespace], api version = [v1]║
╚═════════╧══════════╧═══════════════════════════════════════════════════════════════════════════════════════╝

dataflow:>task platform-list
╔═════════════╤═════════════╤════════════════════════════════════════════════════════════════════════════════════════════╗
║Platform Name│Platform Type│                                   Description                                              ║
╠═════════════╪═════════════╪════════════════════════════════════════════════════════════════════════════════════════════╣
║default      │Kubernetes   │master url = [https://10.0.0.1:443/], namespace = [default], api version = [v1]             ║
║highmemory   │Kubernetes   │master url = [https://10.0.0.1:443/], namespace = [highmemory-namespace], api version = [v1]║
╚═════════════╧═════════════╧════════════════════════════════════════════════════════════════════════════════════════════╝

创建流。

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

部署流。

dataflow:>stream deploy --name foo --platformName kafkazone

验证部署。

kubectl get svc -n kafka-namespace
NAME          TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)                      AGE
kafka         ClusterIP      10.0.7.155    <none>          9092/TCP                     7m29s
kafka-zk      ClusterIP      10.0.15.169   <none>          2181/TCP,2888/TCP,3888/TCP   7m29s

kubectl get pods -n kafka-namespace
NAME                                READY   STATUS    RESTARTS   AGE
foo-cassandra-v1-5d79b8bdcd-94kw4   1/1     Running   0          63s
foo-cardata-v1-6cdc98fbd-cmrr2      1/1     Running   0          63s
foo-predict-v1-758dc44575-tcdkd     1/1     Running   0          63s

或者,您可以使用 SCDF 仪表板中的平台下拉菜单来创建和启动任务。下图显示了如何启动任务

Launch against a platform

Cloud Foundry

对于相同的用例需求,如果您想将包含三个应用程序的流部署到运行 Kafka 服务的组织和空间,并且同样将批处理作业部署到具有更多计算能力的组织和空间,则 SCDF for Cloud Foundry 中的配置可以如以下列表所示。

由于流数据管道是通过 Skipper 管理的,因此您可以更改 Skipper 的 manifest.yml 文件以包含 Kafka 组织和空间连接凭据。

applications:
  - name: skipper-server
    host: skipper-server
    memory: 1G
    disk_quota: 1G
    instances: 1
    timeout: 180
    buildpack: java_buildpack
    path: <PATH TO THE DOWNLOADED SKIPPER SERVER UBER-JAR>
    env:
      SPRING_APPLICATION_NAME: skipper-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_APPLICATION_JSON: |-
        {
          "spring.cloud.skipper.server" : {
             "platform.cloudfoundry.accounts":  {
                   "default": {
                       "connection" : {
                           "url" : <cf-api-url>,
                           "domain" : <cf-apps-domain>,
                           "org" : <org>,
                           "space" : <space>,
                           "username": <email>,
                           "password" : <password>,
                           "skipSsValidation" : false
                       }
                       "deployment" : {
                           "deleteRoutes" : false,
                           "services" : "rabbitmq",
                           "enableRandomAppNamePrefix" : false,
                           "memory" : 2048
                       }
                  },
                  "kafkazone": {
                     "connection" : {
                         "url" : <cf-api-url>,
                         "domain" : <cf-apps-domain>,
                         "org" : kafka-org,
                         "space" : kafka-space,
                         "username": <email>,
                         "password" : <password>,
                         "skipSsValidation" : false
                     }
                     "deployment" : {
                         "deleteRoutes" : false,
                         "services" : "kafkacups",
                         "enableRandomAppNamePrefix" : false,
                         "memory" : 3072
                     }
                  }
              }
           }
        }
services:
  - <services>

请注意,其中包含一个名为 kafkazone 的平台帐户。此外,已部署应用程序的默认内存设置为 3GB。

但是,对于批处理数据管道,您必须更改 SCDF 的 manifest.yml 文件中的配置,如下所示

applications:
  - name: data-flow-server
    host: data-flow-server
    memory: 2G
    disk_quota: 2G
    instances: 1
    path: { PATH TO SERVER UBER-JAR }
    env:
      SPRING_APPLICATION_NAME: data-flow-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI: https://<skipper-host-name>/api
      SPRING_APPLICATION_JSON: |-
        {
           "maven" : {
               "remoteRepositories" : {
                  "repo1" : {
                    "url" : "https://repo.spring.io/libs-snapshot"
                  }
               }
           },
           "spring.cloud.dataflow" : {
                "task.platform.cloudfoundry.accounts" : {
                    "default" : {
                        "connection" : {
                            "url" : <cf-api-url>,
                            "domain" : <cf-apps-domain>,
                            "org" : <org>,
                            "space" : <space>,
                            "username" : <email>,
                            "password" : <password>,
                            "skipSsValidation" : true
                        }
                        "deployment" : {
                          "services" : "postgresSQL"
                        }
                    },
                    "highmemory" : {
                        "connection" : {
                            "url" : <cf-api-url>,
                            "domain" : <cf-apps-domain>,
                            "org" : highmemory-org,
                            "space" : highmemory-space,
                            "username" : <email>,
                            "password" : <password>,
                            "skipSsValidation" : true
                        }
                        "deployment" : {
                          "services" : "postgresSQL",
                          "memory" : 5120
                        }
                    }
                }
           }
        }
services:
  - postgresSQL

请注意,其中包含一个名为 highmemory 的平台帐户。此外,已部署应用程序的默认内存设置为 5GB。

使用这些配置,当您从 SCDF 部署流时,可以选择平台。为此,您可以列出可用平台,然后选择一个

dataflow:>stream platform-list
╔═════════╤════════════╤════════════════════════════════════════════════════════════════════════════╗
║  Name   │    Type    │                               Description                                  ║
╠═════════╪════════════╪════════════════════════════════════════════════════════════════════════════╣
║default  │cloudfoundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]  ║
║kafkazone│cloudfoundry│org = [kafka-org], space = [kafka-space], url = [https://api.run.pivotal.io]║
╚═════════╧════════════╧════════════════════════════════════════════════════════════════════════════╝

dataflow:>task platform-list
╔═════════════╤═════════════╤══════════════════════════════════════════════════════════════════════════════════════╗
║Platform Name│Platform Type│                               Description                                            ║
╠═════════════╪═════════════╪══════════════════════════════════════════════════════════════════════════════════════╣
║default      │Cloud Foundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]            ║
║highmemory   │Cloud Foundry│org = [highmemory-org], space = [highmemory-space], url = [https://api.run.pivotal.io]║
╚═════════════╧═════════════╧══════════════════════════════════════════════════════════════════════════════════════╝

创建流。

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

部署流。

dataflow:>stream deploy --name foo --platformName kafkazone

验证部署。

cf apps
Getting apps in org kafka-org / space kafka-space as [email protected]...
OK

name                           requested state   instances   memory   disk   urls
j6wQUU3-foo-predict-v1          started           1/1         3G       1G     j6wQUU3-foo-predict-v1.cfapps.io
j6wQUU3-foo-cardata-v1          started           1/1         3G       1G     j6wQUU3-foo-cardata-v1.cfapps.io
j6wQUU3-foo-cassandra-v1        started           1/1         3G       1G     j6wQUU3-foo-cassandra-v1.cfapps.io

或者,您可以使用 SCDF 仪表板中的平台下拉菜单来创建和启动任务。

混合使用 Cloud Foundry 和 Kubernetes 部署

在某些情况下,您需要协调一种部署模型,其中特定工作负载部署到 Kubernetes,而其余工作负载部署到 Cloud Foundry。毕竟,从运行时的角度来看,这两个平台都提供不同级别的支持,并且能够灵活地将工作负载部署到不同的平台是一个额外的优势。

想象一下 Spring Cloud Data Flow 在 Cloud Foundry 上运行的场景。仅通过配置设置,也可以在同一个 SCDF 实例中定义和暂存一个或多个 Kubernetes 帐户。这种灵活性开辟了引人注目的部署场景,其中流数据管道和批处理数据管道可以部署到各种平台!

让我们以相同的 Cloud Foundry 场景为例。除了 defaulthighmemory 平台帐户之外,您还会注意到 gpuzone 是 Skipper 的 manifest.yml 文件中的另一个帐户,如下所示。

applications:
  - name: skipper-server
    host: skipper-server
    memory: 1G
    disk_quota: 1G
    instances: 1
    timeout: 180
    buildpack: java_buildpack
    path: <PATH TO THE DOWNLOADED SKIPPER SERVER UBER-JAR>
    env:
      SPRING_APPLICATION_NAME: skipper-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_APPLICATION_JSON: |-
        {
          "spring.cloud.skipper.server" : {
             "platform.cloudfoundry.accounts":  {
                   "default": {
                       "connection" : {
                           "url" : <cf-api-url>,
                           "domain" : <cf-apps-domain>,
                           "org" : <org>,
                           "space" : <space>,
                           "username": <email>,
                           "password" : <password>,
                           "skipSsValidation" : false
                       }
                       "deployment" : {
                           "deleteRoutes" : false,
                           "services" : "rabbitmq",
                           "enableRandomAppNamePrefix" : false,
                           "memory" : 2048
                       }
                  },
                  "kafkazone": {
                     "connection" : {
                         "url" : <cf-api-url>,
                         "domain" : <cf-apps-domain>,
                         "org" : kafka-org,
                         "space" : kafka-space,
                         "username": <email>,
                         "password" : <password>,
                         "skipSsValidation" : false
                     }
                     "deployment" : {
                         "deleteRoutes" : false,
                         "services" : "kafkacups",
                         "enableRandomAppNamePrefix" : false,
                         "memory" : 3072
                     }
                  }
              }
           },
           "platform.kubernetes.accounts":  {
                   "gpuzone": {
                       "fabric8" : {
                           "masterUrl" : <k8s-master-api-url>,
                           "namespace" : "gpuzone-namespace",
                           "trustCerts" : "true"
                  }
              }
           }
        }
services:
  - <services>

在这种情况下,gpuzone 针对 Kubernetes 中的 GPU 虚拟机节点池。通过简单的声明性配置,同一个 SCDF 实例现在就可以将流数据管道和批处理数据管道部署到三个不同的计算环境中。

通过此设置,您可以选择三个平台帐户(defaulthighmemorygpuzone)之一来部署流数据管道或批处理数据管道。

列出可用的平台。

dataflow:>stream platform-list
╔═════════╤════════════╤═══════════════════════════════════════════════════════════════════════════════════════════╗
║  Name   │    Type    │                               Description                                                 ║
╠═════════╪════════════╪═══════════════════════════════════════════════════════════════════════════════════════════╣
║default  │cloudfoundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]                 ║
║kafkazone│cloudfoundry│org = [kafka-org], space = [kafka-space], url = [https://api.run.pivotal.io]               ║
║gpuzone  │kubernetes  │master url = [https://10.0.0.1:443/], namespace = [gpuzone-namespace], api version = [v1]  ║
╚═════════╧════════════╧═══════════════════════════════════════════════════════════════════════════════════════════╝

创建流。

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

部署流。

dataflow:>stream deploy --name foo --platformName gpuzone

验证 Kubernetes 中的新 Pod。

kubectl get pods -n gpuzone-namespace
NAME                                READY   STATUS    RESTARTS   AGE
foo-cassandra-v1-aakhslff-94kw4     1/1     Running   0          73s
foo-cardata-v1-fdalsssdf2-cmrr2     1/1     Running   0          73s
foo-predict-v1-p1j35435-tcdkd       1/1     Running   0          73s

但是,不应该在 Cloud Foundry 中部署新的应用程序。我们应该验证这一点。

cf apps
Getting apps in org scdf-%%% / space space-%%%%% as $$$$$@com.io...
OK

name                         requested state   instances   memory   disk   urls
sabby-skipper                started           1/1         1G       1G     sabby-skipper.....
sabby-test-dataflow-server   started           1/1         1G       1G     sabby-test-dataflow-server....