@@ -17,18 +17,8 @@ RAPIDS Accelerator For Apache Spark is supported on Dataproc 2.0+ (Spark 3.0)+.
1717## RAPIDS Accelerator For Apache Spark
1818
1919### Prerequisites
20-
21- To use RAPIDS Accelerator For Apache Spark, XGBoost4j with Spark 3
22-
23- * Apache Spark 3.0+
24- * Hardware Requirements
25- * NVIDIA Pascal™ GPU architecture or better (V100, P100, T4 and later)
26- * Multi-node clusters with homogenous GPU configuration
27- * Software Requirements
28- * NVIDIA GPU driver 440.33+
29- * CUDA v11.5/v11.0/v10.2/v10.1
30- * NCCL 2.11.4+
31- * Ubuntu 18.04, Ubuntu 20.04 or Rocky Linux 7, Rocky Linux8, Debian 10, Debian 11
20+ Please find the [ RAPIDS Accelerator For Apache Spark] ( https://nvidia.github.io/spark-rapids/ )
21+ official document for the hardware and software [ requirements] ( https://nvidia.github.io/spark-rapids/docs/download.html ) .
3222
3323This section describes how to create
3424[ Google Cloud Dataproc] ( https://cloud.google.com/dataproc ) cluster with
@@ -59,20 +49,17 @@ export GCS_BUCKET=<your bucket for the logs and notebooks>
5949export REGION=< region>
6050export NUM_GPUS=1
6151export NUM_WORKERS=2
62- export CUDA_VER=11.5
6352
6453gcloud dataproc clusters create $CLUSTER_NAME \
6554 --region $REGION \
66- --image-version=2.0-ubuntu18 \
55+ --image-version=2.2-ubuntu22 \
6756 --master-machine-type n1-standard-4 \
6857 --master-boot-disk-size 200 \
6958 --num-workers $NUM_WORKERS \
7059 --worker-accelerator type=nvidia-tesla-t4,count=$NUM_GPUS \
7160 --worker-machine-type n1-standard-8 \
7261 --num-worker-local-ssds 1 \
7362 --initialization-actions gs://goog-dataproc-initialization-actions-${REGION} /spark-rapids/spark-rapids.sh \
74- --optional-components=JUPYTER,ZEPPELIN \
75- --metadata gpu-driver-provider=" NVIDIA" ,rapids-runtime=" SPARK" ,cuda-version=" $CUDA_VER " \
7663 --bucket $GCS_BUCKET \
7764 --subnet=default \
7865 --enable-component-gateway
0 commit comments