Spark job fails with org.apache.spark.SparkException: Job 207 cancelled because SparkContext was shut down

Search

NickName:Ajayv Ask DateTime:2022-12-13T15:26:30

Spark job fails with org.apache.spark.SparkException: Job 207 cancelled because SparkContext was shut down

I am running a AWS Glue job, with pyspark, where I read a json file from S3, do some transformations on it, and then save it back to the same location. The following is the behavior

df = spark.read.json(path)
checkpoint the df and read again
df = df.transformation1()
df = df.transformation2()
print(df.count())
df.write(s3_path)

The count gets printed, but the write to S3 fails with the exception Spark job fails with org.apache.spark.SparkException: Job 207 cancelled because SparkContext was shut down

After searching online, it seems like this is an OOM issue. The data in question is in MBs, so its pretty small. Considering that the count operation succeeds, and write fails, would it make sense to persist the dataframe before the count, so that the write would probably succeed as the data to write is already computed and persisted.

Other error logs that I found:

Lost executor 2 on 172.36.186.165: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN

Copyright Notice：Content Author:「Ajayv」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/74781284/spark-job-fails-with-org-apache-spark-sparkexception-job-207-cancelled-because

Products recommended by Amazon

More >>>

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

Timberland Men's 6-Inch Premium Waterproof Boot

Timberland Men's 6-Inch Premium Waterproof Boot

More about “Spark job fails with org.apache.spark.SparkException: Job 207 cancelled because SparkContext was shut down” related questions

Spark job fails with org.apache.spark.SparkException: Job 207 cancelled because SparkContext was shut down

I am running a AWS Glue job, with pyspark, where I read a json file from S3, do some transformations on it, and then save it back to the same location. The following is the behavior df = spark.read...

Show Detail

Job 65 cancelled because SparkContext was shut down

I'm working on a shared Apache Zeppelin server. Almost every day, I try to run a command and get this error: Job 65 cancelled because SparkContext was shut down I would love to learn more about what

Show Detail

Job cancelled because SparkContext was shut down

While running my spark program in jupyter notebook I got the error "Job cancelled because SparkContext was shut down".I am using spark without hadoop.The same program gave output earlier but now sh...

Show Detail

Job cancelled because SparkContext was shut down while saving dataframe as hive table

I am trying to save a dataframe to hive table. Simplified code looks like this: DataFrame df = hiveContext.sql(<some query>); df.write().mode("overwrite").saveAsTable("schemaName.tableName");

Show Detail

Why is SparkContext being shutdown during Logistic Regression?

I think it has something to do with memory, because it was working fine for smaller data sets. The program utilizes, and prematurely shuts down, while using Logistic Regression from Spark-Mllib. I am

Show Detail

"sparkContext was shut down" while running spark on a large dataset

When running sparkJob on a cluster past a certain data size(~2,5gb) I am getting either "Job cancelled because SparkContext was shut down" or "executor lost". When looking at yarn gui I see that jo...

Show Detail

SparkException: Job 2 cancelled because SparkContext was shut down, Spark Yarn handling large dataset

I run hail0.1 on a Spark cluster via Google Dataproc. I successfully ran 1 Tb dataset, import vcf and write to vds. when I try to ran 3 Tb dataset, use the same code, it report: SparkException: Job 2

Show Detail

SparkContext was shut down while doing a join

I have scenario where I have a big dataset ( ds1) , which need to be joined with another dataset ds2( which is bit smaller than ds1 ), I am joining it by broadcast join something as below shown Dat...

Show Detail

Job submitted via Spark job server fails with error

I am using Spark Job Server to submit spark jobs in cluster .The application I am trying to test is a spark program based on Sansa query and Sansa stack . Sansa is used for scalable processing of ...

Show Detail

Unit Tests using Spark Session : SparkContext was shut down

We have a big project with multiple tests suites and every test suite has in average 3 tests. For our unit tests, we use the Spark Standalone, and so no Yarn as a resource manager. Every test suit...

Show Detail