linux运行pyspark,pyspark使用方法-程序员宅基地

技术标签: linux运行pyspark  

在pycharm上配置pyspark

在pycharm上配置pyspark

在windows上下面的错误,linux上应该正常

C:\ProgramData\Anaconda3\envs\tensorflow\python.exe E:/github/data-analysis/tf/SparkTest.py

2018-07-19 10:35:41 ERROR Shell:397 - Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)

at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)

at org.apache.hadoop.util.Shell.(Shell.java:387)

at org.apache.hadoop.util.StringUtils.(StringUtils.java:80)

at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)

at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)

at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)

2018-07-19 10:35:41 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Binarizer output with Threshold = 5.100000

[Stage 0:> (0 + 1) / 1]2018-07-19 10:35:51 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)

java.io.IOException: Cannot run program "python": CreateProcess error=2, ϵͳ�Ҳ���ָ�����ļ���

at java.lang.ProcessBuilder.start(Unknown Source)

at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)

at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)

at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)

at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)

Caused by: java.io.IOException: CreateProcess error=2, ϵͳ�Ҳ���ָ�����ļ���

at java.lang.ProcessImpl.create(Native Method)

at java.lang.ProcessImpl.(Unknown Source)

at java.lang.ProcessImpl.start(Unknown Source)

... 35 more

2018-07-19 10:35:51 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, ϵͳ�Ҳ���ָ�����ļ���

at java.lang.ProcessBuilder.start(Unknown Source)

at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)

at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)

at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)

at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

Caused by: java.io.IOException: CreateProcess error=2, ϵͳ�Ҳ���ָ�����ļ���

at java.lang.ProcessImpl.create(Native Method)

at java.lang.ProcessImpl.(Unknown Source)

at java.lang.ProcessImpl.start(Unknown Source)

... 35 more

2018-07-19 10:35:51 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job

Traceback (most recent call last):

File "E:/github/data-analysis/tf/SparkTest.py", line 21, in

binarizedDataFrame.show()

File "E:\hadoop-common\spark-2.3.1-bin-hadoop2.7\python\pyspark\sql\dataframe.py", line 350, in show

print(self._jdf.showString(n, 20, vertical))

File "E:\hadoop-common\spark-2.3.1-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__

File "E:\hadoop-common\spark-2.3.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco

return f(*a, **kw)

File "E:\hadoop-common\spark-2.3.1-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling o49.showString.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, 系统找不到指定的文件。

at java.lang.ProcessBuilder.start(Unknown Source)

at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)

at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)

at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)

at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

Caused by: java.io.IOException: CreateProcess error=2, 系统找不到指定的文件。

at java.lang.ProcessImpl.create(Native Method)

at java.lang.ProcessImpl.(Unknown Source)

at java.lang.ProcessImpl.start(Unknown Source)

... 35 more

Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)

at

Caused by: java.io.IOException: Cannot run program "python": CreateProcess error=2, 系统找不到指定的文件。

at java.lang.ProcessBuilder.start(Unknown Source)

at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)

at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)

at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)

at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

Caused by: java.io.IOException: CreateProcess error=2, 系统找不到指定的文件。

Process finished with exit code

windows操作系统的原因

Anaconda上jupyter notebook使用pyspark

打开anaconda navigator,可以安装pyspark。可以运行,不用配置hadoop

d2188251b707

image.png

另外参考

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_33962326/article/details/116823068

智能推荐

SpringBoot项目引入外部jar包_springboot引入外部jar包-程序员宅基地

文章浏览阅读695次。SpringBoot项目引入外部jar包_springboot引入外部jar包

Java系列(面试必备):HashMap和Hashtable的区别!_java hashtable的优缺点-程序员宅基地

文章浏览阅读3.1k次,点赞19次,收藏84次。Java系列(面试必备):HashMap 和 Hashtable 的 6 个区别!前言今天博主将为大家分享:Java系列(面试必备):HashMap 和 Hashtable 的 6 个区别!不喜勿喷,如有异议欢迎讨论!首先推荐结合博主的这篇文章进行阅读===>Java系列(面试必备):简单的hashCode和equals面试题,有好多坑!HashMap 和 Hashtable 是 J..._java hashtable的优缺点

elasticSearch - es报错:exception [type=search_phase_execution_exception, reason=all shards failed]_elasticsearch exception [type=search_phase_executi-程序员宅基地

文章浏览阅读5.7k次。查询语句中,字段类型使用错误,在es中查询字段类型为int,而查询语句中错误地用成了string。_elasticsearch exception [type=search_phase_execution_exception, reason=all s

WIZnet W5300-TOE MQTT 发布和订阅 (micropython)_w5500 mqtt onenet-程序员宅基地

文章浏览阅读113次。这些部分将指导您完成一系列步骤,从配置开发环境到使用 STM32f429zi (nuleo-f429zi) 和 W5300-TOE 运行以太网示例 基本设置请参阅“入门”指南。_w5500 mqtt onenet

day24.|回溯法01-程序员宅基地

文章浏览阅读381次,点赞5次,收藏7次。1.也可以叫做,它是一种搜索的方式。2.回溯是递归的副产品,只要有递归就会有回溯。回溯与递归相辅相成,只要有递归就有回溯。通常递归函数的下面就是回溯的逻辑。3.,如果想让回溯法高效一些,可以加一些剪枝的操作,但也改不了回溯法就是穷举的本质。(暴力查找)5.回溯法解决的问题:(1)组合问题:N个数里面按一定规则找出k个数的集合;(2)切割问题:一个字符串按一定规则有几种切割方式;(3)子集问题:一个N个数的集合里有多少符合条件的子集;(4)排列问题:N个数按一定规则全排列,有几种排列方式;

Esp32-CAM(ESP32带camera)使用说明_esp32 cam说明书-程序员宅基地

文章浏览阅读6.7k次,点赞2次,收藏28次。1、如下图,只需在丝印 VCC GND 处供 3.3V 电源即可启动开发板2、上电后开发板会释放热点。其中SSID: wireless-tagPwd: wireless-tag3、电脑或手机连接此热点后,登录网页 http://192.168.4.1 进入 WEBSERVER 界面。如下图:4、上图中有两个红色框 Get Still 和 Start Stream(Stop Stream)。点击 Get Still,摄像头抓取一张图片,并显示在黑色区域;点击 Start Stream_esp32 cam说明书

随便推点

BZOJ 4034 HAOI2015 T2 DFS序+线段树_haoi2015 t2-bzoj4034-程序员宅基地

文章浏览阅读2.4k次。题目大意:给定一棵树,每个点有点权,支持下列操作: 1.某个点的点权+a 2.某棵子树所有点权+a 3.查询某个点到根路径上的点权和 这个用入栈出栈序就可以了 入栈为正,出栈为负,那么一个点到根路径上的权值和就是入栈出栈序中[1,入栈位置]的和 而子树在入栈出栈序中是连续的,因此用线段树维护一下就可以了 (似乎只要无脑链剖就可以了?#include #include_haoi2015 t2-bzoj4034

3D数学基础(一)——左手坐标系和右手坐标系_3d左手坐标系-程序员宅基地

文章浏览阅读3.7k次。3D数学基础(一)——左手坐标系和右手坐标系1、左手坐标系左手坐标系的定义伸出左手,让拇指和是指成L型,大拇指向右,食指向前,中指指向前方,这样便定义好了一个左手坐标系,其中拇指为x轴,食指为Y轴,中指为Z轴。图示2、右手坐标系右手坐标系的定义右手坐标系与左手坐标系相反图示3、意义左手坐标系和右手坐标系虽然定义简单,并且可以相互转换,但是在一个场景中定义好坐标系是左手坐标..._3d左手坐标系

Java NIO之Selector_java nio selecter-程序员宅基地

文章浏览阅读278次。一、Java NIO 的核心组件Java NIO的核心组件包括:Channel(通道),Buffer(缓冲区),Selector(选择器),其中Channel和Buffer比较好理解 简单来说 NIO是面向通道和缓冲区的,意思就是:数据总是从通道中读到buffer缓冲区内,或者从buffer写入到通道中。二、Java NIO Selector1. Selector简介选择器..._java nio selecter

1. 一键安装oracle11g数据库_linux安装oracle11g数据库-程序员宅基地

文章浏览阅读2.3k次,点赞4次,收藏20次。Oracle数据库在Linux系统上安装步骤比较多,为了方便Oracle数据库的安装,编写了以下脚本,简化了Oracle数据库的安装。_linux安装oracle11g数据库

TCP/IP协议详解-程序员宅基地

文章浏览阅读60次。1、TCP/IP协议栈四层模型 TCP/IP这个协议遵守一个四层的模型概念:应用层、传输层、互联层和网络接口层。 网络接口层 模型的基层是网络接口层。负责数据帧的发送和接收,帧是独立的网络信息传输单元。网络接口层将帧放在网上,或从网上把帧取下来。 互联层 互联协议将数据包封装成internet数据报,并运行必要的路由算法。 这里有四个互联协议: 网际协议IP...

网络安全的隐形守护者——白帽黑客-程序员宅基地

文章浏览阅读2.3k次。提起黑客我们的脑海中总是会浮现那些“啪啪啪”敲键盘,进入别人电脑或是企业服务器的“神秘人”,他们来无影去无踪,但是每次造访总会将所到之处破坏个淋漓尽致,直到拿到自己想要的..._白帽黑客小青