Agile Data Science 2.0书中代码.

摘要: Agile Data Science 2.0书中代码. pyspark+elasticsearch.

先记录下来,工作中有可能会用到, pyspark+elasticsearch,github地址:https://github.com/rjurney/Agile_Data_Code_2

要把数据从PySpark 写入Elasticsearch 中(或者从Elasticsearch 读取数据到PySpark 中),我们要使用Elasticsearch for Hadoop(https://www.elastic.co/products/hadoop)。在我准备好的映像中,我们已经为本项目配置好了PySpark,因此你不用再做什么就可以加载这个库了。如果你是手动安装的,我们也可以通过安装脚本获得类似的配置。

让PySpark 数据可以被搜索。我们用ch02/pyspark_elasticsearch.py(https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/pyspark_elasticsearch.py)把数据从PySpark 保存到Elasticsearch 中:

csv_lines = sc.textFile("data/example.csv")  
data = csv_lines.map(lambda line: line.split(","))  
schema_data = data.map(  
 lambda x: ('ignored_key', {'name': x[0], 'company': x[1], 'title': x[2]})  
)  
schema_data.saveAsNewAPIHadoopFile(  
 path='-',  
 outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",  
 keyClass="org.apache.hadoop.io.NullWritable",  
 valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",  
 conf={ "es.resource": "agile_data_science/executives"}) 


搜索数据
curl 'localhost:9200/agile_data_science/executives/_search?q=name:Russell*&pretty' 


搜索结果
{  
 "took": 19,  
 "timed_out": false,  
 "_shards": {  
"total": 1,  
"successful": 1,  
"failed": 0  
 },  
 "hits": {  
"total": 2,  
"max_score": 1.0,  
"hits": [  
 {  
"_index": "agile_data_science",  
"_type": "executives",  
"_id": "AVrfrAbdfdS5Z0IiIt78",  
"_score": 1.0,  
"_source": {  
 "company": "Relato",  
 "name": "Russell Jurney",  
 "title": "CEO"  
}  
 },  
 {  
"_index": "agile_data_science",  
"_type": "executives",  
"_id": "AVrfrAbdfdS5Z0IiIt79",  
"_score": 1.0,  
"_source": {  
 "company": "Data Syndrome",  
 "name": "Russell Jurney",  
 "title": "Principal Consultant"  
}  
 }  
]  
 }  
} 

上一篇: Linux下top+pstack+gdb的组合拳定位程序进程线程问题并调试
下一篇: 记录自己常用在kibana中用的DSL查询语句,以后方便参考
 评论 ( What Do You Think )
名称
邮箱
网址
评论
验证
   
 

 


  • 微信公众号

  • 我的微信

站点声明:

1、一号门博客CMS,由Python, MySQL, Nginx, Wsgi 强力驱动

2、部分文章或者资源来源于互联网, 有时候很难判断是否侵权, 若有侵权, 请联系邮箱:summer@yihaomen.com, 同时欢迎大家注册用户,主动发布无版权争议的 文章/资源.

3、鄂ICP备14001754号-3, 鄂公网安备 42280202422812号