最近使用Sqoop从Hive导出数据到MySQL中,出现了一系列的问题,下面将这个问题记录一下,避免再度踩坑!
导出语句
sqoop export --connect jdbc:mysql://192.168.1.78:3306/data \
--username root \
-P \
--export-dir '/user/hive/warehouse/personas.db/user_attribute/000000_0' \
--table dm_user_attribute \
--input-fields-terminated-by '|' \
--input-null-non-string '\\N' \
--input-null-string '\\N' \
--lines-terminated-by '\n' \
-m 1
运行环境
centOS7+CDH5.7.2+其中集成的Sqoop
错误信息
以下是我输入命令到服务器中,控制台打印的信息。
Warning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/07/23 11:54:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.2
18/07/23 11:54:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/23 11:54:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/07/23 11:54:45 INFO tool.CodeGenTool: Beginning code generation
18/07/23 11:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_user_attribute` AS t LIMIT 1
18/07/23 11:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_user_attribute` AS t LIMIT 1
18/07/23 11:54:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/2322b82e8ef7190a66357528d5fbddae/dm_user_attribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/07/23 11:54:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2322b82e8ef7190a66357528d5fbddae/dm_user_attribute.jar
18/07/23 11:54:47 INFO mapreduce.ExportJobBase: Beginning export of dm_user_attribute
18/07/23 11:54:47 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/23 11:54:47 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
18/07/23 11:54:48 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/07/23 11:54:48 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/07/23 11:54:48 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/07/23 11:54:48 INFO client.RMProxy: Connecting to ResourceManager at 192.168.1.152:8032
18/07/23 11:54:49 INFO input.FileInputFormat: Total input paths to process : 1
18/07/23 11:54:49 INFO input.FileInputFormat: Total input paths to process : 1
18/07/23 11:54:49 INFO mapreduce.JobSubmitter: number of splits:1
18/07/23 11:54:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528444677205_1338
18/07/23 11:54:50 INFO impl.YarnClientImpl: Submitted application application_1528444677205_1338
18/07/23 11:54:50 INFO mapreduce.Job: The url to track the job: http://daojia02:8088/proxy/application_1528444677205_1338/
18/07/23 11:54:50 INFO mapreduce.Job: Running job: job_1528444677205_1338
18/07/23 11:54:55 INFO mapreduce.Job: Job job_1528444677205_1338 running in uber mode : false
18/07/23 11:54:55 INFO mapreduce.Job: map 0% reduce 0%
18/07/23 11:55:00 INFO mapreduce.Job: map 100% reduce 0%
18/07/23 11:55:01 INFO mapreduce.Job: Job job_1528444677205_1338 failed with state FAILED due to: Task failed task_1528444677205_1338_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
18/07/23 11:55:01 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2855
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2855
Total vcore-seconds taken by all map tasks=2855
Total megabyte-seconds taken by all map tasks=2923520
18/07/23 11:55:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/07/23 11:55:01 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 13.576 seconds (0 bytes/sec)
18/07/23 11:55:01 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/07/23 11:55:01 INFO mapreduce.ExportJobBase: Exported 0 records.
18/07/23 11:55:01 ERROR tool.ExportTool: Error during export: Export job failed!
当我看到这个控制台打印的信息时,犹如一万只草泥马狂奔而过,这是什么鬼?只告诉你导出失败,任务中断了,错误信息呢?你看到是不是也是一样的感觉呢?这该如何解决?从何入手呢?
Sqoop的错误日志
经过两天的各种搞头,最后终于知道了如何解决这个问题,这个问题不是具体的问题,但是想要知道具体的错误信息,在控制台是看不到的,只能到CDH的web管理界面去看,如下就告诉大家CDH的管理界面怎么找到Sqoop的这个任务日志。
第一步
如下图:点击YAEN进入YARN的详情界面。有人会问,为什么不是Sqoop的界面,Sqoop最终会转化为MR进行任务的执行,所以这里要看Sqoop的任务执行情况,还是要到YARN的详情界面去看。
第二步
如下图为YARN的详情界面,需要点击应用程序目录,进入任务的执行结果列表中,可以看到各个执行的任务,以及执行的结果,下图明显看到有一个错误。根据如下的操作进入下一个页面。
第三步
这个界面展示了单个任务的还算详细的任务信息,不过这不是我们最终要找的界面,看到如下图框起来的logs超链接字段,点击进入下一个页面。
第四步
看到这个界面,好像是找到了日志的界面,对不起,还没有,向下拉,你会看到如图的字样,这个页面只是展示了任务执行的流程,具体的错误信息还在另外一个页面。点击如图here超链接的字样,进入下一个页面。
第五步
经过前面的几个页面,我们终于进入了我们想要看到的页面,我们亲爱的错误页面,在这里,就可以看到这个任务的错误原因,这样就可以根据错误信息解决问题了。这个页面展示的错误信息的解决方法,网上基本都有,可以根据错误信息自行查找了。
本人这里展现的问题,是因为Hive和MySQL的时间字段不匹配导致的,这里更改MySQL或者Hive的时间字段类型,让两边的类型保持一致,即可解决问题。
真的没想到,CDH会这么坑,这个问题,整整折磨了我两天,不过还好,最终还是解决了,以后再遇到之后,就会可以立即解决了。