hadoop file encoding 설정


hadoop file read write 시 인코딩을 변경하는 방법


       
    mapred.child.java.opts
    -Xmx200m -Dfile.encoding=utf-8

    Java opts for the task tracker child processes.  Subsumes
    ‘mapred.child.heap.size’ (If a mapred.child.heap.size value is found
    in a configuration, its maximum heap size will be used and a warning
    emitted that heap.size has been deprecated). Also, the following symbols,
    if present, will be interpolated: @taskid@ is replaced by current TaskID;
    and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A second
    child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
    greater than one). Any other occurrences of ‘@’ will go unchanged. For
    example, to enable verbose gc logging to a file named for the taskid in
    /tmp and to set the heap maximum to be a gigabyte, pass a ‘value’ of:
        -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
   

   



댓글

이 블로그의 인기 게시물

ubuntu에서 samba로 파일 공유하기

화이트해커를 위한 암호와 해킹

Shell Program(1) 변수, 상수