JAva pdfワードを回して簡単にできます.ポケットの銀を無駄にしないでください.

8252 ワード

Javaバックエンド関連

最近家族はpdfのドキュメントがwordに回転する必要があって、私はとても気楽にネット上でツールを検索すると思って、意外にも使いやすいツールを探し当てていないで、いくつかの似ているのは意外にも銀を必要としますか??
pdf変換はこんなに難しいですか?どうしてお金が必要なの?強力なjava、使いやすいapacheツールシリーズは解決できませんか ?だから検討することにした.
まずapache pdf解析の依存パケットを見つけた.


            org.apache.pdfbox
            pdfbox
            2.0.4
        
        
            net.coobird
            thumbnailator
            0.4.8

もう一つは画像処理のバッグです.
pdfには画像と文字が含まれているので、私は画像と文字を抽出してwordに追加すればいいので、POI依存を追加します.


            org.apache.poi
            poi
            3.9
        
        
            org.apache.poi
            poi-ooxml
            3.9

コードを貼り付けます.

public class Pdf2word {
    public static void main(String[] args) throws InvalidFormatException {

        try {
            String pdfFileName = "H:\\xuweichao.pdf";
            PDDocument pdf = PDDocument.load(new File(pdfFileName));
            int pageNumber = pdf.getNumberOfPages();

            String docFileName = pdfFileName.substring(0, pdfFileName.lastIndexOf(".")) + ".doc";

            File file = new File(docFileName);
            if (!file.exists()) {
                file.createNewFile();
            }
            CustomXWPFDocument document = new CustomXWPFDocument();
            FileOutputStream fos = new FileOutputStream(docFileName);

            //           ，    word  
            for (int i = 0; i < pageNumber; i++) {

                PDPage page = pdf.getPage(i);
                PDResources resources = page.getResources();

                Iterable names = resources.getXObjectNames();
                Iterator iterator = names.iterator();
                while (iterator.hasNext()) {
                    COSName cosName = iterator.next();

                    if (resources.isImageXObject(cosName)) {
                        PDImageXObject imageXObject = (PDImageXObject) resources.getXObject(cosName);
                        File outImgFile = new File("H:\\img\\" + System.currentTimeMillis() + ".jpg");
                        Thumbnails.of(imageXObject.getImage()).scale(0.9).rotate(0).toFile(outImgFile);


                        BufferedImage bufferedImage = ImageIO.read(outImgFile);
                        int width = bufferedImage.getWidth();
                        int height = bufferedImage.getHeight();
                        if (width > 600) {
                            double ratio = Math.round((double) width / 550.0);
                            System.out.println("   ratio："+ratio);
                            width = (int) (width / ratio);
                            height = (int) (height / ratio);

                        }

                        System.out.println("width: " + width + ",  height: " + height);
                        FileInputStream in = new FileInputStream(outImgFile);
                        byte[] ba = new byte[in.available()];
                        in.read(ba);
                        ByteArrayInputStream byteInputStream = new ByteArrayInputStream(ba);

                        XWPFParagraph picture = document.createParagraph();
                        //    
                        document.addPictureData(byteInputStream, CustomXWPFDocument.PICTURE_TYPE_JPEG);
                        //    、  
                        document.createPicture(document.getAllPictures().size() - 1, width, height, picture);

                    }
                }


                PDFTextStripper stripper = new PDFTextStripper();
                stripper.setSortByPosition(true);
                stripper.setStartPage(i);
                stripper.setEndPage(i);
                //       
                String text = stripper.getText(pdf);


                XWPFParagraph textParagraph = document.createParagraph();
                XWPFRun textRun = textParagraph.createRun();
                textRun.setText(text);
                textRun.setFontFamily("  ");
                textRun.setFontSize(11);
                //  
                textParagraph.setWordWrap(true);
            }
            document.write(fos);
            fos.close();
            pdf.close();
            System.out.println("pdf      ！！----");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

カスタムドキュメントクラス:

public class CustomXWPFDocument extends XWPFDocument {
    public CustomXWPFDocument(InputStream in) throws IOException {
        super(in);
    }

    public CustomXWPFDocument() {
        super();
    }

    public CustomXWPFDocument(OPCPackage pkg) throws IOException {
        super(pkg);
    }

    /**
     * @param id
     * @param width
     *             
     * @param height
     *             
     * @param paragraph
     *              
     */
    public void createPicture(int id, int width, int height,
                              XWPFParagraph paragraph) {
        final int EMU = 9525;
        width *= EMU;
        height *= EMU;
        String blipId = getAllPictures().get(id).getPackageRelationship()
                .getId();
        CTInline inline = paragraph.createRun().getCTR().addNewDrawing()
                .addNewInline();
        String picXml = ""
                + ""
                + "   "
                + "      "
                + "         " + "            "
                + "            "
                + "         "
                + "         "
                + "            "
                + "            "
                + "               "
                + "            "
                + "         "
                + "         "
                + "            "
                + "               "
                + "               "
                + "            "
                + "            "
                + "               "
                + "            "
                + "         "
                + "      "
                + "   " + "";

        inline.addNewGraphic().addNewGraphicData();
        XmlToken xmlToken = null;
        try {
            xmlToken = XmlToken.Factory.parse(picXml);
        } catch (XmlException xe) {
            xe.printStackTrace();
        }
        inline.set(xmlToken);

        inline.setDistT(0);
        inline.setDistB(0);
        inline.setDistL(0);
        inline.setDistR(0);

        CTPositiveSize2D extent = inline.addNewExtent();
        extent.setCx(width);
        extent.setCy(height);

        CTNonVisualDrawingProps docPr = inline.addNewDocPr();
        docPr.setId(id);
        docPr.setName("    ");
        docPr.setDescr("    ");
    }
}

プログラムはこのように簡単で、ファイルの各ページを遍歴してpdfの中のピクチャと文字を抽出して、文字のスタイルの問題はしばらく解決していませんが、生成したwordファイルは大きいピクチャに対して割合を縮小して、レイアウトの簡単なpdfファイルに対して効果はやはり悪くありません.

高度なJAVA-ハイパラレルインタフェースストリーム制限Semaphore

log 4 netの簡単な構成は使います