C#编程读取文档Doc、Docx及Pdf内容的方法
本文实例讲述了C#编程读取文档Doc、Docx及Pdf内容的方法。分享给大家供大家参考。具体分析如下:
Doc文档:MicrosoftWord14.0ObjectLibrary(GAC对象,调用前需要安装word。安装的word版本不同,COM的版本号也会不同)
Docx文档:MicrosoftWord14.0ObjectLibrary(GAC对象,调用前需要安装word。安装的word版本不同,COM的版本号也会不同)
Pdf文档:PDFBox
/* 作者:GhostBear */ usingSystem; usingSystem.Collections.Generic; usingSystem.Linq; usingSystem.Text; usingSystem.IO; usingSystem.Text.RegularExpressions; usingorg.pdfbox.pdmodel; usingorg.pdfbox.util; usingMicrosoft.Office.Interop.Word; namespaceTestPdfReader { classProgram { staticvoidMain(string[]args) { //PDF PDDocumentdoc=PDDocument.load(@"C:\resume.pdf"); PDFTextStripperpdfStripper=newPDFTextStripper(); stringtext=pdfStripper.getText(doc); stringresult=text.Replace('\t','').Replace('\n','').Replace('\r','').Replace("",""); Console.WriteLine(result); //Doc,Docx objectdocPath=@"C:\resume.doc"; objectdocxPath=@"C:\resume.docx"; objectmissing=System.Reflection.Missing.Value; objectreadOnly=true; ApplicationwordApp; wordApp=newApplication(); DocumentwordDoc=wordApp.Documents.Open(refdocPath, refmissing, refreadOnly, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing, refmissing); stringtext2=FilterString(wordDoc.Content.Text); wordDoc.Close(refmissing,refmissing,refmissing); wordApp.Quit(refmissing,refmissing,refmissing); Console.WriteLine(text2); Console.Read(); } privatestaticstringFilterString(stringinput) { returnRegex.Replace(input,@"(\a|\t|\n|\s+)",""); } } }
希望本文所述对大家的C#程序设计有所帮助。