C#编程读取文档Doc、Docx及Pdf内容的方法
本文实例讲述了C#编程读取文档Doc、Docx及Pdf内容的方法。分享给大家供大家参考。具体分析如下:
Doc文档:MicrosoftWord14.0ObjectLibrary(GAC对象,调用前需要安装word。安装的word版本不同,COM的版本号也会不同)
Docx文档:MicrosoftWord14.0ObjectLibrary(GAC对象,调用前需要安装word。安装的word版本不同,COM的版本号也会不同)
Pdf文档:PDFBox
/*
作者:GhostBear
*/
usingSystem;
usingSystem.Collections.Generic;
usingSystem.Linq;
usingSystem.Text;
usingSystem.IO;
usingSystem.Text.RegularExpressions;
usingorg.pdfbox.pdmodel;
usingorg.pdfbox.util;
usingMicrosoft.Office.Interop.Word;
namespaceTestPdfReader
{
classProgram
{
staticvoidMain(string[]args)
{
//PDF
PDDocumentdoc=PDDocument.load(@"C:\resume.pdf");
PDFTextStripperpdfStripper=newPDFTextStripper();
stringtext=pdfStripper.getText(doc);
stringresult=text.Replace('\t','').Replace('\n','').Replace('\r','').Replace("","");
Console.WriteLine(result);
//Doc,Docx
objectdocPath=@"C:\resume.doc";
objectdocxPath=@"C:\resume.docx";
objectmissing=System.Reflection.Missing.Value;
objectreadOnly=true;
ApplicationwordApp;
wordApp=newApplication();
DocumentwordDoc=wordApp.Documents.Open(refdocPath,
refmissing,
refreadOnly,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing,
refmissing);
stringtext2=FilterString(wordDoc.Content.Text);
wordDoc.Close(refmissing,refmissing,refmissing);
wordApp.Quit(refmissing,refmissing,refmissing);
Console.WriteLine(text2);
Console.Read();
}
privatestaticstringFilterString(stringinput)
{
returnRegex.Replace(input,@"(\a|\t|\n|\s+)","");
}
}
}
希望本文所述对大家的C#程序设计有所帮助。