How to read image/PDF binary as MemoryStream

  • 289 Views
  • Last Post 18 October 2018
nanpass posted this 06 September 2018

Hello,

Our company search OCR for text extraction. I am evaluating FineReader Engine SDK.  I am trying to read Image binary from database and extract Text with C#. I read documents and sample programs, but there is no sample that read binary data from database. I want to extract text from image/pdf binary data where is stored in database instead of image/pdf physical file.  Here is my implementation "readImageAndExtractText" method and  "CustomReadStream" class.  I tried to link binary data(memorystream) to FRDocument.  But FRDocument object(document) is empty.  Our system has all files in SQL db and it is required to read/write file data to db. Is it possible to read binary data from DB? If then, would you provide program segment? Also, exporting text to DB possible?   

 

public void readImageAndExtractText()
{
// Create document
FREngine.FRDocument document = engineLoader.Engine.CreateFRDocument();

using (SqlConnection conn = new SqlConnection())
{
conn.ConnectionString = "server=*******;Database=***;uid=*******;pwd=******";
conn.Open();
int contractId = 874;

List<SqlParameter> sqlParameters = new List<SqlParameter>();
sqlParameters.Add(new SqlParameter("@parentFileId", contractId));
DataTable dt = SqlProcess.executeSelectQuery(conn, "STORED_PROCEDURE_GET_IMAGE_BINARY",
sqlParameters.ToArray());
if (dt != null && dt.Rows.Count > 0)
{
foreach (DataRow row in dt.Rows)
{
byte[] content = Convert.IsDBNull(row["FILE_DATA"]) ? null : (byte[]) row["FILE_DATA"];
CustomReadStream readStream = new CustomReadStream(content);
//Trying to get image binary from database and store it into document object
               document.AddImageFileFromStream(readStream, null, null, null, "0");
string plainTextText = document.PlainText.Text;
Console.Write(plainTextText);
}

}



}
}

public class CustomReadStream : FREngine.IReadStream
{
private MemoryStream fileBytes = null;

public CustomReadStream(byte[] _fileBytes)
{
fileBytes = new MemoryStream(_fileBytes);
}

public void Close()
{
fileBytes.Close();
}

public int Read(out byte[] data, int count)
{
data = new byte[count];
int readBytes = fileBytes.Read(data, 0, count);
return readBytes;
}
}

Helen Osetrova posted this 18 October 2018

Hello,

 

Sorry for not answering your query earlier.

 

The document.PlainText.Text object does not contain any text before the recognition is performed. Kindly add the following line after the document.AddImageFileFromStream() call:

 

document.Process();

 

The IFRDocument::Process() method without any parameters performs all stages of document processing with the default parameters. Kindly learn the Developer's Help > API Reference > Parameter Objects > Preprocessing, Analysis, Recognition, and Synthesis Parameters > DocumentProcessingParams Object article to know how to tune the processing parameters. Detailed description of all processing stages could be found in the Guided Tour > Advanced Techniques > Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis section.

 

Hope this information will be helpful!

 

Close