Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.
DocumentExtractor does't work properly for some text files
To watch this topic for replies  Register (enables digests) or give email address:
Reply to topic
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
Yogesh

Joined: 18 Jun 2010
Posts: 15

View user's profile

Back to top
Link to postPosted: Mon Nov 15, 2010 3:04 pmPost subject: DocumentExtractor does't work properly for some text files Reply with quote

Hi,

I have text file containing this  text.

A dicationic bis-hydrazone compound according to claim 1  wherein the compound is chosen from the following compounds: 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[3-(4-{(E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)propyl]pyridinium dibromide  4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[4-(4-((E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)butyl]pyridinium dibromide 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[5-(4-{(E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)pentyl]pyridinium dibromide 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[6-(4-{(E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)hexyl]pyridinium dibromide 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-(6-{[4-({[6-(4-{(E)-[methyl(phenyl)hydrazono]methyl}pyridin ium-1-yl)hexyl]amino}carbonyl)benzoyl]amino}hexyl)pyridinium dibromide 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[6-(4-{(E)-[methyl(phenyl)hydrazono]methyl}quinolinium-1-yl)hexyl]quinolinium dibromide 4-{(E)-[methyl(phenyl)hydrazono]methyl}-1-[6-(4-{(E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)hexyl]quinolinium dibromide 1-methyl-3-[5-(1-methyl-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-imidazol-3-ium-3-yl)pentyl]-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-imidazol-3-ium dibromide 1-methyl-3-[4-(1-methyl-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-benzimidazol-3-ium-3-yl)butyl]-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-benzimidazol-3-ium dibromide 1-[6-(1-methyl-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-imidazol-3-ium-3-yl)hexyl]-4-{(E)-[methyl(phenyl)hydrazono]methyl}quinolinium dibromide 1-[6-(1-methyl-2-{(E)-[methyl(phenyl)hydrazono]methyl}-1H-benzimidazol-3-ium-3-yl)hexyl]-4-{(E)-[methyl(phenyl)hydrazono]methyl}quinolinium dibromide  1 1-pentane-1 5-diylbis(4-{(E)-[methyl(phenyl)hydrazono]methyl}quinolinium) dibromide  1 1-butane-1 4-diylbis(4-{(E)[methyl(phenyl)hydrazono]methyl}quinolinium) dibromide  1 1-propane-1 3-diylbis(4-{(E)-[methyl(phenyl)hydrazono]methyl)quinolinium) dibromide  2-{(E)-[[4-[(4-methoxyphenyl) (methyl)amino]phenyl}(methyl)hydrazono]methyl}-3 3-dimethyl-1-[6-(4{(E)-[methyl(phenyl)hydrazono]methyl}pyridinium-1-yl)hexyl]-3H-indolium dichloride  1 1-hexane-1 6-diylbis(2-{(E)-[{4-[(4-methoxyphenyl)(methyl)amino]phenyl}(methyl)hydrazono]methyl]-3 3-dimethyl-3H-indolium) dichloride.

 

 

DocumentExtractor x = new DocumentExtractor(srcFile);
 x.processHTML();

System.out.println("ok");

DocumentExtractor class get hang .does not print ok .

Can u tell me why this happen?. For other text files It works properly.

 JChem version 5.3.4.

Java : jdk1.6.0_16

OS : Windows XP.

 

Thanks & Regards

Yogesh

Szabolcs
ChemAxon personnel
Joined: 03 Jun 2004
Posts: 1924

View user's profile
Visit poster's website

Back to top
Link to postPosted: Mon Nov 15, 2010 3:19 pmPost subject: Reply with quote

Hi Yogesh,

 

I have moved your question to the naming section of our forum. My colleagues will check it and answer soon.

Best regards,

 

Szabolcs

dbonniot
ChemAxon personnel
Joined: 20 Mar 2006
Posts: 322

View user's profile

Back to top
Link to postPosted: Tue Nov 16, 2010 10:44 amPost subject: 5.4 Reply with quote

Dear Yogesh,


Thank you very much for reporting this issue. After an initial assessment, it looks like soon coming version 5.4 will improve the situation, as it does finish on this text, unlike 5.3. I will let you know more once I finished investigating the situation.

Best regards,

Daniel

dbonniot
ChemAxon personnel
Joined: 20 Mar 2006
Posts: 322

View user's profile

Back to top
Link to postPosted: Fri Nov 19, 2010 6:06 pmPost subject: Reply with quote

So, 5.4 does finish on this text, though it does have problems with the structures, partly because of OCR/formatting errors. I made further improvements, they will probably be released later, in 5.4.1. All of the names are now recognized. This includes automatically fixing the OCR errors in the text, like the missing commas in "1 1-hexane-1 6-diyl" instead of "1,1-hexane-1,6-diyl".

The only remaining issue is that without any formatting between the names, it can be hard to be sure where one compound ends and the next one starts, so several names are understood together as a larger compound.

Thanks again for your report. I hope the improvements in 5.4 and 5.4.1 will be useful to you. Let me know if you find other issues or need specific support.

Reply to topic
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum