arXiv reaDer
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images
Optical chemical structure recognition (OCSR) systems aim to extract the molecular structure information, usually in the form of molecular graph or SMILES, from images of chemical molecules. While many tools have been developed for this purpose, challenges still exist due to different types of noises that might exist in the images. Specifically, we focus on the 'arrow-pushing' diagrams, a typical type of chemical images to demonstrate electron flow in mechanistic steps. We present Structural molecular identifier of Molecular images in Chemical Reaction Mechanisms (SMiCRM), a dataset designed to benchmark machine recognition capabilities of chemical molecules with arrow-pushing annotations. Comprising 453 images, it spans a broad array of organic chemical reactions, each illustrated with molecular structures and mechanistic arrows. SMiCRM offers a rich collection of annotated molecule images for enhancing the benchmarking process for OCSR methods. This dataset includes a machine-readable molecular identity for each image as well as mechanistic arrows showing electron flow during chemical reactions. It presents a more authentic and challenging task for testing molecular recognition technologies, and achieving this task can greatly enrich the mechanisitic information in computer-extracted chemical reaction data.
updated: Thu Jul 25 2024 18:52:10 GMT+0000 (UTC)
published: Thu Jul 25 2024 18:52:10 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト