WWW image search engine needs to establish index information for images browsed on the Web, be able to perform image analysis and discrimination, annotate images, store the extracted index information and establish an index library, which is an ideal image search The engine should also be able to support content-based image retrieval. Image recognition method:
1. Automatically search for graphics and text: You can detect whether there is a displayable image file through two HTML tags, namely IMG SRC and HREF. IMG SRC means "display the following image file" , and HREF means "the following is a link", these two tags often lead to an image file. Search engines determine whether a link is an image file by checking the file extension. If the file extension is .GIF or .JPG, it is a displayable image.
2. Manual intervention to find and classify images: Manually select images and sites on the Internet. This method can produce an accurate query system, but it is too labor-intensive and limits the number of images processed. Since images are different from text and require people to explain their meaning according to their own understanding, image retrieval is much more difficult than text query and matching. Most of the current image search engines support two retrieval methods: keyword retrieval and category browsing. Some can provide visual attribute retrieval, but this is also very limited. Their main retrieval methods are as follows:
a. Based on external information of the image: that is, based on external information such as the file name or directory name of the image, path name, link, ALT tag, and text information around the image. Information retrieval is currently the most commonly used method by image search engines. After locating the image file, the image search engine determines the file contents by looking at the file name or path name, but this depends on how descriptive the file name or path name is.
b. Feature description based on image content: This is a semantic-level matching. It is necessary to manually describe and classify the content of the image (such as objects, background, composition, color characteristics, etc.) and give descriptors. When searching, your search terms will be searched primarily within these descriptors. This query method is relatively accurate and generally can achieve a better accuracy rate. However, it requires manual participation and is labor-intensive, which limits the number of images that can be processed and requires certain specifications and standards. The effect depends on the accuracy of manual description.
c. Extraction based on image form features: The image analysis software automatically extracts the color, shape, texture and other features of the image, and establishes a feature index library. The user only needs to describe the general features of the image to be found. You can find images with similar characteristics. This is a mechanical matching based on the image feature hierarchy, which is particularly suitable for query requirements with clear retrieval goals (such as trademark retrieval). The results produced are also closest to user requirements. However, this relatively mature retrieval technology is currently mainly used in retrieval of image databases. There are still certain difficulties in applying this retrieval technology in online image search engines.