Dynamic Indexing

Introduction

The structure of the community digest collections is informed by an approach called "Dynamic Indexing".

The Patent Search Problem -- What problem(s) are we trying to solve?

Overview of Patent Documents

First, what is a patent document, how is it structured, and what are the different types of patent document?

Types of Patent Searches

  • Claims based search ( exmaination, validity)
  • Patentability
  • Infringement
  • Clearance
  • State of the Art

The features of the Dynamic Indexing system describe here are directed to claims based searching.

Broader vs Narrow specification of Claim Limitations

A patent claim sets forth the subject matter that an inventor wishes to protect. In the context of patent searching, a claim can be considered as a recipe for conducting the search.

What are some prior concepts that might be helpful?

Existing conceptual frameworks that are pertinent to patent searching.

Every New Idea is a Combination of Old Ideas

The key concept underlying this project is that every new idea can be considered as a combination of old ideas. If this is so, an effective search a 'new idea' entails decompositon the new idea into old ideas and an efficient way of locating documents that disclose those old ideas.

In patent jurisprudence, concept that new ideas are derived from old ideas is frequently explicitly expressed. Recent examples:

The Supreme Court case KSR International Co. v. Teleflex Inc. (2007), invalidating a patent for an adjustable vehicle pedal as an obvious combination and declaring: "This is so because inventions in most, if not all, instances rely upon building blocks long since uncovered, and claimed discoveries almost of necessity will be combinations of what, in some sense, is already known."

This was codified and elaborated in Graham v. John Deere Co. (1966), where the Court examined patents as "a combination of old mechanical elements".

Patent searching does not have the same philosophical overlay as its jurisprudence cousin, but the idea is implicit in the activity of searching.

Hierarchical (enumerative?) Classification

In library science Hierarchical Classification theory posits that knowledge can be systematically arranged in a tree-like structure, progressing from general to specific categories to reflect logical subdivisions of subjects. This approach, rooted in Aristotelian principles of genus and species, enables efficient resource location by mirroring the natural relationships among concepts. It contrasts with flat or non-hierarchical systems by imposing a top-down order, where broader classes encompass narrower ones, often denoted through notation or indentation.

A foundational example is the Dewey Decimal Classification (DDC), introduced by Melvil Dewey in 1876, which divides knowledge into ten main classes (e.g., 000–099 for Computer Science, Information, and General Works), each further broken into divisions and sections (e.g., 500 for Natural Sciences, with 510 for Mathematics). This hierarchy facilitates browsing and shelving in libraries, as users can drill down from overarching disciplines to precise topics. Similarly, the Library of Congress Classification (LCC) employs alphanumeric codes in a hierarchical manner, with main classes like "Q" for Science subdividing into "QA" for Mathematics and further into "QA75" for Electronic Computers. The theory emphasizes mnemonic devices and hospitality—allowing for insertions of new subjects without disrupting the structure—while addressing limitations such as rigidity in interdisciplinary areas. In practice, hierarchical systems promote consistency in cataloging but may require auxiliary tools like indexes for cross-references. This framework has influenced patent classification, where hierarchies aid in delineating technological scopes, though it can sometimes overlook multifaceted inventions.

Faceted Classification

Faceted classification theory, pioneered by S.R. Ranganathan in the 1930s through his Colon Classification (CC), revolutionizes knowledge organization by breaking subjects into independent facets or attributes that can be combined dynamically to describe resources. Unlike enumerative systems that pre-list all possible classes, faceted approaches offer flexibility, allowing multidimensional access tailored to user needs. Ranganathan's Five Laws of Library Science underpin this, emphasizing that libraries should save users' time through adaptable structures. Core to the theory are fundamental facets, often summarized in Ranganathan's PMEST model: Personality (core topic), Matter (materials), Energy (processes), Space (location), and Time (period). For instance, a book on "Indian agriculture in the 20th century" might be classified by combining facets: Agriculture (Personality), Crops (Matter), Cultivation (Energy), India (Space), and 1900s (Time), yielding a notation like "J:2;4.44'N". This synthesis enables precise, context-specific retrieval without exhaustive enumeration. In digital environments, faceted classification enhances search interfaces, as seen in e-commerce sites or library catalogs where users filter by multiple criteria (e.g., author, format, subject). It addresses hierarchies' limitations by accommodating complexity, such as in interdisciplinary fields, but requires careful facet design to avoid ambiguity. Applied to patent literature, faceted systems could dissect inventions by technical features, inventors, or applications, paving the way for innovative indexing schemes that improve discoverability in rapidly evolving technologies.

... faceted classification avoids redundancy by modularly combining attributes rather than enumerating every possible permutation.

Background: Dynamic Programming and Memoization

Dynamic Programming (DP) is a computational paradigm in computer science used to solve complex problems by breaking them into smaller, overlapping subproblems, solving each subproblem only once, and storing their solutions for reuse. It is particularly effective for optimization problems where decisions at one stage affect future outcomes, such as shortest paths, resource allocation, or sequence alignment. DP is grounded in the principle of optimality, which states that an optimal solution to a problem contains optimal solutions to its subproblems. It contrasts with divide-and-conquer by addressing overlapping subproblems, avoiding redundant computations. DP approaches typically follow two strategies: top-down (recursive with memoization) and bottom-up (iterative with tabulation). In the top-down approach, the problem is recursively divided, and solutions are cached to avoid recalculating results for identical subproblems. The bottom-up approach builds solutions iteratively from smaller to larger subproblems, storing results in a table, often more space-efficient but less intuitive for some problems. A classic example is the Fibonacci sequence. Computing Fibonacci numbers naively (e.g., F(n) = F(n-1) + F(n-2)) leads to exponential time complexity due to repeated calculations. DP reduces this to linear time by storing intermediate results. For instance, calculating F(5) involves F(4) and F(3), but F(3) is reused, so storing its value eliminates redundant work. Memoization is a key technique in the top-down DP approach, where intermediate results are stored in a data structure (e.g., array, hash table) to cache solutions to subproblems. When a subproblem is encountered again, the cached result is retrieved instead of recomputed, significantly improving efficiency. For example, in the Fibonacci case, a memoization table stores F(n) for each n computed, ensuring each subproblem is solved only once. This reduces the time complexity from O(2^n) to O(n), though it requires O(n) space for storage. Memoization shines in problems like the knapsack problem, where items are selected to maximize value within a weight constraint. By caching results for subproblems (e.g., value achievable with a subset of items and remaining capacity), memoization avoids recalculating overlapping combinations. However, it may consume more memory than bottom-up DP due to recursive call stack overhead and is less intuitive for problems requiring iteration over states.

In the realm of computer science, which intersects with library and information science through algorithmic approaches to organizing and retrieving data—such as in patent classification systems—the term "memoization" holds particular significance. This technique, often employed in dynamic programming to optimize searches and classifications by caching results, traces its etymological roots to the Latin word "memorandum," meaning "to be remembered." The word was deliberately coined by British artificial intelligence researcher Donald Michie in 1968, drawing from "memo" (a common abbreviation for memorandum in American English) and appending the suffix "-ization" to denote the process of recording or storing for future recall. Michie introduced it in the context of computational efficiency, emphasizing how it transforms recursive functions into more structured, reusable forms—much like how faceted classification in library science breaks down complex subjects into combinable attributes for precise retrieval. This origin underscores memoization's role in enhancing systems akin to hierarchical patent classifications, where avoiding redundant computations mirrors the efficient navigation from broad categories to specific innovations in frameworks like the International Patent Classification (IPC).

Current Approaches to the Patent Search Problem

Overview of Standard Patent Classification Systems -- What existing systems are we trying to imporove

Patent classification systems serve as essential tools for organizing and retrieving patent documents by grouping inventions according to their technical domains. These systems facilitate efficient searching, examination, and analysis of intellectual property across jurisdictions. The primary purpose is to enable effective management of vast patent literatures, ensuring that similar technologies are clustered for comparison and prior art assessment. Among the most prominent systems is the International Patent Classification (IPC), a hierarchical framework established under the Strasbourg Agreement of 1971 and administered by the World Intellectual Property Organization (WIPO). The IPC divides technology into eight sections (A through H), further subdivided into classes, subclasses, groups, and subgroups, using a notation system like "A61K" for preparations for medical purposes. It is language-independent and applied globally to over 70 million patent documents, allowing for consistent classification regardless of the issuing authority. In the United States, the U.S. Patent Classification (USPC) has historically organized patents into approximately 470 main classes and over 160,000 subclasses, based on subject matter similarity. Although transitioning to the Cooperative Patent Classification (CPC) since 2013—a joint system with the European Patent Office (EPO) that builds on the IPC with enhanced detail—the USPC remains relevant for older documents and design/plant patents. The CPC refines the IPC by incorporating additional subdivisions, making it particularly useful for precise technology mapping in fields like biotechnology or electronics. Other notable systems include the European Classification (ECLA), which preceded the CPC and focused on EPO-specific needs, and national variants like Japan's File Index (FI) and F-terms for thematic searching. These classifications are typically hierarchical, enabling navigation from broad categories to specific innovations, though they face challenges in adapting to emerging technologies, often requiring periodic revisions. Overall, such systems underscore the need for structured organization in patent literature to support innovation and legal processes.

Digests of Patents

Types of Digests

Corpus Digests, i.e., s-type digests

d-type digests

b-type digests

h-type digests

How to organize Digests and why do it?

Abstraction Layers

Architectural Patterns

Technological Patterns or Templates

Put These all together -- Dynamic Indexing

Anti-Patterns --- Text Searching / Aritificial Intelligence

Remember, the point here is to save conceptual understandings as they occur so that we do not have to repeat ourselves in a future task !