Doro's Python Life in Words Journal

Retrieving posts by similarity


I have previously implemented a tagging system (django-taggit) for blog posts, and there are a lot of interesting things you can do with tags. Tags allow you to categorise posts in a non-hierarchical way. Several tags will be shared by posts on similar topics. To retrieve similar posts to a specific post, follow these steps:

    1. Get all the tags for the current post.
    2. Get all posts tagged with any of these tags.
    3. Exclude the current post from this list to avoid recommending the same post.
    4. If there are two or more posts with the same number of tags, recommend the most recent post.
    5. Limit the query to the number of posts you want to recommend.
# blog/views.py

class PostDetailView(DetailView):
    model = Post
    template_name = "post/detail.html"
    context_object_name = "post"

    def get_object(self):
        return get_object_or_404(
            Post,
            status=Post.Status.PUBLISHED,
            slug=self.kwargs["slug"],
        )

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context["comments"] = self.object.comments.filter(active=True)
        context["form"] = CommentForm()
        context["similar_posts"] = self.get_similar_posts()
        return context

    def get_similar_posts(self):
        post_tags_ids = self.object.tags.values_list("id", flat=True)
        similar_posts = Post.published.filter(tags__in=post_tags_ids).exclude(
            id=self.object.id
        )
        similar_posts = similar_posts.annotate(same_tags=Count("tags")).order_by(
            "-same_tags", "-publish"
        )[:4]
        return similar_posts

Legend:

    • post_tags_ids: This variable stores the ids of all tags of the current post instance.
    • similar_posts: The first time the variable holds all published post instances, where the tags ids are inside the id list (post_tags_ids) and exclude the current post instance
    • similar_posts: The second time, this variable uses the Count aggregation function to generate a calculated field (same_tags) that contains the number of tags shared by all the queried tags. And it orders the result by the number of shared tags in descending order and by publish date to display recent posts first for the posts with the same number of shared tags. And the result will be sliced to only 4 posts.
<!-- blog/templates/post/detail.html -->

  <h2>Similar posts</h2>
  {% for post in similar_posts %}
    <p>
      <a href="{{ post.get_absolute_url }}">{{ post.title }}</a>
    </p>
  {% empty %}
    There are no similar posts yet.
  {% endfor %}

 


Designed by BootstrapMade and modified by DoriDoro