Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and Tavily

by CryptoExpert
Bybit


Introduction

In this tutorial, we will build an advanced AI-powered news agent that can search the web for the latest news on a given topic and summarize the results. This agent follows a structured workflow:

  • Browsing: Generate relevant search queries and collect information from the web.
  • Writing: Extracts and compiles news summaries from the collected information.
  • Reflection: Critiques the summaries by checking for factual correctness and suggests improvements.
  • Refinement: Improves the summaries based on the critique.
  • Headline Generation: Generates appropriate headlines for each news summary.
  • To enhance usability, we will also create a simple GUI using Streamlit. Similar to previous tutorials, we will use Groq for LLM-based processing and Tavily for web browsing. You can generate free API keys from their respective websites.

    Setting Up the Environment

    We begin by setting up environment variables, installing the required libraries, and importing necessary dependencies:

    Install Required Libraries

    bybit
    pip install langgraph==0.2.53 langgraph-checkpoint==2.0.6 langgraph-sdk==0.1.36 langchain-groq langchain-community langgraph-checkpoint-sqlite==2.0.1 tavily-python streamlit

    Import Libraries and Set API Keys

    import os
    import sqlite3
    from langgraph.graph import StateGraph
    from langchain_core.messages import SystemMessage, HumanMessage
    from langchain_groq import ChatGroq
    from tavily import TavilyClient
    from langgraph.checkpoint.sqlite import SqliteSaver
    from typing import TypedDict, List
    from pydantic import BaseModel
    import streamlit as st

    # Set API Keys
    os.environ[‘TAVILY_API_KEY’] = “your_tavily_key”
    os.environ[‘GROQ_API_KEY’] = “your_groq_key”

    # Initialize Database for Checkpointing
    sqlite_conn = sqlite3.connect(“checkpoints.sqlite”, check_same_thread=False)
    memory = SqliteSaver(sqlite_conn)

    # Initialize Model and Tavily Client
    model = ChatGroq(model=”Llama-3.1-8b-instant”)
    tavily = TavilyClient(api_key=os.environ[“TAVILY_API_KEY”])

    Defining the Agent State

    The agent maintains state information throughout its workflow:

  • Topic: The topic on which user wants the latest news Drafts: The first drafts of the news summaries 
  • Content: The research content extracted from the search results of the Tavily 
  • Critique: The critique and recommendations generated for the draft in the reflection state. 
  • Refined Summaries: Updated news summaries after incorporating suggesstions from Critique 
  • Headings: Headlines generated for each news article class

    class AgentState(TypedDict):
    topic: str
    drafts: List[str]
    content: List[str]
    critiques: List[str]
    refined_summaries: List[str]
    headings: List[str]

    Defining Prompts

    We define system prompts for each phase of the agent’s workflow:

    BROWSING_PROMPT = “””You are an AI news researcher tasked with finding the latest news articles on given topics. Generate up to 3 relevant search queries.”””

    WRITER_PROMPT = “””You are an AI news summarizer. Write a detailed summary (1 to 2 paragraphs) based on the given content, ensuring factual correctness, clarity, and coherence.”””

    CRITIQUE_PROMPT = “””You are a teacher reviewing draft summaries against the source content. Ensure factual correctness, identify missing or incorrect details, and suggest improvements.
    ———-
    Content: {content}
    ———-“””

    REFINE_PROMPT = “””You are an AI news editor. Given a summary and critique, refine the summary accordingly.
    ———–
    Summary: {summary}”””

    HEADING_GENERATION_PROMPT = “””You are an AI news summarizer. Generate a short, descriptive headline for each news summary.”””

    Structuring Queries and News

    We use Pydantic to define the structure of queries and News articles. Pydantic allows us to define the structure of the output of the LLM. This is important because we want the queries to be a list of string and the extracted content from web will have multiple news articles, hence a list of strings.

    from pydantic import BaseModel

    class Queries(BaseModel):
    queries: List[str]

    class News(BaseModel):
    news: List[str]

    Implementing the AI Agents

    1. Browsing Node

    This node generates search queries and retrieves relevant content from the web.

    def browsing_node(state: AgentState):
    queries = model.with_structured_output(Queries).invoke([
    SystemMessage(content=BROWSING_PROMPT),
    HumanMessage(content=state[‘topic’])
    ])
    content = state.get(‘content’, [])
    for q in queries.queries:
    response = tavily.search(query=q, max_results=2)
    for r in response[‘results’]:
    content.append(r[‘content’])
    return {“content”: content}

    2. Writing Node

    Extracts news summaries from the retrieved content.

    def writing_node(state: AgentState):
    content = “\n\n”.join(state[‘content’])
    news = model.with_structured_output(News).invoke([
    SystemMessage(content=WRITER_PROMPT),
    HumanMessage(content=content)
    ])
    return {“drafts”: news.news}

    3. Reflection Node

    Critiques the generated summaries against the content.

    def reflection_node(state: AgentState):
    content = “\n\n”.join(state[‘content’])
    critiques = []
    for draft in state[‘drafts’]:
    response = model.invoke([
    SystemMessage(content=CRITIQUE_PROMPT.format(content=content)),
    HumanMessage(content=”draft: ” + draft)
    ])
    critiques.append(response.content)
    return {“critiques”: critiques}

    4. Refinement Node

    Improves the summaries based on critique.

    def refine_node(state: AgentState):
    refined_summaries = []
    for summary, critique in zip(state[‘drafts’], state[‘critiques’]):
    response = model.invoke([
    SystemMessage(content=REFINE_PROMPT.format(summary=summary)),
    HumanMessage(content=”Critique: ” + critique)
    ])
    refined_summaries.append(response.content)
    return {“refined_summaries”: refined_summaries}

    5. Headlines Generation Node

    Generates a short headline for each news summary.

    def heading_node(state: AgentState):
    headings = []
    for summary in state[‘refined_summaries’]:
    response = model.invoke([
    SystemMessage(content=HEADING_GENERATION_PROMPT),
    HumanMessage(content=summary)
    ])
    headings.append(response.content)
    return {“headings”: headings}

    Building the UI with Streamlit

    # Define Streamlit app
    st.title(“News Summarization Chatbot”)

    # Initialize session state
    if “messages” not in st.session_state:
    st.session_state[“messages”] = []

    # Display past messages
    for message in st.session_state[“messages”]:
    with st.chat_message(message[“role”]):
    st.markdown(message[“content”])

    # Input field for user
    user_input = st.chat_input(“Ask about the latest news…”)

    thread = 1
    if user_input:
    st.session_state[“messages”].append({“role”: “user”, “content”: user_input})
    with st.chat_message(“assistant”):
    loading_text = st.empty()
    loading_text.markdown(“*Thinking…*”)

    builder = StateGraph(AgentState)
    builder.add_node(“browser”, browsing_node)
    builder.add_node(“writer”, writing_node)
    builder.add_node(“reflect”, reflection_node)
    builder.add_node(“refine”, refine_node)
    builder.add_node(“heading”, heading_node)
    builder.set_entry_point(“browser”)
    builder.add_edge(“browser”, “writer”)
    builder.add_edge(“writer”, “reflect”)
    builder.add_edge(“reflect”, “refine”)
    builder.add_edge(“refine”, “heading”)
    graph = builder.compile(checkpointer=memory)

    config = {“configurable”: {“thread_id”: f”{thread}”}}
    for s in graph.stream({“topic”: user_input}, config):
    # loading_text.markdown(f”*{st.session_state[‘loading_message’]}*”)
    print(s)

    s = graph.get_state(config).values
    refined_summaries = s[‘refined_summaries’]
    headings = s[‘headings’]
    thread+=1
    # Display final response
    loading_text.empty()
    response_text = “\n\n”.join([f”{h}\n{s}” for h, s in zip(headings, refined_summaries)])
    st.markdown(response_text)
    st.session_state[“messages”].append({“role”: “assistant”, “content”: response_text})

    Conclusion

    This tutorial covered the entire process of building an AI-powered news summarization agent with a simple Streamlit UI. Now you can play around with this and make some further improvements like:

    • A better GUI for enhanced user interaction.
    • Incorporating Iterative refinement to make sure the summaries are accurate and appropriate.
    • Maintaining a context to continue conversation about particular news.

    Happy coding!

    Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He is a Machine Learning enthusiast. He is passionate about research and the latest advancements in Deep Learning, Computer Vision, and related fields.

    ✅ [Recommended] Join Our Telegram Channel



    Source link

    You may also like