1

AgriBot Locator Project

AgriBotLocator Project is a weed detector that uses entirely off-the-shelf componentry, very simple green-detection algorithms (with capacity to upgrade to in-crop detection) and 3D printable parts. OWL integrates weed detection on a mini computer with a relay control board or custom driver board, in a custom designed. This can be mounted on robots, vehicles and bicycles for spot spraying.

2

Legal AI Advisor

The Legal AI Advisor is a system designed to provide basic legal advice through a natural language interface powered by an LLM (e.g., GPT). Users input queries like "What should I do if I’ve been involved in a car accident?" into a web or mobile application, where the LLM interprets their intent and retrieves relevant information from a structured legal knowledge base. This back-end database organizes laws, FAQs, and case summaries by domain (e.g., Criminal Law, Contracts) and uses semantic search to deliver precise responses. Middleware connects the LLM to the knowledge base, processing user queries into database searches and returning simplified, conversational responses. The system emphasizes security and compliance, anonymizing user data and adhering to privacy standards like GDPR. Developed with tools such as React for the front-end, Python for middleware, and a vector database like Pinecone, the system is scalable via cloud hosting (e.g., AWS). While providing accessible legal information, it includes disclaimers urging users to consult professionals for complex issues.

3

Data Privacy

The Privacy Project Using Apache Atlas and Apache Ranger teaches students how to manage data privacy and governance in a big data environment. By utilizing Apache Atlas for metadata management and Apache Ranger for access control, students will catalog sensitive datasets, assign privacy tags (like PII or Academic Records), and implement role-based access control (RBAC) policies to enforce privacy. The project involves setting up mock datasets in Hive or HDFS, using Atlas to classify data and Ranger to define policies that restrict access based on user roles (e.g., Admin, Student, Researcher). Students will also use Ranger’s auditing features to monitor access attempts and ensure compliance. The project provides hands-on experience with data governance concepts, such as metadata tagging, RBAC, and policy enforcement, while demonstrating the practical application of Apache Atlas and Ranger in managing data privacy and security in real-world scenarios.

4

Data Governance

The Data Governance Project using Apache Atlas focuses on managing metadata and ensuring data privacy and quality within a big data environment. Students will utilize Apache Atlas to catalog, classify, and track data across various sources like HDFS or Hive, applying data governance principles such as metadata management, data lineage, and classification. They will define custom metadata types, classify datasets (e.g., PII, Sensitive Data), and visualize data flow to ensure transparency and compliance. Apache Ranger will be used to enforce access policies based on roles, ensuring proper data access control, and securing sensitive information. Additionally, students will implement data quality practices throughout the project, including data profiling, validation, and automated checks during ingestion to ensure that only high-quality data is ingested, categorized, and governed.

The project will also involve integrating data quality into the governance framework by defining data quality rules for accuracy, completeness, and consistency, and tracking these metrics within Apache Atlas. Students will define data quality tags and metadata to make quality information visible alongside other metadata. Through data lineage visualization and auditing, they will monitor and ensure the integrity of data as it moves through the system. Policies in Apache Ranger will control access to low-quality data, ensuring that only validated, high-quality data is accessible for reporting and analysis. This hands-on experience provides students with practical knowledge in implementing data governance and data quality management, vital for ensuring that data remains secure, reliable, and useful in any organization.

5

Data Privacy

The Personal Data Analyser is a privacy-enhancing tool designed to automate privacy-preserving data monitoring and assess privacy risks associated with data transactions. It leverages a hybrid mechanism combining Regular Expressions, Natural Language Processing (NLP), and machine learning models (e.g., Multilayer Perceptron and Random Forest) to detect privacy-sensitive data patterns and evaluate risk factors. The tool uses custom-built crisp and fuzzy models that consider data sensitivity, processor reputation, and other transaction details to assess potential privacy risks. Integrated with the PoSeID-on platform, the system provides real-time alerts to users whenever privacy risks are detected, helping organizations mitigate privacy issues before they escalate.

Validated through real-world use cases and pilot testing with actual users, the tool has proven effective in improving privacy assurances between organizations and their users. By automating privacy risk assessments and raising awareness about data privacy, the Personal Data Analyser empowers organizations to comply with privacy regulations and protect personally identifiable information (PII). The tool not only ensures regulatory compliance but also enhances user trust by promoting responsible data handling practices.

6

Chatbots

The Automated Will Creation project uses Chatfuel, a popular chatbot platform, to guide users through the process of creating a legally binding will via an interactive conversational interface. The chatbot collects essential information such as personal details, asset data, beneficiary names, and specific instructions for the will (e.g., guardianship, funeral preferences). It then uses this information to generate a draft will in a legal format, which users can download, review, and finalize. The tool offers a simple, user-friendly way for individuals to create a will without needing legal expertise, ensuring the process is accessible and straightforward.

The system integrates with a document generation tool to automatically format the gathered data into a standardized will template. It also includes disclaimers that encourage users to consult legal professionals for final validation, ensuring the will complies with local regulations. To protect user data, the project employs encryption and secure communication channels, safeguarding sensitive personal and financial information. Through this chatbot, users can easily create and personalize their will, making the process more convenient and efficient while ensuring privacy and legal compliance. Additionally, the system could offer users the option to connect with legal professionals for further review, helping to enhance the overall trust and security of the process.

7

Cyber Security Data Visualization

This project involves building a cybersecurity data visualization platform that integrates data from multiple sources using Apache Kafka for real-time streaming and a graph database (such as Neo4j) to map and visualize relationships between security-related entities. The platform aggregates data from network traffic logs, system logs, vulnerability databases, and external threat intelligence feeds. Kafka facilitates seamless data integration across various departments, such as IT, security, and network monitoring. As data is streamed, it is processed and stored in the graph database, which helps security analysts visualize complex relationships like IP addresses, users, devices, malware, and attack vectors. This approach enables in-depth analysis and faster identification of potential threats, such as malware spreading across devices or unauthorized access patterns.

Using tools like Grafana or Kibana, the platform offers real-time visualization of cybersecurity events, showing threat maps, attack timelines, and network traffic anomalies. The integration of threat intelligence allows the system to cross-reference incoming data with known attack patterns, improving the detection of advanced threats. Security incidents can be tracked in real-time, and automated alerts are generated when suspicious activities are detected, helping security teams respond quickly. By leveraging streaming data, graph relationships, and interactive dashboards, this project provides a comprehensive solution for cybersecurity threat detection, analysis, and incident response. It enables a unified view of security events across a hybrid infrastructure, enhancing the organization’s ability to detect, analyze, and mitigate cyber threats efficiently.